[jira] [Closed] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuien Liu closed HAWQ-1529. --- Resolution: Fixed Fix Version/s: backlog https://github.com/apache/incubator-hawq/pull/1290 > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > Fix For: backlog > > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180172#comment-16180172 ] Kuien Liu commented on HAWQ-1529: - https://github.com/apache/incubator-hawq/pull/1290 has been marged. > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-hawq pull request #1290: HAWQ-1529. Fix segment resource manager h...
Github user kuien closed the pull request at: https://github.com/apache/incubator-hawq/pull/1290 ---
[jira] [Commented] (HAWQ-1480) Packing a core file in hawq
[ https://issues.apache.org/jira/browse/HAWQ-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180029#comment-16180029 ] ASF GitHub Bot commented on HAWQ-1480: -- Github user janebeckman commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/123#discussion_r140936494 --- Diff: markdown/admin/packcore.html.md.erb --- @@ -0,0 +1,51 @@ +--- +title: HAWQ packcore utility +--- + + + +## Core file + +Core file is a disk file that records the image of a process' memory in case the process crashes or terminates abruptly. This image can be later used to debug the state of process at the time when it was terminated. This information can be very useful to debug the cause failure. + +## Packcore + +Packcore utility helps in packing a core file with its context – executable, application and system shared libraries from the current environment. This information can be later unpacked on a different system and can be used for debugging. Packcore takes a core file, extracts the name of the binary which generated the core and executes `ldd` (List Dynamic Dependencies) to get the required information into a single tar archive. + +### Using packcore + +The packcore utility is located under `${GPHOME}/sbin`. Following are the options for packing a core file: + +```shell +$GPHOME/sbin/packcore + +or + +$GPHOME/sbin/packcore -b|--binary $GPHOME/bin/postgres +``` + +Alternatively, you can navigate to `$GPHOME/sbin` and run the following: + +```shell +./packcore + +or + +./packcore -b|--binary $GPHOME/bin/postgres +``` +Once finished the utility will create a tar file named `packcore-.tgz`. This file can be later unpacked on another system and used for debugging. --- End diff -- When processing is completed, the utility creates a tar file in the format: `packcore-.tgz`. Unpack this file on another system to use it for debugging. > Packing a core file in hawq > --- > > Key: HAWQ-1480 > URL: https://issues.apache.org/jira/browse/HAWQ-1480 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > Fix For: 2.3.0.0-incubating > > > Currently there is no way to packing a core file with its context – > executable, application and system shared libraries in hawq. This information > can be later unpacked on another system and helps in debugging. It is a > useful feature to quickly gather all the data needed from a crash/core > generated on the system to analyze it later. > Another open source project, greenplum, uses a script > [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to > collect this information. Tested this script against Hawq's installation and > it collects the required information needed for debug. > Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1480) Packing a core file in hawq
[ https://issues.apache.org/jira/browse/HAWQ-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180025#comment-16180025 ] ASF GitHub Bot commented on HAWQ-1480: -- Github user janebeckman commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/123#discussion_r140936073 --- Diff: markdown/admin/packcore.html.md.erb --- @@ -0,0 +1,51 @@ +--- +title: HAWQ packcore utility +--- + + + +## Core file + +Core file is a disk file that records the image of a process' memory in case the process crashes or terminates abruptly. This image can be later used to debug the state of process at the time when it was terminated. This information can be very useful to debug the cause failure. + +## Packcore + +Packcore utility helps in packing a core file with its context – executable, application and system shared libraries from the current environment. This information can be later unpacked on a different system and can be used for debugging. Packcore takes a core file, extracts the name of the binary which generated the core and executes `ldd` (List Dynamic Dependencies) to get the required information into a single tar archive. + +### Using packcore + +The packcore utility is located under `${GPHOME}/sbin`. Following are the options for packing a core file: --- End diff -- Run one of the following commands to pack a core file: > Packing a core file in hawq > --- > > Key: HAWQ-1480 > URL: https://issues.apache.org/jira/browse/HAWQ-1480 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > Fix For: 2.3.0.0-incubating > > > Currently there is no way to packing a core file with its context – > executable, application and system shared libraries in hawq. This information > can be later unpacked on another system and helps in debugging. It is a > useful feature to quickly gather all the data needed from a crash/core > generated on the system to analyze it later. > Another open source project, greenplum, uses a script > [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to > collect this information. Tested this script against Hawq's installation and > it collects the required information needed for debug. > Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1480) Packing a core file in hawq
[ https://issues.apache.org/jira/browse/HAWQ-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180006#comment-16180006 ] ASF GitHub Bot commented on HAWQ-1480: -- Github user janebeckman commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/123#discussion_r140933529 --- Diff: markdown/admin/packcore.html.md.erb --- @@ -0,0 +1,51 @@ +--- +title: HAWQ packcore utility +--- + + + +## Core file + +Core file is a disk file that records the image of a process' memory in case the process crashes or terminates abruptly. This image can be later used to debug the state of process at the time when it was terminated. This information can be very useful to debug the cause failure. --- End diff -- A core file is a disk file that records the image of a process' memory in case the process crashes or terminates abruptly. The information in this image is useful for debugging the state of a process at the time when it was terminated. > Packing a core file in hawq > --- > > Key: HAWQ-1480 > URL: https://issues.apache.org/jira/browse/HAWQ-1480 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > Fix For: 2.3.0.0-incubating > > > Currently there is no way to packing a core file with its context – > executable, application and system shared libraries in hawq. This information > can be later unpacked on another system and helps in debugging. It is a > useful feature to quickly gather all the data needed from a crash/core > generated on the system to analyze it later. > Another open source project, greenplum, uses a script > [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to > collect this information. Tested this script against Hawq's installation and > it collects the required information needed for debug. > Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1480) Packing a core file in hawq
[ https://issues.apache.org/jira/browse/HAWQ-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16180005#comment-16180005 ] ASF GitHub Bot commented on HAWQ-1480: -- Github user janebeckman commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/123#discussion_r140933386 --- Diff: markdown/admin/packcore.html.md.erb --- @@ -0,0 +1,51 @@ +--- +title: HAWQ packcore utility +--- + + + +## Core file + +Core file is a disk file that records the image of a process' memory in case the process crashes or terminates abruptly. This image can be later used to debug the state of process at the time when it was terminated. This information can be very useful to debug the cause failure. + +## Packcore + +Packcore utility helps in packing a core file with its context – executable, application and system shared libraries from the current environment. This information can be later unpacked on a different system and can be used for debugging. Packcore takes a core file, extracts the name of the binary which generated the core and executes `ldd` (List Dynamic Dependencies) to get the required information into a single tar archive. --- End diff -- The Packcore utility helps pack a core file with its context, including the executable, application, and shared system libraries from the current environment. This information can be unpacked for later debugging on a different system. Packcore extracts the name of the binary that generated the core from the core file, then executes `ldd` (List Dynamic Dependencies) to create a single tar archive containing the core file information. > Packing a core file in hawq > --- > > Key: HAWQ-1480 > URL: https://issues.apache.org/jira/browse/HAWQ-1480 > Project: Apache HAWQ > Issue Type: Improvement > Components: Command Line Tools >Reporter: Shubham Sharma >Assignee: Radar Lei > Fix For: 2.3.0.0-incubating > > > Currently there is no way to packing a core file with its context – > executable, application and system shared libraries in hawq. This information > can be later unpacked on another system and helps in debugging. It is a > useful feature to quickly gather all the data needed from a crash/core > generated on the system to analyze it later. > Another open source project, greenplum, uses a script > [https://github.com/greenplum-db/gpdb/blob/master/gpMgmt/sbin/packcore] to > collect this information. Tested this script against Hawq's installation and > it collects the required information needed for debug. > Can this be merged into Hawq, if yes, I can submit a pull request and test it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HAWQ-1531) Templatize pxf-private.classpath and pxf-log4j.properties
Oleksandr Diachenko created HAWQ-1531: - Summary: Templatize pxf-private.classpath and pxf-log4j.properties Key: HAWQ-1531 URL: https://issues.apache.org/jira/browse/HAWQ-1531 Project: Apache HAWQ Issue Type: Improvement Components: PXF Reporter: Oleksandr Diachenko Assignee: Ed Espino Fix For: 2.3.0.0-incubating Users should be able to initialize PXF instance with a given HADOOP_HOME, HIVE_HOME, HBASE_HOME, logs location etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-hawq issue #1290: HAWQ-1529. Fix segment resource manager hang whe...
Github user jianlirong commented on the issue: https://github.com/apache/incubator-hawq/pull/1290 LGTM, +1 ---
[jira] [Created] (HAWQ-1530) Illegally killing a JDBC select query causes locking problems
Grant Krieger created HAWQ-1530: --- Summary: Illegally killing a JDBC select query causes locking problems Key: HAWQ-1530 URL: https://issues.apache.org/jira/browse/HAWQ-1530 Project: Apache HAWQ Issue Type: Bug Components: Transaction Reporter: Grant Krieger Assignee: Radar Lei Hi, When you perform a long running select statement on 2 hawq tables (join) from JDBC and illegally kill the JDBC client (CTRL ALT DEL) before completion of the query the 2 tables remained locked even when the query completes on the server. The lock is visible via PG_locks. One cannot kill the query via SELECT pg_terminate_backend(393937). The only way to get rid of it is to kill -9 from linux or restart hawq but this can kill other things as well. The JDBC client I am using is Aqua Data Studio. I can provide exact steps to reproduce if required Thank you Grant -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (HAWQ-1193) TDE support in HAWQ
[ https://issues.apache.org/jira/browse/HAWQ-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongxu Ma closed HAWQ-1193. --- Resolution: Fixed finished > TDE support in HAWQ > --- > > Key: HAWQ-1193 > URL: https://issues.apache.org/jira/browse/HAWQ-1193 > Project: Apache HAWQ > Issue Type: New Feature > Components: libhdfs >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: backlog > > Attachments: HAWQ_TDE_Design_ver0.1.pdf, HAWQ_TDE_Design_ver0.2 .pdf > > > TDE(transparently data encrypted) has been supported after hadoop 2.6: > http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html > https://issues.apache.org/jira/browse/HDFS-6134 > Use TDE can promise: > 1, hdfs file is encrypted. > 2, network transfer between hdfs and libhdfs client is encrypted. > So hawq will update libhdfs3 to support it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-hawq issue #1290: HAWQ-1529. Fix segment resource manager hang whe...
Github user linwen commented on the issue: https://github.com/apache/incubator-hawq/pull/1290 LGTM, +1 ---
[GitHub] incubator-hawq pull request #1290: HAWQ-1529. Fix segment resource manager h...
GitHub user kuien opened a pull request: https://github.com/apache/incubator-hawq/pull/1290 HAWQ-1529. Fix segment resource manager hang when postmaster died If PostmasterIsAlive() is under implicit declaration, %eax (32-bits) will be used for comparison rather than %al (8-bits), BUT PostmasterIsAlive() only set the lower 8-bits (because 'bool' is really a 'char'). Then segment resource manager will never exit after postmaster died. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kuien/incubator-hawq rmseg Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/1290.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1290 commit 010a493e483785a49cbfadd33bc62c1c1e4b5321 Author: Kuien LiuDate: 2017-09-25T08:40:54Z HAWQ-1529. Fix segment resource manager hang when postmaster died Change-Id: I418f55bdbcc927bbe3b892d77fb99f5b60c1f1eb ---
[jira] [Closed] (HAWQ-1510) Add TDE-related functionality into hawq command line tools
[ https://issues.apache.org/jira/browse/HAWQ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongxu Ma closed HAWQ-1510. --- Resolution: Fixed done > Add TDE-related functionality into hawq command line tools > -- > > Key: HAWQ-1510 > URL: https://issues.apache.org/jira/browse/HAWQ-1510 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: 2.3.0.0-incubating > > > 1, hawq init > the only way to enable tde in hawq: > user should give a key name(already created by hadoop key command) parameter > when execuate the init command, it makes the whole hawq_default directory as > an encryption zone. > note: > * cannot support transfer the existed(and non-empty) hawq_default directory > into an encryption zone. > * create encryption zone need hdfs *superuser privilege*, so if hawq user and > hdfs superuser is not the same one, you should create the encryption zone on > hawq directory manually before running hawq-init script, example: > {code} > hdfs crypto -createZone -keyName key_demo -path /hawq_default/ > {code} > command: > {code} > hawq init cluster --tde_keyname key_demo > {code} > -2, hawq state- > -show the encryption zone info if user enable tde in hawq.- > -3, hawq register- > cannot register file in different encryption zones / un-encryption zones. > -4, hawq extract- > give user a warning of the table data is stored in encryption zone if user > enable tde in hawq. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HAWQ-1510) Add TDE-related functionality into hawq command line tools
[ https://issues.apache.org/jira/browse/HAWQ-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hongxu Ma updated HAWQ-1510: Description: 1, hawq init the only way to enable tde in hawq: user should give a key name(already created by hadoop key command) parameter when execuate the init command, it makes the whole hawq_default directory as an encryption zone. note: * cannot support transfer the existed(and non-empty) hawq_default directory into an encryption zone. * create encryption zone need hdfs *superuser privilege*, so if hawq user and hdfs superuser is not the same one, you should create the encryption zone on hawq directory manually before running hawq-init script, example: {code} hdfs crypto -createZone -keyName key_demo -path /hawq_default/ {code} command: {code} hawq init cluster --tde_keyname key_demo {code} -2, hawq state- -show the encryption zone info if user enable tde in hawq.- -3, hawq register- cannot register file in different encryption zones / un-encryption zones. -4, hawq extract- give user a warning of the table data is stored in encryption zone if user enable tde in hawq. was: 1, hawq init the only way to enable tde in hawq: user should give a key name(already created by hadoop key command) parameter when execuate the init command, it makes the whole hawq_default directory as an encryption zone. note: * cannot support transfer the existed(and non-empty) hawq_default directory into an encryption zone. * create encryption zone need hdfs *superuser privilege*, so if hawq user and hdfs superuser is not the same one, you should create the encryption zone on hawq directory manually before running hawq-init script, example: {code} hdfs crypto -createZone -keyName key_demo -path /hawq_default/ {code} command: {code} hawq init cluster --tde_keyname key_demo {code} -2, hawq state- -show the encryption zone info if user enable tde in hawq.- 3, hawq register cannot register file in different encryption zones / un-encryption zones. 4, hawq extract give user a warning of the table data is stored in encryption zone if user enable tde in hawq. > Add TDE-related functionality into hawq command line tools > -- > > Key: HAWQ-1510 > URL: https://issues.apache.org/jira/browse/HAWQ-1510 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Command Line Tools >Reporter: Hongxu Ma >Assignee: Hongxu Ma > Fix For: 2.3.0.0-incubating > > > 1, hawq init > the only way to enable tde in hawq: > user should give a key name(already created by hadoop key command) parameter > when execuate the init command, it makes the whole hawq_default directory as > an encryption zone. > note: > * cannot support transfer the existed(and non-empty) hawq_default directory > into an encryption zone. > * create encryption zone need hdfs *superuser privilege*, so if hawq user and > hdfs superuser is not the same one, you should create the encryption zone on > hawq directory manually before running hawq-init script, example: > {code} > hdfs crypto -createZone -keyName key_demo -path /hawq_default/ > {code} > command: > {code} > hawq init cluster --tde_keyname key_demo > {code} > -2, hawq state- > -show the encryption zone info if user enable tde in hawq.- > -3, hawq register- > cannot register file in different encryption zones / un-encryption zones. > -4, hawq extract- > give user a warning of the table data is stored in encryption zone if user > enable tde in hawq. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178730#comment-16178730 ] Kuien Liu commented on HAWQ-1529: - Without include pmsignal.h, 32 bits of return (PostmasterIsAlive() ) will be used for comparasion, that is, %eax, otherwise %al will be used (it is what we want). > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178608#comment-16178608 ] Kuien Liu edited comment on HAWQ-1529 at 9/25/17 8:31 AM: -- A possible patch looks strange but does work. {code:diff} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) {code} was (Author: kuien): A possible patch looks strange but does work. {code:diff} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) @@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void) DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec(); while( DRMGlobalInstance->ResManagerMainKeepRun ) { - if (!PostmasterIsAlive(true)) { + if (0 == PostmasterIsAlive(true)) { DRMGlobalInstance->ResManagerMainKeepRun = false; elog(LOG, "Postmaster is not alive, resource manager exits"); break; {code} > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178608#comment-16178608 ] Kuien Liu commented on HAWQ-1529: - A possible patch looks strange but does work. {code:c} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) @@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void) DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec(); while( DRMGlobalInstance->ResManagerMainKeepRun ) { - if (!PostmasterIsAlive(true)) { + if (0 == PostmasterIsAlive(true)) { DRMGlobalInstance->ResManagerMainKeepRun = false; elog(LOG, "Postmaster is not alive, resource manager exits"); break; {code} > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HAWQ-1529) "segment resource manager" will NOT exit when postmaster died
[ https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178608#comment-16178608 ] Kuien Liu edited comment on HAWQ-1529 at 9/25/17 6:52 AM: -- A possible patch looks strange but does work. {code:diff} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) @@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void) DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec(); while( DRMGlobalInstance->ResManagerMainKeepRun ) { - if (!PostmasterIsAlive(true)) { + if (0 == PostmasterIsAlive(true)) { DRMGlobalInstance->ResManagerMainKeepRun = false; elog(LOG, "Postmaster is not alive, resource manager exits"); break; {code} was (Author: kuien): A possible patch looks strange but does work. {code:c} --- a/src/backend/resourcemanager/resourcemanager_RMSEG.c +++ b/src/backend/resourcemanager/resourcemanager_RMSEG.c @@ -26,6 +26,7 @@ #include "communication/rmcomm_MessageServer.h" #include "communication/rmcomm_RMSEG2RM.h" #include "resourceenforcer/resourceenforcer.h" +#include "storage/pmsignal.h" /* PostmasterIsAlive */ #include "cdb/cdbtmpdir.h" int ResManagerMainSegment2ndPhase(void) @@ -156,7 +157,7 @@ int MainHandlerLoop_RMSEG(void) DRMGlobalInstance->ResourceManagerStartTime = gettime_microsec(); while( DRMGlobalInstance->ResManagerMainKeepRun ) { - if (!PostmasterIsAlive(true)) { + if (0 == PostmasterIsAlive(true)) { DRMGlobalInstance->ResManagerMainKeepRun = false; elog(LOG, "Postmaster is not alive, resource manager exits"); break; {code} > "segment resource manager" will NOT exit when postmaster died > - > > Key: HAWQ-1529 > URL: https://issues.apache.org/jira/browse/HAWQ-1529 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kuien Liu >Assignee: Radar Lei > > If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster > dies, BUT "segment resource manager" and "logger process" are still alive and > flushing "WARNING" each 30s. > To my understanding, "logger process" is waiting for "segment resource > manager", but the resource manager will not detect the alive-status of > postmaster and continue waiting. Does it make sense? Why not quit in case of > postmaster gone? > The call stack of RM when postmaster is killed: > #0 0x7f19023ccab6 in poll () from /lib64/libc.so.6 > #1 0x00a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156 > #2 0x00a8ce5e in MainHandlerLoop_RMSEG () at > resourcemanager_RMSEG.c:166 > #3 0x00a8cba3 in ResManagerMainSegment2ndPhase () at > resourcemanager_RMSEG.c:71 > #4 0x00a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at > resourcemanager.c:346 > #5 0x00a8db45 in ResManagerProcessStartup () at resourcemanager.c:411 > #6 0x00899b89 in CommenceNormalOperations () at postmaster.c:3673 > #7 0x0089a562 in do_reaper () at postmaster.c:4021 > #8 0x008969bb in ServerLoop () at postmaster.c:2136 > #9 0x00895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at > postmaster.c:1454 > #10 0x007b185d in main (argc=0xc, argv=0x229a730) at main.c:226 > #11 0x7f190231e994 in __libc_start_main () from /lib64/libc.so.6 > #12 0x004bde89 in _start () -- This message was sent by Atlassian JIRA (v6.4.14#64029)