[ 
https://issues.apache.org/jira/browse/HAWQ-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuien Liu closed HAWQ-1529.
---------------------------
       Resolution: Fixed
    Fix Version/s: backlog

https://github.com/apache/incubator-hawq/pull/1290

> "segment resource manager" will NOT exit when postmaster died
> -------------------------------------------------------------
>
>                 Key: HAWQ-1529
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1529
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Kuien Liu
>            Assignee: Radar Lei
>             Fix For: backlog
>
>
> If I send SIGKILL to postmaster of segment by 'kill -9', then postmaster 
> dies, BUT "segment resource manager" and "logger process" are still alive and 
> flushing "WARNING" each 30s.
> To my understanding, "logger process" is waiting for "segment resource 
> manager", but the resource manager will not detect the alive-status of 
> postmaster and continue waiting. Does it make sense? Why not quit in case of 
> postmaster gone? 
> The call stack of RM when postmaster is killed:
> #0  0x00007f19023ccab6 in poll () from /lib64/libc.so.6
> #1  0x0000000000a48c9e in processAllCommFileDescs () at rmcomm_AsyncComm.c:156
> #2  0x0000000000a8ce5e in MainHandlerLoop_RMSEG () at 
> resourcemanager_RMSEG.c:166
> #3  0x0000000000a8cba3 in ResManagerMainSegment2ndPhase () at 
> resourcemanager_RMSEG.c:71
> #4  0x0000000000a8d966 in ResManagerMain (argc=0x3, argv=0x7fffa018b890) at 
> resourcemanager.c:346
> #5  0x0000000000a8db45 in ResManagerProcessStartup () at resourcemanager.c:411
> #6  0x0000000000899b89 in CommenceNormalOperations () at postmaster.c:3673
> #7  0x000000000089a562 in do_reaper () at postmaster.c:4021
> #8  0x00000000008969bb in ServerLoop () at postmaster.c:2136
> #9  0x0000000000895a78 in PostmasterMain (argc=0xc, argv=0x229a730) at 
> postmaster.c:1454
> #10 0x00000000007b185d in main (argc=0xc, argv=0x229a730) at main.c:226
> #11 0x00007f190231e994 in __libc_start_main () from /lib64/libc.so.6
> #12 0x00000000004bde89 in _start ()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to