Re: [go-nuts] zombie parent scenario with golang

2020-09-11 Thread Uday Kiran Jonnala
Hi  Kurtis,

Thanks for the reply. I was giving C code to show the behavior of defunct 
with threads still executing in a process. I do feel defunct process should 
not have any associated resources or threads held.
Could be an issue with this Linux version, will check on the behavior in 
Linux community why defunct still showing threads with resources.

I was giving this C example to check if the go process may be hitting into 
the same situation.

Thanks again for the time. 

Thanks & Regards,
Uday Kiran 

On Thursday, September 10, 2020 at 11:00:57 PM UTC-7 ba...@iitbombay.org 
wrote:

> 1. Looks like*something* in ps reports process/thread state incorrectly. 
> It should not report  until all the pthreads have exited and the 
> parent has not picked up the status. The runtime will call exit() when the 
> last thread terminates (exit() in turn will call the _exit syscall).
>
> 2. If any thread calls _exit(), the system will clean up everything. It 
> doesn't matter how many threads may  be active. You can see this for 
> yourself if you replaced pthread_exit() in main() with _exit(0).
>
> 3. stacktrace within the kernel mode is irrelevant. You are merely 
> confusing yourself.
>
> 4. Go runtime doesn't use pthread so not sure testing pthread based C 
> program is relevant. Exiting from func main() will kill all goroutines. 
> Copy https://play.golang.org/p/zRfhvfYt_oE locally and see for yourself.
>
> 5. Looking at your original message, it *seems* like a parent is not 
> picking up the child's status. But I can't be sure.
>
> I suspect you are on a wild goose chase. Possibly confused by ps. You may 
> wish to backtrack to whatever you were looking at before this  
> came up. Or try explaining what is going on with your go program and what 
> you expect it should do, without stacktrace or C programs etc.
>
> I wouldn't switch to a newer kernel if I were you. When debugging you 
> should keep everything else fixed or else you may end up chasing something 
> different or the symptom may change or disappear.
>
> On Sep 10, 2020, at 10:08 PM, Uday Kiran Jonnala  
> wrote:
>
> Thanks Kurtis for the reply. I understand defunct process mechanism. 
>
> As I mentioned in the initial mail, [Correct me if I am wrong here], In a 
> process if there is main thread and a detached thread created by main 
> thread, when the main thread exits the process is kept in defunct state, 
> since the created thread is still
> executing, I was thinking if we have such scenario in go runtime. That 
> could be the reason I see this thread is waiting on futex and holding the 
> file handles and causing the go process (kernel) not to send SIGCHLD to 
> parent process.
>
> For example below case
>
> #include 
> #include 
> #include 
> #include 
>
> void *thread_function(void *args)
> {
> printf("The is new thread! Sleep 20 seconds...\n");
> sleep(100);
> printf("Exit from thread\n");
> pthread_exit(0);
> }
>
> int main(int argc, char **argv)
> {
>  pthread_t thrd;
>  pthread_attr_t attr;
>  int res = 0;
>  res = pthread_attr_init();
>  res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
>  res = pthread_create(, , thread_function, NULL);
>  res = pthread_attr_destroy();
>  printf("Main thread. Sleep 5 seconds\n");
>  sleep(5);
>  printf("Exit from main process\n");
>  pthread_exit(0);
> }
>
> ujonnala@ ~/mycode/go () $ ps -T
>PID   SPID TTY  TIME CMD
>  43635  43635 pts/29   00:00:00 a.out 
>  43635  43638 pts/29   00:00:00 a.out
>
> Due to the detached thread still executing the process left in defunt 
> state. 
>
> Thanks for checking on this, I will see if we can reproduce my situation 
> on a newer kernel.
>
> Thanks & Regards,
> Uday Kiran
>
> On Thursday, September 10, 2020 at 9:49:06 PM UTC-7 Kurtis Rader wrote:
>
>> On Thu, Sep 10, 2020 at 9:25 PM Uday Kiran Jonnala  
>> wrote:
>>
>>> Thanks for the reply. We are fixing the issue. But the point I wanted to 
>>> bring it up here is the issue of a thread causing the go process to be in 
>>> defunct state.
>>>
>>
>> Any thread can cause the go process to enter the "defunct" state. For 
>> example, by calling os.Exit(), or panic(), or causing a signal to be 
>> delivered that terminates the process (e.g., SIGSEGV).
>>  
>>
>>> My kernel version is 
>>> Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c) 
>>> (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10 
>>> 02:17:54 UTC 2020
>>>
>>
>> Is that the output of `uname -a`? It seems to suggest you're using CentOS 
>> provided by the https://www.nutanix.com/go/linux-on-ahv cloud 
>> environment. So we've established you are using Linux with kernel version 
>> 4.14. A kernel that is now three years old. I don't have anything like it 
>> installed on any of my virtual machines so I can't explore how it handles 
>> defunct processes. But my prior point stands: A "defunct" process is one 
>> that has been terminated but whose parent process has not reaped its exit 
>> 

Re: [go-nuts] zombie parent scenario with golang

2020-09-11 Thread Bakul Shah
1. Looks like*something* in ps reports process/thread state incorrectly. It 
should not report  until all the pthreads have exited and the parent 
has not picked up the status. The runtime will call exit() when the last thread 
terminates (exit() in turn will call the _exit syscall).

2. If any thread calls _exit(), the system will clean up everything. It doesn't 
matter how many threads may  be active. You can see this for yourself if you 
replaced pthread_exit() in main() with _exit(0).

3. stacktrace within the kernel mode is irrelevant. You are merely confusing 
yourself.

4. Go runtime doesn't use pthread so not sure testing pthread based C program 
is relevant. Exiting from func main() will kill all goroutines. Copy 
https://play.golang.org/p/zRfhvfYt_oE locally and see for yourself.

5. Looking at your original message, it *seems* like a parent is not picking up 
the child's status. But I can't be sure.

I suspect you are on a wild goose chase. Possibly confused by ps. You may wish 
to backtrack to whatever you were looking at before this  came up. Or 
try explaining what is going on with your go program and what you expect it 
should do, without stacktrace or C programs etc.

I wouldn't switch to a newer kernel if I were you. When debugging you should 
keep everything else fixed or else you may end up chasing something different 
or the symptom may change or disappear.

> On Sep 10, 2020, at 10:08 PM, Uday Kiran Jonnala  wrote:
> 
> Thanks Kurtis for the reply. I understand defunct process mechanism. 
> 
> As I mentioned in the initial mail, [Correct me if I am wrong here], In a 
> process if there is main thread and a detached thread created by main thread, 
> when the main thread exits the process is kept in defunct state, since the 
> created thread is still
> executing, I was thinking if we have such scenario in go runtime. That could 
> be the reason I see this thread is waiting on futex and holding the file 
> handles and causing the go process (kernel) not to send SIGCHLD to parent 
> process.
> 
> For example below case
> 
> #include 
> #include 
> #include 
> #include 
> 
> void *thread_function(void *args)
> {
> printf("The is new thread! Sleep 20 seconds...\n");
> sleep(100);
> printf("Exit from thread\n");
> pthread_exit(0);
> }
> 
> int main(int argc, char **argv)
> {
>  pthread_t thrd;
>  pthread_attr_t attr;
>  int res = 0;
>  res = pthread_attr_init();
>  res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
>  res = pthread_create(, , thread_function, NULL);
>  res = pthread_attr_destroy();
>  printf("Main thread. Sleep 5 seconds\n");
>  sleep(5);
>  printf("Exit from main process\n");
>  pthread_exit(0);
> }
> 
> ujonnala@ ~/mycode/go () $ ps -T
>PID   SPID TTY  TIME CMD
>  43635  43635 pts/29   00:00:00 a.out 
>  43635  43638 pts/29   00:00:00 a.out
> 
> Due to the detached thread still executing the process left in defunt state. 
> 
> Thanks for checking on this, I will see if we can reproduce my situation on a 
> newer kernel.
> 
> Thanks & Regards,
> Uday Kiran
> 
> On Thursday, September 10, 2020 at 9:49:06 PM UTC-7 Kurtis Rader wrote:
> On Thu, Sep 10, 2020 at 9:25 PM Uday Kiran Jonnala  > wrote:
> Thanks for the reply. We are fixing the issue. But the point I wanted to 
> bring it up here is the issue of a thread causing the go process to be in 
> defunct state.
> 
> Any thread can cause the go process to enter the "defunct" state. For 
> example, by calling os.Exit(), or panic(), or causing a signal to be 
> delivered that terminates the process (e.g., SIGSEGV).
>  
> My kernel version is 
> Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c) (gcc 
> version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10 02:17:54 
> UTC 2020
> 
> Is that the output of `uname -a`? It seems to suggest you're using CentOS 
> provided by the https://www.nutanix.com/go/linux-on-ahv 
>  cloud environment. So we've 
> established you are using Linux with kernel version 4.14. A kernel that is 
> now three years old. I don't have anything like it installed on any of my 
> virtual machines so I can't explore how it handles defunct processes. But my 
> prior point stands: A "defunct" process is one that has been terminated but 
> whose parent process has not reaped its exit status. Either that parent 
> process has a bug (the most likely explanation) or your OS has a bug.
> 
> -- 
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/ad4843e1-f7d1-43ae-8091-579bc61527fdn%40googlegroups.com
>  
> 

Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Kurtis Rader
Your example is a C program. I'm guessing you're using gccgo to link with
equivalent C code. In which case your question has almost nothing to do
with Go. You need to ask the Linux community why your example results in a
defunct process that appears to have a live thread.

I do not believe you "understand defunct process mechanism". Because a
defunct process is one that does not have any executing threads. Yet you
still seem to think that Go, somehow, creates a process that is both
defunct and has an executing thread.

On Thu, Sep 10, 2020 at 10:08 PM Uday Kiran Jonnala 
wrote:

> Thanks Kurtis for the reply. I understand defunct process mechanism.
>
> As I mentioned in the initial mail, [Correct me if I am wrong here], In a
> process if there is main thread and a detached thread created by main
> thread, when the main thread exits the process is kept in defunct state,
> since the created thread is still
> executing, I was thinking if we have such scenario in go runtime. That
> could be the reason I see this thread is waiting on futex and holding the
> file handles and causing the go process (kernel) not to send SIGCHLD to
> parent process.
>
> For example below case
>
> #include 
> #include 
> #include 
> #include 
>
> void *thread_function(void *args)
> {
> printf("The is new thread! Sleep 20 seconds...\n");
> sleep(100);
> printf("Exit from thread\n");
> pthread_exit(0);
> }
>
> int main(int argc, char **argv)
> {
>  pthread_t thrd;
>  pthread_attr_t attr;
>  int res = 0;
>  res = pthread_attr_init();
>  res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
>  res = pthread_create(, , thread_function, NULL);
>  res = pthread_attr_destroy();
>  printf("Main thread. Sleep 5 seconds\n");
>  sleep(5);
>  printf("Exit from main process\n");
>  pthread_exit(0);
> }
>
> ujonnala@ ~/mycode/go () $ ps -T
>PID   SPID TTY  TIME CMD
>  43635  43635 pts/29   00:00:00 a.out 
>  43635  43638 pts/29   00:00:00 a.out
>
> Due to the detached thread still executing the process left in defunt
> state.
>
> Thanks for checking on this, I will see if we can reproduce my situation
> on a newer kernel.
>
> Thanks & Regards,
> Uday Kiran
>
> On Thursday, September 10, 2020 at 9:49:06 PM UTC-7 Kurtis Rader wrote:
>
>> On Thu, Sep 10, 2020 at 9:25 PM Uday Kiran Jonnala 
>> wrote:
>>
>>> Thanks for the reply. We are fixing the issue. But the point I wanted to
>>> bring it up here is the issue of a thread causing the go process to be in
>>> defunct state.
>>>
>>
>> Any thread can cause the go process to enter the "defunct" state. For
>> example, by calling os.Exit(), or panic(), or causing a signal to be
>> delivered that terminates the process (e.g., SIGSEGV).
>>
>>
>>> My kernel version is
>>> Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c)
>>> (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10
>>> 02:17:54 UTC 2020
>>>
>>
>> Is that the output of `uname -a`? It seems to suggest you're using CentOS
>> provided by the https://www.nutanix.com/go/linux-on-ahv cloud
>> environment. So we've established you are using Linux with kernel version
>> 4.14. A kernel that is now three years old. I don't have anything like it
>> installed on any of my virtual machines so I can't explore how it handles
>> defunct processes. But my prior point stands: A "defunct" process is one
>> that has been terminated but whose parent process has not reaped its exit
>> status. Either that parent process has a bug (the most likely explanation)
>> or your OS has a bug.
>>
>> --
>> Kurtis Rader
>> Caretaker of the exceptional canines Junior and Hank
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/ad4843e1-f7d1-43ae-8091-579bc61527fdn%40googlegroups.com
> 
> .
>


-- 
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-AA6N7jw2%3DR5gamWoewpokuRzVz_OSo70jGrWQ%2Bpj7DA%40mail.gmail.com.


Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Uday Kiran Jonnala
Thanks Kurtis for the reply. I understand defunct process mechanism. 

As I mentioned in the initial mail, [Correct me if I am wrong here], In a 
process if there is main thread and a detached thread created by main 
thread, when the main thread exits the process is kept in defunct state, 
since the created thread is still
executing, I was thinking if we have such scenario in go runtime. That 
could be the reason I see this thread is waiting on futex and holding the 
file handles and causing the go process (kernel) not to send SIGCHLD to 
parent process.

For example below case

#include 
#include 
#include 
#include 

void *thread_function(void *args)
{
printf("The is new thread! Sleep 20 seconds...\n");
sleep(100);
printf("Exit from thread\n");
pthread_exit(0);
}

int main(int argc, char **argv)
{
 pthread_t thrd;
 pthread_attr_t attr;
 int res = 0;
 res = pthread_attr_init();
 res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
 res = pthread_create(, , thread_function, NULL);
 res = pthread_attr_destroy();
 printf("Main thread. Sleep 5 seconds\n");
 sleep(5);
 printf("Exit from main process\n");
 pthread_exit(0);
}

ujonnala@ ~/mycode/go () $ ps -T
   PID   SPID TTY  TIME CMD
 43635  43635 pts/29   00:00:00 a.out 
 43635  43638 pts/29   00:00:00 a.out

Due to the detached thread still executing the process left in defunt 
state. 

Thanks for checking on this, I will see if we can reproduce my situation on 
a newer kernel.

Thanks & Regards,
Uday Kiran

On Thursday, September 10, 2020 at 9:49:06 PM UTC-7 Kurtis Rader wrote:

> On Thu, Sep 10, 2020 at 9:25 PM Uday Kiran Jonnala  
> wrote:
>
>> Thanks for the reply. We are fixing the issue. But the point I wanted to 
>> bring it up here is the issue of a thread causing the go process to be in 
>> defunct state.
>>
>
> Any thread can cause the go process to enter the "defunct" state. For 
> example, by calling os.Exit(), or panic(), or causing a signal to be 
> delivered that terminates the process (e.g., SIGSEGV).
>  
>
>> My kernel version is 
>> Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c) 
>> (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10 
>> 02:17:54 UTC 2020
>>
>
> Is that the output of `uname -a`? It seems to suggest you're using CentOS 
> provided by the https://www.nutanix.com/go/linux-on-ahv cloud 
> environment. So we've established you are using Linux with kernel version 
> 4.14. A kernel that is now three years old. I don't have anything like it 
> installed on any of my virtual machines so I can't explore how it handles 
> defunct processes. But my prior point stands: A "defunct" process is one 
> that has been terminated but whose parent process has not reaped its exit 
> status. Either that parent process has a bug (the most likely explanation) 
> or your OS has a bug.
>
> -- 
> Kurtis Rader
> Caretaker of the exceptional canines Junior and Hank
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/ad4843e1-f7d1-43ae-8091-579bc61527fdn%40googlegroups.com.


Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Kurtis Rader
On Thu, Sep 10, 2020 at 9:25 PM Uday Kiran Jonnala 
wrote:

> Thanks for the reply. We are fixing the issue. But the point I wanted to
> bring it up here is the issue of a thread causing the go process to be in
> defunct state.
>

Any thread can cause the go process to enter the "defunct" state. For
example, by calling os.Exit(), or panic(), or causing a signal to be
delivered that terminates the process (e.g., SIGSEGV).


> My kernel version is
> Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c)
> (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10
> 02:17:54 UTC 2020
>

Is that the output of `uname -a`? It seems to suggest you're using CentOS
provided by the https://www.nutanix.com/go/linux-on-ahv cloud environment.
So we've established you are using Linux with kernel version 4.14. A kernel
that is now three years old. I don't have anything like it installed on any
of my virtual machines so I can't explore how it handles defunct processes.
But my prior point stands: A "defunct" process is one that has been
terminated but whose parent process has not reaped its exit status. Either
that parent process has a bug (the most likely explanation) or your OS has
a bug.

-- 
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CABx2%3DD-VA7DOcAMkGmMSiH%2Bs-iugfzCU-4dgr_PnmqkjdbnjrQ%40mail.gmail.com.


Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Uday Kiran Jonnala
Hi Ian, Kurtis,

Thanks for the reply. We are fixing the issue. But the point I wanted to 
bring it up here is the issue of a thread causing the go process to be in 
defunct state.
My kernel version is 
Linux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c) 
(gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10 
02:17:54 UTC 2020

Thanks & Regards,
Uday Kiran

On Thursday, September 10, 2020 at 6:42:06 PM UTC-7 Ian Lance Taylor wrote:

> On Thu, Sep 10, 2020 at 5:09 PM Kurtis Rader  wrote:
> >
> > A defunct process is a process that has terminated but whose parent 
> process has not called wait() or one of its variants. I don't know why lsof 
> still reports open files. It shouldn't since a dead process should have its 
> resources, such as its file descriptor table, freed by the kernel even if 
> the parent hasn't called wait(). You didn't tell us the details of the OS 
> you're using so I would simply assume it's a quirk of your OS. It might be 
> more productive to look into why your program is panicing at 
> map_faststr.go:275. A likely explanation is you have a race in your program 
> that is causing it to attempt to mutate a map concurrently or you're trying 
> to insert into a nil map.
>
> That's a good point. What OS are you using? I don't think you said.
>
> Ian
>
>
> > On Thu, Sep 10, 2020 at 4:43 PM Uday Kiran Jonnala  
> wrote:
> >>
> >> Hi Ian,
> >>
> >> Again. Thanks for the reply. Problem here is we see go process is in 
> defunt process and sure parent process did not get SIGCHILD and looking 
> deeper,
> >> I see a thread in futex_wait_queue_me. If we think we are just getting 
> the stack trace and the go process actually got killed, why would I see
> >> associated fd's in file table and fd table is still intact (see lsof 
> information)
> >>
> >> Process which is in defunt state which got panic is <87548>, checking 
> for threads in this which is 87548
> >>
> >> bash-4.2# cat /proc/87548/status
> >> Name: replicator
> >> State: Z (zombie)
> >>
> >> bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649
> >> l-wx--. 1 root root 64 Aug 25 10:59 1 -> pipe:[606649]
> >> l-wx--. 1 root root 64 Aug 25 10:59 2 -> pipe:[606649]
> >>
> >> Listing the threads
> >>
> >> bash-4.2# ps -aefT | grep 87548
> >> root 87548 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> >> root 87548 87561 87507 0 Aug23 ? 00:00:00 [replicator] 
> >> root 112448 112448 42566 0 17:13 pts/0 00:00:00 grep 87548
> >>
> >> bash-4.2# lsof | grep 606649
> >> replicato 87548 87561 root 1w FIFO 0,11 0t0 606649 pipe
> >> replicato 87548 87561 root 2w FIFO 0,11 0t0 606649 pipe
> >>
> >> Why does lsof show the entry for the FIFO file of this process?
> >>
> >> So I feel we have a scenario the thread which is sleeping on 
> futex_wait_queue_me is not cleanup during panic() and causing the main
> >> thread to be exited leaving detached thread which waiting in 
> futex_wait_queue_me is still present.
> >>
> >> The main issue is I am not able to reproduce this, since this go 
> process is very big.
> >>
> >> Any way to verify this OR take it further.
> >>
> >> Thanks & Regards,
> >> Uday Kiran
> >> On Monday, September 7, 2020 at 12:05:05 PM UTC-7 Ian Lance Taylor 
> wrote:
> >>>
> >>> On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala  
> wrote:
> >>> >
> >>> > Thanks for the reply, I get the point on zombie, I do not think the 
> issue here is parent not reaping child, seems like go process has not 
> finished execution of some
> >>> > internal threads (waiting on some futex) and causing SIGCHILD not to 
> be sent to parent.
> >>> >
> >>> > go process named  hit with panic and I see this went 
> into zombie state
> >>> >
> >>> > $ ps -ef | grep replicator
> >>> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> >>> >
> >>> > Now looking at the tasks within the process
> >>> >
> >>> > I see the stack trace of the threads within the process still stuck 
> on following
> >>> >
> >>> > bash-4.2# cat /proc/87548/task/87561/stack
> >>> > [] futex_wait_queue_me+0xc4/0x120
> >>> > [] futex_wait+0x10a/0x250
> >>> > [] do_futex+0x35e/0x5b0
> >>> > [] SyS_futex+0x13b/0x180
> >>> > [] do_syscall_64+0x79/0x1b0
> >>> > [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >>> > [] 0x
> >>> >
> >>> > From the above example if we are creating some internal threads and 
> main thread is excited due to panic and left some detached threads, process 
> will be in zombie state until the threads
> >>> > within the process completes.
> >>> >
> >>> > It appears there is some run away threads hung state scenario 
> causing this. I am not able to reproduce it with main go routine explict 
> panic and some go routine still executing.
> >>> >
> >>> > Does the above stack trace sound familiar wrt internal threads of Go 
> runtime ?
> >>>
> >>> If the process is defunct, then none of the thread stacks matter.
> >>> They are just where the thread happened to be when the process exited.
> >>>
> >>> What is the 

Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Ian Lance Taylor
On Thu, Sep 10, 2020 at 5:09 PM Kurtis Rader  wrote:
>
> A defunct process is a process that has terminated but whose parent process 
> has not called wait() or one of its variants. I don't know why lsof still 
> reports open files. It shouldn't since a dead process should have its 
> resources, such as its file descriptor table, freed by the kernel even if the 
> parent hasn't called wait(). You didn't tell us the details of the OS you're 
> using so I would simply assume it's a quirk of your OS. It might be more 
> productive to look into why your program is panicing at map_faststr.go:275. A 
> likely explanation is you have a race in your program that is causing it to 
> attempt to mutate a map concurrently or you're trying to insert into a nil 
> map.

That's a good point.  What OS are you using?  I don't think you said.

Ian


> On Thu, Sep 10, 2020 at 4:43 PM Uday Kiran Jonnala  
> wrote:
>>
>> Hi Ian,
>>
>> Again. Thanks for the reply. Problem here is we see go process is in defunt 
>> process and sure parent process did not get SIGCHILD and looking deeper,
>> I see a thread in  futex_wait_queue_me. If we think we are just getting the 
>> stack trace and the go process actually got killed, why would I see
>> associated fd's in file table and fd table is still intact (see lsof 
>> information)
>>
>> Process which is in defunt state which got panic is <87548>, checking for 
>> threads in this which is 87548
>>
>> bash-4.2# cat /proc/87548/status
>>  Name: replicator
>>  State: Z (zombie)
>>
>> bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649
>> l-wx--. 1 root root 64 Aug 25 10:59 1 -> pipe:[606649]
>> l-wx--. 1 root root 64 Aug 25 10:59 2 -> pipe:[606649]
>>
>> Listing the threads
>>
>> bash-4.2# ps -aefT | grep 87548
>> root 87548 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
>> root 87548 87561 87507 0 Aug23 ? 00:00:00 [replicator] 
>> root 112448 112448 42566 0 17:13 pts/0 00:00:00 grep 87548
>>
>> bash-4.2# lsof | grep 606649
>> replicato  87548  87561root1w FIFO   0,11   0t0  
>>606649 pipe
>> replicato  87548  87561root2w FIFO   0,11   0t0  
>>606649 pipe
>>
>> Why does lsof show the entry for the FIFO file of this process?
>>
>> So I feel we have a scenario the thread which is sleeping on 
>> futex_wait_queue_me is not cleanup during panic() and causing the main
>> thread to be exited leaving detached thread which waiting in 
>> futex_wait_queue_me is still present.
>>
>> The main issue is I am not able to reproduce this, since this go process is 
>> very big.
>>
>> Any way to verify this OR  take it further.
>>
>> Thanks & Regards,
>> Uday Kiran
>> On Monday, September 7, 2020 at 12:05:05 PM UTC-7 Ian Lance Taylor wrote:
>>>
>>> On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala  
>>> wrote:
>>> >
>>> > Thanks for the reply, I get the point on zombie, I do not think the issue 
>>> > here is parent not reaping child, seems like go process has not finished 
>>> > execution of some
>>> > internal threads (waiting on some futex) and causing SIGCHILD not to be 
>>> > sent to parent.
>>> >
>>> > go process named  hit with panic and I see this went into 
>>> > zombie state
>>> >
>>> > $ ps -ef | grep replicator
>>> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
>>> >
>>> > Now looking at the tasks within the process
>>> >
>>> > I see the stack trace of the threads within the process still stuck on 
>>> > following
>>> >
>>> > bash-4.2# cat /proc/87548/task/87561/stack
>>> > [] futex_wait_queue_me+0xc4/0x120
>>> > [] futex_wait+0x10a/0x250
>>> > [] do_futex+0x35e/0x5b0
>>> > [] SyS_futex+0x13b/0x180
>>> > [] do_syscall_64+0x79/0x1b0
>>> > [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>>> > [] 0x
>>> >
>>> > From the above example if we are creating some internal threads and main 
>>> > thread is excited due to panic and left some detached threads, process 
>>> > will be in zombie state until the threads
>>> > within the process completes.
>>> >
>>> > It appears there is some run away threads hung state scenario causing 
>>> > this. I am not able to reproduce it with main go routine explict panic 
>>> > and some go routine still executing.
>>> >
>>> > Does the above stack trace sound familiar wrt internal threads of Go 
>>> > runtime ?
>>>
>>> If the process is defunct, then none of the thread stacks matter.
>>> They are just where the thread happened to be when the process exited.
>>>
>>> What is the real problem you are seeing?
>>>
>>> Ian
>>>
>>>
>>>
>>>
>>> > On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor wrote:
>>> >>
>>> >> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
>>> >>  wrote:
>>> >> >
>>> >> > I have a situation on zombie parent scenario with golang
>>> >> >
>>> >> > A process (in the case replicator) has many goroutines internally
>>> >> >
>>> >> > We hit into panic() and I see the replicator process is in Zombie state
>>> >> >
>>> >> > <<>>>:~$ ps -ef | grep 

Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Kurtis Rader
A defunct process is a process that has terminated but whose parent process
has not called wait() or one of its variants. I don't know why lsof still
reports open files. It shouldn't since a dead process should have its
resources, such as its file descriptor table, freed by the kernel even if
the parent hasn't called wait(). You didn't tell us the details of the OS
you're using so I would simply assume it's a quirk of your OS. It might be
more productive to look into why your program is panicing at
map_faststr.go:275. A likely explanation is you have a race in your program
that is causing it to attempt to mutate a map concurrently or you're trying
to insert into a nil map.

On Thu, Sep 10, 2020 at 4:43 PM Uday Kiran Jonnala 
wrote:

> Hi Ian,
>
> Again. Thanks for the reply. Problem here is we see go process is in
> defunt process and sure parent process did not get SIGCHILD and looking
> deeper,
> I see a thread in  futex_wait_queue_me. If we think we are just getting
> the stack trace and the go process actually got killed, why would I see
> associated fd's in file table and fd table is still intact (see lsof
> information)
>
> Process which is in defunt state which got panic is <87548>, checking for
> threads in this which is 87548
>
> bash-4.2# cat /proc/*87548*/status
>  Name: replicator
>  State: Z (zombie)
>
> bash-4.2# ls -Fl /proc/*87548*/task/*87561*/fd | grep 606649
> l-wx--. 1 root root 64 Aug 25 10:59 1 -> pipe:[606649]
> l-wx--. 1 root root 64 Aug 25 10:59 2 -> pipe:[606649]
>
> Listing the threads
>
> bash-4.2# ps -aefT | grep 87548
> root 87548 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> root 87548 87561 87507 0 Aug23 ? 00:00:00 [replicator] 
> root 112448 112448 42566 0 17:13 pts/0 00:00:00 grep 87548
>
> bash-4.2# lsof | grep 606649
> replicato  87548  87561root1w FIFO   0,11
> 0t0 606649 pipe
> replicato  87548  87561root2w FIFO   0,11
> 0t0 606649 pipe
>
> Why does lsof show the entry for the FIFO file of this process?
>
> So I feel we have a scenario the thread which is sleeping on
> futex_wait_queue_me is not cleanup during panic() and causing the main
> thread to be exited leaving detached thread which waiting in
> futex_wait_queue_me is still present.
>
> The main issue is I am not able to reproduce this, since this go process
> is very big.
>
> Any way to verify this OR  take it further.
>
> Thanks & Regards,
> Uday Kiran
> On Monday, September 7, 2020 at 12:05:05 PM UTC-7 Ian Lance Taylor wrote:
>
>> On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala 
>> wrote:
>> >
>> > Thanks for the reply, I get the point on zombie, I do not think the
>> issue here is parent not reaping child, seems like go process has not
>> finished execution of some
>> > internal threads (waiting on some futex) and causing SIGCHILD not to be
>> sent to parent.
>> >
>> > go process named  hit with panic and I see this went into
>> zombie state
>> >
>> > $ ps -ef | grep replicator
>> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
>> >
>> > Now looking at the tasks within the process
>> >
>> > I see the stack trace of the threads within the process still stuck on
>> following
>> >
>> > bash-4.2# cat /proc/87548/task/87561/stack
>> > [] futex_wait_queue_me+0xc4/0x120
>> > [] futex_wait+0x10a/0x250
>> > [] do_futex+0x35e/0x5b0
>> > [] SyS_futex+0x13b/0x180
>> > [] do_syscall_64+0x79/0x1b0
>> > [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>> > [] 0x
>> >
>> > From the above example if we are creating some internal threads and
>> main thread is excited due to panic and left some detached threads, process
>> will be in zombie state until the threads
>> > within the process completes.
>> >
>> > It appears there is some run away threads hung state scenario causing
>> this. I am not able to reproduce it with main go routine explict panic and
>> some go routine still executing.
>> >
>> > Does the above stack trace sound familiar wrt internal threads of Go
>> runtime ?
>>
>> If the process is defunct, then none of the thread stacks matter.
>> They are just where the thread happened to be when the process exited.
>>
>> What is the real problem you are seeing?
>>
>> Ian
>>
>>
>>
>>
>> > On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor
>> wrote:
>> >>
>> >> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
>> >>  wrote:
>> >> >
>> >> > I have a situation on zombie parent scenario with golang
>> >> >
>> >> > A process (in the case replicator) has many goroutines internally
>> >> >
>> >> > We hit into panic() and I see the replicator process is in Zombie
>> state
>> >> >
>> >> > <<>>>:~$ ps -ef | grep replicator
>> >> >
>> >> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
>> >> >
>> >> >
>> >> >
>> >> > Main go routine (or the supporting P) excited, but panic left the
>> other P thread to be still in executing state (main P could be 87548 and
>> supporting P thread 87561 is still there) in blocked state
>> >> >
>> 

Re: [go-nuts] zombie parent scenario with golang

2020-09-10 Thread Uday Kiran Jonnala
Hi Ian,

Again. Thanks for the reply. Problem here is we see go process is in defunt 
process and sure parent process did not get SIGCHILD and looking deeper,
I see a thread in  futex_wait_queue_me. If we think we are just getting the 
stack trace and the go process actually got killed, why would I see
associated fd's in file table and fd table is still intact (see lsof 
information)

Process which is in defunt state which got panic is <87548>, checking for 
threads in this which is 87548

bash-4.2# cat /proc/*87548*/status 
 Name: replicator 
 State: Z (zombie) 

bash-4.2# ls -Fl /proc/*87548*/task/*87561*/fd | grep 606649 
l-wx--. 1 root root 64 Aug 25 10:59 1 -> pipe:[606649] 
l-wx--. 1 root root 64 Aug 25 10:59 2 -> pipe:[606649]  

Listing the threads

bash-4.2# ps -aefT | grep 87548 
root 87548 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
root 87548 87561 87507 0 Aug23 ? 00:00:00 [replicator]  
root 112448 112448 42566 0 17:13 pts/0 00:00:00 grep 87548  

bash-4.2# lsof | grep 606649 
replicato  87548  87561root1w FIFO   0,11   0t0 
606649 pipe 
replicato  87548  87561root2w FIFO   0,11   0t0 
606649 pipe  

Why does lsof show the entry for the FIFO file of this process?

So I feel we have a scenario the thread which is sleeping on 
futex_wait_queue_me is not cleanup during panic() and causing the main
thread to be exited leaving detached thread which waiting in 
futex_wait_queue_me is still present.

The main issue is I am not able to reproduce this, since this go process is 
very big.

Any way to verify this OR  take it further.

Thanks & Regards,
Uday Kiran
On Monday, September 7, 2020 at 12:05:05 PM UTC-7 Ian Lance Taylor wrote:

> On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala  
> wrote:
> >
> > Thanks for the reply, I get the point on zombie, I do not think the 
> issue here is parent not reaping child, seems like go process has not 
> finished execution of some
> > internal threads (waiting on some futex) and causing SIGCHILD not to be 
> sent to parent.
> >
> > go process named  hit with panic and I see this went into 
> zombie state
> >
> > $ ps -ef | grep replicator
> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> >
> > Now looking at the tasks within the process
> >
> > I see the stack trace of the threads within the process still stuck on 
> following
> >
> > bash-4.2# cat /proc/87548/task/87561/stack
> > [] futex_wait_queue_me+0xc4/0x120
> > [] futex_wait+0x10a/0x250
> > [] do_futex+0x35e/0x5b0
> > [] SyS_futex+0x13b/0x180
> > [] do_syscall_64+0x79/0x1b0
> > [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > [] 0x
> >
> > From the above example if we are creating some internal threads and main 
> thread is excited due to panic and left some detached threads, process will 
> be in zombie state until the threads
> > within the process completes.
> >
> > It appears there is some run away threads hung state scenario causing 
> this. I am not able to reproduce it with main go routine explict panic and 
> some go routine still executing.
> >
> > Does the above stack trace sound familiar wrt internal threads of Go 
> runtime ?
>
> If the process is defunct, then none of the thread stacks matter.
> They are just where the thread happened to be when the process exited.
>
> What is the real problem you are seeing?
>
> Ian
>
>
>
>
> > On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor wrote:
> >>
> >> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
> >>  wrote:
> >> >
> >> > I have a situation on zombie parent scenario with golang
> >> >
> >> > A process (in the case replicator) has many goroutines internally
> >> >
> >> > We hit into panic() and I see the replicator process is in Zombie 
> state
> >> >
> >> > <<>>>:~$ ps -ef | grep replicator
> >> >
> >> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> >> >
> >> >
> >> >
> >> > Main go routine (or the supporting P) excited, but panic left the 
> other P thread to be still in executing state (main P could be 87548 and 
> supporting P thread 87561 is still there) in blocked state
> >> >
> >> > bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx--. 1 
> root root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx--. 1 root root 64 Aug 
> 25 10:59 2 -> pipe:[606649]
> >> >
> >> > Stack trace
> >> >
> >> > bash-4.2# cat /proc/87548/task/87561/stack[] 
> futex_wait_queue_me+0xc4/0x120[] 
> futex_wait+0x10a/0x250[] 
> do_futex+0x35e/0x5b0[] 
> SyS_futex+0x13b/0x180[] 
> do_syscall_64+0x79/0x1b0[] 
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2[] 
> 0x
> >> >
> >> >
> >> >
> >> > We have panic internally from main go routine
> >> >
> >> > fatal error: concurrent map writes
> >> >
> >> > goroutine 666359 [running]:
> >> > runtime.throw(0x101d6ae, 0x15)
> >> > 
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>  
> +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 

Re: [go-nuts] zombie parent scenario with golang

2020-09-07 Thread Ian Lance Taylor
On Mon, Sep 7, 2020 at 12:03 AM Uday Kiran Jonnala  wrote:
>
> Thanks for the reply, I get the point on zombie, I do not think the issue 
> here is parent not reaping child, seems like go process has not finished 
> execution of some
> internal threads (waiting on some futex) and causing SIGCHILD not to be sent 
> to parent.
>
> go process named  hit with panic and I see this went into zombie 
> state
>
> $ ps -ef | grep replicator
> root  87548  87507  0 Aug23 ?00:00:00 [replicator] 
>
> Now looking at the tasks within the process
>
> I see the stack trace of the threads within the process still stuck on 
> following
>
> bash-4.2# cat /proc/87548/task/87561/stack
> [] futex_wait_queue_me+0xc4/0x120
> [] futex_wait+0x10a/0x250
> [] do_futex+0x35e/0x5b0
> [] SyS_futex+0x13b/0x180
> [] do_syscall_64+0x79/0x1b0
> [] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [] 0x
>
> From the above example if we are creating some internal threads and main 
> thread is excited due to panic and left some detached threads, process will 
> be in zombie state until the threads
> within the process completes.
>
> It appears there is some run away threads hung state scenario causing this. I 
> am not able to reproduce it with main go routine explict panic and some go 
> routine still executing.
>
> Does the above stack trace sound familiar wrt internal threads of Go runtime ?

If the process is defunct, then none of the thread stacks matter.
They are just where the thread happened to be when the process exited.

What is the real problem you are seeing?

Ian




> On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor wrote:
>>
>> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
>>  wrote:
>> >
>> > I have a situation on zombie parent scenario with golang
>> >
>> > A process (in the case replicator) has many goroutines internally
>> >
>> > We hit into panic() and I see the replicator process is in Zombie state
>> >
>> > <<>>>:~$ ps -ef | grep replicator
>> >
>> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
>> >
>> >
>> >
>> > Main go routine (or the supporting P) excited, but panic left the other P 
>> > thread to be still in executing state (main P could be 87548 and 
>> > supporting P thread 87561 is still there) in blocked state
>> >
>> > bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx--. 1 root 
>> > root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx--. 1 root root 64 Aug 25 
>> > 10:59 2 -> pipe:[606649]
>> >
>> > Stack trace
>> >
>> > bash-4.2# cat /proc/87548/task/87561/stack[] 
>> > futex_wait_queue_me+0xc4/0x120[] 
>> > futex_wait+0x10a/0x250[] 
>> > do_futex+0x35e/0x5b0[] 
>> > SyS_futex+0x13b/0x180[] 
>> > do_syscall_64+0x79/0x1b0[] 
>> > entry_SYSCALL_64_after_hwframe+0x3d/0xa2[] 
>> > 0x
>> >
>> >
>> >
>> > We have panic internally from main go routine
>> >
>> > fatal error: concurrent map writes
>> >
>> > goroutine 666359 [running]:
>> > runtime.throw(0x101d6ae, 0x15)
>> > /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>> >  +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
>> > runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83, 
>> > 0xc0009d03c8)
>> > /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
>> >  +0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
>> > github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
>> >  0x83, 0x0)
>> >
>> > ...
>> >
>> > goroutine 665516 [chan receive, 2 minutes]:
>> > zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 
>> > 0x52, 0xc00302ec60, 0x29)
>> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
>> > created by zeus.(*Leadership).LeaderValue
>> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
>> > 2020-08-03 00:35:04 rolled over log file
>> > ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 
>> > dataset.go:26] initialize zfs linking
>> > ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 
>> > dataset.go:34] completed zfs linking successfully
>> > I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid: 
>> > c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
>> > I0803 00:35:04.433460 196123 main.go:99] Component name using for this 
>> > process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
>> > I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
>> >
>> > If there is panic() from main P thread, as I understand we exit() and 
>> > cleanup all P threads of the process.
>> >
>> > Are we hitting into the following scenario, I did not look into M-P-G 
>> > implantation in detail.
>> >
>> > Example:
>> >
>> > #include 
>> > #include 
>> > #include 
>> > #include 
>> >
>> > void *thread_function(void *args)
>> > {
>> > printf("The is new thread! Sleep 20 seconds...\n");
>> > sleep(100);
>> > printf("Exit from thread\n");
>> > 

Re: [go-nuts] zombie parent scenario with golang

2020-09-07 Thread Uday Kiran Jonnala
Hi Ian,

Thanks for the reply, I get the point on zombie, I do not think the issue 
here is parent not reaping child, seems like go process has not finished 
execution of some
internal threads (waiting on some futex) and causing SIGCHILD not to be 
sent to parent.

go process named  hit with panic and I see this went into 
zombie state

$ ps -ef | grep replicator
root  87548  87507  0 Aug23 ?00:00:00 [replicator]   

Now looking at the tasks within the process

I see the stack trace of the threads within the process still stuck on 
following

bash-4.2# cat /proc/87548/task/87561/stack
[] futex_wait_queue_me+0xc4/0x120
[] futex_wait+0x10a/0x250
[] do_futex+0x35e/0x5b0
[] SyS_futex+0x13b/0x180
[] do_syscall_64+0x79/0x1b0
[] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[] 0x  

>From the above example if we are creating some internal threads and main 
thread is excited due to panic and left some detached threads, process will 
be in zombie state until the threads
within the process completes.
It appears there is some run away threads hung state scenario causing this. 
I am not able to reproduce it with main go routine explict panic and some 
go routine still executing.

Does the above stack trace sound familiar wrt internal threads of Go 
runtime ?

Thanks,
Uday

On Thursday, August 27, 2020 at 1:43:39 PM UTC-7 Ian Lance Taylor wrote:

> On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
>  wrote:
> >
> > I have a situation on zombie parent scenario with golang
> >
> > A process (in the case replicator) has many goroutines internally
> >
> > We hit into panic() and I see the replicator process is in Zombie state
> >
> > <<>>>:~$ ps -ef | grep replicator
> >
> > root 87548 87507 0 Aug23 ? 00:00:00 [replicator] 
> >
> >
> >
> > Main go routine (or the supporting P) excited, but panic left the other 
> P thread to be still in executing state (main P could be 87548 and 
> supporting P thread 87561 is still there) in blocked state
> >
> > bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx--. 1 
> root root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx--. 1 root root 64 Aug 
> 25 10:59 2 -> pipe:[606649]
> >
> > Stack trace
> >
> > bash-4.2# cat /proc/87548/task/87561/stack[] 
> futex_wait_queue_me+0xc4/0x120[] 
> futex_wait+0x10a/0x250[] 
> do_futex+0x35e/0x5b0[] 
> SyS_futex+0x13b/0x180[] 
> do_syscall_64+0x79/0x1b0[] 
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2[] 
> 0x
> >
> >
> >
> > We have panic internally from main go routine
> >
> > fatal error: concurrent map writes
> >
> > goroutine 666359 [running]:
> > runtime.throw(0x101d6ae, 0x15)
> > 
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>  
> +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
> > runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83, 
> 0xc0009d03c8)
> > 
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
>  
> +0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
> > 
> github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
>  
> 0x83, 0x0)
> >
> > ...
> >
> > goroutine 665516 [chan receive, 2 minutes]:
> > zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 
> 0x52, 0xc00302ec60, 0x29)
> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
> > created by zeus.(*Leadership).LeaderValue
> > /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
> > 2020-08-03 00:35:04 rolled over log file
> > ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 
> dataset.go:26] initialize zfs linking
> > ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 
> dataset.go:34] completed zfs linking successfully
> > I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid: 
> c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
> > I0803 00:35:04.433460 196123 main.go:99] Component name using for this 
> process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
> > I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
> >
> > If there is panic() from main P thread, as I understand we exit() and 
> cleanup all P threads of the process.
> >
> > Are we hitting into the following scenario, I did not look into M-P-G 
> implantation in detail.
> >
> > Example:
> >
> > #include 
> > #include 
> > #include 
> > #include 
> >
> > void *thread_function(void *args)
> > {
> > printf("The is new thread! Sleep 20 seconds...\n");
> > sleep(100);
> > printf("Exit from thread\n");
> > pthread_exit(0);
> > }
> >
> > int main(int argc, char **argv)
> > {
> > pthread_t thrd;
> > pthread_attr_t attr;
> > int res = 0;
> > res = pthread_attr_init();
> > res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
> > res = pthread_create(, , thread_function, NULL);
> > res = pthread_attr_destroy();
> > printf("Main thread. Sleep 5 seconds\n");
> > sleep(5);
> > printf("Exit from main 

Re: [go-nuts] zombie parent scenario with golang

2020-08-27 Thread Ian Lance Taylor
On Thu, Aug 27, 2020 at 10:01 AM Uday Kiran Jonnala
 wrote:
>
> I have a situation on zombie parent scenario with golang
>
>  A process (in the case replicator) has many goroutines internally
>
> We hit into panic() and I see the replicator process is in Zombie state
>
> <<>>>:~$ ps -ef | grep replicator
>
> root  87548  87507  0 Aug23 ?00:00:00 [replicator] 
>
>
>
> Main go routine (or the supporting P) excited, but panic left the other P 
> thread to be still in executing state (main P could be 87548 and supporting P 
> thread 87561 is still there) in blocked state
>
> bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx--. 1 root 
> root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx--. 1 root root 64 Aug 25 
> 10:59 2 -> pipe:[606649]
>
> Stack trace
>
> bash-4.2# cat /proc/87548/task/87561/stack[] 
> futex_wait_queue_me+0xc4/0x120[] 
> futex_wait+0x10a/0x250[] 
> do_futex+0x35e/0x5b0[] 
> SyS_futex+0x13b/0x180[] 
> do_syscall_64+0x79/0x1b0[] 
> entry_SYSCALL_64_after_hwframe+0x3d/0xa2[] 
> 0x
>
>
>
> We have panic internally from main go routine
>
> fatal error: concurrent map writes
>
> goroutine 666359 [running]:
> runtime.throw(0x101d6ae, 0x15)
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
>  +0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
> runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83, 
> 0xc0009d03c8)
> /home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
>  +0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
> github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
>  0x83, 0x0)
>
> ...
>
> goroutine 665516 [chan receive, 2 minutes]:
> zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 0x52, 
> 0xc00302ec60, 0x29)
> /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
> created by zeus.(*Leadership).LeaderValue
> /home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
> 2020-08-03 00:35:04 rolled over log file
> ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 dataset.go:26] 
> initialize zfs linking
> ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 dataset.go:34] 
> completed zfs linking successfully
> I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid: 
> c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
> I0803 00:35:04.433460 196123 main.go:99] Component name using for this 
> process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
> I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
>
>  If there is panic() from main P thread, as I understand we exit() and 
> cleanup all P threads of the process.
>
>  Are we hitting into the following scenario, I did not look into M-P-G 
> implantation in detail.
>
>  Example:
>
> #include 
> #include 
> #include 
> #include 
>
> void *thread_function(void *args)
> {
> printf("The is new thread! Sleep 20 seconds...\n");
> sleep(100);
> printf("Exit from thread\n");
> pthread_exit(0);
> }
>
> int main(int argc, char **argv)
> {
> pthread_t thrd;
> pthread_attr_t attr;
> int res = 0;
> res = pthread_attr_init();
> res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
> res = pthread_create(, , thread_function, NULL);
> res = pthread_attr_destroy();
> printf("Main thread. Sleep 5 seconds\n");
> sleep(5);
> printf("Exit from main process\n");
> pthread_exit(0);
> }
>
> kkk@ ~/mycode/go () $ ./a.out &
> [1] 108418Main thread. Sleep 5 secondsThe is new thread! Sleep 20 seconds...
> kkk@ ~/mycode/go () $
> Exit from main processs
> PID TTY  TIME CMD
> 49313 pts/26   00:00:01 bash108418 pts/26   00:00:00 [a.out] 108449 
> pts/26   00:00:00 ps
>
>  See the main process is  and child is still hanging around
>
> kkk@ ~/mycode/go () $ sudo cat 
> /proc/108418/task/108420/stack[] 
> hrtimer_nanosleep+0xbd/0x1d0[] 
> SyS_nanosleep+0x7e/0x90[] 
> system_call_fastpath+0x16/0x1b[] 
> 0xujonnala@ ~/mycode/go () $ Exit from thread
>
>  Any help in this regard is appreciated.


I think you are misreading something somewhere.  Zombie status is a
feature of a process, not a thread.  It means that the child process
has exited but that the parent process, the one which started the
child process via the fork system call (or, on GNU/Linux, the clone
system call), has not called the wait (or waitpid or wait3 or wait4)
system call to collect its status.

So don't look at threads or P's.  Look at the parent process that
started the process that became a zombie.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcXCc05jyP6OzKt0vRJ7nUod%3DFT9JTAivU3ACfDHxGg%3Djw%40mail.gmail.com.

[go-nuts] zombie parent scenario with golang

2020-08-27 Thread Uday Kiran Jonnala


I have a situation on zombie parent scenario with golang

 A process (in the case replicator) has many goroutines internally

   1. We hit into panic() and I see the replicator process is in Zombie 
   state 

<<>>>:~$ ps -ef | grep replicator
root  87548  87507  0 Aug23 ?00:00:00 [replicator]  

 

   1. Main go routine (or the supporting P) excited, but panic left the 
   other P thread to be still in executing state (main P could be 87548 and 
   supporting P thread 87561 is still there) in blocked state 

bash-4.2# ls -Fl /proc/87548/task/87561/fd | grep 606649l-wx--. 1 root 
root 64 Aug 25 10:59 1 -> pipe:[606649]l-wx--. 1 root root 64 Aug 25 
10:59 2 -> pipe:[606649] 
   
   1. Stack trace 

bash-4.2# cat /proc/87548/task/87561/stack[] 
futex_wait_queue_me+0xc4/0x120[] 
futex_wait+0x10a/0x250[] 
do_futex+0x35e/0x5b0[] 
SyS_futex+0x13b/0x180[] 
do_syscall_64+0x79/0x1b0[] 
entry_SYSCALL_64_after_hwframe+0x3d/0xa2[] 
0x 

 

   1. We have panic internally from main go routine 

fatal error: concurrent map writes

goroutine 666359 [running]:
runtime.throw(0x101d6ae, 0x15)
/home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
 
+0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83, 
0xc0009d03c8)
/home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
 
+0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990, 
0x83, 0x0)

...

goroutine 665516 [chan receive, 2 minutes]:
zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 0x52, 
0xc00302ec60, 0x29)
/home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
created by zeus.(*Leadership).LeaderValue
/home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
2020-08-03 00:35:04 rolled over log file
ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 
dataset.go:26] initialize zfs linking
ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 
dataset.go:34] completed zfs linking successfully
I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid: 
c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
I0803 00:35:04.433460 196123 main.go:99] Component name using for this 
process : abc-c238e584-0eeb-48bd-b299-2a25b13602f1
I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB

 If there is panic() from main P thread, as I understand we exit() and 
cleanup all P threads of the process.

 Are we hitting into the following scenario, I did not look into M-P-G 
implantation in detail.

 Example:

#include 
#include 
#include 
#include 

void *thread_function(void *args)
{
printf("The is new thread! Sleep 20 seconds...\n");
sleep(100);
printf("Exit from thread\n");
pthread_exit(0);
}

int main(int argc, char **argv)
{
pthread_t thrd;
pthread_attr_t attr;
int res = 0;
res = pthread_attr_init();
res = pthread_attr_setdetachstate(, PTHREAD_CREATE_DETACHED);
res = pthread_create(, , thread_function, NULL);
res = pthread_attr_destroy();
printf("Main thread. Sleep 5 seconds\n");
sleep(5);
printf("Exit from main process\n");
pthread_exit(0);
}

kkk@ ~/mycode/go () $ ./a.out &
[1] 108418Main thread. Sleep 5 secondsThe is new thread! Sleep 20 
seconds... 
kkk@ ~/mycode/go () $ 
Exit from main processs   
PID TTY  TIME CMD 
49313 pts/26   00:00:01 bash108418 pts/26   00:00:00 [a.out] 
108449 pts/26   00:00:00 ps

 See the main process is  and child is still hanging around
kkk@ ~/mycode/go () $ sudo cat 
/proc/108418/task/108420/stack[] 
hrtimer_nanosleep+0xbd/0x1d0[] 
SyS_nanosleep+0x7e/0x90[] 
system_call_fastpath+0x16/0x1b[] 
0xujonnala@ ~/mycode/go () $ Exit from thread 

 Any help in this regard is appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/37388cc2-3854-4dfd-ab18-48fc33e46e6an%40googlegroups.com.