Re: Request for help / ideas to debug issue

2017-03-13 Thread Gustavo Niemeyer
Thanks for your help on this, Michael.

Yes, the fix seems pretty simple and porting to older Go releases will
ensure people are not getting those super awkward messages randomly.

I have created a Trello card with a checklist so we can track all the
distributions that we need to fix, and included you, John, and Zygmunt.



On Sun, Mar 12, 2017 at 5:38 AM, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> PS: I guess I should back port that Go fix to all supported Go releases?
>
> On 12 March 2017 at 21:37, Michael Hudson-Doyle <
> michael.hud...@canonical.com> wrote:
>
> > Before we get into this, what is the actual problem here? Just the ugly
> > messages?
> >
> > On 11 March 2017 at 02:58, Alfonso Sanchez-Beato  > canonical.com> wrote:
> >
> >> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton <
> john.len...@canonical.com>
> >> wrote:
> >>
> >> > Hello!
> >> >
> >> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> >> > you're knowledgeable about one or more of those things, could you take
> >> > a look? Thank you.
> >> >
> >> > The issue manifests as nasty warnings from the "snap run" command,
> >> > which is also the first step into a snapped app or service. It looks
> >> > like
> >> >
> >> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >> >
> >> > a very stripped-down reproducer is http://pastebin.ubuntu.com/
> 24150663/
> >> >
> >> > build that, run it in a loop, and you'll see a bunch of those messages
> >> > (and some weirder ones, but let's take it one step at a time)
> >>
> >
> > Turns out this was fixed in Go 1.8: https://go-review.
> > googlesource.com/c/33894/
> >
> >
> >> > if you comment out the 'import "C"' line the message will change but
> >> > still happen, which makes me think that at least in part this is a Go
> >> > issue (or that we're holding it wrong).
> >>
> >
> > ... but only in the non-cgo case, you can (occasionally) still get
> > messages like:
> >
> > runtime: failed to create new OS thread (have 5 already; errno=11)
> > runtime: may need to increase max user processes (ulimit -u)
> > fatal error: newosproc
> >
> > if you comment out the import "C". I guess we should report that
> upstream.
> >
> >
> >> > Note that the exec does work; the warning seems to come from a
> >> > different thread than the one doing the Exec (the other clue that
> >> > points in this direction is that sometimes the message is truncated).
> >> > You can verify the fact that it does run by changing the /bin/true to
> >> > /bin/echo os.Args[1], but because this issue is obviously a race
> >> > somewhere, this change makes it less likely to happen (from ~10% down
> >> > to ~.5% of runs, in my machines).
> >> >
> >> > One thing that makes this harder to debug is that strace'ing the
> >> > process hangs (hard, kill -9 of strace to get out) before reproducing
> >> > the issue. This probably means we need to trace it at a lower level,
> >> > and I don't know enough about tracing a process group from inside the
> >> > kernel to be able to do that; what I can find about kernel-level
> >> > tracing is around syscalls or devices.
> >> >
> >> > Ideas?
> >> >
> >>
> >> I found this related thread:
> >>
> >> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
> >> 4/lhROTfN9TxIJ
> >>
> >> <<
> >> I believe this can happen on GNU/Linux if your program uses cgo and if
> >> thread A is in the Go runtime starting up a new thread B while thread
> >> C is execing a program.  The underlying cause is that while one thread
> >> is calling exec the Linux kernel will fail attempts by other threads
> >> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> >> field in the kernel sources.)
> >> >>
> >>
> >
> > Yeah, this seems to be very accurate. It's also why it seems this is a
> > cosmetic problem only, some thread not calling exec fails, but well, the
> > thread is about to die anyway.
> >
> >
> >> Something like adding a little sleep removes the traces, for instance:
> >>
> >> http://paste.ubuntu.com/24151637/
> >>
> >> where the program run sleep for 1ms before calling Exec. For smaller
> units
> >> (say, 20 us) the issue still happens.
> >>
> >> It looks to me that right before running main(), go creates some
> threads,
> >> calling clone() and probably getting the race described in the thread.
> As
> >> anyway you are running Exec I guess the traces are harmless, you do not
> >> need the go threads. Nonetheless, I think that the go run time should
> >> retry
> >> instead of printing that trace.
> >
> >
> > Cheers,
> > mwh
> >
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
Thank you Alfonso for that link (I didn't find that when I looked, so
kudos), and thank you Michael and Zyga for all your work on this!

On 12 March 2017 at 08:38, Michael Hudson-Doyle
 wrote:
> PS: I guess I should back port that Go fix to all supported Go releases?

that would be very nice. Does that include trusty? And how far can it
reach in Debian?

Zyga, what are our chances of patching Go on other distros?

Michael, WRT the non-cgo case, can you report it upstream? Would you
rather I did?

I'm off to try to find a reproducer for the "oops i lost the suid bit" bug :-D

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-12 Thread Michael Hudson-Doyle
PS: I guess I should back port that Go fix to all supported Go releases?

On 12 March 2017 at 21:37, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> Before we get into this, what is the actual problem here? Just the ugly
> messages?
>
> On 11 March 2017 at 02:58, Alfonso Sanchez-Beato  canonical.com> wrote:
>
>> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
>> wrote:
>>
>> > Hello!
>> >
>> > We're seeing a weird issue with either go, pthreads, or the kernel. If
>> > you're knowledgeable about one or more of those things, could you take
>> > a look? Thank you.
>> >
>> > The issue manifests as nasty warnings from the "snap run" command,
>> > which is also the first step into a snapped app or service. It looks
>> > like
>> >
>> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
>> >
>> > a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>> >
>> > build that, run it in a loop, and you'll see a bunch of those messages
>> > (and some weirder ones, but let's take it one step at a time)
>>
>
> Turns out this was fixed in Go 1.8: https://go-review.
> googlesource.com/c/33894/
>
>
>> > if you comment out the 'import "C"' line the message will change but
>> > still happen, which makes me think that at least in part this is a Go
>> > issue (or that we're holding it wrong).
>>
>
> ... but only in the non-cgo case, you can (occasionally) still get
> messages like:
>
> runtime: failed to create new OS thread (have 5 already; errno=11)
> runtime: may need to increase max user processes (ulimit -u)
> fatal error: newosproc
>
> if you comment out the import "C". I guess we should report that upstream.
>
>
>> > Note that the exec does work; the warning seems to come from a
>> > different thread than the one doing the Exec (the other clue that
>> > points in this direction is that sometimes the message is truncated).
>> > You can verify the fact that it does run by changing the /bin/true to
>> > /bin/echo os.Args[1], but because this issue is obviously a race
>> > somewhere, this change makes it less likely to happen (from ~10% down
>> > to ~.5% of runs, in my machines).
>> >
>> > One thing that makes this harder to debug is that strace'ing the
>> > process hangs (hard, kill -9 of strace to get out) before reproducing
>> > the issue. This probably means we need to trace it at a lower level,
>> > and I don't know enough about tracing a process group from inside the
>> > kernel to be able to do that; what I can find about kernel-level
>> > tracing is around syscalls or devices.
>> >
>> > Ideas?
>> >
>>
>> I found this related thread:
>>
>> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
>> 4/lhROTfN9TxIJ
>>
>> <<
>> I believe this can happen on GNU/Linux if your program uses cgo and if
>> thread A is in the Go runtime starting up a new thread B while thread
>> C is execing a program.  The underlying cause is that while one thread
>> is calling exec the Linux kernel will fail attempts by other threads
>> to call clone by returning EAGAIN.  (Look for uses of the in_exec
>> field in the kernel sources.)
>> >>
>>
>
> Yeah, this seems to be very accurate. It's also why it seems this is a
> cosmetic problem only, some thread not calling exec fails, but well, the
> thread is about to die anyway.
>
>
>> Something like adding a little sleep removes the traces, for instance:
>>
>> http://paste.ubuntu.com/24151637/
>>
>> where the program run sleep for 1ms before calling Exec. For smaller units
>> (say, 20 us) the issue still happens.
>>
>> It looks to me that right before running main(), go creates some threads,
>> calling clone() and probably getting the race described in the thread. As
>> anyway you are running Exec I guess the traces are harmless, you do not
>> need the go threads. Nonetheless, I think that the go run time should
>> retry
>> instead of printing that trace.
>
>
> Cheers,
> mwh
>
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-12 Thread Michael Hudson-Doyle
Before we get into this, what is the actual problem here? Just the ugly
messages?

On 11 March 2017 at 02:58, Alfonso Sanchez-Beato <
alfonso.sanchez-be...@canonical.com> wrote:

> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
> wrote:
>
> > Hello!
> >
> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> > you're knowledgeable about one or more of those things, could you take
> > a look? Thank you.
> >
> > The issue manifests as nasty warnings from the "snap run" command,
> > which is also the first step into a snapped app or service. It looks
> > like
> >
> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >
> > a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
> >
> > build that, run it in a loop, and you'll see a bunch of those messages
> > (and some weirder ones, but let's take it one step at a time)
>

Turns out this was fixed in Go 1.8:
https://go-review.googlesource.com/c/33894/


> > if you comment out the 'import "C"' line the message will change but
> > still happen, which makes me think that at least in part this is a Go
> > issue (or that we're holding it wrong).
>

... but only in the non-cgo case, you can (occasionally) still get messages
like:

runtime: failed to create new OS thread (have 5 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

if you comment out the import "C". I guess we should report that upstream.


> > Note that the exec does work; the warning seems to come from a
> > different thread than the one doing the Exec (the other clue that
> > points in this direction is that sometimes the message is truncated).
> > You can verify the fact that it does run by changing the /bin/true to
> > /bin/echo os.Args[1], but because this issue is obviously a race
> > somewhere, this change makes it less likely to happen (from ~10% down
> > to ~.5% of runs, in my machines).
> >
> > One thing that makes this harder to debug is that strace'ing the
> > process hangs (hard, kill -9 of strace to get out) before reproducing
> > the issue. This probably means we need to trace it at a lower level,
> > and I don't know enough about tracing a process group from inside the
> > kernel to be able to do that; what I can find about kernel-level
> > tracing is around syscalls or devices.
> >
> > Ideas?
> >
>
> I found this related thread:
>
> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_4/lhROTfN9TxIJ
>
> <<
> I believe this can happen on GNU/Linux if your program uses cgo and if
> thread A is in the Go runtime starting up a new thread B while thread
> C is execing a program.  The underlying cause is that while one thread
> is calling exec the Linux kernel will fail attempts by other threads
> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> field in the kernel sources.)
> >>
>

Yeah, this seems to be very accurate. It's also why it seems this is a
cosmetic problem only, some thread not calling exec fails, but well, the
thread is about to die anyway.


> Something like adding a little sleep removes the traces, for instance:
>
> http://paste.ubuntu.com/24151637/
>
> where the program run sleep for 1ms before calling Exec. For smaller units
> (say, 20 us) the issue still happens.
>
> It looks to me that right before running main(), go creates some threads,
> calling clone() and probably getting the race described in the thread. As
> anyway you are running Exec I guess the traces are harmless, you do not
> need the go threads. Nonetheless, I think that the go run time should retry
> instead of printing that trace.


Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-10 Thread Alfonso Sanchez-Beato
On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
wrote:

> Hello!
>
> We're seeing a weird issue with either go, pthreads, or the kernel. If
> you're knowledgeable about one or more of those things, could you take
> a look? Thank you.
>
> The issue manifests as nasty warnings from the "snap run" command,
> which is also the first step into a snapped app or service. It looks
> like
>
> runtime/cgo: pthread_create failed: Resource temporarily unavailable
>
> a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>
> build that, run it in a loop, and you'll see a bunch of those messages
> (and some weirder ones, but let's take it one step at a time).
>
> if you comment out the 'import "C"' line the message will change but
> still happen, which makes me think that at least in part this is a Go
> issue (or that we're holding it wrong).
>
> Note that the exec does work; the warning seems to come from a
> different thread than the one doing the Exec (the other clue that
> points in this direction is that sometimes the message is truncated).
> You can verify the fact that it does run by changing the /bin/true to
> /bin/echo os.Args[1], but because this issue is obviously a race
> somewhere, this change makes it less likely to happen (from ~10% down
> to ~.5% of runs, in my machines).
>
> One thing that makes this harder to debug is that strace'ing the
> process hangs (hard, kill -9 of strace to get out) before reproducing
> the issue. This probably means we need to trace it at a lower level,
> and I don't know enough about tracing a process group from inside the
> kernel to be able to do that; what I can find about kernel-level
> tracing is around syscalls or devices.
>
> Ideas?
>

I found this related thread:

https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_4/lhROTfN9TxIJ

<<
I believe this can happen on GNU/Linux if your program uses cgo and if
thread A is in the Go runtime starting up a new thread B while thread
C is execing a program.  The underlying cause is that while one thread
is calling exec the Linux kernel will fail attempts by other threads
to call clone by returning EAGAIN.  (Look for uses of the in_exec
field in the kernel sources.)
>>

Something like adding a little sleep removes the traces, for instance:

http://paste.ubuntu.com/24151637/

where the program run sleep for 1ms before calling Exec. For smaller units
(say, 20 us) the issue still happens.

It looks to me that right before running main(), go creates some threads,
calling clone() and probably getting the race described in the thread. As
anyway you are running Exec I guess the traces are harmless, you do not
need the go threads. Nonetheless, I think that the go run time should retry
instead of printing that trace.


>
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-10 Thread Sergio Schvezov
On Fri, 10 Mar 2017 09:22:45 +, John Lenton wrote:
> Hello!
>
> We're seeing a weird issue with either go, pthreads, or the kernel. If
> you're knowledgeable about one or more of those things, could you take
> a look? Thank you.
>
> The issue manifests as nasty warnings from the "snap run" command,
> which is also the first step into a snapped app or service. It looks
> like
>
> runtime/cgo: pthread_create failed: Resource temporarily unavailable
>
> a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>
> build that, run it in a loop, and you'll see a bunch of those messages
> (and some weirder ones, but let's take it one step at a time).
>
> if you comment out the 'import "C"' line the message will change but
> still happen, which makes me think that at least in part this is a Go
> issue (or that we're holding it wrong).
>
> Note that the exec does work; the warning seems to come from a
> different thread than the one doing the Exec (the other clue that
> points in this direction is that sometimes the message is truncated).
> You can verify the fact that it does run by changing the /bin/true to
> /bin/echo os.Args[1], but because this issue is obviously a race
> somewhere, this change makes it less likely to happen (from ~10% down
> to ~.5% of runs, in my machines).
>
> One thing that makes this harder to debug is that strace'ing the
> process hangs (hard, kill -9 of strace to get out) before reproducing
> the issue. This probably means we need to trace it at a lower level,
> and I don't know enough about tracing a process group from inside the
> kernel to be able to do that; what I can find about kernel-level
> tracing is around syscalls or devices.
>
> Ideas?

Aside from being able to reproduce what you see I have done no analysis against 
this but may I sugges using bcc[1] instead of strace? There's a snap for it too 
;-)

[1] https://github.com/iovisor/bcc

-- 
Sent using Dekko from my Ubuntu device

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft