Re: Part 2! Request for help / ideas to debug issue

2017-03-19 Thread Michael Hudson-Doyle
On 14 March 2017 at 23:56, John Lenton  wrote:

> As a followup, I added a mutex around pthread_create, and around the
> exec syscall, and the problem went away. This all in go's runtime; not
> a huge diff but they probably don't want the overhead (and that seems
> reasonable to me).
> Next I'm going to try to find a kernel person to take a look at this.
>

Having looked at the kernel a bit, I'm not sure that a mutex is that crazy
an idea :-) Does it impact performance in any noticeable way? clone is
fairly expensive I guess and Go programs only run ~GOMAXPROCS OS threads
anyway.  Would be interested in what any proper people you can round up
would have to say though.

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-14 Thread John Lenton
I've filed https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1672819
and https://github.com/golang/go/issues/19546

(the latter more of an FYI than a bug)

On 14 March 2017 at 10:56, John Lenton  wrote:
> As a followup, I added a mutex around pthread_create, and around the
> exec syscall, and the problem went away. This all in go's runtime; not
> a huge diff but they probably don't want the overhead (and that seems
> reasonable to me).
> Next I'm going to try to find a kernel person to take a look at this.
>
> On 13 March 2017 at 23:33, Michael Hudson-Doyle
>  wrote:
>> On 14 March 2017 at 12:21, John Lenton  wrote:
>>
>>> On 13 March 2017 at 21:05, Michael Hudson-Doyle
>>>  wrote:
>>> > If I add a
>>> > time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
>>> > is respected every time.
>>>
>>> on my way to bed, I'll give your response a proper read in the
>>> morning, but note that my reproducer causes the issue a lot more
>>> frequently than in "the real world" (snap run calling snap-confine
>>> calling snap-exec), where delays are happening naturally (because the
>>> programs aren't just calling exec, they actually have work to do :-p).
>>> I don't have numbers for how often it happens in snappy; it's a lot
>>> less frequent, but often enough to be annoying when working
>>> interactively (and there are bug reports with these warnings/errors in
>>> lp already).
>>
>>
>> Oh yes, the sleep was just to allow the other threads to settle in the test
>> case program. It's not a solution for the real world at all.
>>
>> Cheers,
>> mwh
>> --
>> Snapcraft mailing list
>> Snapcraft@lists.snapcraft.io
>> Modify settings or unsubscribe at: 
>> https://lists.ubuntu.com/mailman/listinfo/snapcraft

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Michael Hudson-Doyle
On 14 March 2017 at 12:21, John Lenton  wrote:

> On 13 March 2017 at 21:05, Michael Hudson-Doyle
>  wrote:
> > If I add a
> > time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
> > is respected every time.
>
> on my way to bed, I'll give your response a proper read in the
> morning, but note that my reproducer causes the issue a lot more
> frequently than in "the real world" (snap run calling snap-confine
> calling snap-exec), where delays are happening naturally (because the
> programs aren't just calling exec, they actually have work to do :-p).
> I don't have numbers for how often it happens in snappy; it's a lot
> less frequent, but often enough to be annoying when working
> interactively (and there are bug reports with these warnings/errors in
> lp already).


Oh yes, the sleep was just to allow the other threads to settle in the test
case program. It's not a solution for the real world at all.

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Michael Hudson-Doyle
On 14 March 2017 at 01:59, John Lenton  wrote:

> This one is slightly more interesting.
>
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
>
> This one is the real issue :-)
>

Ah _hah_!


> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
>
> TODO: check if something similar happens when using caps
>
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
>
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
>
> Anyway, code to repro this is at
> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>
> on my machine,
>
> $ for i in `seq -w `; do ./a_c; done | wc -l
> 0
> $ for i in `seq -w `; do ./a_go; done | wc -l
> 394
>
> And,
>
> $ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
> 0
>
> Gnarly!
>

That's pretty exciting.  I bet this is going to have the same underlying
cause as the other bug: something some other thread in the go process is
doing is causing the kernel to ignore the setuid bit. If I add a
time.Sleep(1*time.Millisecond) to a_go.go before the exec, the setuid bit
is respected every time. It doesn't help that setuid is ignored when
tracing or that strace likes to hang when you trace a_go.

I spent a while staring at the kernel source but I don't really have any
idea how this might be happening. It might be this code
https://github.com/torvalds/linux/blob/master/security/commoncap.c#L549-L561,
but I don't know how to be sure (well, without building kernels to do
debugging-via-kprint or whatever).

Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-13 Thread Gustavo Niemeyer
Thanks for your help on this, Michael.

Yes, the fix seems pretty simple and porting to older Go releases will
ensure people are not getting those super awkward messages randomly.

I have created a Trello card with a checklist so we can track all the
distributions that we need to fix, and included you, John, and Zygmunt.



On Sun, Mar 12, 2017 at 5:38 AM, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> PS: I guess I should back port that Go fix to all supported Go releases?
>
> On 12 March 2017 at 21:37, Michael Hudson-Doyle <
> michael.hud...@canonical.com> wrote:
>
> > Before we get into this, what is the actual problem here? Just the ugly
> > messages?
> >
> > On 11 March 2017 at 02:58, Alfonso Sanchez-Beato  > canonical.com> wrote:
> >
> >> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton <
> john.len...@canonical.com>
> >> wrote:
> >>
> >> > Hello!
> >> >
> >> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> >> > you're knowledgeable about one or more of those things, could you take
> >> > a look? Thank you.
> >> >
> >> > The issue manifests as nasty warnings from the "snap run" command,
> >> > which is also the first step into a snapped app or service. It looks
> >> > like
> >> >
> >> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >> >
> >> > a very stripped-down reproducer is http://pastebin.ubuntu.com/
> 24150663/
> >> >
> >> > build that, run it in a loop, and you'll see a bunch of those messages
> >> > (and some weirder ones, but let's take it one step at a time)
> >>
> >
> > Turns out this was fixed in Go 1.8: https://go-review.
> > googlesource.com/c/33894/
> >
> >
> >> > if you comment out the 'import "C"' line the message will change but
> >> > still happen, which makes me think that at least in part this is a Go
> >> > issue (or that we're holding it wrong).
> >>
> >
> > ... but only in the non-cgo case, you can (occasionally) still get
> > messages like:
> >
> > runtime: failed to create new OS thread (have 5 already; errno=11)
> > runtime: may need to increase max user processes (ulimit -u)
> > fatal error: newosproc
> >
> > if you comment out the import "C". I guess we should report that
> upstream.
> >
> >
> >> > Note that the exec does work; the warning seems to come from a
> >> > different thread than the one doing the Exec (the other clue that
> >> > points in this direction is that sometimes the message is truncated).
> >> > You can verify the fact that it does run by changing the /bin/true to
> >> > /bin/echo os.Args[1], but because this issue is obviously a race
> >> > somewhere, this change makes it less likely to happen (from ~10% down
> >> > to ~.5% of runs, in my machines).
> >> >
> >> > One thing that makes this harder to debug is that strace'ing the
> >> > process hangs (hard, kill -9 of strace to get out) before reproducing
> >> > the issue. This probably means we need to trace it at a lower level,
> >> > and I don't know enough about tracing a process group from inside the
> >> > kernel to be able to do that; what I can find about kernel-level
> >> > tracing is around syscalls or devices.
> >> >
> >> > Ideas?
> >> >
> >>
> >> I found this related thread:
> >>
> >> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
> >> 4/lhROTfN9TxIJ
> >>
> >> <<
> >> I believe this can happen on GNU/Linux if your program uses cgo and if
> >> thread A is in the Go runtime starting up a new thread B while thread
> >> C is execing a program.  The underlying cause is that while one thread
> >> is calling exec the Linux kernel will fail attempts by other threads
> >> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> >> field in the kernel sources.)
> >> >>
> >>
> >
> > Yeah, this seems to be very accurate. It's also why it seems this is a
> > cosmetic problem only, some thread not calling exec fails, but well, the
> > thread is about to die anyway.
> >
> >
> >> Something like adding a little sleep removes the traces, for instance:
> >>
> >> http://paste.ubuntu.com/24151637/
> >>
> >> where the program run sleep for 1ms before calling Exec. For smaller
> units
> >> (say, 20 us) the issue still happens.
> >>
> >> It looks to me that right before running main(), go creates some
> threads,
> >> calling clone() and probably getting the race described in the thread.
> As
> >> anyway you are running Exec I guess the traces are harmless, you do not
> >> need the go threads. Nonetheless, I think that the go run time should
> >> retry
> >> instead of printing that trace.
> >
> >
> > Cheers,
> > mwh
> >
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Part 2! Request for help / ideas to debug issue

2017-03-13 Thread Gustavo Niemeyer
Thanks for finding and debugging this John.

I can't see how this one would be in Go. I can't think of any related
action that would disable the setuid bit on the exec syscall.

Perhaps someone from the kernel team can help here?



On Mon, Mar 13, 2017 at 9:59 AM, John Lenton 
wrote:

> This one is slightly more interesting.
>
> You need 1.8 (or patched <1.8 as per the previous thread) for this one
> to make sense; without it you're just going to get drowned in warning
> messages and not see the real issue.
>
> This one is the real issue :-)
>
> In go, when calling syscall.Exec to a setuid root binary, sometimes
> (about 4% of the times, on my machine, but it's hardware- and
> load-dependent), the exec'ed process will find itself running with
> effective uid different to zero. That is, a setuid root binary will
> find itself running as non-root. As the process that sets up
> confinement is setuid root (in distros where setuid is favoured over
> capabilities), this means the snap app falls on its face.
>
> TODO: check if something similar happens when using caps
>
> This is *probably* a bug in Go, but it only seems to arise when using
> syscall.Exec, which as far as I can tell is unsupported (the whole
> syscall package is unsupported -- not covered by the go1 compatibility
> promise -- and its replacement, golang.org/x/sys/unix, is ominously
> missing Exec).
>
> Having said that, it might be a bug in the kernel ;-)
> And I say this because if you pin the process to a single cpu, the
> issue doesn't arise.
>
> Anyway, code to repro this is at
> https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813
>
> on my machine,
>
> $ for i in `seq -w `; do ./a_c; done | wc -l
> 0
> $ for i in `seq -w `; do ./a_go; done | wc -l
> 394
>
> And,
>
> $ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
> 0
>
> Gnarly!
>
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
> an/listinfo/snapcraft
>



-- 
gustavo @ http://niemeyer.net
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Part 2! Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
This one is slightly more interesting.

You need 1.8 (or patched <1.8 as per the previous thread) for this one
to make sense; without it you're just going to get drowned in warning
messages and not see the real issue.

This one is the real issue :-)

In go, when calling syscall.Exec to a setuid root binary, sometimes
(about 4% of the times, on my machine, but it's hardware- and
load-dependent), the exec'ed process will find itself running with
effective uid different to zero. That is, a setuid root binary will
find itself running as non-root. As the process that sets up
confinement is setuid root (in distros where setuid is favoured over
capabilities), this means the snap app falls on its face.

TODO: check if something similar happens when using caps

This is *probably* a bug in Go, but it only seems to arise when using
syscall.Exec, which as far as I can tell is unsupported (the whole
syscall package is unsupported -- not covered by the go1 compatibility
promise -- and its replacement, golang.org/x/sys/unix, is ominously
missing Exec).

Having said that, it might be a bug in the kernel ;-)
And I say this because if you pin the process to a single cpu, the
issue doesn't arise.

Anyway, code to repro this is at
https://gist.github.com/chipaca/806c90d96c437444f27f45a83d00a813

on my machine,

$ for i in `seq -w `; do ./a_c; done | wc -l
0
$ for i in `seq -w `; do ./a_go; done | wc -l
394

And,

$ for i in `seq -w `; do taskset 2 ./a_go; done | wc -l
0

Gnarly!

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-13 Thread John Lenton
Thank you Alfonso for that link (I didn't find that when I looked, so
kudos), and thank you Michael and Zyga for all your work on this!

On 12 March 2017 at 08:38, Michael Hudson-Doyle
 wrote:
> PS: I guess I should back port that Go fix to all supported Go releases?

that would be very nice. Does that include trusty? And how far can it
reach in Debian?

Zyga, what are our chances of patching Go on other distros?

Michael, WRT the non-cgo case, can you report it upstream? Would you
rather I did?

I'm off to try to find a reproducer for the "oops i lost the suid bit" bug :-D

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-12 Thread Michael Hudson-Doyle
PS: I guess I should back port that Go fix to all supported Go releases?

On 12 March 2017 at 21:37, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> Before we get into this, what is the actual problem here? Just the ugly
> messages?
>
> On 11 March 2017 at 02:58, Alfonso Sanchez-Beato  canonical.com> wrote:
>
>> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
>> wrote:
>>
>> > Hello!
>> >
>> > We're seeing a weird issue with either go, pthreads, or the kernel. If
>> > you're knowledgeable about one or more of those things, could you take
>> > a look? Thank you.
>> >
>> > The issue manifests as nasty warnings from the "snap run" command,
>> > which is also the first step into a snapped app or service. It looks
>> > like
>> >
>> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
>> >
>> > a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>> >
>> > build that, run it in a loop, and you'll see a bunch of those messages
>> > (and some weirder ones, but let's take it one step at a time)
>>
>
> Turns out this was fixed in Go 1.8: https://go-review.
> googlesource.com/c/33894/
>
>
>> > if you comment out the 'import "C"' line the message will change but
>> > still happen, which makes me think that at least in part this is a Go
>> > issue (or that we're holding it wrong).
>>
>
> ... but only in the non-cgo case, you can (occasionally) still get
> messages like:
>
> runtime: failed to create new OS thread (have 5 already; errno=11)
> runtime: may need to increase max user processes (ulimit -u)
> fatal error: newosproc
>
> if you comment out the import "C". I guess we should report that upstream.
>
>
>> > Note that the exec does work; the warning seems to come from a
>> > different thread than the one doing the Exec (the other clue that
>> > points in this direction is that sometimes the message is truncated).
>> > You can verify the fact that it does run by changing the /bin/true to
>> > /bin/echo os.Args[1], but because this issue is obviously a race
>> > somewhere, this change makes it less likely to happen (from ~10% down
>> > to ~.5% of runs, in my machines).
>> >
>> > One thing that makes this harder to debug is that strace'ing the
>> > process hangs (hard, kill -9 of strace to get out) before reproducing
>> > the issue. This probably means we need to trace it at a lower level,
>> > and I don't know enough about tracing a process group from inside the
>> > kernel to be able to do that; what I can find about kernel-level
>> > tracing is around syscalls or devices.
>> >
>> > Ideas?
>> >
>>
>> I found this related thread:
>>
>> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_
>> 4/lhROTfN9TxIJ
>>
>> <<
>> I believe this can happen on GNU/Linux if your program uses cgo and if
>> thread A is in the Go runtime starting up a new thread B while thread
>> C is execing a program.  The underlying cause is that while one thread
>> is calling exec the Linux kernel will fail attempts by other threads
>> to call clone by returning EAGAIN.  (Look for uses of the in_exec
>> field in the kernel sources.)
>> >>
>>
>
> Yeah, this seems to be very accurate. It's also why it seems this is a
> cosmetic problem only, some thread not calling exec fails, but well, the
> thread is about to die anyway.
>
>
>> Something like adding a little sleep removes the traces, for instance:
>>
>> http://paste.ubuntu.com/24151637/
>>
>> where the program run sleep for 1ms before calling Exec. For smaller units
>> (say, 20 us) the issue still happens.
>>
>> It looks to me that right before running main(), go creates some threads,
>> calling clone() and probably getting the race described in the thread. As
>> anyway you are running Exec I guess the traces are harmless, you do not
>> need the go threads. Nonetheless, I think that the go run time should
>> retry
>> instead of printing that trace.
>
>
> Cheers,
> mwh
>
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-12 Thread Michael Hudson-Doyle
Before we get into this, what is the actual problem here? Just the ugly
messages?

On 11 March 2017 at 02:58, Alfonso Sanchez-Beato <
alfonso.sanchez-be...@canonical.com> wrote:

> On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
> wrote:
>
> > Hello!
> >
> > We're seeing a weird issue with either go, pthreads, or the kernel. If
> > you're knowledgeable about one or more of those things, could you take
> > a look? Thank you.
> >
> > The issue manifests as nasty warnings from the "snap run" command,
> > which is also the first step into a snapped app or service. It looks
> > like
> >
> > runtime/cgo: pthread_create failed: Resource temporarily unavailable
> >
> > a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
> >
> > build that, run it in a loop, and you'll see a bunch of those messages
> > (and some weirder ones, but let's take it one step at a time)
>

Turns out this was fixed in Go 1.8:
https://go-review.googlesource.com/c/33894/


> > if you comment out the 'import "C"' line the message will change but
> > still happen, which makes me think that at least in part this is a Go
> > issue (or that we're holding it wrong).
>

... but only in the non-cgo case, you can (occasionally) still get messages
like:

runtime: failed to create new OS thread (have 5 already; errno=11)
runtime: may need to increase max user processes (ulimit -u)
fatal error: newosproc

if you comment out the import "C". I guess we should report that upstream.


> > Note that the exec does work; the warning seems to come from a
> > different thread than the one doing the Exec (the other clue that
> > points in this direction is that sometimes the message is truncated).
> > You can verify the fact that it does run by changing the /bin/true to
> > /bin/echo os.Args[1], but because this issue is obviously a race
> > somewhere, this change makes it less likely to happen (from ~10% down
> > to ~.5% of runs, in my machines).
> >
> > One thing that makes this harder to debug is that strace'ing the
> > process hangs (hard, kill -9 of strace to get out) before reproducing
> > the issue. This probably means we need to trace it at a lower level,
> > and I don't know enough about tracing a process group from inside the
> > kernel to be able to do that; what I can find about kernel-level
> > tracing is around syscalls or devices.
> >
> > Ideas?
> >
>
> I found this related thread:
>
> https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_4/lhROTfN9TxIJ
>
> <<
> I believe this can happen on GNU/Linux if your program uses cgo and if
> thread A is in the Go runtime starting up a new thread B while thread
> C is execing a program.  The underlying cause is that while one thread
> is calling exec the Linux kernel will fail attempts by other threads
> to call clone by returning EAGAIN.  (Look for uses of the in_exec
> field in the kernel sources.)
> >>
>

Yeah, this seems to be very accurate. It's also why it seems this is a
cosmetic problem only, some thread not calling exec fails, but well, the
thread is about to die anyway.


> Something like adding a little sleep removes the traces, for instance:
>
> http://paste.ubuntu.com/24151637/
>
> where the program run sleep for 1ms before calling Exec. For smaller units
> (say, 20 us) the issue still happens.
>
> It looks to me that right before running main(), go creates some threads,
> calling clone() and probably getting the race described in the thread. As
> anyway you are running Exec I guess the traces are harmless, you do not
> need the go threads. Nonetheless, I think that the go run time should retry
> instead of printing that trace.


Cheers,
mwh
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-10 Thread Alfonso Sanchez-Beato
On Fri, Mar 10, 2017 at 10:22 AM, John Lenton 
wrote:

> Hello!
>
> We're seeing a weird issue with either go, pthreads, or the kernel. If
> you're knowledgeable about one or more of those things, could you take
> a look? Thank you.
>
> The issue manifests as nasty warnings from the "snap run" command,
> which is also the first step into a snapped app or service. It looks
> like
>
> runtime/cgo: pthread_create failed: Resource temporarily unavailable
>
> a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>
> build that, run it in a loop, and you'll see a bunch of those messages
> (and some weirder ones, but let's take it one step at a time).
>
> if you comment out the 'import "C"' line the message will change but
> still happen, which makes me think that at least in part this is a Go
> issue (or that we're holding it wrong).
>
> Note that the exec does work; the warning seems to come from a
> different thread than the one doing the Exec (the other clue that
> points in this direction is that sometimes the message is truncated).
> You can verify the fact that it does run by changing the /bin/true to
> /bin/echo os.Args[1], but because this issue is obviously a race
> somewhere, this change makes it less likely to happen (from ~10% down
> to ~.5% of runs, in my machines).
>
> One thing that makes this harder to debug is that strace'ing the
> process hangs (hard, kill -9 of strace to get out) before reproducing
> the issue. This probably means we need to trace it at a lower level,
> and I don't know enough about tracing a process group from inside the
> kernel to be able to do that; what I can find about kernel-level
> tracing is around syscalls or devices.
>
> Ideas?
>

I found this related thread:

https://groups.google.com/forum/#!msg/golang-nuts/8gszDBRZh_4/lhROTfN9TxIJ

<<
I believe this can happen on GNU/Linux if your program uses cgo and if
thread A is in the Go runtime starting up a new thread B while thread
C is execing a program.  The underlying cause is that while one thread
is calling exec the Linux kernel will fail attempts by other threads
to call clone by returning EAGAIN.  (Look for uses of the in_exec
field in the kernel sources.)
>>

Something like adding a little sleep removes the traces, for instance:

http://paste.ubuntu.com/24151637/

where the program run sleep for 1ms before calling Exec. For smaller units
(say, 20 us) the issue still happens.

It looks to me that right before running main(), go creates some threads,
calling clone() and probably getting the race described in the thread. As
anyway you are running Exec I guess the traces are harmless, you do not
need the go threads. Nonetheless, I think that the go run time should retry
instead of printing that trace.


>
> --
> Snapcraft mailing list
> Snapcraft@lists.snapcraft.io
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/snapcraft
>
-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Re: Request for help / ideas to debug issue

2017-03-10 Thread Sergio Schvezov
On Fri, 10 Mar 2017 09:22:45 +, John Lenton wrote:
> Hello!
>
> We're seeing a weird issue with either go, pthreads, or the kernel. If
> you're knowledgeable about one or more of those things, could you take
> a look? Thank you.
>
> The issue manifests as nasty warnings from the "snap run" command,
> which is also the first step into a snapped app or service. It looks
> like
>
> runtime/cgo: pthread_create failed: Resource temporarily unavailable
>
> a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/
>
> build that, run it in a loop, and you'll see a bunch of those messages
> (and some weirder ones, but let's take it one step at a time).
>
> if you comment out the 'import "C"' line the message will change but
> still happen, which makes me think that at least in part this is a Go
> issue (or that we're holding it wrong).
>
> Note that the exec does work; the warning seems to come from a
> different thread than the one doing the Exec (the other clue that
> points in this direction is that sometimes the message is truncated).
> You can verify the fact that it does run by changing the /bin/true to
> /bin/echo os.Args[1], but because this issue is obviously a race
> somewhere, this change makes it less likely to happen (from ~10% down
> to ~.5% of runs, in my machines).
>
> One thing that makes this harder to debug is that strace'ing the
> process hangs (hard, kill -9 of strace to get out) before reproducing
> the issue. This probably means we need to trace it at a lower level,
> and I don't know enough about tracing a process group from inside the
> kernel to be able to do that; what I can find about kernel-level
> tracing is around syscalls or devices.
>
> Ideas?

Aside from being able to reproduce what you see I have done no analysis against 
this but may I sugges using bcc[1] instead of strace? There's a snap for it too 
;-)

[1] https://github.com/iovisor/bcc

-- 
Sent using Dekko from my Ubuntu device

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft


Request for help / ideas to debug issue

2017-03-10 Thread John Lenton
Hello!

We're seeing a weird issue with either go, pthreads, or the kernel. If
you're knowledgeable about one or more of those things, could you take
a look? Thank you.

The issue manifests as nasty warnings from the "snap run" command,
which is also the first step into a snapped app or service. It looks
like

runtime/cgo: pthread_create failed: Resource temporarily unavailable

a very stripped-down reproducer is http://pastebin.ubuntu.com/24150663/

build that, run it in a loop, and you'll see a bunch of those messages
(and some weirder ones, but let's take it one step at a time).

if you comment out the 'import "C"' line the message will change but
still happen, which makes me think that at least in part this is a Go
issue (or that we're holding it wrong).

Note that the exec does work; the warning seems to come from a
different thread than the one doing the Exec (the other clue that
points in this direction is that sometimes the message is truncated).
You can verify the fact that it does run by changing the /bin/true to
/bin/echo os.Args[1], but because this issue is obviously a race
somewhere, this change makes it less likely to happen (from ~10% down
to ~.5% of runs, in my machines).

One thing that makes this harder to debug is that strace'ing the
process hangs (hard, kill -9 of strace to get out) before reproducing
the issue. This probably means we need to trace it at a lower level,
and I don't know enough about tracing a process group from inside the
kernel to be able to do that; what I can find about kernel-level
tracing is around syscalls or devices.

Ideas?

-- 
Snapcraft mailing list
Snapcraft@lists.snapcraft.io
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/snapcraft