Re: [smartos-discuss] Process death in LX (docker) zone

Nigel Magnay Mon, 11 Apr 2016 05:50:27 -0700

Ok, so I have traced a little further

Part of the maven build shells out to run git (via the plugin code at [2];
the last lines of log show this before the mysterious death:


[INFO] Executing: /bin/sh -c cd
'/home/jenkins/realtime/commons/csw-commons-utils' && 'git' 'rev-parse'
'--verify' 'HEAD'
[INFO] Working directory: /home/jenkins/realtime/commons/csw-commons-utils
[INFO] Storing buildNumber: a20f2793de8f449ff5c638479e73dab4a884936c at
timestamp: 1460376264112

The output is showing that the git process returned (the info output
showing the returned info). However, in the dtrace log, I can see this:

937/1: lx_emulate(7fffff08eca0, 231, [8d, 0, 8d, ffffffffffffff90, 3c, e7])
937/1: lx_exit_common(LX_ET_EXIT_GROUP, 141)

Which I'm assuming is the git process returning an error from exit(), and
googling around 141 seems to indicate a pipe being closed (e.g: [1])

Sometimes the build does not die at that point - and the corresponding part
of the log did not show 141, but instead a 0

1264/1: lx_exit_common(LX_ET_EXIT_GROUP, 0)

If I remove the part of the build that calls git, it completes
successfully, so it certainly seems to be related to this.

What I don't really understand is why the child process exiting is causing
the parent to mysteriously exit - but not immediately ?


[1] https://bugs.launchpad.net/mksh/+bug/1532621
[2]
https://github.com/mojohaus/buildnumber-maven-plugin/blob/buildnumber-maven-plugin-1.3/src/main/java/org/codehaus/mojo/build/CreateMojo.java





On Mon, Apr 11, 2016 at 10:37 AM, Nigel Magnay <[email protected]>
wrote:

> So uname of the GZ is
> SunOS headnode 5.11 joyent_20160317T000105Z i86pc i386 i86pc
>
> The docker container is
> Linux ff4a058f3f5e 3.13.0 BrandZ virtual linux x86_64 GNU/Linux
>
> I've attached the Dockerfile, though it's not that exciting. I seem to get
> failures quite often when the docker api chooses the HN to provision a node
> rather than a CN, but I'm not totally sure about that. I don't think it's
> running out of memory - vmstat in the GZ shows
>
> I've attached an output from the tail of truss on the GZ for the java
> process (bear in mind I know nothing currently about dtrace, but I will go
> away and try to read up) -  I assume though that I'll see segfaults as a
> natural product of the java memory management.
>
> The only thing that stands out to my eye in the trace is
> vforkx(0)                                       = 65788
>
> which seems an strange return value if I'm reading it correctly?
>
>
> On Sat, Apr 9, 2016 at 11:16 AM, Nigel Magnay <[email protected]>
> wrote:
>
>> I'll dig out the Dockerfile on monday - it's not particularly complex.
>>
>> Basically the (jenkins) build process asks triton to instantiate a
>> (java:8) docker image, the uses ssh to connect and invoke maven. I can
>> connect 'by hand' and also make it die - though it may be that it's only
>> doing this on some nodes.
>>
>> The most likely explanation is I've either misunderstood or misconfigured
>> something. There would seem to be enough free memory and swap on the host
>> whilst it's running, but might there be other limits (processes?) that may
>> have been breached causing it to be killed?
>>
>>
>>
>> On Fri, Apr 8, 2016 at 10:20 PM, Elijah Zupancic <[email protected]>
>> wrote:
>>
>>> Hi Nigel,
>>>
>>> Could you please give us some additional information. Could you tell us
>>> the platform image version. You can find it by doing a uname -a or if you
>>> are within a docker container a /native/bin/uname -a
>>>
>>> Also, with regards to the Docker build process are you using Docker
>>> build with Triton or are you using Docker build with Linux Docker and then
>>> running the built image on Triton?
>>>
>>> Also, if you Dockerfile isn't sensitive could you please share it with
>>> us? I can attempt to reproduce. Another way to narrow down problems is to
>>> try to run it in the Joyent public cloud and to see if it dies there.
>>>
>>> Thanks,
>>> Elijah
>>>
>>> On Fri, Apr 8, 2016 at 6:11 AM, Nigel Magnay <[email protected]>
>>> wrote:
>>>
>>>> Hi -
>>>>
>>>> I've successfully built a Triton cloud, and am using it to provision
>>>> docker containers to build our java software.
>>>>
>>>> The java container I use is derived from "java:8" docker image.
>>>>
>>>> Mostly it works. However I am getting mysterious failures where the
>>>> build process (maven) simply exits half way through with no error. I have a
>>>> test system that is spun up that, mostly, consistently fails in the same
>>>> place.
>>>>
>>>> dmesg from inside the container yields nothing. Irritatinly, if I do
>>>> 'strace -ff -o me ...', the build continues past the point it was failing.
>>>> If I try to monitor the process from a second connection - sometimes the
>>>> build continues and *that* connection just dies.
>>>>
>>>> I was able to connect briefly with truss from the GZ, but I'm not sure
>>>> it tells me very much.
>>>>
>>>>
>>>> Could I have hit some process accounting limit? I don't think it's
>>>> memory (the process is set to a fairly low java Xmx limit). Where else can
>>>> I look to figure this out?
>>>>
>>>>
>>>
>>>
>>> --
>>> -Elijah
>>> *smartos-discuss* | Archives
>>> <https://www.listbox.com/member/archive/184463/=now>
>>> <https://www.listbox.com/member/archive/rss/184463/22541698-24d6dc34> |
>>> Modify
>>> <https://www.listbox.com/member/?&;>
>>> Your Subscription <http://www.listbox.com>
>>>
>>
>>
>



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Re: [smartos-discuss] Process death in LX (docker) zone

Reply via email to