It is possible that you are hitting https://smartos.org/bugview/OS-5311, which I just fixed on Friday. To summarize the problem:
The bug is that the active contract template is only setup on the initial LWP in the process, so any time that LWP forks, the child will be in a different contract, as configured. However, if any other LWP initiates the fork, then that LWP has a NULL active contract template, so the child will be in the same contract. Due to the how we setup the application process for a Docker zone, if the application forks a child, it must be in a different contract. If it's not, then when the child exits, it will cause all of the other processes in the same contract to die and the zone will halt. Jerry On Mon, Apr 11, 2016 at 6:49 AM, Nigel Magnay <[email protected]> wrote: > Ok, so I have traced a little further > > Part of the maven build shells out to run git (via the plugin code at [2]; > the last lines of log show this before the mysterious death: > > [INFO] Executing: /bin/sh -c cd > '/home/jenkins/realtime/commons/csw-commons-utils' && 'git' 'rev-parse' > '--verify' 'HEAD' > [INFO] Working directory: /home/jenkins/realtime/commons/csw-commons-utils > [INFO] Storing buildNumber: a20f2793de8f449ff5c638479e73dab4a884936c at > timestamp: 1460376264112 > > The output is showing that the git process returned (the info output > showing the returned info). However, in the dtrace log, I can see this: > > 937/1: lx_emulate(7fffff08eca0, 231, [8d, 0, 8d, ffffffffffffff90, 3c, e7]) > 937/1: lx_exit_common(LX_ET_EXIT_GROUP, 141) > > Which I'm assuming is the git process returning an error from exit(), and > googling around 141 seems to indicate a pipe being closed (e.g: [1]) > > Sometimes the build does not die at that point - and the corresponding > part of the log did not show 141, but instead a 0 > > 1264/1: lx_exit_common(LX_ET_EXIT_GROUP, 0) > > If I remove the part of the build that calls git, it completes > successfully, so it certainly seems to be related to this. > > What I don't really understand is why the child process exiting is causing > the parent to mysteriously exit - but not immediately ? > > > [1] https://bugs.launchpad.net/mksh/+bug/1532621 > [2] > https://github.com/mojohaus/buildnumber-maven-plugin/blob/buildnumber-maven-plugin-1.3/src/main/java/org/codehaus/mojo/build/CreateMojo.java > > > > > > On Mon, Apr 11, 2016 at 10:37 AM, Nigel Magnay <[email protected]> > wrote: > >> So uname of the GZ is >> SunOS headnode 5.11 joyent_20160317T000105Z i86pc i386 i86pc >> >> The docker container is >> Linux ff4a058f3f5e 3.13.0 BrandZ virtual linux x86_64 GNU/Linux >> >> I've attached the Dockerfile, though it's not that exciting. I seem to >> get failures quite often when the docker api chooses the HN to provision a >> node rather than a CN, but I'm not totally sure about that. I don't think >> it's running out of memory - vmstat in the GZ shows >> >> I've attached an output from the tail of truss on the GZ for the java >> process (bear in mind I know nothing currently about dtrace, but I will go >> away and try to read up) - I assume though that I'll see segfaults as a >> natural product of the java memory management. >> >> The only thing that stands out to my eye in the trace is >> vforkx(0) = 65788 >> >> which seems an strange return value if I'm reading it correctly? >> >> >> On Sat, Apr 9, 2016 at 11:16 AM, Nigel Magnay <[email protected]> >> wrote: >> >>> I'll dig out the Dockerfile on monday - it's not particularly complex. >>> >>> Basically the (jenkins) build process asks triton to instantiate a >>> (java:8) docker image, the uses ssh to connect and invoke maven. I can >>> connect 'by hand' and also make it die - though it may be that it's only >>> doing this on some nodes. >>> >>> The most likely explanation is I've either misunderstood or >>> misconfigured something. There would seem to be enough free memory and swap >>> on the host whilst it's running, but might there be other limits >>> (processes?) that may have been breached causing it to be killed? >>> >>> >>> >>> On Fri, Apr 8, 2016 at 10:20 PM, Elijah Zupancic <[email protected]> >>> wrote: >>> >>>> Hi Nigel, >>>> >>>> Could you please give us some additional information. Could you tell us >>>> the platform image version. You can find it by doing a uname -a or if you >>>> are within a docker container a /native/bin/uname -a >>>> >>>> Also, with regards to the Docker build process are you using Docker >>>> build with Triton or are you using Docker build with Linux Docker and then >>>> running the built image on Triton? >>>> >>>> Also, if you Dockerfile isn't sensitive could you please share it with >>>> us? I can attempt to reproduce. Another way to narrow down problems is to >>>> try to run it in the Joyent public cloud and to see if it dies there. >>>> >>>> Thanks, >>>> Elijah >>>> >>>> On Fri, Apr 8, 2016 at 6:11 AM, Nigel Magnay <[email protected]> >>>> wrote: >>>> >>>>> Hi - >>>>> >>>>> I've successfully built a Triton cloud, and am using it to provision >>>>> docker containers to build our java software. >>>>> >>>>> The java container I use is derived from "java:8" docker image. >>>>> >>>>> Mostly it works. However I am getting mysterious failures where the >>>>> build process (maven) simply exits half way through with no error. I have >>>>> a >>>>> test system that is spun up that, mostly, consistently fails in the same >>>>> place. >>>>> >>>>> dmesg from inside the container yields nothing. Irritatinly, if I do >>>>> 'strace -ff -o me ...', the build continues past the point it was failing. >>>>> If I try to monitor the process from a second connection - sometimes the >>>>> build continues and *that* connection just dies. >>>>> >>>>> I was able to connect briefly with truss from the GZ, but I'm not sure >>>>> it tells me very much. >>>>> >>>>> >>>>> Could I have hit some process accounting limit? I don't think it's >>>>> memory (the process is set to a fairly low java Xmx limit). Where else can >>>>> I look to figure this out? >>>>> >>>>> >>>> >>>> >>>> -- >>>> -Elijah >>>> >>> >>> >> > *smartos-discuss* | Archives > <https://www.listbox.com/member/archive/184463/=now> > <https://www.listbox.com/member/archive/rss/184463/21516906-2011406d> | > Modify > <https://www.listbox.com/member/?&> > Your Subscription <http://www.listbox.com> > ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com
