Ok - I can successfully recreate this on the public cloud - I'll send you
some details separately.

On Mon, Apr 11, 2016 at 9:08 PM, Nigel Magnay <[email protected]>
wrote:

> Sure - I'm going to try and get a test container up on the Joyent cloud to
> at least see if it does the same thing there.
>
> Interestingly, it looks like it's timing related. If I attach a java
> debugger to a debug socket when running the build, it doesn't seem to fail.
>
>
> On Mon, Apr 11, 2016 at 8:28 PM, Elijah Zupancic <[email protected]>
> wrote:
>
>> Hi Nigel,
>>
>> I'm trying to verify this bug. I wasn't able to get your Dockerfile to
>> build because it was missing associated files. Is there a way for you to
>> build the Docker image and push it to Docker hub? If so, could you then try
>> to distill the repro steps down to a single command that I can execute
>> using docker exec?
>>
>> Thanks,
>> Elijah
>>
>> On Mon, Apr 11, 2016 at 7:54 AM, Nigel Magnay <[email protected]>
>> wrote:
>>
>>> Hm - sounds related. I updated the GZ to 20160411T120144Z, but the
>>> docker container still exhibits the behaviour (unless there are other
>>> updates I need to apply?)
>>>
>>> On Mon, Apr 11, 2016 at 2:44 PM, Jerry Jelinek <[email protected]
>>> > wrote:
>>>
>>>> It is possible that you are hitting https://smartos.org/bugview/OS-5311,
>>>> which I just fixed on Friday. To summarize the problem:
>>>>
>>>> The bug is that the active contract template is only setup on the
>>>> initial LWP in the process, so any time that LWP forks, the child will be
>>>> in a different contract, as configured. However, if any other LWP initiates
>>>> the fork, then that LWP has a NULL active contract template, so the child
>>>> will be in the same contract.
>>>>
>>>> Due to the how we setup the application process for a Docker zone, if
>>>> the application forks a child, it must be in a different contract. If it's
>>>> not, then when the child exits, it will cause all of the other processes in
>>>> the same contract to die and the zone will halt.
>>>>
>>>> Jerry
>>>>
>>>>
>>>> On Mon, Apr 11, 2016 at 6:49 AM, Nigel Magnay <[email protected]>
>>>> wrote:
>>>>
>>>>> Ok, so I have traced a little further
>>>>>
>>>>> Part of the maven build shells out to run git (via the plugin code at
>>>>> [2]; the last lines of log show this before the mysterious death:
>>>>>
>>>>> [INFO] Executing: /bin/sh -c cd
>>>>> '/home/jenkins/realtime/commons/csw-commons-utils' && 'git' 'rev-parse'
>>>>> '--verify' 'HEAD'
>>>>> [INFO] Working directory:
>>>>> /home/jenkins/realtime/commons/csw-commons-utils
>>>>> [INFO] Storing buildNumber: a20f2793de8f449ff5c638479e73dab4a884936c
>>>>> at timestamp: 1460376264112
>>>>>
>>>>> The output is showing that the git process returned (the info output
>>>>> showing the returned info). However, in the dtrace log, I can see this:
>>>>>
>>>>> 937/1: lx_emulate(7fffff08eca0, 231, [8d, 0, 8d, ffffffffffffff90, 3c,
>>>>> e7])
>>>>> 937/1: lx_exit_common(LX_ET_EXIT_GROUP, 141)
>>>>>
>>>>> Which I'm assuming is the git process returning an error from exit(),
>>>>> and googling around 141 seems to indicate a pipe being closed (e.g: [1])
>>>>>
>>>>> Sometimes the build does not die at that point - and the corresponding
>>>>> part of the log did not show 141, but instead a 0
>>>>>
>>>>> 1264/1: lx_exit_common(LX_ET_EXIT_GROUP, 0)
>>>>>
>>>>> If I remove the part of the build that calls git, it completes
>>>>> successfully, so it certainly seems to be related to this.
>>>>>
>>>>> What I don't really understand is why the child process exiting is
>>>>> causing the parent to mysteriously exit - but not immediately ?
>>>>>
>>>>>
>>>>> [1] https://bugs.launchpad.net/mksh/+bug/1532621
>>>>> [2]
>>>>> https://github.com/mojohaus/buildnumber-maven-plugin/blob/buildnumber-maven-plugin-1.3/src/main/java/org/codehaus/mojo/build/CreateMojo.java
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Apr 11, 2016 at 10:37 AM, Nigel Magnay <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> So uname of the GZ is
>>>>>> SunOS headnode 5.11 joyent_20160317T000105Z i86pc i386 i86pc
>>>>>>
>>>>>> The docker container is
>>>>>> Linux ff4a058f3f5e 3.13.0 BrandZ virtual linux x86_64 GNU/Linux
>>>>>>
>>>>>> I've attached the Dockerfile, though it's not that exciting. I seem
>>>>>> to get failures quite often when the docker api chooses the HN to 
>>>>>> provision
>>>>>> a node rather than a CN, but I'm not totally sure about that. I don't 
>>>>>> think
>>>>>> it's running out of memory - vmstat in the GZ shows
>>>>>>
>>>>>> I've attached an output from the tail of truss on the GZ for the java
>>>>>> process (bear in mind I know nothing currently about dtrace, but I will 
>>>>>> go
>>>>>> away and try to read up) -  I assume though that I'll see segfaults as a
>>>>>> natural product of the java memory management.
>>>>>>
>>>>>> The only thing that stands out to my eye in the trace is
>>>>>> vforkx(0)                                       = 65788
>>>>>>
>>>>>> which seems an strange return value if I'm reading it correctly?
>>>>>>
>>>>>>
>>>>>> On Sat, Apr 9, 2016 at 11:16 AM, Nigel Magnay <[email protected]
>>>>>> > wrote:
>>>>>>
>>>>>>> I'll dig out the Dockerfile on monday - it's not particularly
>>>>>>> complex.
>>>>>>>
>>>>>>> Basically the (jenkins) build process asks triton to instantiate a
>>>>>>> (java:8) docker image, the uses ssh to connect and invoke maven. I can
>>>>>>> connect 'by hand' and also make it die - though it may be that it's only
>>>>>>> doing this on some nodes.
>>>>>>>
>>>>>>> The most likely explanation is I've either misunderstood or
>>>>>>> misconfigured something. There would seem to be enough free memory and 
>>>>>>> swap
>>>>>>> on the host whilst it's running, but might there be other limits
>>>>>>> (processes?) that may have been breached causing it to be killed?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 8, 2016 at 10:20 PM, Elijah Zupancic <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Nigel,
>>>>>>>>
>>>>>>>> Could you please give us some additional information. Could you
>>>>>>>> tell us the platform image version. You can find it by doing a uname 
>>>>>>>> -a or
>>>>>>>> if you are within a docker container a /native/bin/uname -a
>>>>>>>>
>>>>>>>> Also, with regards to the Docker build process are you using Docker
>>>>>>>> build with Triton or are you using Docker build with Linux Docker and 
>>>>>>>> then
>>>>>>>> running the built image on Triton?
>>>>>>>>
>>>>>>>> Also, if you Dockerfile isn't sensitive could you please share it
>>>>>>>> with us? I can attempt to reproduce. Another way to narrow down 
>>>>>>>> problems is
>>>>>>>> to try to run it in the Joyent public cloud and to see if it dies 
>>>>>>>> there.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Elijah
>>>>>>>>
>>>>>>>> On Fri, Apr 8, 2016 at 6:11 AM, Nigel Magnay <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi -
>>>>>>>>>
>>>>>>>>> I've successfully built a Triton cloud, and am using it to
>>>>>>>>> provision docker containers to build our java software.
>>>>>>>>>
>>>>>>>>> The java container I use is derived from "java:8" docker image.
>>>>>>>>>
>>>>>>>>> Mostly it works. However I am getting mysterious failures where
>>>>>>>>> the build process (maven) simply exits half way through with no 
>>>>>>>>> error. I
>>>>>>>>> have a test system that is spun up that, mostly, consistently fails 
>>>>>>>>> in the
>>>>>>>>> same place.
>>>>>>>>>
>>>>>>>>> dmesg from inside the container yields nothing. Irritatinly, if I
>>>>>>>>> do 'strace -ff -o me ...', the build continues past the point it was
>>>>>>>>> failing. If I try to monitor the process from a second connection -
>>>>>>>>> sometimes the build continues and *that* connection just dies.
>>>>>>>>>
>>>>>>>>> I was able to connect briefly with truss from the GZ, but I'm not
>>>>>>>>> sure it tells me very much.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Could I have hit some process accounting limit? I don't think it's
>>>>>>>>> memory (the process is set to a fairly low java Xmx limit). Where 
>>>>>>>>> else can
>>>>>>>>> I look to figure this out?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -Elijah
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> *smartos-discuss* | Archives
>>>> <https://www.listbox.com/member/archive/184463/=now>
>>>> <https://www.listbox.com/member/archive/rss/184463/22541698-24d6dc34>
>>>> | Modify <https://www.listbox.com/member/?&;> Your Subscription
>>>> <http://www.listbox.com>
>>>>
>>>
>>>
>>
>>
>> --
>> -Elijah
>> *smartos-discuss* | Archives
>> <https://www.listbox.com/member/archive/184463/=now>
>> <https://www.listbox.com/member/archive/rss/184463/22541698-24d6dc34> |
>> Modify
>> <https://www.listbox.com/member/?&;>
>> Your Subscription <http://www.listbox.com>
>>
>
>



-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com

Reply via email to