On Mon, Jan 11, 2016 at 1:18 PM, Matt Broadstone <[email protected]> wrote:
> On Mon, Jan 11, 2016 at 1:15 PM, Gordon Sim <[email protected]> wrote: > >> On 01/11/2016 05:37 PM, Matt Broadstone wrote: >> >>> I'm having trouble tracking down the root cause of a thrown SIGABRT in >>> qpidd, and was wondering for some advice from the list. Specifically, it >>> seems to be after a period of little to no activity, a large burst of >>> traffic hits the broker and the only information we're seeing in the logs >>> is: >>> >>> Jan 11 15:58:33 test-box kernel: [ 652.903997] init: qpidd main process >>> (2239) killed by ABRT signal >>> Jan 11 15:58:33 test-box kernel: [ 652.911661] init: qpidd main process >>> ended, respawning >>> >>> We're running ubuntu 14.04 (trusty) on this machine, with the packages >>> off >>> the official qpid PPA. I tried running the services with trace logging >>> enabled to no avail (there were no strange packets, and no error messages >>> about bad assertions). Attaching gdb to the process also resulted in no >>> relevant information, so I'm running out of ideas of what to try next. >>> AFAICT the only `abort()` present in the codebase is in the assertion >>> code, >>> which would print something about around the assertion failure. >>> >>> Any thoughts on what I might try to help resolve this issue? >>> >> >> Could it be a memory issue? I.e. the qpidd processes exceeding some >> memory limit and being killed by the oom killer? >> >> > I thought so at first too, but I believe we would see a kernel message > related to that if that were the case, not to mention that the server had > something like 120GB of free RAM at the time as well. I'm quite willing to > test that theory more thoroughly if you have a recommended means of doing > so? > > I wish I could tell you that I have a quick reproducible test case that only used qpid code, unfortunately we don't it's always in concert with a number of other services. Is it possible that this could be triggered from some proton code and that no message would be output? We're actually experiencing this problem on a loop on one of our servers currently and there's something like 452GB free memory, so I'm inclined to rule out memory as the root cause. > Matt > > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> >
