We finally found the issue!!! [?]
In include/qpid/dispatch/atomic.h, the "#else" is not correctly tested because in some cases the mutex can be NULL. So when we call "sys_atomic_add" and the mutex is NULL, then a core is dumped. I removed the "#elif defined(__GNUC__) || defined(__clang__)" part and ran the code on Linux Red Hat with gcc 4.9 and got the same core as on Solaris. In Solaris, there is an Atomic library, I will add an "elif" part and use the part. Then, I will submit a patch for it. But I really think you should test the "#else" part. For example qdr_forward_message_CT is setting the in_delivery to 0 and thus the mutex is never initialized. I think there are other paths which might lead to the same error. dbx qdrouterd core (dbx) where current thread: t@2 [1] mutex_lock_impl(0x0, 0x0, 0x0, 0x0, 0x8dd138, 0x8a1820), at 0xfffffd7ffef44c05 [2] __mutex_lock(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef44d9b =>[3] sys_mutex_lock(mutex = (nil)), line 64 in "threading.c" [4] sys_atomic_add(ref = 0x8dd138, value = 1U), line 109 in "atomic.h" [5] qdr_forward_deliver_CT(core = 0x7ad680, link = 0x8a1820, dlv = 0x8dd120), line 167 in "forwarder.c" [6] qdr_forward_closest_CT(core = 0x7ad680, addr = 0x7e3da0, msg = 0x8a4020, in_delivery = (nil), exclude_inprocess = 1U, control = 0), line 394 in "forwarder.c" [7] qdr_forward_message_CT(core = 0x7ad680, addr = 0x7e3da0, msg = 0x8a4020, in_delivery = (nil), exclude_inprocess = 1U, control = 0), line 750 in "forwarder.c" [8] qdr_send_to_CT(core = 0x7ad680, action = 0x7af220, discard = 0), line 650 in "transfer.c" [9] router_core_thread(arg = 0x7ad680), line 83 in "router_core_thread.c" [10] _thr_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef4bfbb [11] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffef4c1e0 Regards, Adel ________________________________ From: Adel Boutros <[email protected]> Sent: Tuesday, January 24, 2017 6:51:09 PM To: [email protected] Subject: Re: [Dispatch Router 0.7.0] [SOLARIS] Unit tests failing The below error was due to a bug in on of my patches (I updated it). I still have 2 problems: * Test Case server_tests.test_user_fd: FAIL: Error while reading The error is "Resource temporarily unavailable". I suppose this is an OS dependent issue with the "read" but not sure yet. * qdstat/qdmanage makes the dispatch router crash 9 out of 10 test failures are related to this error Core using Purify: ------------------------------ **** Purify instrumented ./qdrouterd (pid 15668) **** COR: Fatal core dump: * This is occurring while in thread 2: mutex_lock_impl [libc.so.1] sys_atomic_add [atomic.h:109] qdr_forward_deliver_CT [forwarder.c:167] qdr_forward_closest_CT [forwarder.c:394] qdr_forward_message_CT [forwarder.c:750] qdr_send_to_CT [transfer.c:650] router_core_thread [router_core_thread.c:83] _thr_setup [libc.so.1] * Received signal 11 (SIGSEGV - Segmentation Fault) * Faulting address = 0x4 * Signal mask: (SIGSEGV) * Pending signals: Command line ---------------------- qdrouterd -c dispatch.conf dispatch.conf ------------------ container { worker-threads: 4 containerName: qpid.dispatch.router.10400 } listener { addr: 0.0.0.0 port: 10400 role: normal saslMechanisms: ANONYMOUS requireSsl: no authenticatePeer: no } router { mode: interior routerId: router.10400 helloInterval: 60 helloMaxAge: 180 } Regards, Adel ________________________________ From: Adel Boutros <[email protected]> Sent: Tuesday, January 24, 2017 10:05:44 AM To: [email protected] Subject: Re: [Dispatch Router 0.7.0] [SOLARIS] Unit tests failing Hello, I was able to fix most of the errors except for the Logger one which is preventing the Dispatch Router from starting and thus causing the tests which needs a running Dispatch Router to fail. 9: Test command: /python278/bin/python "/build-dir/qpid-dispatch/tests/run.py" "--vg" "unit_tests" "/qpid-dispatch-0.7.0/tests/threads4.conf" 9: Test timeout computed to be: 1500 9: Tue Jan 24 09:52:00 2017 ERROR (error) Python: KeyError: ('output',) 9: Tue Jan 24 09:52:00 2017 ERROR (error) Traceback (most recent call last): 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch/management/entity.py", line 60, in __getitem__ 9: return self.attributes[name] 9: KeyError: ('output',) 9: 9: Tue Jan 24 09:52:00 2017 ERROR (error) Python: CError: Python: KeyError: ('output',) 9: Tue Jan 24 09:52:00 2017 ERROR (error) Traceback (most recent call last): 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch_internal/management/config.py", line 147, in configure_dispatch 9: agent.configure(attributes=dict(type="log", module=m)) 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch_internal/management/agent.py", line 891, in configure 9: self._create(attributes) 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch_internal/management/agent.py", line 859, in _create 9: pointer = entity.create() 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch_internal/management/agent.py", line 287, in create 9: self._qd.qd_log_entity(self) 9: File "/qpid-dispatch-0.7.0/python/qpid_dispatch_internal/dispatch.py", line 100, in _errcheck 9: raise CError(self.qd_error_message()) 9: CError: Python: KeyError: ('output',) 9: 9: Config failed: Python: CError: Python: KeyError: ('output',) Regards, Adel ________________________________ From: Adel Boutros <[email protected]> Sent: Monday, January 23, 2017 7:30:30 PM To: [email protected] Subject: [Dispatch Router 0.7.0] [SOLARIS] Unit tests failing Hello, After succeeding in compiling Dispatch Router on Solaris, I have 16 out of 26 unit tests failing. I was hoping you could give me hints as to what could be causing them. It seems to me there are some errors which keep appearing. I will keep investigating tomorrow but I am really counting on getting some assistance to ease my task. You will find the failed tests attached to this mail. Regards, Adel
