That's a good point. I suspect something (extra) funny is going on here. I really want to see line numbers against that stack trace. I'm not sure I believe it's a real trace, there may be some sort of stack corruption happening here.
On Wed, Jul 2, 2014 at 10:54 AM, Cliff Jansen <[email protected]> wrote: > As far as I can tell, pni_map_entry() only allocates space for a > single additional entry at a time, and hence should never recurse more > than once. I.e. pni_map_ensure should work the first time and prevent > further recursion. > > On Wed, Jul 2, 2014 at 7:32 AM, Alan Conway <[email protected]> wrote: > > On Tue, 2014-07-01 at 07:15 -0400, Michael Goulish wrote: > >> Yes! > >> Great idea -- > >> I will attempt. > > > > I would put #ifndef NDEBUG around this code. We will never test it but > > someday on a vital production server at our biggest customer, somebody > > will use a map with 33 levels of nesting. I can guarantee it. > > > > It would be even better to do proper loop detection , i.e. check if > > you've seen the same map before. That would affect performance but I > > think the effect would be negligible for maps with a "normal" amount of > > nesting. > > > >> > >> > >> > >> ----- Original Message ----- > >> > >> [ > https://issues.apache.org/jira/browse/PROTON-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048701#comment-14048701 > ] > >> > >> Rafael H. Schloming commented on PROTON-625: > >> -------------------------------------------- > >> > >> I think the easiest way to track down this bug would be to put some > sort of detection inside of pni_map_entry and if it recurses more than some > limit, e.g. 32 times or something, then print out a representation of the > maps internal structure. It might also help to use a debug build so you > have line numbers. Is that something you feel comfortable trying? You > should be able to find the relevant code around line 551 of object.c. > >> > >> > Biggest Backtrace Ever! > >> > ----------------------- > >> > > >> > Key: PROTON-625 > >> > URL: https://issues.apache.org/jira/browse/PROTON-625 > >> > Project: Qpid Proton > >> > Issue Type: Bug > >> > Components: proton-c > >> > Affects Versions: 0.7 > >> > Reporter: michael goulish > >> > > >> > I am saving all my stuff so I can repro on demand. > >> > It doesn't happen every time, but it's about 50%. > >> > ------------------------------------------ > >> > On one box, I have a dispatch router. > >> > On the other box, I have 10 clients: 5 Messenger-based receivers, and > 5 qpid-messaging-based senders. > >> > Each client will handle 100 addresses, of the form "mick/0" ... > "mick/1" ... & c. > >> > 100 messages will be sent to each address. > >> > I start the 5 receivers first. They start OK. Dispatch router happy > & stable. > >> > Wait a few seconds. > >> > I start the 5 senders, from a bash script. > >> > The first sender is already sending when the 2nd, 3rd, 4th start. > >> > After a few of them start,but before all have finished starting, a > few seconds into the script, the crash occurs. ( If they all start up > successfully, no crash. ) > >> > The crash occurs in the dispatch router. > >> > Here is the biggest backtrace ever: > >> > #0 0x0000003cf9879ad1 in _int_malloc (av=0x7f101c000020, > bytes=16384) at malloc.c:4383 > >> > #1 0x0000003cf987a911 in __libc_malloc (bytes=16384) at malloc.c:3664 > >> > #2 0x00000039c6c1650a in pni_map_allocate () from > /usr/lib64/libqpid-proton.so.2 > >> > #3 0x00000039c6c16a3a in pni_map_ensure () from > /usr/lib64/libqpid-proton.so.2 > >> > #4 0x00000039c6c16c45 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #5 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #6 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #7 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #8 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #9 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #10 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #11 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #12 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #13 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #14 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > . > >> > . > >> > . > >> > . > >> > #93549 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93550 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93551 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93552 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93553 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93554 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93555 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93556 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93557 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93558 0x00000039c6c16c64 in pni_map_entry () from > /usr/lib64/libqpid-proton.so.2 > >> > #93559 0x00000039c6c16dc0 in pn_map_put () from > /usr/lib64/libqpid-proton.so.2 > >> > #93560 0x00000039c6c17226 in pn_hash_put () from > /usr/lib64/libqpid-proton.so.2 > >> > #93561 0x00000039c6c2a643 in pn_delivery_map_push () from > /usr/lib64/libqpid-proton.so.2 > >> > #93562 0x00000039c6c2c44b in pn_do_transfer () from > /usr/lib64/libqpid-proton.so.2 > >> > #93563 0x00000039c6c24385 in pn_dispatch_frame () from > /usr/lib64/libqpid-proton.so.2 > >> > #93564 0x00000039c6c2448f in pn_dispatcher_input () from > /usr/lib64/libqpid-proton.so.2 > >> > #93565 0x00000039c6c2d68b in pn_input_read_amqp () from > /usr/lib64/libqpid-proton.so.2 > >> > #93566 0x00000039c6c3011a in pn_io_layer_input_passthru () from > /usr/lib64/libqpid-proton.so.2 > >> > #93567 0x00000039c6c3011a in pn_io_layer_input_passthru () from > /usr/lib64/libqpid-proton.so.2 > >> > #93568 0x00000039c6c2d275 in transport_consume () from > /usr/lib64/libqpid-proton.so.2 > >> > #93569 0x00000039c6c304cd in pn_transport_process () from > /usr/lib64/libqpid-proton.so.2 > >> > #93570 0x00000039c6c3e40c in pn_connector_process () from > /usr/lib64/libqpid-proton.so.2 > >> > #93571 0x00007f1060c60460 in process_connector () from > /home/mick/dispatch/build/libqpid-dispatch.so.0 > >> > #93572 0x00007f1060c61017 in thread_run () from > /home/mick/dispatch/build/libqpid-dispatch.so.0 > >> > #93573 0x0000003cf9c07851 in start_thread (arg=0x7f1052bfd700) at > pthread_create.c:301 > >> > #93574 0x0000003cf98e890d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.2#6252) > > > > >
