[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729662#comment-15729662 ] Vishal Sharda commented on DISPATCH-358: What you said seems to be the case. These crashes have not been observed after upgrading to Proton 0.13.0 and later. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Critical > Attachments: Crash_EXTERNAL.png, Crash_Java_Router_3.png, > Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_Java_same_router.png, Crash_Java_same_router_another.png, > Crash_Java_same_router_another_bt.png, Crash_SASL.png, Crash_SASL_2.png, > Crash_SR_1.png, Crash_SR_2.png, Crash_bt_double_free_Java_RES_266MB.png, > Crash_double_free_Java_RES_266MB.png, Crash_free.png, > Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556243#comment-15556243 ] Vishal Sharda commented on DISPATCH-337: We built with the default setting "RelWithDebInfo" which should have included the Debug information. CPU usage per thread is very low. vsharda@millennium-qpid-deploy-lnp-1-5129:~$ top -Hbcd 5 | grep qdrouterd 14213 vsharda 20 0 11132 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25467 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.34 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 20:02.09 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.65 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:55.10 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 20:02.10 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 14213 vsharda 20 0 11136 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25467 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.34 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.65 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:55.10 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 19:51.66 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 14213 vsharda 20 0 11136 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25467 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.34 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 20:02.10 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:55.10 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 20:02.11 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 19:55.11 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 14213 vsharda 20 0 11136 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25467 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.34 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.66 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25467 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 19:51.35 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 19:51.67 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 14213 vsharda 20 0 11136 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 20:02.11 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:55.11 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25493 qserv 20 0 2524848 2.187g 8420 S 0.2 3.7 20:02.12 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 14213 vsharda 20 0 11136 1624 1492 S 0.0 0.0 0:00.00 grep qdrouterd 25467 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.35 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25482 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 0:39.50 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25494 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:51.67 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ 25495 qserv 20 0 2524848 2.187g 8420 S 0.0 3.7 19:55.11 qdrouterd -c /x/web/LIVE/switch-dr-network/configurator+ The memory footprint is increasing steadily. qdstat failed to get a response within 120 seconds: vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -cb 10.25.171.242 -t 120 Timeout: Connection amqp://10.25.171.242:amqp/$management timed out: Opening connection > Huge memory leaks in
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556181#comment-15556181 ] Vishal Sharda commented on DISPATCH-337: vsharda@millennium-qpid-deploy-lnp-2-7131:/$ PN_TRACE_FRM=1 qdstat -cb 10.24.170.251 [0xfbf6f0]: -> SASL [0xfbf6f0]: <- SASL [0xfbf6f0]:0 <- @sasl-mechanisms(64) [sasl-server-mechanisms=@PN_SYMBOL[:ANONYMOUS]] [0xfbf6f0]:0 -> @sasl-init(65) [mechanism=:ANONYMOUS, initial-response=b"anonymous@millennium-qpid-deploy-lnp-2-7131"] [0xfbf6f0]:0 <- @sasl-outcome(68) [code=0] [0xfbf6f0]: -> AMQP [0xfbf6f0]:0 -> @open(16) [container-id="2034a069-e072-46f3-ac55-3c76fbb692ca", hostname="10.24.170.251", channel-max=32767] [0xfbf6f0]: <- AMQP [0xfbf6f0]:0 <- @open(16) [container-id="Router.A.0", max-frame-size=16384, channel-max=32767, idle-time-out=8000, offered-capabilities=:"ANONYMOUS-RELAY", properties={:product="qpid-dispatch-router", :version="0.6.0"}] [0xfbf6f0]:0 -> @begin(17) [next-outgoing-id=0, incoming-window=2147483647, outgoing-window=2147483647] [0xfbf6f0]:0 -> @attach(18) [name="2034a069-e072-46f3-ac55-3c76fbb692ca-$management", handle=0, role=false, snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [durable=0, timeout=0, dynamic=false], target=@target(41) [address="$management", durable=0, timeout=0, dynamic=false], initial-delivery-count=0] [0xfbf6f0]:0 <- @begin(17) [remote-channel=0, next-outgoing-id=0, incoming-window=61, outgoing-window=2147483647] [0xfbf6f0]:0 -> (EMPTY FRAME) [0xfbf6f0]:0 -> (EMPTY FRAME) Timeout: Connection amqp://10.24.170.251:amqp/$management timed out: Opening link 2034a069-e072-46f3-ac55-3c76fbb692ca-$management There are no symbols found while running pstack 25467: qdrouterd -c /x/web/LIVE/switch-dr-network/configurator/qdrouterd.conf (No symbols found) 0x7f4197612d3d: (2, 4023a0, 7fff7c15271f, 7fff7c15271f, 4023a0, 401a67) + 800085032b50 0x7f41987163b0: (1, 12f3f60, 30, 31, 7f4184123050, 7f41801207e0) + 527e0 0x10003: (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + ffdddcf0 0x7f410004: (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) + 90 0x01183850: (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + ffdddcf0 0x7f410004: (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) + 90 0x01183850: (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + ffdddcf0 0x7f410004: (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) + 90 0x01183850: (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + ffdddcf0 0x7f410004: (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) + 90 0x01183850: (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + ffdddcf0 > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, > LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556144#comment-15556144 ] Vishal Sharda commented on DISPATCH-337: 1. The router is reachable and shows "Accepting incoming connection ..." from one of the client machines in the logs. It is not clear what causes connection failure. 2. Yes, there are few TCP connections to the bad router: vsharda@millennium-qpid-deploy-lnp-1-5129:~$ netstat -at | grep 5670 tcp0 0 *:5670 *:* LISTEN tcp0 0 millennium-qpid-de:5670 userstage114828.c:36758 ESTABLISHED tcp0 0 localhost:5670 localhost:33894 ESTABLISHED tcp0 0 millennium-qpid-de:5670 userstage118169.c:38132 ESTABLISHED tcp0 0 millennium-qpid-de:5670 10.22.99.81:40080 ESTABLISHED tcp0 0 millennium-qpid-de:5670 10.22.102.215:50594 ESTABLISHED tcp6 0 0 localhost:33894 localhost:5670 ESTABLISHED 3. Other routers say that the bad router (Router.A.0) does not exist in the network: vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -nv Routers in the Network router-id next-hop link cost neighbors valid-origins = Router.A.1 (self)- ['Router.A.2', 'Router.A.3', 'Router.A.4'] [] Router.A.2 - 1 1 ['Router.A.1', 'Router.A.3', 'Router.A.4'] [] Router.A.3 - 2 1 ['Router.A.1', 'Router.A.2', 'Router.A.4'] [] Router.A.4 - 3 1 ['Router.A.1', 'Router.A.2', 'Router.A.3'] [] 4. On 2016-10-02 13:28:30, it was 620 MB. On 2016-10-07 13:16:00, it is 2.127 GB. 5. Bad router cannot be checked from the good routers: vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -b 10.24.170.251 -c Timeout: Connection amqp://10.24.170.251:amqp/$management timed out: Opening link 9809b66a-1f2d-4952-92f1-c6c5c8b35680-$management > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, > LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543002#comment-15543002 ] Vishal Sharda edited comment on DISPATCH-337 at 10/3/16 9:34 PM: - One router on a machine (LNP-1) in a cluster of 5 completely connected routers suddenly started growing in memory and stopped accepting incoming connections. We cannot even run qdstat on that router to know its status. This is happening on one machine after we cherry-picked the fixes of DISPATCH-491 and DISPATCH-505 on all 5 machines in this cluster. was (Author: vsharda): One router on a machine (LNP-1) in a cluster of 5 completely connected routers suddenly started growing in memory and stopped accepting incoming connections. We cannot even run qdstat on that router to know its status. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, > LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-337: --- Attachment: LNP-1_not_accepting_connections.png LNP-1_Leak_starts.png LNP-1_Huge_memory.png One router on a machine (LNP-1) in a cluster of 5 completely connected routers suddenly started growing in memory and stopped accepting incoming connections. We cannot even run qdstat on that router to know its status. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, > LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514526#comment-15514526 ] Vishal Sharda commented on DISPATCH-337: Hi Ted, The testing continues and it has grown to 167 MB now. There were some undelivered messages but we killed all the clients. Router memory stayed at 167 MB. 3124 tim 20 0 288432 167348 6088 S 0.7 2.1 114:07.90 qdrouterd Here are the various qdstat outputs now. tim@tkuchlein3-linux:~$ qdstat -c Connections Idhost container role dir securityauthentication = 2 10.244.162.114:58234 Router.A.4 inter-router in TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 3 10.244.162.117:52588 Router.A.5 inter-router in TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 4 10.244.162.176:48724 Router.A.6 inter-router in TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 5 10.244.162.231:38920 Router.A.7 inter-router in TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384) anonymous-user 8360 127.0.0.1:57104 69110aed-8f13-4968-8c84-cad3c734a859 normal in no-security anonymous-user tim@tkuchlein3-linux:~$ qdstat -m Types type size batch thread-max totalin-threads rebal-in rebal-out == qd_bitmask_t 24 64 128 640 128 56,559 56,567 qd_buffer_t 53616 32 120,672 118,144 1,799,008 1,799,166 qd_composed_field_t 64 64 128 64 64 0 0 qd_composite_t 11264 128 128 128 0 0 qd_connection_t 22464 128 1,024128 95 109 qd_deferred_call_t 32 64 128 64 64 0 0 qd_field_iterator_t 12864 128 16,640 13,184 196,986197,040 qd_hash_handle_t 16 64 128 192 192 0 0 qd_hash_item_t 32 64 128 192 192 0 0 qd_hash_segment_t24 64 128 64 64 0 0 qd_link_t48 64 128 1,088128 101 116 qd_listener_t32 64 128 64 64 0 0 qd_log_entry_t 2,104 16 32 1,0081,008 0 0 qd_management_context_t 56 64 128 64 64 0 0 qd_message_content_t 64016 32 26,928 26,560 104,471104,494 qd_message_t 12864 128 27,328 26,624 50,618 50,629 qd_node_t56 64 128 64 64 0 0 qd_parsed_field_t80 64 128 7,9364,928 107,049107,096 qd_timer_t 56 64 128 128 128 0 0 qd_work_item_t 24 64 128 1,024128 3,059 3,073 qdpn_connector_t 60016 32 1,02432 429 491 qdpn_listener_t 48 64 128 64 64 0 0 qdr_action_t 16064 128 320 128 501,060501,063 qdr_address_config_t 56 64 128 64 64 0 0 qdr_address_t26416 32 64 32 0 2 qdr_connection_t 21664 128 1,152128 112 128 qdr_connection_work_t56 64 128 192 128 129 130 qdr_delivery_ref_t 24 64 128 448 128 115,473115,478 qdr_delivery_t 14464 128 26,560 25,920 5,327 5,337 qdr_error_t 24 64 128 192 128 129 130 qdr_field_t 40 64 128 8,4488,448 9,175 9,175 qdr_general_work_t 64 64 128 192 128 55,495 55,496 qdr_link_ref_t 24 64 128 2,176128 271,818271,850 qdr_link_t 26416 32 1,07248 460 524 qdr_node_t 80 64 128 64 64
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514271#comment-15514271 ] Vishal Sharda commented on DISPATCH-337: Hi Ted, Here is the output. This router has grown to nearly 160 MB in memory. tim@tkuchlein3-linux:~$ qdstat -m Types type size batch thread-max totalin-threads rebal-in rebal-out == qd_bitmask_t 24 64 128 640 192 55,913 55,920 qd_buffer_t 53616 32 118,624 115,936 1,658,161 1,658,329 qd_composed_field_t 64 64 128 64 64 0 0 qd_composite_t 11264 128 128 128 0 0 qd_connection_t 22464 128 1,024128 95 109 qd_deferred_call_t 32 64 128 64 64 0 0 qd_field_iterator_t 12864 128 16,192 13,184 169,723169,770 qd_hash_handle_t 16 64 128 128 128 0 0 qd_hash_item_t 32 64 128 128 128 0 0 qd_hash_segment_t24 64 128 64 64 0 0 qd_link_t48 64 128 1,088128 101 116 qd_listener_t32 64 128 64 64 0 0 qd_log_entry_t 2,104 16 32 1,0081,008 0 0 qd_management_context_t 56 64 128 64 64 0 0 qd_message_content_t 64016 32 26,512 26,032 102,426102,456 qd_message_t 12864 128 27,072 26,112 49,998 50,013 qd_node_t56 64 128 64 64 0 0 qd_parsed_field_t80 64 128 7,6804,928 88,639 88,682 qd_timer_t 56 64 128 128 128 0 0 qd_work_item_t 24 64 128 1,024128 3,059 3,073 qdpn_connector_t 60016 32 1,02448 428 489 qdpn_listener_t 48 64 128 64 64 0 0 qdr_action_t 16064 128 320 128 469,489469,492 qdr_address_config_t 56 64 128 64 64 0 0 qdr_address_t26416 32 64 64 0 0 qdr_connection_t 21664 128 1,152192 112 127 qdr_connection_work_t56 64 128 192 128 128 129 qdr_delivery_ref_t 24 64 128 448 128 102,729102,734 qdr_delivery_t 14464 128 26,304 25,344 5,184 5,199 qdr_error_t 24 64 128 192 128 127 128 qdr_field_t 40 64 128 8,4488,384 9,077 9,078 qdr_general_work_t 64 64 128 192 128 54,906 54,907 qdr_link_ref_t 24 64 128 2,176256 259,099259,129 qdr_link_t 26416 32 1,07296 457 518 qdr_node_t 80 64 128 64 64 0 0 qdr_query_t 33616 32 16 16 0 0 qdr_terminus_t 64 64 128 64 64 0 0 qdtm_router_t16 64 128 64 64 0 0 > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers >
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514009#comment-15514009 ] Vishal Sharda commented on DISPATCH-337: Hi Ted, The clients connected to the non-SSL port (5672). Here are the openssl versions used while building: tim@tkuchlein3-linux:~$ sudo dpkg -l | grep openssl ii libgnutls-openssl27:amd64 3.4.10-4ubuntu1.1 amd64GNU TLS library - OpenSSL wrapper ii openssl 1.0.2g-1ubuntu4.2 amd64Secure Sockets Layer toolkit - cryptographic utility > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-337: --- Attachment: Tim-Router-3-huge-memory-usage.png Tim_Routers_3_and_6_further_leaks.png Tim_Router_3.png We are running a network of 5 routers (0.6.1 and Proton 0.14.0) on 5 bare metal machines having Ubuntu 16.04.1 LTS. These charts show that routers leak memory every time clients connect. There are no Undelivered/pending messages. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, > Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, > config2.conf, val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-470) Router terminated by itself while sitting idle
Vishal Sharda created DISPATCH-470: -- Summary: Router terminated by itself while sitting idle Key: DISPATCH-470 URL: https://issues.apache.org/jira/browse/DISPATCH-470 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.1 Environment: Debian 8.3, Apache Qpid Proton 0.13.0 for drivers and dependencies, Hardware: 8 CPUs, 61 GB RAM, 30 GB HDD each on 5 separate machines Reporter: Vishal Sharda We are running a network of 5 inter-connected routers each on a separate host. One of the routers in this network terminated after running successfully for 7+ days. The router was idle (not receiving/sending any messages) when this happened. After analyzing the core dump, it turned out to be something related to mutex lock in multithreading. Here is the full bt: Reading symbols from qpid-dispatch/qdrouterd...(no debugging symbols found)...done. [New LWP 4082] [New LWP 4088] [New LWP 4030] [New LWP 4089] [New LWP 4090] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `qdrouterd -c /x/web/LIVE/switch-dr-network/configurator/qdrouterd.conf'. Program terminated with signal SIGABRT, Aborted. #0 0x7ffaaf7e6067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) up #1 0x7ffaaf7e7448 in __GI_abort () at abort.c:89 89abort.c: No such file or directory. (gdb) up #2 0x7ffaaf7df266 in __assert_fail_base ( fmt=0x7ffaaf918238 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7ffab0997baf "result == 0", file=file@entry=0x7ffab0997b48 "/home/anhtran/qpid-package/dispatch/qpid-dispatch-0.6.x-Vanilla/src/posix/threading.c", line=line@entry=71, function=function@entry=0x7ffab0997cd1 "sys_mutex_lock") at assert.c:92 92assert.c: No such file or directory. (gdb) up #3 0x7ffaaf7df312 in __GI___assert_fail (assertion=0x7ffab0997baf "result == 0", file=0x7ffab0997b48 "/home/anhtran/qpid-package/dispatch/qpid-dispatch-0.6.x-Vanilla/src/posix/threading.c", line=71, function=0x7ffab0997cd1 "sys_mutex_lock") at assert.c:101 101 in assert.c (gdb) #4 0x7ffab09802eb in sys_mutex_lock () from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so (gdb) #5 0x7ffab098863a in qdr_forward_deliver_CT () from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so (gdb) #6 0x7ffab0989274 in qdr_forward_closest_CT () from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so (gdb) #7 0x7ffab098dd98 in ?? () from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so (gdb) #8 0x7ffab098b45a in router_core_thread () from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so (gdb) #9 0x7ffab04f50a4 in start_thread (arg=0x7ffaad524700) at pthread_create.c:309 309 pthread_create.c: No such file or directory. (gdb) #10 0x7ffaaf89987d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 111 ../sysdeps/unix/sysv/linux/x86_64/clone.S: No such file or directory. (gdb) Initial frame selected; you cannot go up. (gdb) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads
[ https://issues.apache.org/jira/browse/DISPATCH-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327955#comment-15327955 ] Vishal Sharda commented on DISPATCH-383: These are intermittent crashes and I do not yet have a test case that can reliably reproduce them. 1. If I restarted R1 with different number of threads, both R2 and R3 crashed with the same backtrace which is attached here. On a later run, I saw crash only in R2. 2. Yes, this could most likely be timing issue with multithreading on. There is no way for us to control/prevent this from occurring again. The steps involved were simple - interrupting the router, editing the configuration file and starting it again. 3. I have not tested this without SSL but the intermittent crashes that I was seeing due to SASL (DISPATCH-358) no longer appear after upgrading to Proton-0.13.0-RC. Hence, I keep 2-way SSL enabled for all inter-router communication during my tests. > Intermittent router crashes when restarting one router in the network with > different number of threads > -- > > Key: DISPATCH-383 > URL: https://issues.apache.org/jira/browse/DISPATCH-383 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Assignee: Ganesh Murthy >Priority: Critical > Attachments: Crash_route_tables_1.png, Crash_route_tables_2.png, > Crash_route_tables_3.png > > > Network: A network of 3 interior routers built using the latest trunk and > connected to each other using 2-way SSL. > Stopping one router in the network, changing its number of threads in the > configuration file and starting it again to join the network causes > intermittent crash in other routers in the network. > I was able to reproduce the crash three times and collect the backtraces > inside gdb (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads
[ https://issues.apache.org/jira/browse/DISPATCH-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-383: --- Attachment: Crash_route_tables_3.png Crash_route_tables_2.png Crash_route_tables_1.png Screenshots showing crash and the backtrace in the routers when doing as described. > Intermittent router crashes when restarting one router in the network with > different number of threads > -- > > Key: DISPATCH-383 > URL: https://issues.apache.org/jira/browse/DISPATCH-383 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_route_tables_1.png, Crash_route_tables_2.png, > Crash_route_tables_3.png > > > Network: A network of 3 interior routers built using the latest trunk and > connected to each other using 2-way SSL. > Stopping one router in the network, changing its number of threads in the > configuration file and starting it again to join the network causes > intermittent crash in other routers in the network. > I was able to reproduce the crash three times and collect the backtraces > inside gdb (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads
Vishal Sharda created DISPATCH-383: -- Summary: Intermittent router crashes when restarting one router in the network with different number of threads Key: DISPATCH-383 URL: https://issues.apache.org/jira/browse/DISPATCH-383 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines Reporter: Vishal Sharda Priority: Critical Network: A network of 3 interior routers built using the latest trunk and connected to each other using 2-way SSL. Stopping one router in the network, changing its number of threads in the configuration file and starting it again to join the network causes intermittent crash in other routers in the network. I was able to reproduce the crash three times and collect the backtraces inside gdb (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-382) Intermittent router crash when starting 50 receivers/0 senders and doing qdstat
[ https://issues.apache.org/jira/browse/DISPATCH-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-382: --- Attachment: val_crash_2.txt val_crash_1.txt Crash_in_Valgrind_3.txt Crash_in_Valgrind_3.png Crash_in_Valgrind_2.png Crash_in_Valgrind_1.png Attached 3 screenshots showing the crash and 3 output files from Valgrind for the corresponding runs. Here is the information about the thread that lead to SIGABRT as per Valgrind: ==18841== Thread 2: ==18841== Invalid read of size 4 ==18841==at 0x52F7274: pthread_mutex_lock (pthread_mutex_lock.c:66) ==18841==by 0x4E648E7: sys_mutex_lock (threading.c:70) ==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132) ==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405) ==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707) ==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581) ==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841==by 0x5F8387C: clone (clone.S:111) ==18841== Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd ==18841==at 0x4C28D29: free (vg_replace_malloc.c:530) ==18841==by 0x4E648CD: sys_mutex_free (threading.c:64) ==18841==by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972) ==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841==by 0x5F8387C: clone (clone.S:111) ==18841== Block was alloc'd at ==18841==at 0x4C27C0F: malloc (vg_replace_malloc.c:299) ==18841==by 0x4E64859: sys_mutex (threading.c:51) ==18841==by 0x4E6D111: qdr_connection_opened (connections.c:85) ==18841==by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560) ==18841==by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572) ==18841==by 0x4E5397D: notify_opened (container.c:261) ==18841==by 0x4E53A0D: policy_notify_opened (container.c:275) ==18841==by 0x4E61B3A: qd_policy_amqp_open (policy.c:744) ==18841==by 0x4E81BC1: invoke_deferred_calls (server.c:720) ==18841==by 0x4E81CE7: process_connector (server.c:766) ==18841==by 0x4E827C0: thread_run (server.c:1024) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841== ==18841== Invalid read of size 4 ==18841==at 0x52F2A03: __pthread_mutex_lock_full (pthread_mutex_lock.c:177) ==18841==by 0x4E648E7: sys_mutex_lock (threading.c:70) ==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132) ==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405) ==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707) ==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581) ==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841==by 0x5F8387C: clone (clone.S:111) ==18841== Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd ==18841==at 0x4C28D29: free (vg_replace_malloc.c:530) ==18841==by 0x4E648CD: sys_mutex_free (threading.c:64) ==18841==by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972) ==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841==by 0x5F8387C: clone (clone.S:111) ==18841== Block was alloc'd at ==18841==at 0x4C27C0F: malloc (vg_replace_malloc.c:299) ==18841==by 0x4E64859: sys_mutex (threading.c:51) ==18841==by 0x4E6D111: qdr_connection_opened (connections.c:85) ==18841==by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560) ==18841==by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572) ==18841==by 0x4E5397D: notify_opened (container.c:261) ==18841==by 0x4E53A0D: policy_notify_opened (container.c:275) ==18841==by 0x4E61B3A: qd_policy_amqp_open (policy.c:744) ==18841==by 0x4E81BC1: invoke_deferred_calls (server.c:720) ==18841==by 0x4E81CE7: process_connector (server.c:766) ==18841==by 0x4E827C0: thread_run (server.c:1024) ==18841==by 0x52F50A3: start_thread (pthread_create.c:309) ==18841== ==18841== ==18841== Process terminating with default action of signal 6 (SIGABRT) ==18841==at 0x5ED0067: raise (raise.c:56) ==18841==by 0x5ED1447: abort (abort.c:89) ==18841==by 0x5EC9265: __assert_fail_base (assert.c:92) ==18841==by 0x5EC9311: __assert_fail (assert.c:101) ==18841==by 0x4E6490F: sys_mutex_lock (threading.c:71) ==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132) ==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405) ==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707) ==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581) ==18841==by 0x4E76623: router_core_thread
[jira] [Created] (DISPATCH-382) Intermittent router crash when starting 50 receivers/0 senders and doing qdstat
Vishal Sharda created DISPATCH-382: -- Summary: Intermittent router crash when starting 50 receivers/0 senders and doing qdstat Key: DISPATCH-382 URL: https://issues.apache.org/jira/browse/DISPATCH-382 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines Reporter: Vishal Sharda Priority: Blocker Network: A network of 3 interior routers built from trunk and connected to each other using 2-way SSL. We ran a Proton-J Reactor API based client to start 50 receivers and 0 senders on one of the above 3 routers. After that we ran "qdstat -c". This leads to intermittent crash in the router. This crash could not be reproduced while running the routers independently or inside gdb. When we run the routers inside Valgrind, this crash is frequent. I was able to reproduce the crash 3 times using Valgrind (Screenshots and Valgrind output files are attached). This intermittent crash becomes permanent in our instrumented build. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel
[ https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-380: --- Attachment: Single_router_testing_results.pdf Attached a file Single_router_testing_results.pdf containing several results of testing a single router. > Router stops receiving messages from multiple senders publishing to multiple > queues in parallel > --- > > Key: DISPATCH-380 > URL: https://issues.apache.org/jira/browse/DISPATCH-380 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Priority: Critical > Fix For: 0.6.0 > > Attachments: AWS_hung_at_round_figures.png, > AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, > Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, > Single_router_testing_results.pdf, qdstat_wrong_output.png > > > I am running a Java Client against a cluster of 3 interior routers connected > to each other. 2-way SSL is enabled for all the connections. > There were 20 simultaneous queues with 20 senders on each queue and each > sender publishing 1000 messages. All the senders were connected to Router 1. > 20 receivers were connected to Router 2 with 1 receiver receiving from each > queue. > In the first run, router stopped receiving incoming messages after delivering > 386,339 out of 400K "Hello World!" messages. > In the second run, 388,781 messages out of 400K were delivered. > I reduced the number of queues to 10 (halving total number of messages to > 200K) and the issue occurred again. > I ran the Java client on an 8 CPU machine again with 10 queues and the issue > occurred again after delivering just 54K out of 200K messages. > All the senders were hung (still connected) with no messages flowing at all. > Connection information from qdstat: > When the messages are flowing properly and I run "qdstat -c", I see all the > senders as secure and authenticated. > After they hang and I run "qdstat -c", it erroneously shows all the clients > as insecure and unauthenticated. > Shortly after the clients hang, all the queues are deleted from the router > network but connections are still shown until I terminate the clients. > I saw this erroneous situation before also when "qdstat -c" showed some > senders as secure and authentic but some as insecure/unauthentic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel
[ https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325898#comment-15325898 ] Vishal Sharda edited comment on DISPATCH-380 at 6/11/16 3:14 PM: - I ran a single instance of Qpid Dispatch Router built on an AWS instance running Ubuntu. This was an interior router with no connections to other routers. I used Proton-J Messenger API based Java client that uses 2-way SSL. I started 10 receivers first with each receiver listening on a different endpoint (total 10 endpoints created). Then I started 20 senders on each endpoint (total 200 senders) each publishing 1000 “Hello World!” messages. Total 200K messages were to be delivered but I saw that all the senders hung after only 70K messages were delivered. In the second exact run, they hung after delivery of 67K messages. The attached screenshots show the following: 1.) AWS_uneven_start.png: Uneven start on different endpoints. 2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of messages sent/received on each endpoint became a round figure. 3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through and all the counts remained intact. Finally, all the endpoints were deleted from the router but qdstat –c still showed connections from my senders but wrongly as insecure and inauthentic. was (Author: vsharda): I ran single instance of ADR already built on an AWS running. This was an interior router with no connections to other routers. I used Proton-J Messenger API based Java client that uses 2-way SSL. I started 10 receivers first with each receiver listening on a different endpoint (total 10 endpoints created). Then I started 20 senders on each endpoint (total 200 senders) each publishing 1000 “Hello World!” messages. Total 200K messages were to be delivered but I saw that all the senders hung after only 70K messages were delivered. In the second exact run, they hung after delivery of 67K messages. The attached screenshots show the following: 1.) AWS_uneven_start.png: Uneven start on different endpoints. 2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of messages sent/received on each endpoint became a round figure. 3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through and all the counts remained intact. Finally, all the endpoints were deleted from the router but qdstat –c still showed connections from my senders but wrongly as insecure and inauthentic. > Router stops receiving messages from multiple senders publishing to multiple > queues in parallel > --- > > Key: DISPATCH-380 > URL: https://issues.apache.org/jira/browse/DISPATCH-380 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Priority: Critical > Fix For: 0.6.0 > > Attachments: AWS_hung_at_round_figures.png, > AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, > Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, > qdstat_wrong_output.png > > > I am running a Java Client against a cluster of 3 interior routers connected > to each other. 2-way SSL is enabled for all the connections. > There were 20 simultaneous queues with 20 senders on each queue and each > sender publishing 1000 messages. All the senders were connected to Router 1. > 20 receivers were connected to Router 2 with 1 receiver receiving from each > queue. > In the first run, router stopped receiving incoming messages after delivering > 386,339 out of 400K "Hello World!" messages. > In the second run, 388,781 messages out of 400K were delivered. > I reduced the number of queues to 10 (halving total number of messages to > 200K) and the issue occurred again. > I ran the Java client on an 8 CPU machine again with 10 queues and the issue > occurred again after delivering just 54K out of 200K messages. > All the senders were hung (still connected) with no messages flowing at all. > Connection information from qdstat: > When the messages are flowing properly and I run "qdstat -c", I see all the > senders as secure and authenticated. > After they hang and I run "qdstat -c", it erroneously shows all the clients > as insecure and unauthenticated. > Shortly after the clients hang, all the queues are deleted from the router > network but connections are still shown until I terminate the clients. > I saw this erroneous situation before also when "qdstat -c" showed some > senders as secure and authentic but some as
[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel
[ https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-380: --- Attachment: AWS_uneven_start.png AWS_hung_for_4_seconds.png AWS_hung_at_round_figures.png I ran single instance of ADR already built on an AWS running. This was an interior router with no connections to other routers. I used Proton-J Messenger API based Java client that uses 2-way SSL. I started 10 receivers first with each receiver listening on a different endpoint (total 10 endpoints created). Then I started 20 senders on each endpoint (total 200 senders) each publishing 1000 “Hello World!” messages. Total 200K messages were to be delivered but I saw that all the senders hung after only 70K messages were delivered. In the second exact run, they hung after delivery of 67K messages. The attached screenshots show the following: 1.) AWS_uneven_start.png: Uneven start on different endpoints. 2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of messages sent/received on each endpoint became a round figure. 3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through and all the counts remained intact. Finally, all the endpoints were deleted from the router but qdstat –c still showed connections from my senders but wrongly as insecure and inauthentic. > Router stops receiving messages from multiple senders publishing to multiple > queues in parallel > --- > > Key: DISPATCH-380 > URL: https://issues.apache.org/jira/browse/DISPATCH-380 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Priority: Critical > Fix For: 0.6.0 > > Attachments: AWS_hung_at_round_figures.png, > AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, > Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, > qdstat_wrong_output.png > > > I am running a Java Client against a cluster of 3 interior routers connected > to each other. 2-way SSL is enabled for all the connections. > There were 20 simultaneous queues with 20 senders on each queue and each > sender publishing 1000 messages. All the senders were connected to Router 1. > 20 receivers were connected to Router 2 with 1 receiver receiving from each > queue. > In the first run, router stopped receiving incoming messages after delivering > 386,339 out of 400K "Hello World!" messages. > In the second run, 388,781 messages out of 400K were delivered. > I reduced the number of queues to 10 (halving total number of messages to > 200K) and the issue occurred again. > I ran the Java client on an 8 CPU machine again with 10 queues and the issue > occurred again after delivering just 54K out of 200K messages. > All the senders were hung (still connected) with no messages flowing at all. > Connection information from qdstat: > When the messages are flowing properly and I run "qdstat -c", I see all the > senders as secure and authenticated. > After they hang and I run "qdstat -c", it erroneously shows all the clients > as insecure and unauthenticated. > Shortly after the clients hang, all the queues are deleted from the router > network but connections are still shown until I terminate the clients. > I saw this erroneous situation before also when "qdstat -c" showed some > senders as secure and authentic but some as insecure/unauthentic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel
[ https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-380: --- Attachment: qdstat_wrong_output.png Senders_4_10_queues.png Senders_3_10_Queues.png Senders_2.png Senders_1.png Screenshots showing how qdstat shows senders connected but insecure and unauthenticated after they hang. > Router stops receiving messages from multiple senders publishing to multiple > queues in parallel > --- > > Key: DISPATCH-380 > URL: https://issues.apache.org/jira/browse/DISPATCH-380 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine > Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate > machines >Reporter: Vishal Sharda >Priority: Critical > Fix For: 0.6.0 > > Attachments: Senders_1.png, Senders_2.png, Senders_3_10_Queues.png, > Senders_4_10_queues.png, qdstat_wrong_output.png > > > I am running a Java Client against a cluster of 3 interior routers connected > to each other. 2-way SSL is enabled for all the connections. > There were 20 simultaneous queues with 20 senders on each queue and each > sender publishing 1000 messages. All the senders were connected to Router 1. > 20 receivers were connected to Router 2 with 1 receiver receiving from each > queue. > In the first run, router stopped receiving incoming messages after delivering > 386,339 out of 400K "Hello World!" messages. > In the second run, 388,781 messages out of 400K were delivered. > I reduced the number of queues to 10 (halving total number of messages to > 200K) and the issue occurred again. > I ran the Java client on an 8 CPU machine again with 10 queues and the issue > occurred again after delivering just 54K out of 200K messages. > All the senders were hung (still connected) with no messages flowing at all. > Connection information from qdstat: > When the messages are flowing properly and I run "qdstat -c", I see all the > senders as secure and authenticated. > After they hang and I run "qdstat -c", it erroneously shows all the clients > as insecure and unauthenticated. > Shortly after the clients hang, all the queues are deleted from the router > network but connections are still shown until I terminate the clients. > I saw this erroneous situation before also when "qdstat -c" showed some > senders as secure and authentic but some as insecure/unauthentic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel
Vishal Sharda created DISPATCH-380: -- Summary: Router stops receiving messages from multiple senders publishing to multiple queues in parallel Key: DISPATCH-380 URL: https://issues.apache.org/jira/browse/DISPATCH-380 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines Reporter: Vishal Sharda Priority: Critical Fix For: 0.6.0 I am running a Java Client against a cluster of 3 interior routers connected to each other. 2-way SSL is enabled for all the connections. There were 20 simultaneous queues with 20 senders on each queue and each sender publishing 1000 messages. All the senders were connected to Router 1. 20 receivers were connected to Router 2 with 1 receiver receiving from each queue. In the first run, router stopped receiving incoming messages after delivering 386,339 out of 400K "Hello World!" messages. In the second run, 388,781 messages out of 400K were delivered. I reduced the number of queues to 10 (halving total number of messages to 200K) and the issue occurred again. I ran the Java client on an 8 CPU machine again with 10 queues and the issue occurred again after delivering just 54K out of 200K messages. All the senders were hung (still connected) with no messages flowing at all. Connection information from qdstat: When the messages are flowing properly and I run "qdstat -c", I see all the senders as secure and authenticated. After they hang and I run "qdstat -c", it erroneously shows all the clients as insecure and unauthenticated. Shortly after the clients hang, all the queues are deleted from the router network but connections are still shown until I terminate the clients. I saw this erroneous situation before also when "qdstat -c" showed some senders as secure and authentic but some as insecure/unauthentic. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-358: --- Attachment: Crash_EXTERNAL.png I started testing again with latest trunk code that has several bug fixes. I enabled SSL between the routers but not between clients and routers. I got the crash in Crash_EXTERNAL.png while starting/stopping clients/router. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Critical > Attachments: Crash_EXTERNAL.png, Crash_Java_Router_3.png, > Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_Java_same_router.png, Crash_Java_same_router_another.png, > Crash_Java_same_router_another_bt.png, Crash_SASL.png, Crash_SASL_2.png, > Crash_SR_1.png, Crash_SR_2.png, Crash_bt_double_free_Java_RES_266MB.png, > Crash_double_free_Java_RES_266MB.png, Crash_free.png, > Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-371) qdstat and all other clients stopped connecting to interior router
[ https://issues.apache.org/jira/browse/DISPATCH-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320787#comment-15320787 ] Vishal Sharda commented on DISPATCH-371: Hi Ganesh, Can I share the same port between inter-router and normal listeners or they have to use different ports? > qdstat and all other clients stopped connecting to interior router > -- > > Key: DISPATCH-371 > URL: https://issues.apache.org/jira/browse/DISPATCH-371 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines > each >Reporter: Vishal Sharda >Priority: Blocker > > I just updated my sandbox to pull the latest bug fixes. I see that all the > routers in my cluster of 3 interior routers have stopped accepting > connections from my clients as well as qdstat. > The port is properly open but qdstat shows the following error: > * > vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ > sudo netstat -tulpn | grep qdrouterd > tcp0 0 0.0.0.0:56720.0.0.0:* LISTEN > 18460/qdrouterd > vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ > qdstat -c > LinkDetached: sender 427616d4-8253-4bb3-a332-d1940a12f0e3-$management to > $management closed due to: Condition('qd:connection-role', 'Link attach > forbidden on inter-router connection') > ** > If I run the router as standalone, I can see qdstat working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-371) qdstat and all other clients stopped connecting to interior router
Vishal Sharda created DISPATCH-371: -- Summary: qdstat and all other clients stopped connecting to interior router Key: DISPATCH-371 URL: https://issues.apache.org/jira/browse/DISPATCH-371 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines each Reporter: Vishal Sharda Priority: Blocker I just updated my sandbox to pull the latest bug fixes. I see that all the routers in my cluster of 3 interior routers have stopped accepting connections from my clients as well as qdstat. The port is properly open but qdstat shows the following error: * vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ sudo netstat -tulpn | grep qdrouterd tcp0 0 0.0.0.0:56720.0.0.0:* LISTEN 18460/qdrouterd vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ qdstat -c LinkDetached: sender 427616d4-8253-4bb3-a332-d1940a12f0e3-$management to $management closed due to: Condition('qd:connection-role', 'Link attach forbidden on inter-router connection') ** If I run the router as standalone, I can see qdstat working fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-365) Standalone router crashes if an interior router attempts to connect to it
[ https://issues.apache.org/jira/browse/DISPATCH-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-365: --- Attachment: config2_nossl.conf config1_standalone.conf Configuration files to reproduce the crash in standalone router. > Standalone router crashes if an interior router attempts to connect to it > - > > Key: DISPATCH-365 > URL: https://issues.apache.org/jira/browse/DISPATCH-365 > Project: Qpid Dispatch > Issue Type: Bug > Components: Router Node >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1_standalone.conf, config2_nossl.conf > > > I accidentally pointed my interior router to a standalone router. The > standalone router did not ignore the connection request and crashed. The > attached config files reproduce the crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-365) Standalone router crashes if an interior router attempts to connect to it
Vishal Sharda created DISPATCH-365: -- Summary: Standalone router crashes if an interior router attempts to connect to it Key: DISPATCH-365 URL: https://issues.apache.org/jira/browse/DISPATCH-365 Project: Qpid Dispatch Issue Type: Bug Components: Router Node Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines Reporter: Vishal Sharda Priority: Critical I accidentally pointed my interior router to a standalone router. The standalone router did not ignore the connection request and crashed. The attached config files reproduce the crash. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-360) Disallow router with duplicate ID from joining the network
[ https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318872#comment-15318872 ] Vishal Sharda commented on DISPATCH-360: Thanks Ted. I corrected my mistake and will continue testing without SSL. Is it possible to prevent the router with duplicate ID from joining the network? This subtle error can happen easily in large networks of routers. > Disallow router with duplicate ID from joining the network > -- > > Key: DISPATCH-360 > URL: https://issues.apache.org/jira/browse/DISPATCH-360 > Project: Qpid Dispatch > Issue Type: Improvement > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines >Reporter: Vishal Sharda >Priority: Minor > Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, > config1_nossl.conf, config2_nossl.conf, config3_nossl.conf > > > In order to isolate the issues that I am getting with 2-way SSL connections > among routers, I created a cluster of 3 inter-connected routers (R1, R2 and > R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any > type of SSL (I had been using just 2 routers so far but our actual cluster > consists of 3 nodes). All connections were insecure as shown in my config > files. > When I tried sending 4 messages using simple_send.py to R1 after starting > simple_recv.py to receive from R2, I saw no messages were sent. > If I stop R3 and reduce the cluster to just two nodes, it works fine. > If I have 2-way SSL connections between all the 3 routers, it again works > fine. > In my more than 20 runs to test this scenario of sending just 4 messages, it > even worked a few times after waiting for very long. In the other two cases > above, I always got the messages instantaneously (there were no other > senders/receivers active). > The drivers.tar.gz that I attached in DISPATCH-343 either timed out or > returned with unclear status when trying to send just 4 messages from 1 > sender (connected to R1) to 1 receiver (connected to R2). It showed > successful just once. The behavior is completely non-deterministic. > This basic test working non-deterministically some times and failing most of > the times seemed very weird and I turned to running routers outside gdb but > the results were similar. In the process of stopping/restarting the 3 > routers for testing this scenario, I also got a crash (backtrace attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-360) Disallow router with duplicate ID from joining the network
[ https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-360: --- Priority: Minor (was: Blocker) Issue Type: Improvement (was: Bug) Summary: Disallow router with duplicate ID from joining the network (was: Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections) > Disallow router with duplicate ID from joining the network > -- > > Key: DISPATCH-360 > URL: https://issues.apache.org/jira/browse/DISPATCH-360 > Project: Qpid Dispatch > Issue Type: Improvement > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines >Reporter: Vishal Sharda >Priority: Minor > Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, > config1_nossl.conf, config2_nossl.conf, config3_nossl.conf > > > In order to isolate the issues that I am getting with 2-way SSL connections > among routers, I created a cluster of 3 inter-connected routers (R1, R2 and > R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any > type of SSL (I had been using just 2 routers so far but our actual cluster > consists of 3 nodes). All connections were insecure as shown in my config > files. > When I tried sending 4 messages using simple_send.py to R1 after starting > simple_recv.py to receive from R2, I saw no messages were sent. > If I stop R3 and reduce the cluster to just two nodes, it works fine. > If I have 2-way SSL connections between all the 3 routers, it again works > fine. > In my more than 20 runs to test this scenario of sending just 4 messages, it > even worked a few times after waiting for very long. In the other two cases > above, I always got the messages instantaneously (there were no other > senders/receivers active). > The drivers.tar.gz that I attached in DISPATCH-343 either timed out or > returned with unclear status when trying to send just 4 messages from 1 > sender (connected to R1) to 1 receiver (connected to R2). It showed > successful just once. The behavior is completely non-deterministic. > This basic test working non-deterministically some times and failing most of > the times seemed very weird and I turned to running routers outside gdb but > the results were similar. In the process of stopping/restarting the 3 > routers for testing this scenario, I also got a crash (backtrace attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317880#comment-15317880 ] Vishal Sharda edited comment on DISPATCH-358 at 6/7/16 5:36 AM: Ganesh, Yes, I am able to reproduce the crash without SSL also and filed another bug DISPATCH-360 for the same. I am working on refining my Java client (it currently does SSL always). Until then you can use the drivers.tar.gz from DISPATCH-343 against the three routers connected insecurely as per the config files in DISPATCH-360. You will see the issues with the following very basic test: $ ./recv_no_ssl -n 1 amqp://:/paypal/foo Separate terminal: $ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo Be sure to have host3 connected to the two hosts as described and run the test several times. (It will work occasionally and fail most of the times). Then run again with host3 removed. (It will work almost always.) was (Author: vsharda): Ganesh, Yes, I am able to reproduce the crash without SSL also and filed another bug DISPATCH-360 for the same. I am working on refining my Java client (it currently does SSL always). Until then you can use the drivers.tar.gz from DISPATCH-343 against the three routers connected insecurely as per the config files in DISPATCH-360. You will see the issues with the following very basic test: $ ./recv_no_ssl -n 1 amqp://:/paypal/foo Separate terminal: $ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo Be sure to have host3 connected to the two hosts as described and run the test several times. (It will work occasionally and fail most of the times). Then run again with host3 removed. (It will work almost always.) > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, > Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, > Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_bt_double_free_Java_RES_266MB.png, > Crash_double_free_Java_RES_266MB.png, Crash_free.png, > Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-358: --- Attachment: Overflow_Error.png Overflow error that I got once during testing. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, > Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, > Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_bt_double_free_Java_RES_266MB.png, > Crash_double_free_Java_RES_266MB.png, Crash_free.png, > Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-358: --- Attachment: Crash_watch_qdstat.png Crash_Java_same_router.png Crash_Java_same_router_another.png Crash_Java_same_router_another_bt.png Crash_Java_Router_3.png Crash_double_free_Java_RES_266MB.png Crash_bt_double_free_Java_RES_266MB.png More crashes that I got during my testing yesterday and today. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, > Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, > Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_bt_double_free_Java_RES_266MB.png, > Crash_double_free_Java_RES_266MB.png, Crash_free.png, > Crash_sasl_server_done.png, Crash_watch_qdstat.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317880#comment-15317880 ] Vishal Sharda commented on DISPATCH-358: Ganesh, Yes, I am able to reproduce the crash without SSL also and filed another bug DISPATCH-360 for the same. I am working on refining my Java client (it currently does SSL always). Until then you can use the drivers.tar.gz from DISPATCH-343 against the three routers connected insecurely as per the config files in DISPATCH-360. You will see the issues with the following very basic test: $ ./recv_no_ssl -n 1 amqp://:/paypal/foo Separate terminal: $ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo Be sure to have host3 connected to the two hosts as described and run the test several times. (It will work occasionally and fail most of the times). Then run again with host3 removed. (It will work almost always.) > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_free.png, Crash_sasl_server_done.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections
[ https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-360: --- Attachment: Crash_bt_no_SSL_2.png Crash_bt_no_SSL.png Backtrace of the crash observed. > Sender and receiver cannot communicate using a network of 3 inter-connected > routers having insecure connections > --- > > Key: DISPATCH-360 > URL: https://issues.apache.org/jira/browse/DISPATCH-360 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, > config1_nossl.conf, config2_nossl.conf, config3_nossl.conf > > > In order to isolate the issues that I am getting with 2-way SSL connections > among routers, I created a cluster of 3 inter-connected routers (R1, R2 and > R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any > type of SSL (I had been using just 2 routers so far but our actual cluster > consists of 3 nodes). All connections were insecure as shown in my config > files. > When I tried sending 4 messages using simple_send.py to R1 after starting > simple_recv.py to receive from R2, I saw no messages were sent. > If I stop R3 and reduce the cluster to just two nodes, it works fine. > If I have 2-way SSL connections between all the 3 routers, it again works > fine. > In my more than 20 runs to test this scenario of sending just 4 messages, it > even worked a few times after waiting for very long. In the other two cases > above, I always got the messages instantaneously (there were no other > senders/receivers active). > The drivers.tar.gz that I attached in DISPATCH-343 either timed out or > returned with unclear status when trying to send just 4 messages from 1 > sender (connected to R1) to 1 receiver (connected to R2). It showed > successful just once. The behavior is completely non-deterministic. > This basic test working non-deterministically some times and failing most of > the times seemed very weird and I turned to running routers outside gdb but > the results were similar. In the process of stopping/restarting the 3 > routers for testing this scenario, I also got a crash (backtrace attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections
[ https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-360: --- Attachment: config3_nossl.conf config2_nossl.conf config1_nossl.conf The three configuration files that I used for the three routers. > Sender and receiver cannot communicate using a network of 3 inter-connected > routers having insecure connections > --- > > Key: DISPATCH-360 > URL: https://issues.apache.org/jira/browse/DISPATCH-360 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines >Reporter: Vishal Sharda >Priority: Blocker > Attachments: config1_nossl.conf, config2_nossl.conf, > config3_nossl.conf > > > In order to isolate the issues that I am getting with 2-way SSL connections > among routers, I created a cluster of 3 inter-connected routers (R1, R2 and > R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any > type of SSL (I had been using just 2 routers so far but our actual cluster > consists of 3 nodes). All connections were insecure as shown in my config > files. > When I tried sending 4 messages using simple_send.py to R1 after starting > simple_recv.py to receive from R2, I saw no messages were sent. > If I stop R3 and reduce the cluster to just two nodes, it works fine. > If I have 2-way SSL connections between all the 3 routers, it again works > fine. > In my more than 20 runs to test this scenario of sending just 4 messages, it > even worked a few times after waiting for very long. In the other two cases > above, I always got the messages instantaneously (there were no other > senders/receivers active). > The drivers.tar.gz that I attached in DISPATCH-343 either timed out or > returned with unclear status when trying to send just 4 messages from 1 > sender (connected to R1) to 1 receiver (connected to R2). It showed > successful just once. The behavior is completely non-deterministic. > This basic test working non-deterministically some times and failing most of > the times seemed very weird and I turned to running routers outside gdb but > the results were similar. In the process of stopping/restarting the 3 > routers for testing this scenario, I also got a crash (backtrace attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections
Vishal Sharda created DISPATCH-360: -- Summary: Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections Key: DISPATCH-360 URL: https://issues.apache.org/jira/browse/DISPATCH-360 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines Reporter: Vishal Sharda Priority: Blocker In order to isolate the issues that I am getting with 2-way SSL connections among routers, I created a cluster of 3 inter-connected routers (R1, R2 and R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any type of SSL (I had been using just 2 routers so far but our actual cluster consists of 3 nodes). All connections were insecure as shown in my config files. When I tried sending 4 messages using simple_send.py to R1 after starting simple_recv.py to receive from R2, I saw no messages were sent. If I stop R3 and reduce the cluster to just two nodes, it works fine. If I have 2-way SSL connections between all the 3 routers, it again works fine. In my more than 20 runs to test this scenario of sending just 4 messages, it even worked a few times after waiting for very long. In the other two cases above, I always got the messages instantaneously (there were no other senders/receivers active). The drivers.tar.gz that I attached in DISPATCH-343 either timed out or returned with unclear status when trying to send just 4 messages from 1 sender (connected to R1) to 1 receiver (connected to R2). It showed successful just once. The behavior is completely non-deterministic. This basic test working non-deterministically some times and failing most of the times seemed very weird and I turned to running routers outside gdb but the results were similar. In the process of stopping/restarting the 3 routers for testing this scenario, I also got a crash (backtrace attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315263#comment-15315263 ] Vishal Sharda commented on DISPATCH-358: Ganesh, the senders/receivers that I am using are identical to those in drivers.tar.gz but with support for 2-way SSL. I have also written equivalent drivers in Java (built on Proton-J Messenger API) with support for 2-way SSL and have begun testing with them. Occasionally, I am also using the ones from drivers.tar.gz (that do not support SSL) in parallel. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_free.png, Crash_sasl_server_done.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315170#comment-15315170 ] Vishal Sharda commented on DISPATCH-358: The common thing about all the fatal scenarios is multithreading getting triggered in router. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_free.png, Crash_sasl_server_done.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-358: --- Attachment: Crash_SR_2.png Crash_SR_1.png Crash_SASL.png Crash_sasl_server_done.png Crash_SASL_2.png Crash_Java_Send.png Crash_Java_free_qd_connection.png Crash_free.png Attached screenshots having backtraces of router crashes that happened intermittently during my testing. > Intermittent crashes in qdrouterd under load from parallel senders > -- > > Key: DISPATCH-358 > URL: https://issues.apache.org/jira/browse/DISPATCH-358 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, > Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, > Crash_free.png, Crash_sasl_server_done.png > > > In my setup of two inter-connected routers, several senders connecting to one > router while few receivers connecting to the other router, I see several > crashes in the router to which senders connect. These crashes are > intermittent and happen once in every 10 runs or so. I have collected the > backtraces of all the crashes but do not yet have a test case that can > reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders
Vishal Sharda created DISPATCH-358: -- Summary: Intermittent crashes in qdrouterd under load from parallel senders Key: DISPATCH-358 URL: https://issues.apache.org/jira/browse/DISPATCH-358 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines Reporter: Vishal Sharda Priority: Critical In my setup of two inter-connected routers, several senders connecting to one router while few receivers connecting to the other router, I see several crashes in the router to which senders connect. These crashes are intermittent and happen once in every 10 runs or so. I have collected the backtraces of all the crashes but do not yet have a test case that can reliably reproduce any of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-337: --- Attachment: Memory_usage_subsequent_run_no_SSL.png Memory_usage_first_run_no_SSL.png Attached two files: Memory_usage_first_run_no_SSL.png and Memory_usage_subsequent_run_no_SSL.png I ran another test - two routers connected to each other, several senders connecting to first router and 1 receiver connecting to the second router. All connections were insecure (no SSL at all). I see the same type of memory leaks but the pace of growth in resident memory usage is nearly halved. SSL amplifies the leaks but they occur without SSL also. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Memory_usage_first_run_no_SSL.png, > Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, config1.conf, config2.conf, > val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-337: --- Attachment: Subsequent_memory_increase.png Rapid_perm_memory_increase.png Two files attached: Rapid_perm_memory_increase.png and Subsequent_memory_increase.png We ran two interconnected routers having 2-way SSL connection between them and accepting 2-way SSL connections from clients. We connected several senders to one router and several receivers to the other. Total of 250K messages were sent from one end to the other. We saw rapid increase in memory usage of both the routers. This memory never became low again and a subsequent identical run increased the memory further. Since the routers are never intended to be terminated, such memory leaks on all subsequent connections and data transfers will eventually lead to routers being killed. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: Rapid_perm_memory_increase.png, > Subsequent_memory_increase.png, config1.conf, config2.conf, > val2_receiver.txt, val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308664#comment-15308664 ] Vishal Sharda commented on DISPATCH-343: The tracker used by sender is different than the one used by receiver. Hence, receiver does get() followed by accept() and sender does settle(). We see that even if we do not call settle() at either end, the crash still occurs with 6 senders and 1 receiver. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Blocker > Fix For: 0.6.0 > > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > drivers.tar.gz, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302363#comment-15302363 ] Vishal Sharda commented on DISPATCH-343: Ted, Do you have an ETA of the fix? > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Blocker > Fix For: 0.6.0 > > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > drivers.tar.gz, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: drivers.tar.gz Ganesh, I am attaching my drivers here (drivers.tar.gz). When you run them against 2 connected routers from your cluster of 3, you should be able to reproduce the crash. Please do make in the extracted folder and run the drivers as follows: ./recv_no_ssl -n 1 amqp://:/examples Separate terminal: ./send_no_ssl -m 10 -n 6 -a amqp://:/examples > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > drivers.tar.gz, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300571#comment-15300571 ] Vishal Sharda commented on DISPATCH-343: Ganesh, can you provide the number of messages per sender that you tested with and the sender/receiver code that you used? Also, it would help if you can test some scenarios in which senders outnumber receivers. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299218#comment-15299218 ] Vishal Sharda commented on DISPATCH-343: Also hit the crash reported in bt_qd_dealloc.png without SSL. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: config2_nossl.conf config1_nossl.conf The two config files used for no SSL configuration of two routers. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299200#comment-15299200 ] Vishal Sharda commented on DISPATCH-343: I have completely removed SSL. The two config files are attached. Still, I hit the crashes that I reported in bt_qdr_link_cleanup_CT.png and bt_sys_mutex_lock.png. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: bt_qdr_link_cleanup_CT.png One more backtrace of a crash. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, > bt_sasl.png, bt_sys_mutex_lock.png, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: bt_sys_mutex_lock.png bt_sasl.png bt_qd_dealloc.png Screenshots with backtrace under different types of crashes. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, bt_qd_dealloc.png, bt_sasl.png, > bt_sys_mutex_lock.png, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298571#comment-15298571 ] Vishal Sharda commented on DISPATCH-343: Environment for the tests using 2 latest routers: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines. Oenssl information: vsharda@millennium-qpid-untouched-latest-2-8501:~$ dpkg -l | grep openssl ii libcurl4-openssl-dev:amd64 7.38.0-4+deb8u3 amd64development files and documentation for libcurl (OpenSSL flavour) ii libgnutls-openssl27:amd643.3.8-6+deb8u3 amd64GNU TLS library - OpenSSL wrapper ii openssl 1.0.1k-3+deb8u5 amd64Secure Sockets Layer toolkit - cryptographic utility > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298556#comment-15298556 ] Vishal Sharda commented on DISPATCH-337: There are 18 places in val2_sender.txt where memory is "definitely" lost according to valgrind report attached (val2_sender.txt). All of these leaks seem to be coming from router C code. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1.conf, config2.conf, val2_receiver.txt, > val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: Crash_10S_2R.png Router crash with 10 Senders attached to one router and 2 Receivers attached to the second router. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, > Sender_router_crash.png, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298528#comment-15298528 ] Vishal Sharda commented on DISPATCH-343: I will be switching to Debug build now and try to collect debug information. Until then, here is another crash (Crash_10S_2R.png) that I saw with 10 senders sending to one router and 2 receivers receiving from the other router. This crash is due to an invalid pointer. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, Sender_router_crash.png, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: Sender_router_crash.png Crash in router due to assertion failure. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, Sender_router_crash.png, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297247#comment-15297247 ] Vishal Sharda commented on DISPATCH-343: Another test agains two latest routers connected to each other with 4 parallel senders each sending 50K "Hello World!" messages to one router and 4 parallel receivers receiving from another router: Receivers only received 64,390 total messages out of 200,000 and exited after timeout. The router to which they were connected seemed fine. On the other hand, all 4 senders started receiving timeout on their messages. As soon as the senders were killed, the router to which these senders were connected also crashed. The screenshot Sender_router_crash.png is attached. I have seen this assertion failure several times before at both the ends. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297112#comment-15297112 ] Vishal Sharda edited comment on DISPATCH-343 at 5/23/16 10:25 PM: -- I just ran against two latest routers connected to each other with 2 parallel senders each sending 50K "Hello World!" messages to one router and 2 parallel receivers receiving from another router and saw a crash. The screenshot crash.png is attached showing double free error. was (Author: vsharda): I just ran against two latest routers connected to each other with 2 parallel senders sending to one router and 2 parallel receivers receiving from another router and saw a crash. The screenshot crash.png is attached showing double free error. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297112#comment-15297112 ] Vishal Sharda commented on DISPATCH-343: I just ran against two latest routers connected to each other with 2 parallel senders sending to one router and 2 parallel receivers receiving from another router and saw a crash. The screenshot crash.png is attached showing double free error. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: Crash.png Screenshot showing crash in the router to which 2 receivers connected. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297053#comment-15297053 ] Vishal Sharda commented on DISPATCH-343: This is how I start the two senders (-n 2 does the fork() inside the code, UUID is set for each message): time ./send_ssl -c /home/vsharda/protected/switch-dr-network_cert.pem -k /home/vsharda/protected/switch-dr-network_key.pem -p -m 5 -n 2 amqps://guest:guest@10.24.170.251:5671/foo The receivers: time ./recv_ssl -c /home/vsharda/protected/switch-dr-network_cert.pem -k /home/vsharda/protected/switch-dr-network_key.pem -p -n 2 amqps://guest:guest@10.24.170.251:5671/foo > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: R3.conf R2.conf R1.conf The three configuration files used for the routers. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296625#comment-15296625 ] Vishal Sharda commented on DISPATCH-343: Hi Ted, This test was run with a cluster of 3 routers R1, R2, R3 all configured for interior mode. R2 has an inter-router connector to R1, and R3 has two inter-router connectors - one to R1 and other to R2. All the connectors are 2-way SSL. Our driver is based on Proton-C Messenger API and we are now doing unsettled deliveries (stopped doing pre-settled after your response on DISPATCH-336). We have configured fixedAddress "/" for closest distribution but have noticed that support for fixedAddress "/" is now gone. Hence, all our queues are configured for balanced distribution which is default. In our tests, we ran 2 parallel senders sending to R1 and 2 parallel receivers also receiving from R1. All senders/receivers were listening on the same queue. R1 as well as R2 have become completely irresponsive and are not accepting an incoming connection even from qdstat. R1 has become around 919 MB resident. Even R2 has grown to 233 MB resident and R3 to 146MB. All three of them are leaking more memory. Thanks, Vishal > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295050#comment-15295050 ] Vishal Sharda commented on DISPATCH-343: Because of disrupted connections, total 100K messages also failed to reach. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders
[ https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-343: --- Attachment: resource-limit-exceeded.png Connection_aborted.png Connection_aborted_1.png Errors in router after putting load from senders. > Router stops accepting connections after load from parallel senders > --- > > Key: DISPATCH-343 > URL: https://issues.apache.org/jira/browse/DISPATCH-343 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 >Reporter: Vishal Sharda >Priority: Blocker > Attachments: Connection_aborted.png, Connection_aborted_1.png, > resource-limit-exceeded.png > > > We ran 2 parallel senders and 2 receivers with each sender sending 5 > messages. After a while we saw that router stopped accepting connections > even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-343) Router stops accepting connections after load from parallel senders
Vishal Sharda created DISPATCH-343: -- Summary: Router stops accepting connections after load from parallel senders Key: DISPATCH-343 URL: https://issues.apache.org/jira/browse/DISPATCH-343 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Reporter: Vishal Sharda Priority: Blocker We ran 2 parallel senders and 2 receivers with each sender sending 5 messages. After a while we saw that router stopped accepting connections even from qdstat. We saw various errors in the logs (screenshots attached). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
[ https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-337: --- Attachment: val2_sender.txt val2_receiver.txt config2.conf config1.conf Configuration files for two routers and the valgrind output with debug build of Qpid Dispatch running with 2 senders and 2 receivers. > Huge memory leaks in Qpid Dispatch router > - > > Key: DISPATCH-337 > URL: https://issues.apache.org/jira/browse/DISPATCH-337 > Project: Qpid Dispatch > Issue Type: Bug >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1.conf, config2.conf, val2_receiver.txt, > val2_sender.txt > > > Valgrind shows huge memory leaks while running 2 interconnected routers with > 2 parallel senders connected to the one router and 2 parallel receivers > connected to the other router. > The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: > https://issues.apache.org/jira/browse/PROTON-1115 > However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router
Vishal Sharda created DISPATCH-337: -- Summary: Huge memory leaks in Qpid Dispatch router Key: DISPATCH-337 URL: https://issues.apache.org/jira/browse/DISPATCH-337 Project: Qpid Dispatch Issue Type: Bug Affects Versions: 0.6.0 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines Reporter: Vishal Sharda Priority: Critical Valgrind shows huge memory leaks while running 2 interconnected routers with 2 parallel senders connected to the one router and 2 parallel receivers connected to the other router. The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here: https://issues.apache.org/jira/browse/PROTON-1115 However, the rest of the leaks are from qdrouterd. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender
[ https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-336: --- Attachment: (was: output.txt) > Very high latency for fire-and-forget sender > > > Key: DISPATCH-336 > URL: https://issues.apache.org/jira/browse/DISPATCH-336 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1.conf, config2.conf, output_1S_1R.txt > > > We are running two interconnected routers with 1 fire-and-forget sender > connected to 1 router and 1 receiver connected to another router. We are > observing increasing latency for the messages irrespective of number messages > sent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender
[ https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-336: --- Attachment: output_1S_1R.txt > Very high latency for fire-and-forget sender > > > Key: DISPATCH-336 > URL: https://issues.apache.org/jira/browse/DISPATCH-336 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1.conf, config2.conf, output_1S_1R.txt > > > We are running two interconnected routers with 1 fire-and-forget sender > connected to 1 router and 1 receiver connected to another router. We are > observing increasing latency for the messages irrespective of number messages > sent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender
[ https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-336: --- Attachment: output.txt config2.conf config1.conf Configuration files for two routers and the observed latency for 200K messages to arrive from sender to the receiver (Both sender and receiver were running on the same machine). > Very high latency for fire-and-forget sender > > > Key: DISPATCH-336 > URL: https://issues.apache.org/jira/browse/DISPATCH-336 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and > dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines >Reporter: Vishal Sharda >Priority: Critical > Attachments: config1.conf, config2.conf, output.txt > > > We are running two interconnected routers with 1 fire-and-forget sender > connected to 1 router and 1 receiver connected to another router. We are > observing increasing latency for the messages irrespective of number messages > sent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Comment Edited] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280551#comment-15280551 ] Vishal Sharda edited comment on DISPATCH-332 at 5/11/16 6:18 PM: - Two router configuration files to reproduce the message loss bug and the output from the receiver. There were to simple_send.py senders running in parallel and each sending 20K messages. The simple_recv.py on the other router however received only 1 message - the last one (2) from both the senders. was (Author: vsharda): Two router configuration files to reproduce the message loss bug and the output from the receiver. > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Blocker > Fix For: 0.6.0 > > Attachments: config1.conf, config2.conf, output.txt > > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router with > inter-router role. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-332: --- Attachment: output.txt config2.conf config1.conf Two router configuration files to reproduce the message loss bug and the output from the receiver. > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Blocker > Fix For: 0.6.0 > > Attachments: config1.conf, config2.conf, output.txt > > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router with > inter-router role. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Commented] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280386#comment-15280386 ] Vishal Sharda commented on DISPATCH-332: I used the following fixedAddress in both the configuration files. fixedAddress { prefix: / fanout: single bias: closest } Insecure port 5672 was used for all the communication. Everything is working fine if the 2 senders and 1 receiver are all attached to the same router and also if 1 sender and 1 receiver are each connected to the two interconnected routers. The issue occurs only when we start a second parallel sender on the same router where one sender is already active. Increasing the number of parallel senders and receivers further increases the percentage of messages lost. > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Assignee: Ted Ross >Priority: Blocker > Fix For: 0.6.0 > > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router with > inter-router role. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-332: --- Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD. (was: Debian 8.3, Qpid Proton 0.12.2 for drivers and dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD.) > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Priority: Blocker > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router with > inter-router role. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-332: --- Description: We are running two Dispatch Routers each configured for interior mode and the second router's configuration includes a connector to the first router with inter-router role. When we connect one sender to one router and one receiver to the other router both listening to the same queue, we see all messages (20,000 in our test) being transmitted. As soon as we start a second sender connected to the same router to which the first sender connects and sending to the same queue, we start seeing heavy message loss. Around 20% of messages are lost with each sender attempting to send 20,000 messages on its own (40,000 in total) and running in parallel with the other sender. The message loss happens regardless of the message size. We tried with simple_send.py, simple_recv.py as well as send and recv C executable files from Qpid Proton 0.12.2. We even saw a crash in the router with the following message: qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: sys_mutex_lock: Assertion `result == 0' failed. Aborted The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as the one taken on March 3 before the router core refactoring. was: We are running two Dispatch Routers each configured for interior mode and the second router's configuration includes a connector to the first router. When we connect one sender to one router and one receiver to the other router both listening to the same queue, we see all messages (20,000 in our test) being transmitted. As soon as we start a second sender connected to the same router to which the first sender connects and sending to the same queue, we start seeing heavy message loss. Around 20% of messages are lost with each sender attempting to send 20,000 messages on its own (40,000 in total) and running in parallel with the other sender. The message loss happens regardless of the message size. We tried with simple_send.py, simple_recv.py as well as send and recv C executable files from Qpid Proton 0.12.2. We even saw a crash in the router with the following message: qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: sys_mutex_lock: Assertion `result == 0' failed. Aborted The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as the one taken on March 3 before the router core refactoring. > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Priority: Blocker > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router with > inter-router role. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
[ https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Sharda updated DISPATCH-332: --- Description: We are running two Dispatch Routers each configured for interior mode and the second router's configuration includes a connector to the first router. When we connect one sender to one router and one receiver to the other router both listening to the same queue, we see all messages (20,000 in our test) being transmitted. As soon as we start a second sender connected to the same router to which the first sender connects and sending to the same queue, we start seeing heavy message loss. Around 20% of messages are lost with each sender attempting to send 20,000 messages on its own (40,000 in total) and running in parallel with the other sender. The message loss happens regardless of the message size. We tried with simple_send.py, simple_recv.py as well as send and recv C executable files from Qpid Proton 0.12.2. We even saw a crash in the router with the following message: qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: sys_mutex_lock: Assertion `result == 0' failed. Aborted The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as the one taken on March 3 before the router core refactoring. was: We are running two Dispatch Routers each configured for inter-router mode and the second router's configuration includes a connector to the first router. When we connect one sender to one router and one receiver to the other router both listening to the same queue, we see all messages (20,000 in our test) being transmitted. As soon as we start a second sender connected to the same router to which the first sender connects and sending to the same queue, we start seeing heavy message loss. Around 20% of messages are lost with each sender attempting to send 20,000 messages on its own (40,000 in total) and running in parallel with the other sender. The message loss happens regardless of the message size. We tried with simple_send.py, simple_recv.py as well as send and recv C executable files from Qpid Proton 0.12.2. We even saw a crash in the router with the following message: qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: sys_mutex_lock: Assertion `result == 0' failed. Aborted The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as the one taken on March 3 before the router core refactoring. > Heavy message loss happening with 2 interconnected routers > -- > > Key: DISPATCH-332 > URL: https://issues.apache.org/jira/browse/DISPATCH-332 > Project: Qpid Dispatch > Issue Type: Bug > Components: Routing Engine >Affects Versions: 0.6.0 > Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and > dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD. >Reporter: Vishal Sharda >Priority: Blocker > > We are running two Dispatch Routers each configured for interior mode and the > second router's configuration includes a connector to the first router. > When we connect one sender to one router and one receiver to the other router > both listening to the same queue, we see all messages (20,000 in our test) > being transmitted. > As soon as we start a second sender connected to the same router to which the > first sender connects and sending to the same queue, we start seeing heavy > message loss. Around 20% of messages are lost with each sender attempting to > send 20,000 messages on its own (40,000 in total) and running in parallel > with the other sender. The message loss happens regardless of the message > size. > We tried with simple_send.py, simple_recv.py as well as send and recv C > executable files from Qpid Proton 0.12.2. > We even saw a crash in the router with the following message: > qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: > sys_mutex_lock: Assertion `result == 0' failed. > Aborted > The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well > as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org
[jira] [Created] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers
Vishal Sharda created DISPATCH-332: -- Summary: Heavy message loss happening with 2 interconnected routers Key: DISPATCH-332 URL: https://issues.apache.org/jira/browse/DISPATCH-332 Project: Qpid Dispatch Issue Type: Bug Components: Routing Engine Affects Versions: 0.6.0 Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD. Reporter: Vishal Sharda Priority: Blocker We are running two Dispatch Routers each configured for inter-router mode and the second router's configuration includes a connector to the first router. When we connect one sender to one router and one receiver to the other router both listening to the same queue, we see all messages (20,000 in our test) being transmitted. As soon as we start a second sender connected to the same router to which the first sender connects and sending to the same queue, we start seeing heavy message loss. Around 20% of messages are lost with each sender attempting to send 20,000 messages on its own (40,000 in total) and running in parallel with the other sender. The message loss happens regardless of the message size. We tried with simple_send.py, simple_recv.py as well as send and recv C executable files from Qpid Proton 0.12.2. We even saw a crash in the router with the following message: qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: sys_mutex_lock: Assertion `result == 0' failed. Aborted The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as the one taken on March 3 before the router core refactoring. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org For additional commands, e-mail: dev-h...@qpid.apache.org