[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-12-07 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729662#comment-15729662
 ] 

Vishal Sharda commented on DISPATCH-358:


What you said seems to be the case.  These crashes have not been observed after 
upgrading to Proton 0.13.0 and later.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Critical
> Attachments: Crash_EXTERNAL.png, Crash_Java_Router_3.png, 
> Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_Java_same_router.png, Crash_Java_same_router_another.png, 
> Crash_Java_same_router_another_bt.png, Crash_SASL.png, Crash_SASL_2.png, 
> Crash_SR_1.png, Crash_SR_2.png, Crash_bt_double_free_Java_RES_266MB.png, 
> Crash_double_free_Java_RES_266MB.png, Crash_free.png, 
> Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-10-07 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556243#comment-15556243
 ] 

Vishal Sharda commented on DISPATCH-337:


We built with the default setting "RelWithDebInfo" which should have included 
the Debug information.

CPU usage per thread is very low.

vsharda@millennium-qpid-deploy-lnp-1-5129:~$ top -Hbcd 5 | grep qdrouterd
14213 vsharda   20   0   11132   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25467 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.34 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  20:02.09 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.65 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:55.10 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  20:02.10 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
14213 vsharda   20   0   11136   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25467 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.34 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.65 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:55.10 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  19:51.66 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
14213 vsharda   20   0   11136   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25467 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.34 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  20:02.10 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:55.10 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  20:02.11 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  19:55.11 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
14213 vsharda   20   0   11136   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25467 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.34 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.66 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25467 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  19:51.35 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  19:51.67 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
14213 vsharda   20   0   11136   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  20:02.11 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:55.11 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25493 qserv 20   0 2524848 2.187g   8420 S  0.2  3.7  20:02.12 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
14213 vsharda   20   0   11136   1624   1492 S  0.0  0.0   0:00.00 grep 
qdrouterd
25467 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.35 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25482 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7   0:39.50 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25494 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:51.67 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+
25495 qserv 20   0 2524848 2.187g   8420 S  0.0  3.7  19:55.11 qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator+

The memory footprint is increasing steadily.

qdstat failed to get a response within 120 seconds:

vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -cb 10.25.171.242 -t 120
Timeout: Connection amqp://10.25.171.242:amqp/$management timed out: Opening 
connection



> Huge memory leaks in 

[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-10-07 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556181#comment-15556181
 ] 

Vishal Sharda commented on DISPATCH-337:


vsharda@millennium-qpid-deploy-lnp-2-7131:/$ PN_TRACE_FRM=1 qdstat -cb 
10.24.170.251
[0xfbf6f0]:  -> SASL
[0xfbf6f0]:  <- SASL
[0xfbf6f0]:0 <- @sasl-mechanisms(64) 
[sasl-server-mechanisms=@PN_SYMBOL[:ANONYMOUS]]
[0xfbf6f0]:0 -> @sasl-init(65) [mechanism=:ANONYMOUS, 
initial-response=b"anonymous@millennium-qpid-deploy-lnp-2-7131"]
[0xfbf6f0]:0 <- @sasl-outcome(68) [code=0]
[0xfbf6f0]:  -> AMQP
[0xfbf6f0]:0 -> @open(16) [container-id="2034a069-e072-46f3-ac55-3c76fbb692ca", 
hostname="10.24.170.251", channel-max=32767]
[0xfbf6f0]:  <- AMQP
[0xfbf6f0]:0 <- @open(16) [container-id="Router.A.0", max-frame-size=16384, 
channel-max=32767, idle-time-out=8000, offered-capabilities=:"ANONYMOUS-RELAY", 
properties={:product="qpid-dispatch-router", :version="0.6.0"}]
[0xfbf6f0]:0 -> @begin(17) [next-outgoing-id=0, incoming-window=2147483647, 
outgoing-window=2147483647]
[0xfbf6f0]:0 -> @attach(18) 
[name="2034a069-e072-46f3-ac55-3c76fbb692ca-$management", handle=0, role=false, 
snd-settle-mode=2, rcv-settle-mode=0, source=@source(40) [durable=0, timeout=0, 
dynamic=false], target=@target(41) [address="$management", durable=0, 
timeout=0, dynamic=false], initial-delivery-count=0]
[0xfbf6f0]:0 <- @begin(17) [remote-channel=0, next-outgoing-id=0, 
incoming-window=61, outgoing-window=2147483647]
[0xfbf6f0]:0 -> (EMPTY FRAME)
[0xfbf6f0]:0 -> (EMPTY FRAME)
Timeout: Connection amqp://10.24.170.251:amqp/$management timed out: Opening 
link 2034a069-e072-46f3-ac55-3c76fbb692ca-$management


There are no symbols found while running pstack


25467: qdrouterd -c /x/web/LIVE/switch-dr-network/configurator/qdrouterd.conf
(No symbols found)
0x7f4197612d3d:  (2, 4023a0, 7fff7c15271f, 7fff7c15271f, 4023a0, 401a67) + 
800085032b50
0x7f41987163b0:  (1, 12f3f60, 30, 31, 7f4184123050, 7f41801207e0) + 
527e0
0x10003:  (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + 
ffdddcf0
0x7f410004:  (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) 
+ 90
0x01183850:  (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + 
ffdddcf0
0x7f410004:  (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) 
+ 90
0x01183850:  (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + 
ffdddcf0
0x7f410004:  (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) 
+ 90
0x01183850:  (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + 
ffdddcf0
0x7f410004:  (12bee70, 7f4195605a50, 12c1a90, 12c1b50, 7f4198b4d580, 4) 
+ 90
0x01183850:  (1267c80, 1278990, 1267ca0, 11d8120, 124dd40, 401810) + 
ffdddcf0


> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, 
> LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-10-07 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15556144#comment-15556144
 ] 

Vishal Sharda commented on DISPATCH-337:


1. The router is reachable and shows "Accepting incoming connection ..." from 
one of the client machines in the logs.  It is not clear what causes connection 
failure.

2. Yes, there are few TCP connections to the bad router:

vsharda@millennium-qpid-deploy-lnp-1-5129:~$ netstat -at | grep 5670
tcp0  0 *:5670  *:* LISTEN 
tcp0  0 millennium-qpid-de:5670 userstage114828.c:36758 ESTABLISHED
tcp0  0 localhost:5670  localhost:33894 ESTABLISHED
tcp0  0 millennium-qpid-de:5670 userstage118169.c:38132 ESTABLISHED
tcp0  0 millennium-qpid-de:5670 10.22.99.81:40080   ESTABLISHED
tcp0  0 millennium-qpid-de:5670 10.22.102.215:50594 ESTABLISHED
tcp6   0  0 localhost:33894 localhost:5670  ESTABLISHED

3. Other routers say that the bad router (Router.A.0) does not exist in the 
network:

vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -nv
Routers in the Network
  router-id   next-hop  link  cost  neighbors   
valid-origins
  
=
  Router.A.1  (self)-   ['Router.A.2', 'Router.A.3', 'Router.A.4']  
[]
  Router.A.2  - 1 1 ['Router.A.1', 'Router.A.3', 'Router.A.4']  
[]
  Router.A.3  - 2 1 ['Router.A.1', 'Router.A.2', 'Router.A.4']  
[]
  Router.A.4  - 3 1 ['Router.A.1', 'Router.A.2', 'Router.A.3']  
[]

4. On 2016-10-02 13:28:30, it was 620 MB.  On 2016-10-07 13:16:00, it is 2.127 
GB.

5. Bad router cannot be checked from the good routers:

vsharda@millennium-qpid-deploy-lnp-2-7131:/$ qdstat -b 10.24.170.251 -c
Timeout: Connection amqp://10.24.170.251:amqp/$management timed out: Opening 
link 9809b66a-1f2d-4952-92f1-c6c5c8b35680-$management

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, 
> LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-10-03 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15543002#comment-15543002
 ] 

Vishal Sharda edited comment on DISPATCH-337 at 10/3/16 9:34 PM:
-

One router on a machine (LNP-1) in a cluster of 5 completely connected routers 
suddenly started growing in memory and stopped accepting incoming connections.  
We cannot even run qdstat on that router to know its status.

This is happening on one machine after we cherry-picked the fixes of 
DISPATCH-491 and DISPATCH-505 on all 5 machines in this cluster.


was (Author: vsharda):
One router on a machine (LNP-1) in a cluster of 5 completely connected routers 
suddenly started growing in memory and stopped accepting incoming connections.  
We cannot even run qdstat on that router to know its status.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, 
> LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-10-03 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-337:
---
Attachment: LNP-1_not_accepting_connections.png
LNP-1_Leak_starts.png
LNP-1_Huge_memory.png

One router on a machine (LNP-1) in a cluster of 5 completely connected routers 
suddenly started growing in memory and stopped accepting incoming connections.  
We cannot even run qdstat on that router to know its status.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: LNP-1_Huge_memory.png, LNP-1_Leak_starts.png, 
> LNP-1_not_accepting_connections.png, Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-09-22 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514526#comment-15514526
 ] 

Vishal Sharda commented on DISPATCH-337:


Hi Ted,

The testing continues and it has grown to 167 MB now.  There were some 
undelivered messages but we killed all the clients.  Router memory stayed at 
167 MB.

 3124 tim   20   0  288432 167348   6088 S   0.7  2.1 114:07.90 qdrouterd   
   

Here are the various qdstat outputs now.

tim@tkuchlein3-linux:~$ qdstat -c
Connections
  Idhost  container role
  dir  securityauthentication
  
=
  2 10.244.162.114:58234  Router.A.4
inter-router  in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user
  3 10.244.162.117:52588  Router.A.5
inter-router  in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user
  4 10.244.162.176:48724  Router.A.6
inter-router  in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user
  5 10.244.162.231:38920  Router.A.7
inter-router  in   TLSv1/SSLv3(DHE-RSA-AES256-GCM-SHA384)  anonymous-user
  8360  127.0.0.1:57104   69110aed-8f13-4968-8c84-cad3c734a859  normal  
  in   no-security anonymous-user


tim@tkuchlein3-linux:~$ qdstat -m
Types
  type size   batch  thread-max  totalin-threads  
rebal-in   rebal-out
  
==
  qd_bitmask_t 24 64 128 640  128 
56,559 56,567
  qd_buffer_t  53616 32  120,672  118,144 
1,799,008  1,799,166
  qd_composed_field_t  64 64 128 64   64  0 
 0
  qd_composite_t   11264 128 128  128 0 
 0
  qd_connection_t  22464 128 1,024128 95
 109
  qd_deferred_call_t   32 64 128 64   64  0 
 0
  qd_field_iterator_t  12864 128 16,640   13,184  
196,986197,040
  qd_hash_handle_t 16 64 128 192  192 0 
 0
  qd_hash_item_t   32 64 128 192  192 0 
 0
  qd_hash_segment_t24 64 128 64   64  0 
 0
  qd_link_t48 64 128 1,088128 101   
 116
  qd_listener_t32 64 128 64   64  0 
 0
  qd_log_entry_t   2,104  16 32  1,0081,008   0 
 0
  qd_management_context_t  56 64 128 64   64  0 
 0
  qd_message_content_t 64016 32  26,928   26,560  
104,471104,494
  qd_message_t 12864 128 27,328   26,624  
50,618 50,629
  qd_node_t56 64 128 64   64  0 
 0
  qd_parsed_field_t80 64 128 7,9364,928   
107,049107,096
  qd_timer_t   56 64 128 128  128 0 
 0
  qd_work_item_t   24 64 128 1,024128 3,059 
 3,073
  qdpn_connector_t 60016 32  1,02432  429   
 491
  qdpn_listener_t  48 64 128 64   64  0 
 0
  qdr_action_t 16064 128 320  128 
501,060501,063
  qdr_address_config_t 56 64 128 64   64  0 
 0
  qdr_address_t26416 32  64   32  0 
 2
  qdr_connection_t 21664 128 1,152128 112   
 128
  qdr_connection_work_t56 64 128 192  128 129   
 130
  qdr_delivery_ref_t   24 64 128 448  128 
115,473115,478
  qdr_delivery_t   14464 128 26,560   25,920  5,327 
 5,337
  qdr_error_t  24 64 128 192  128 129   
 130
  qdr_field_t  40 64 128 8,4488,448   9,175 
 9,175
  qdr_general_work_t   64 64 128 192  128 
55,495 55,496
  qdr_link_ref_t   24 64 128 2,176128 
271,818271,850
  qdr_link_t   26416 32  1,07248  460   
 524
  qdr_node_t   80 64 128 64   64  

[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-09-22 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514271#comment-15514271
 ] 

Vishal Sharda commented on DISPATCH-337:


Hi Ted,

Here is the output.  This router has grown to nearly 160 MB in memory.

tim@tkuchlein3-linux:~$ qdstat -m
Types
  type size   batch  thread-max  totalin-threads  
rebal-in   rebal-out
  
==
  qd_bitmask_t 24 64 128 640  192 
55,913 55,920
  qd_buffer_t  53616 32  118,624  115,936 
1,658,161  1,658,329
  qd_composed_field_t  64 64 128 64   64  0 
 0
  qd_composite_t   11264 128 128  128 0 
 0
  qd_connection_t  22464 128 1,024128 95
 109
  qd_deferred_call_t   32 64 128 64   64  0 
 0
  qd_field_iterator_t  12864 128 16,192   13,184  
169,723169,770
  qd_hash_handle_t 16 64 128 128  128 0 
 0
  qd_hash_item_t   32 64 128 128  128 0 
 0
  qd_hash_segment_t24 64 128 64   64  0 
 0
  qd_link_t48 64 128 1,088128 101   
 116
  qd_listener_t32 64 128 64   64  0 
 0
  qd_log_entry_t   2,104  16 32  1,0081,008   0 
 0
  qd_management_context_t  56 64 128 64   64  0 
 0
  qd_message_content_t 64016 32  26,512   26,032  
102,426102,456
  qd_message_t 12864 128 27,072   26,112  
49,998 50,013
  qd_node_t56 64 128 64   64  0 
 0
  qd_parsed_field_t80 64 128 7,6804,928   
88,639 88,682
  qd_timer_t   56 64 128 128  128 0 
 0
  qd_work_item_t   24 64 128 1,024128 3,059 
 3,073
  qdpn_connector_t 60016 32  1,02448  428   
 489
  qdpn_listener_t  48 64 128 64   64  0 
 0
  qdr_action_t 16064 128 320  128 
469,489469,492
  qdr_address_config_t 56 64 128 64   64  0 
 0
  qdr_address_t26416 32  64   64  0 
 0
  qdr_connection_t 21664 128 1,152192 112   
 127
  qdr_connection_work_t56 64 128 192  128 128   
 129
  qdr_delivery_ref_t   24 64 128 448  128 
102,729102,734
  qdr_delivery_t   14464 128 26,304   25,344  5,184 
 5,199
  qdr_error_t  24 64 128 192  128 127   
 128
  qdr_field_t  40 64 128 8,4488,384   9,077 
 9,078
  qdr_general_work_t   64 64 128 192  128 
54,906 54,907
  qdr_link_ref_t   24 64 128 2,176256 
259,099259,129
  qdr_link_t   26416 32  1,07296  457   
 518
  qdr_node_t   80 64 128 64   64  0 
 0
  qdr_query_t  33616 32  16   16  0 
 0
  qdr_terminus_t   64 64 128 64   64  0 
 0
  qdtm_router_t16 64 128 64   64  0 
 0


> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> 

[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-09-22 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514009#comment-15514009
 ] 

Vishal Sharda commented on DISPATCH-337:


Hi Ted,

The clients connected to the non-SSL port (5672).

Here are the openssl versions used while building:

tim@tkuchlein3-linux:~$ sudo dpkg -l | grep openssl
ii  libgnutls-openssl27:amd64   3.4.10-4ubuntu1.1   
amd64GNU TLS library - OpenSSL wrapper
ii  openssl 1.0.2g-1ubuntu4.2   
amd64Secure Sockets Layer toolkit - 
cryptographic utility


> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-09-22 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-337:
---
Attachment: Tim-Router-3-huge-memory-usage.png
Tim_Routers_3_and_6_further_leaks.png
Tim_Router_3.png

We are running a network of 5 routers (0.6.1 and Proton 0.14.0) on 5 bare metal 
machines having Ubuntu 16.04.1 LTS.  These charts show that routers leak memory 
every time clients connect.  There are no Undelivered/pending messages.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, Tim-Router-3-huge-memory-usage.png, 
> Tim_Router_3.png, Tim_Routers_3_and_6_further_leaks.png, config1.conf, 
> config2.conf, val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-470) Router terminated by itself while sitting idle

2016-08-02 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-470:
--

 Summary: Router terminated by itself while sitting idle
 Key: DISPATCH-470
 URL: https://issues.apache.org/jira/browse/DISPATCH-470
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.1
 Environment: Debian 8.3, Apache Qpid Proton 0.13.0 for drivers and 
dependencies, Hardware: 8 CPUs, 61 GB RAM, 30 GB HDD each on 5 separate machines
Reporter: Vishal Sharda


We are running a network of 5 inter-connected routers each on a separate host.  
One of the routers in this network terminated after running successfully for 7+ 
days.  The router was idle (not receiving/sending any messages) when this 
happened.  After analyzing the core dump, it turned out to be something related 
to mutex lock in multithreading.

Here is the full bt:

Reading symbols from qpid-dispatch/qdrouterd...(no debugging symbols 
found)...done.
[New LWP 4082]
[New LWP 4088]
[New LWP 4030]
[New LWP 4089]
[New LWP 4090]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `qdrouterd -c 
/x/web/LIVE/switch-dr-network/configurator/qdrouterd.conf'.
Program terminated with signal SIGABRT, Aborted.
#0  0x7ffaaf7e6067 in __GI_raise (sig=sig@entry=6) at 
../nptl/sysdeps/unix/sysv/linux/raise.c:56
56../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) up
#1  0x7ffaaf7e7448 in __GI_abort () at abort.c:89
89abort.c: No such file or directory.
(gdb) up
#2  0x7ffaaf7df266 in __assert_fail_base (
fmt=0x7ffaaf918238 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
assertion=assertion@entry=0x7ffab0997baf "result == 0", 
file=file@entry=0x7ffab0997b48 
"/home/anhtran/qpid-package/dispatch/qpid-dispatch-0.6.x-Vanilla/src/posix/threading.c",
 line=line@entry=71, function=function@entry=0x7ffab0997cd1 "sys_mutex_lock")
at assert.c:92
92assert.c: No such file or directory.
(gdb) up
#3  0x7ffaaf7df312 in __GI___assert_fail (assertion=0x7ffab0997baf "result 
== 0", 
file=0x7ffab0997b48 
"/home/anhtran/qpid-package/dispatch/qpid-dispatch-0.6.x-Vanilla/src/posix/threading.c",
 line=71, function=0x7ffab0997cd1 "sys_mutex_lock") at assert.c:101
101   in assert.c
(gdb) 
#4  0x7ffab09802eb in sys_mutex_lock () from 
/usr/local/lib/qpid-dispatch/libqpid-dispatch.so
(gdb) 
#5  0x7ffab098863a in qdr_forward_deliver_CT ()
   from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so
(gdb) 
#6  0x7ffab0989274 in qdr_forward_closest_CT ()
   from /usr/local/lib/qpid-dispatch/libqpid-dispatch.so
(gdb) 
#7  0x7ffab098dd98 in ?? () from 
/usr/local/lib/qpid-dispatch/libqpid-dispatch.so
(gdb) 
#8  0x7ffab098b45a in router_core_thread () from 
/usr/local/lib/qpid-dispatch/libqpid-dispatch.so
(gdb) 
#9  0x7ffab04f50a4 in start_thread (arg=0x7ffaad524700) at 
pthread_create.c:309
309   pthread_create.c: No such file or directory.
(gdb) 
#10 0x7ffaaf89987d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
111   ../sysdeps/unix/sysv/linux/x86_64/clone.S: No such file or directory.
(gdb) 
Initial frame selected; you cannot go up.
(gdb) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads

2016-06-13 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327955#comment-15327955
 ] 

Vishal Sharda commented on DISPATCH-383:


These are intermittent crashes and I do not yet have a test case that can 
reliably reproduce them.

1. If I restarted R1 with different number of threads, both R2 and R3 crashed 
with the same backtrace which is attached here.  On a later run, I saw crash 
only in R2.

2. Yes, this could most likely be timing issue with multithreading on.  There 
is no way for us to control/prevent this from occurring again.  The steps 
involved were simple - interrupting the router, editing the configuration file 
and starting it again.

3.  I have not tested this without SSL but the intermittent crashes that I was 
seeing due to SASL (DISPATCH-358) no longer appear after upgrading to 
Proton-0.13.0-RC.  Hence, I keep 2-way SSL enabled for all inter-router 
communication during my tests.


> Intermittent router crashes when restarting one router in the network with 
> different number of threads
> --
>
> Key: DISPATCH-383
> URL: https://issues.apache.org/jira/browse/DISPATCH-383
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Assignee: Ganesh Murthy
>Priority: Critical
> Attachments: Crash_route_tables_1.png, Crash_route_tables_2.png, 
> Crash_route_tables_3.png
>
>
> Network: A network of 3 interior routers built using the latest trunk and 
> connected to each other using 2-way SSL.
> Stopping one router in the network, changing its number of threads in the 
> configuration file and starting it again to join the network causes 
> intermittent crash in other routers in the network.
> I was able to reproduce the crash three times and collect the backtraces 
> inside gdb (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads

2016-06-12 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-383:
---
Attachment: Crash_route_tables_3.png
Crash_route_tables_2.png
Crash_route_tables_1.png

Screenshots showing crash and the backtrace in the routers when doing as 
described.

> Intermittent router crashes when restarting one router in the network with 
> different number of threads
> --
>
> Key: DISPATCH-383
> URL: https://issues.apache.org/jira/browse/DISPATCH-383
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_route_tables_1.png, Crash_route_tables_2.png, 
> Crash_route_tables_3.png
>
>
> Network: A network of 3 interior routers built using the latest trunk and 
> connected to each other using 2-way SSL.
> Stopping one router in the network, changing its number of threads in the 
> configuration file and starting it again to join the network causes 
> intermittent crash in other routers in the network.
> I was able to reproduce the crash three times and collect the backtraces 
> inside gdb (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-383) Intermittent router crashes when restarting one router in the network with different number of threads

2016-06-12 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-383:
--

 Summary: Intermittent router crashes when restarting one router in 
the network with different number of threads
 Key: DISPATCH-383
 URL: https://issues.apache.org/jira/browse/DISPATCH-383
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines
Reporter: Vishal Sharda
Priority: Critical


Network: A network of 3 interior routers built using the latest trunk and 
connected to each other using 2-way SSL.

Stopping one router in the network, changing its number of threads in the 
configuration file and starting it again to join the network causes 
intermittent crash in other routers in the network.

I was able to reproduce the crash three times and collect the backtraces inside 
gdb (screenshots attached).





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-382) Intermittent router crash when starting 50 receivers/0 senders and doing qdstat

2016-06-12 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-382:
---
Attachment: val_crash_2.txt
val_crash_1.txt
Crash_in_Valgrind_3.txt
Crash_in_Valgrind_3.png
Crash_in_Valgrind_2.png
Crash_in_Valgrind_1.png

Attached 3 screenshots showing the crash and 3 output files from Valgrind for 
the corresponding runs.

Here is the information about the thread that lead to SIGABRT as per Valgrind:

==18841== Thread 2:
==18841== Invalid read of size 4
==18841==at 0x52F7274: pthread_mutex_lock (pthread_mutex_lock.c:66)
==18841==by 0x4E648E7: sys_mutex_lock (threading.c:70)
==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==by 0x5F8387C: clone (clone.S:111)
==18841==  Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd
==18841==at 0x4C28D29: free (vg_replace_malloc.c:530)
==18841==by 0x4E648CD: sys_mutex_free (threading.c:64)
==18841==by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972)
==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==by 0x5F8387C: clone (clone.S:111)
==18841==  Block was alloc'd at
==18841==at 0x4C27C0F: malloc (vg_replace_malloc.c:299)
==18841==by 0x4E64859: sys_mutex (threading.c:51)
==18841==by 0x4E6D111: qdr_connection_opened (connections.c:85)
==18841==by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560)
==18841==by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572)
==18841==by 0x4E5397D: notify_opened (container.c:261)
==18841==by 0x4E53A0D: policy_notify_opened (container.c:275)
==18841==by 0x4E61B3A: qd_policy_amqp_open (policy.c:744)
==18841==by 0x4E81BC1: invoke_deferred_calls (server.c:720)
==18841==by 0x4E81CE7: process_connector (server.c:766)
==18841==by 0x4E827C0: thread_run (server.c:1024)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841== 
==18841== Invalid read of size 4
==18841==at 0x52F2A03: __pthread_mutex_lock_full (pthread_mutex_lock.c:177)
==18841==by 0x4E648E7: sys_mutex_lock (threading.c:70)
==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==by 0x5F8387C: clone (clone.S:111)
==18841==  Address 0xdc42ee0 is 16 bytes inside a block of size 48 free'd
==18841==at 0x4C28D29: free (vg_replace_malloc.c:530)
==18841==by 0x4E648CD: sys_mutex_free (threading.c:64)
==18841==by 0x4E6FBAF: qdr_connection_closed_CT (connections.c:972)
==18841==by 0x4E76623: router_core_thread (router_core_thread.c:71)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841==by 0x5F8387C: clone (clone.S:111)
==18841==  Block was alloc'd at
==18841==at 0x4C27C0F: malloc (vg_replace_malloc.c:299)
==18841==by 0x4E64859: sys_mutex (threading.c:51)
==18841==by 0x4E6D111: qdr_connection_opened (connections.c:85)
==18841==by 0x4E7D7CA: AMQP_opened_handler (router_node.c:560)
==18841==by 0x4E7D837: AMQP_inbound_opened_handler (router_node.c:572)
==18841==by 0x4E5397D: notify_opened (container.c:261)
==18841==by 0x4E53A0D: policy_notify_opened (container.c:275)
==18841==by 0x4E61B3A: qd_policy_amqp_open (policy.c:744)
==18841==by 0x4E81BC1: invoke_deferred_calls (server.c:720)
==18841==by 0x4E81CE7: process_connector (server.c:766)
==18841==by 0x4E827C0: thread_run (server.c:1024)
==18841==by 0x52F50A3: start_thread (pthread_create.c:309)
==18841== 
==18841== 
==18841== Process terminating with default action of signal 6 (SIGABRT)
==18841==at 0x5ED0067: raise (raise.c:56)
==18841==by 0x5ED1447: abort (abort.c:89)
==18841==by 0x5EC9265: __assert_fail_base (assert.c:92)
==18841==by 0x5EC9311: __assert_fail (assert.c:101)
==18841==by 0x4E6490F: sys_mutex_lock (threading.c:71)
==18841==by 0x4E70EDC: qdr_forward_deliver_CT (forwarder.c:132)
==18841==by 0x4E71D4A: qdr_forward_closest_CT (forwarder.c:405)
==18841==by 0x4E72A96: qdr_forward_message_CT (forwarder.c:707)
==18841==by 0x4E7C0B9: qdr_send_to_CT (transfer.c:581)
==18841==by 0x4E76623: router_core_thread 

[jira] [Created] (DISPATCH-382) Intermittent router crash when starting 50 receivers/0 senders and doing qdstat

2016-06-12 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-382:
--

 Summary: Intermittent router crash when starting 50 receivers/0 
senders and doing qdstat
 Key: DISPATCH-382
 URL: https://issues.apache.org/jira/browse/DISPATCH-382
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines
Reporter: Vishal Sharda
Priority: Blocker


Network: A network of 3 interior routers built from trunk and connected to each 
other using 2-way SSL.

We ran a Proton-J Reactor API based client to start 50 receivers and 0 senders 
on one of the above 3 routers.  After that we ran "qdstat -c".  This leads to 
intermittent crash in the router.  This crash could not be reproduced while 
running the routers independently or inside gdb.  When we run the routers 
inside Valgrind, this crash is frequent.  I was able to reproduce the crash 3 
times using Valgrind (Screenshots and Valgrind output files are attached).

This intermittent crash becomes permanent in our instrumented build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel

2016-06-11 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-380:
---
Attachment: Single_router_testing_results.pdf

Attached a file Single_router_testing_results.pdf containing several results of 
testing a single router.

> Router stops receiving messages from multiple senders publishing to multiple 
> queues in parallel
> ---
>
> Key: DISPATCH-380
> URL: https://issues.apache.org/jira/browse/DISPATCH-380
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: AWS_hung_at_round_figures.png, 
> AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, 
> Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, 
> Single_router_testing_results.pdf, qdstat_wrong_output.png
>
>
> I am running a Java Client against a cluster of 3 interior routers connected 
> to each other.  2-way SSL is enabled for all the connections.
> There were 20 simultaneous queues with 20 senders on each queue and each 
> sender publishing 1000 messages.  All the senders were connected to Router 1. 
>  20 receivers were connected to Router 2 with 1 receiver receiving from each 
> queue.
> In the first run, router stopped receiving incoming messages after delivering 
> 386,339 out of 400K "Hello World!" messages.
> In the second run, 388,781 messages out of 400K were delivered.
> I reduced the number of queues to 10 (halving total number of messages to 
> 200K) and the issue occurred again.
> I ran the Java client on an 8 CPU machine again with 10 queues and the issue 
> occurred again after delivering just 54K out of 200K messages.
> All the senders were hung (still connected) with no messages flowing at all.
> Connection information from qdstat:
> When the messages are flowing properly and I run "qdstat -c", I see all the 
> senders as secure and authenticated.
> After they hang and I run "qdstat -c", it erroneously shows all the clients 
> as insecure and unauthenticated.
> Shortly after the clients hang, all the queues are deleted from the router 
> network but connections are still shown until I terminate the clients.
> I saw this erroneous situation before also when "qdstat -c" showed some 
> senders as secure and authentic but some as insecure/unauthentic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel

2016-06-11 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325898#comment-15325898
 ] 

Vishal Sharda edited comment on DISPATCH-380 at 6/11/16 3:14 PM:
-

I ran a single instance of Qpid Dispatch Router built on an AWS instance 
running Ubuntu.  This was an interior router with no connections to other 
routers. 

I used Proton-J Messenger API based Java client that uses 2-way SSL.

I started 10 receivers first with each receiver listening on a different 
endpoint (total 10 endpoints created).

Then I started 20 senders on each endpoint (total 200 senders) each publishing 
1000 “Hello World!” messages.

Total 200K messages were to be delivered but I saw that all the senders hung 
after only 70K messages were delivered.  In the second exact run, they hung 
after delivery of 67K messages.

The attached screenshots show the following:

1.) AWS_uneven_start.png: Uneven start on different endpoints.
2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of 
messages sent/received on each endpoint became a round figure.
3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through 
and all the counts remained intact.

Finally, all the endpoints were deleted from the router but qdstat –c still 
showed connections from my senders but wrongly as insecure and inauthentic.



was (Author: vsharda):
I ran single instance of ADR already built on an AWS running.  This was an 
interior router with no connections to other routers. 

I used Proton-J Messenger API based Java client that uses 2-way SSL.

I started 10 receivers first with each receiver listening on a different 
endpoint (total 10 endpoints created).

Then I started 20 senders on each endpoint (total 200 senders) each publishing 
1000 “Hello World!” messages.

Total 200K messages were to be delivered but I saw that all the senders hung 
after only 70K messages were delivered.  In the second exact run, they hung 
after delivery of 67K messages.

The attached screenshots show the following:

1.) AWS_uneven_start.png: Uneven start on different endpoints.
2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of 
messages sent/received on each endpoint became a round figure.
3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through 
and all the counts remained intact.

Finally, all the endpoints were deleted from the router but qdstat –c still 
showed connections from my senders but wrongly as insecure and inauthentic.


> Router stops receiving messages from multiple senders publishing to multiple 
> queues in parallel
> ---
>
> Key: DISPATCH-380
> URL: https://issues.apache.org/jira/browse/DISPATCH-380
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: AWS_hung_at_round_figures.png, 
> AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, 
> Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, 
> qdstat_wrong_output.png
>
>
> I am running a Java Client against a cluster of 3 interior routers connected 
> to each other.  2-way SSL is enabled for all the connections.
> There were 20 simultaneous queues with 20 senders on each queue and each 
> sender publishing 1000 messages.  All the senders were connected to Router 1. 
>  20 receivers were connected to Router 2 with 1 receiver receiving from each 
> queue.
> In the first run, router stopped receiving incoming messages after delivering 
> 386,339 out of 400K "Hello World!" messages.
> In the second run, 388,781 messages out of 400K were delivered.
> I reduced the number of queues to 10 (halving total number of messages to 
> 200K) and the issue occurred again.
> I ran the Java client on an 8 CPU machine again with 10 queues and the issue 
> occurred again after delivering just 54K out of 200K messages.
> All the senders were hung (still connected) with no messages flowing at all.
> Connection information from qdstat:
> When the messages are flowing properly and I run "qdstat -c", I see all the 
> senders as secure and authenticated.
> After they hang and I run "qdstat -c", it erroneously shows all the clients 
> as insecure and unauthenticated.
> Shortly after the clients hang, all the queues are deleted from the router 
> network but connections are still shown until I terminate the clients.
> I saw this erroneous situation before also when "qdstat -c" showed some 
> senders as secure and authentic but some as 

[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel

2016-06-11 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-380:
---
Attachment: AWS_uneven_start.png
AWS_hung_for_4_seconds.png
AWS_hung_at_round_figures.png

I ran single instance of ADR already built on an AWS running.  This was an 
interior router with no connections to other routers. 

I used Proton-J Messenger API based Java client that uses 2-way SSL.

I started 10 receivers first with each receiver listening on a different 
endpoint (total 10 endpoints created).

Then I started 20 senders on each endpoint (total 200 senders) each publishing 
1000 “Hello World!” messages.

Total 200K messages were to be delivered but I saw that all the senders hung 
after only 70K messages were delivered.  In the second exact run, they hung 
after delivery of 67K messages.

The attached screenshots show the following:

1.) AWS_uneven_start.png: Uneven start on different endpoints.
2.) AWS_hung_at_round_figures.png: Just before my senders hung, the number of 
messages sent/received on each endpoint became a round figure.
3.) AWS_hung_for_4_seconds.png: After 4 seconds, no more messages went through 
and all the counts remained intact.

Finally, all the endpoints were deleted from the router but qdstat –c still 
showed connections from my senders but wrongly as insecure and inauthentic.


> Router stops receiving messages from multiple senders publishing to multiple 
> queues in parallel
> ---
>
> Key: DISPATCH-380
> URL: https://issues.apache.org/jira/browse/DISPATCH-380
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: AWS_hung_at_round_figures.png, 
> AWS_hung_for_4_seconds.png, AWS_uneven_start.png, Senders_1.png, 
> Senders_2.png, Senders_3_10_Queues.png, Senders_4_10_queues.png, 
> qdstat_wrong_output.png
>
>
> I am running a Java Client against a cluster of 3 interior routers connected 
> to each other.  2-way SSL is enabled for all the connections.
> There were 20 simultaneous queues with 20 senders on each queue and each 
> sender publishing 1000 messages.  All the senders were connected to Router 1. 
>  20 receivers were connected to Router 2 with 1 receiver receiving from each 
> queue.
> In the first run, router stopped receiving incoming messages after delivering 
> 386,339 out of 400K "Hello World!" messages.
> In the second run, 388,781 messages out of 400K were delivered.
> I reduced the number of queues to 10 (halving total number of messages to 
> 200K) and the issue occurred again.
> I ran the Java client on an 8 CPU machine again with 10 queues and the issue 
> occurred again after delivering just 54K out of 200K messages.
> All the senders were hung (still connected) with no messages flowing at all.
> Connection information from qdstat:
> When the messages are flowing properly and I run "qdstat -c", I see all the 
> senders as secure and authenticated.
> After they hang and I run "qdstat -c", it erroneously shows all the clients 
> as insecure and unauthenticated.
> Shortly after the clients hang, all the queues are deleted from the router 
> network but connections are still shown until I terminate the clients.
> I saw this erroneous situation before also when "qdstat -c" showed some 
> senders as secure and authentic but some as insecure/unauthentic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel

2016-06-10 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-380:
---
Attachment: qdstat_wrong_output.png
Senders_4_10_queues.png
Senders_3_10_Queues.png
Senders_2.png
Senders_1.png

Screenshots showing how qdstat shows senders connected but insecure and 
unauthenticated after they hang.

> Router stops receiving messages from multiple senders publishing to multiple 
> queues in parallel
> ---
>
> Key: DISPATCH-380
> URL: https://issues.apache.org/jira/browse/DISPATCH-380
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
> Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate 
> machines
>Reporter: Vishal Sharda
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: Senders_1.png, Senders_2.png, Senders_3_10_Queues.png, 
> Senders_4_10_queues.png, qdstat_wrong_output.png
>
>
> I am running a Java Client against a cluster of 3 interior routers connected 
> to each other.  2-way SSL is enabled for all the connections.
> There were 20 simultaneous queues with 20 senders on each queue and each 
> sender publishing 1000 messages.  All the senders were connected to Router 1. 
>  20 receivers were connected to Router 2 with 1 receiver receiving from each 
> queue.
> In the first run, router stopped receiving incoming messages after delivering 
> 386,339 out of 400K "Hello World!" messages.
> In the second run, 388,781 messages out of 400K were delivered.
> I reduced the number of queues to 10 (halving total number of messages to 
> 200K) and the issue occurred again.
> I ran the Java client on an 8 CPU machine again with 10 queues and the issue 
> occurred again after delivering just 54K out of 200K messages.
> All the senders were hung (still connected) with no messages flowing at all.
> Connection information from qdstat:
> When the messages are flowing properly and I run "qdstat -c", I see all the 
> senders as secure and authenticated.
> After they hang and I run "qdstat -c", it erroneously shows all the clients 
> as insecure and unauthenticated.
> Shortly after the clients hang, all the queues are deleted from the router 
> network but connections are still shown until I terminate the clients.
> I saw this erroneous situation before also when "qdstat -c" showed some 
> senders as secure and authentic but some as insecure/unauthentic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-380) Router stops receiving messages from multiple senders publishing to multiple queues in parallel

2016-06-10 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-380:
--

 Summary: Router stops receiving messages from multiple senders 
publishing to multiple queues in parallel
 Key: DISPATCH-380
 URL: https://issues.apache.org/jira/browse/DISPATCH-380
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
 Environment: Debian 8.3, Apache Qpid Proton 0.13.0-RC for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD each on 3 separate machines
Reporter: Vishal Sharda
Priority: Critical
 Fix For: 0.6.0


I am running a Java Client against a cluster of 3 interior routers connected to 
each other.  2-way SSL is enabled for all the connections.

There were 20 simultaneous queues with 20 senders on each queue and each sender 
publishing 1000 messages.  All the senders were connected to Router 1.  20 
receivers were connected to Router 2 with 1 receiver receiving from each queue.

In the first run, router stopped receiving incoming messages after delivering 
386,339 out of 400K "Hello World!" messages.
In the second run, 388,781 messages out of 400K were delivered.

I reduced the number of queues to 10 (halving total number of messages to 200K) 
and the issue occurred again.

I ran the Java client on an 8 CPU machine again with 10 queues and the issue 
occurred again after delivering just 54K out of 200K messages.

All the senders were hung (still connected) with no messages flowing at all.

Connection information from qdstat:

When the messages are flowing properly and I run "qdstat -c", I see all the 
senders as secure and authenticated.

After they hang and I run "qdstat -c", it erroneously shows all the clients as 
insecure and unauthenticated.

Shortly after the clients hang, all the queues are deleted from the router 
network but connections are still shown until I terminate the clients.

I saw this erroneous situation before also when "qdstat -c" showed some senders 
as secure and authentic but some as insecure/unauthentic.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-08 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-358:
---
Attachment: Crash_EXTERNAL.png

I started testing again with latest trunk code that has several bug fixes.  I 
enabled SSL between the routers but not between clients and routers.  I got the 
crash in Crash_EXTERNAL.png while starting/stopping clients/router.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Critical
> Attachments: Crash_EXTERNAL.png, Crash_Java_Router_3.png, 
> Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_Java_same_router.png, Crash_Java_same_router_another.png, 
> Crash_Java_same_router_another_bt.png, Crash_SASL.png, Crash_SASL_2.png, 
> Crash_SR_1.png, Crash_SR_2.png, Crash_bt_double_free_Java_RES_266MB.png, 
> Crash_double_free_Java_RES_266MB.png, Crash_free.png, 
> Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-371) qdstat and all other clients stopped connecting to interior router

2016-06-08 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320787#comment-15320787
 ] 

Vishal Sharda commented on DISPATCH-371:


Hi Ganesh, Can I share the same port between inter-router and normal listeners 
or they have to use different ports?

> qdstat and all other clients stopped connecting to interior router
> --
>
> Key: DISPATCH-371
> URL: https://issues.apache.org/jira/browse/DISPATCH-371
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines 
> each
>Reporter: Vishal Sharda
>Priority: Blocker
>
> I just updated my sandbox to pull the latest bug fixes.  I see that all the 
> routers in my cluster of 3 interior routers have stopped accepting 
> connections from my clients as well as qdstat.
> The port is properly open but qdstat shows the following error:
> *
> vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$
>  sudo netstat -tulpn | grep qdrouterd
> tcp0  0 0.0.0.0:56720.0.0.0:*   LISTEN
>   18460/qdrouterd 
> vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$
>  qdstat -c
> LinkDetached: sender 427616d4-8253-4bb3-a332-d1940a12f0e3-$management to 
> $management closed due to: Condition('qd:connection-role', 'Link attach 
> forbidden on inter-router connection')
> **
> If I run the router as standalone, I can see qdstat working fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-371) qdstat and all other clients stopped connecting to interior router

2016-06-08 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-371:
--

 Summary: qdstat and all other clients stopped connecting to 
interior router
 Key: DISPATCH-371
 URL: https://issues.apache.org/jira/browse/DISPATCH-371
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines each
Reporter: Vishal Sharda
Priority: Blocker


I just updated my sandbox to pull the latest bug fixes.  I see that all the 
routers in my cluster of 3 interior routers have stopped accepting connections 
from my clients as well as qdstat.

The port is properly open but qdstat shows the following error:

*

vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ 
sudo netstat -tulpn | grep qdrouterd
tcp0  0 0.0.0.0:56720.0.0.0:*   LISTEN  
18460/qdrouterd 
vsharda@millennium-qpid-untouched-latest-6443:~/apache-qpid-dispatch/my_build$ 
qdstat -c
LinkDetached: sender 427616d4-8253-4bb3-a332-d1940a12f0e3-$management to 
$management closed due to: Condition('qd:connection-role', 'Link attach 
forbidden on inter-router connection')

**

If I run the router as standalone, I can see qdstat working fine.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-365) Standalone router crashes if an interior router attempts to connect to it

2016-06-07 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-365:
---
Attachment: config2_nossl.conf
config1_standalone.conf

Configuration files to reproduce the crash in standalone router.

> Standalone router crashes if an interior router attempts to connect to it
> -
>
> Key: DISPATCH-365
> URL: https://issues.apache.org/jira/browse/DISPATCH-365
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1_standalone.conf, config2_nossl.conf
>
>
> I accidentally pointed my interior router to a standalone router.  The 
> standalone router did not ignore the connection request and crashed.  The 
> attached config files reproduce the crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-365) Standalone router crashes if an interior router attempts to connect to it

2016-06-07 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-365:
--

 Summary: Standalone router crashes if an interior router attempts 
to connect to it
 Key: DISPATCH-365
 URL: https://issues.apache.org/jira/browse/DISPATCH-365
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
Reporter: Vishal Sharda
Priority: Critical


I accidentally pointed my interior router to a standalone router.  The 
standalone router did not ignore the connection request and crashed.  The 
attached config files reproduce the crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-360) Disallow router with duplicate ID from joining the network

2016-06-07 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318872#comment-15318872
 ] 

Vishal Sharda commented on DISPATCH-360:


Thanks Ted.  I corrected my mistake and will continue testing without SSL.  Is 
it possible to prevent the router with duplicate ID from joining the network?  
This subtle error can happen easily in large networks of routers.

> Disallow router with duplicate ID from joining the network
> --
>
> Key: DISPATCH-360
> URL: https://issues.apache.org/jira/browse/DISPATCH-360
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines
>Reporter: Vishal Sharda
>Priority: Minor
> Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, 
> config1_nossl.conf, config2_nossl.conf, config3_nossl.conf
>
>
> In order to isolate the issues that I am getting with 2-way SSL connections 
> among routers, I created a cluster of 3 inter-connected routers (R1, R2 and 
> R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any 
> type of SSL (I had been using just 2 routers so far but our actual cluster 
> consists of 3 nodes).  All connections were insecure as shown in my config 
> files.
> When I tried sending 4 messages using simple_send.py to R1 after starting 
> simple_recv.py to receive from R2, I saw no messages were sent.
> If I stop R3 and reduce the cluster to just two nodes, it works fine.
> If I have 2-way SSL connections between all the 3 routers, it again works 
> fine.
> In my more than 20 runs to test this scenario of sending just 4 messages, it 
> even worked a few times after waiting for very long.  In the other two cases 
> above, I always got the messages instantaneously (there were no other 
> senders/receivers active).
> The drivers.tar.gz that I attached in DISPATCH-343 either timed out or 
> returned with unclear status when trying to send just 4 messages from 1 
> sender (connected to R1) to 1 receiver (connected to R2).  It showed 
> successful just once.  The behavior is completely non-deterministic.
> This basic test working non-deterministically some times and failing most of 
> the times seemed very weird and I turned to running routers outside gdb but 
> the results were similar.  In the process of stopping/restarting the 3 
> routers for testing this scenario, I also got a crash (backtrace attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-360) Disallow router with duplicate ID from joining the network

2016-06-07 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-360:
---
  Priority: Minor  (was: Blocker)
Issue Type: Improvement  (was: Bug)
   Summary: Disallow router with duplicate ID from joining the network  
(was: Sender and receiver cannot communicate using a network of 3 
inter-connected routers having insecure connections)

> Disallow router with duplicate ID from joining the network
> --
>
> Key: DISPATCH-360
> URL: https://issues.apache.org/jira/browse/DISPATCH-360
> Project: Qpid Dispatch
>  Issue Type: Improvement
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines
>Reporter: Vishal Sharda
>Priority: Minor
> Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, 
> config1_nossl.conf, config2_nossl.conf, config3_nossl.conf
>
>
> In order to isolate the issues that I am getting with 2-way SSL connections 
> among routers, I created a cluster of 3 inter-connected routers (R1, R2 and 
> R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any 
> type of SSL (I had been using just 2 routers so far but our actual cluster 
> consists of 3 nodes).  All connections were insecure as shown in my config 
> files.
> When I tried sending 4 messages using simple_send.py to R1 after starting 
> simple_recv.py to receive from R2, I saw no messages were sent.
> If I stop R3 and reduce the cluster to just two nodes, it works fine.
> If I have 2-way SSL connections between all the 3 routers, it again works 
> fine.
> In my more than 20 runs to test this scenario of sending just 4 messages, it 
> even worked a few times after waiting for very long.  In the other two cases 
> above, I always got the messages instantaneously (there were no other 
> senders/receivers active).
> The drivers.tar.gz that I attached in DISPATCH-343 either timed out or 
> returned with unclear status when trying to send just 4 messages from 1 
> sender (connected to R1) to 1 receiver (connected to R2).  It showed 
> successful just once.  The behavior is completely non-deterministic.
> This basic test working non-deterministically some times and failing most of 
> the times seemed very weird and I turned to running routers outside gdb but 
> the results were similar.  In the process of stopping/restarting the 3 
> routers for testing this scenario, I also got a crash (backtrace attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-06 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317880#comment-15317880
 ] 

Vishal Sharda edited comment on DISPATCH-358 at 6/7/16 5:36 AM:


Ganesh,

Yes, I am able to reproduce the crash without SSL also and filed another bug 
DISPATCH-360 for the same.  I am working on refining my Java client (it 
currently does SSL always).  Until then you can use the drivers.tar.gz from 
DISPATCH-343 against the three routers connected insecurely as per the config 
files in DISPATCH-360.

You will see the issues with the following very basic test:

$ ./recv_no_ssl -n 1 amqp://:/paypal/foo

Separate terminal:

$ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo

Be sure to have host3 connected to the two hosts as described and run the test 
several times.  (It will work occasionally and fail most of the times).

Then run again with host3 removed.  (It will work almost always.)



was (Author: vsharda):
Ganesh,

Yes, I am able to reproduce the crash without SSL also and filed another bug 
DISPATCH-360 for the same.  I am working on refining my Java client (it 
currently does SSL always).  Until then you can use the drivers.tar.gz from 
DISPATCH-343 against the three routers connected insecurely as per the config 
files in DISPATCH-360.

You will see the issues with the following very basic test:

$ ./recv_no_ssl -n 1 amqp://:/paypal/foo

Separate terminal:

$ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo

Be sure to have host3 connected to the two hosts as described and run the test 
several times.  (It will work occasionally and fail most of the times).

Then run again with host3 removed.  (It will work almost always.)


> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, 
> Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, 
> Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_bt_double_free_Java_RES_266MB.png, 
> Crash_double_free_Java_RES_266MB.png, Crash_free.png, 
> Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-06 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-358:
---
Attachment: Overflow_Error.png

Overflow error that I got once during testing.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, 
> Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, 
> Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_bt_double_free_Java_RES_266MB.png, 
> Crash_double_free_Java_RES_266MB.png, Crash_free.png, 
> Crash_sasl_server_done.png, Crash_watch_qdstat.png, Overflow_Error.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-06 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-358:
---
Attachment: Crash_watch_qdstat.png
Crash_Java_same_router.png
Crash_Java_same_router_another.png
Crash_Java_same_router_another_bt.png
Crash_Java_Router_3.png
Crash_double_free_Java_RES_266MB.png
Crash_bt_double_free_Java_RES_266MB.png

More crashes that I got during my testing yesterday and today.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Router_3.png, Crash_Java_Send.png, 
> Crash_Java_free_qd_connection.png, Crash_Java_same_router.png, 
> Crash_Java_same_router_another.png, Crash_Java_same_router_another_bt.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_bt_double_free_Java_RES_266MB.png, 
> Crash_double_free_Java_RES_266MB.png, Crash_free.png, 
> Crash_sasl_server_done.png, Crash_watch_qdstat.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-06 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317880#comment-15317880
 ] 

Vishal Sharda commented on DISPATCH-358:


Ganesh,

Yes, I am able to reproduce the crash without SSL also and filed another bug 
DISPATCH-360 for the same.  I am working on refining my Java client (it 
currently does SSL always).  Until then you can use the drivers.tar.gz from 
DISPATCH-343 against the three routers connected insecurely as per the config 
files in DISPATCH-360.

You will see the issues with the following very basic test:

$ ./recv_no_ssl -n 1 amqp://:/paypal/foo

Separate terminal:

$ ./send_no_ssl -m 4 -n 1 amqp://:/paypal/foo

Be sure to have host3 connected to the two hosts as described and run the test 
several times.  (It will work occasionally and fail most of the times).

Then run again with host3 removed.  (It will work almost always.)


> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_free.png, Crash_sasl_server_done.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections

2016-06-06 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-360:
---
Attachment: Crash_bt_no_SSL_2.png
Crash_bt_no_SSL.png

Backtrace of the crash observed.

> Sender and receiver cannot communicate using a network of 3 inter-connected 
> routers having insecure connections
> ---
>
> Key: DISPATCH-360
> URL: https://issues.apache.org/jira/browse/DISPATCH-360
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Crash_bt_no_SSL.png, Crash_bt_no_SSL_2.png, 
> config1_nossl.conf, config2_nossl.conf, config3_nossl.conf
>
>
> In order to isolate the issues that I am getting with 2-way SSL connections 
> among routers, I created a cluster of 3 inter-connected routers (R1, R2 and 
> R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any 
> type of SSL (I had been using just 2 routers so far but our actual cluster 
> consists of 3 nodes).  All connections were insecure as shown in my config 
> files.
> When I tried sending 4 messages using simple_send.py to R1 after starting 
> simple_recv.py to receive from R2, I saw no messages were sent.
> If I stop R3 and reduce the cluster to just two nodes, it works fine.
> If I have 2-way SSL connections between all the 3 routers, it again works 
> fine.
> In my more than 20 runs to test this scenario of sending just 4 messages, it 
> even worked a few times after waiting for very long.  In the other two cases 
> above, I always got the messages instantaneously (there were no other 
> senders/receivers active).
> The drivers.tar.gz that I attached in DISPATCH-343 either timed out or 
> returned with unclear status when trying to send just 4 messages from 1 
> sender (connected to R1) to 1 receiver (connected to R2).  It showed 
> successful just once.  The behavior is completely non-deterministic.
> This basic test working non-deterministically some times and failing most of 
> the times seemed very weird and I turned to running routers outside gdb but 
> the results were similar.  In the process of stopping/restarting the 3 
> routers for testing this scenario, I also got a crash (backtrace attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections

2016-06-06 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-360:
---
Attachment: config3_nossl.conf
config2_nossl.conf
config1_nossl.conf

The three configuration files that I used for the three routers.

> Sender and receiver cannot communicate using a network of 3 inter-connected 
> routers having insecure connections
> ---
>
> Key: DISPATCH-360
> URL: https://issues.apache.org/jira/browse/DISPATCH-360
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: config1_nossl.conf, config2_nossl.conf, 
> config3_nossl.conf
>
>
> In order to isolate the issues that I am getting with 2-way SSL connections 
> among routers, I created a cluster of 3 inter-connected routers (R1, R2 and 
> R3 with R2 connecting to R1 and R3 connecting to both R1 and R2) without any 
> type of SSL (I had been using just 2 routers so far but our actual cluster 
> consists of 3 nodes).  All connections were insecure as shown in my config 
> files.
> When I tried sending 4 messages using simple_send.py to R1 after starting 
> simple_recv.py to receive from R2, I saw no messages were sent.
> If I stop R3 and reduce the cluster to just two nodes, it works fine.
> If I have 2-way SSL connections between all the 3 routers, it again works 
> fine.
> In my more than 20 runs to test this scenario of sending just 4 messages, it 
> even worked a few times after waiting for very long.  In the other two cases 
> above, I always got the messages instantaneously (there were no other 
> senders/receivers active).
> The drivers.tar.gz that I attached in DISPATCH-343 either timed out or 
> returned with unclear status when trying to send just 4 messages from 1 
> sender (connected to R1) to 1 receiver (connected to R2).  It showed 
> successful just once.  The behavior is completely non-deterministic.
> This basic test working non-deterministically some times and failing most of 
> the times seemed very weird and I turned to running routers outside gdb but 
> the results were similar.  In the process of stopping/restarting the 3 
> routers for testing this scenario, I also got a crash (backtrace attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-360) Sender and receiver cannot communicate using a network of 3 inter-connected routers having insecure connections

2016-06-06 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-360:
--

 Summary: Sender and receiver cannot communicate using a network of 
3 inter-connected routers having insecure connections
 Key: DISPATCH-360
 URL: https://issues.apache.org/jira/browse/DISPATCH-360
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 3 separate machines
Reporter: Vishal Sharda
Priority: Blocker


In order to isolate the issues that I am getting with 2-way SSL connections 
among routers, I created a cluster of 3 inter-connected routers (R1, R2 and R3 
with R2 connecting to R1 and R3 connecting to both R1 and R2) without any type 
of SSL (I had been using just 2 routers so far but our actual cluster consists 
of 3 nodes).  All connections were insecure as shown in my config files.

When I tried sending 4 messages using simple_send.py to R1 after starting 
simple_recv.py to receive from R2, I saw no messages were sent.

If I stop R3 and reduce the cluster to just two nodes, it works fine.
If I have 2-way SSL connections between all the 3 routers, it again works fine.

In my more than 20 runs to test this scenario of sending just 4 messages, it 
even worked a few times after waiting for very long.  In the other two cases 
above, I always got the messages instantaneously (there were no other 
senders/receivers active).

The drivers.tar.gz that I attached in DISPATCH-343 either timed out or returned 
with unclear status when trying to send just 4 messages from 1 sender 
(connected to R1) to 1 receiver (connected to R2).  It showed successful just 
once.  The behavior is completely non-deterministic.

This basic test working non-deterministically some times and failing most of 
the times seemed very weird and I turned to running routers outside gdb but the 
results were similar.  In the process of stopping/restarting the 3 routers for 
testing this scenario, I also got a crash (backtrace attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-03 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315263#comment-15315263
 ] 

Vishal Sharda commented on DISPATCH-358:


Ganesh, the senders/receivers that I am using are identical to those in 
drivers.tar.gz but with support for 2-way SSL.  I have also written equivalent 
drivers in Java (built on Proton-J Messenger API) with support for 2-way SSL 
and have begun testing with them.  Occasionally, I am also using the ones from 
drivers.tar.gz (that do not support SSL) in parallel.


> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_free.png, Crash_sasl_server_done.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-03 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15315170#comment-15315170
 ] 

Vishal Sharda commented on DISPATCH-358:


The common thing about all the fatal scenarios is multithreading getting 
triggered in router.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_free.png, Crash_sasl_server_done.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-03 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-358:
---
Attachment: Crash_SR_2.png
Crash_SR_1.png
Crash_SASL.png
Crash_sasl_server_done.png
Crash_SASL_2.png
Crash_Java_Send.png
Crash_Java_free_qd_connection.png
Crash_free.png

Attached screenshots having backtraces of router crashes that happened 
intermittently during my testing.

> Intermittent crashes in qdrouterd under load from parallel senders
> --
>
> Key: DISPATCH-358
> URL: https://issues.apache.org/jira/browse/DISPATCH-358
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Crash_Java_Send.png, Crash_Java_free_qd_connection.png, 
> Crash_SASL.png, Crash_SASL_2.png, Crash_SR_1.png, Crash_SR_2.png, 
> Crash_free.png, Crash_sasl_server_done.png
>
>
> In my setup of two inter-connected routers, several senders connecting to one 
> router while few receivers connecting to the other router, I see several 
> crashes in the router to which senders connect.  These crashes are 
> intermittent and happen once in every 10 runs or so.  I have collected the 
> backtraces of all the crashes but do not yet have a test case that can 
> reliably reproduce any of them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-358) Intermittent crashes in qdrouterd under load from parallel senders

2016-06-03 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-358:
--

 Summary: Intermittent crashes in qdrouterd under load from 
parallel senders
 Key: DISPATCH-358
 URL: https://issues.apache.org/jira/browse/DISPATCH-358
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
Reporter: Vishal Sharda
Priority: Critical


In my setup of two inter-connected routers, several senders connecting to one 
router while few receivers connecting to the other router, I see several 
crashes in the router to which senders connect.  These crashes are intermittent 
and happen once in every 10 runs or so.  I have collected the backtraces of all 
the crashes but do not yet have a test case that can reliably reproduce any of 
them.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-06-03 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-337:
---
Attachment: Memory_usage_subsequent_run_no_SSL.png
Memory_usage_first_run_no_SSL.png

Attached two files: Memory_usage_first_run_no_SSL.png and 
Memory_usage_subsequent_run_no_SSL.png

I ran another test - two routers connected to each other, several senders 
connecting to first router and 1 receiver connecting to the second router.  All 
connections were insecure (no SSL at all).  I see the same type of memory leaks 
but the pace of growth in resident memory usage is nearly halved.

SSL amplifies the leaks but they occur without SSL also.


> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Memory_usage_first_run_no_SSL.png, 
> Memory_usage_subsequent_run_no_SSL.png, Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, config1.conf, config2.conf, 
> val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-06-02 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-337:
---
Attachment: Subsequent_memory_increase.png
Rapid_perm_memory_increase.png

Two files attached: Rapid_perm_memory_increase.png and 
Subsequent_memory_increase.png

We ran two interconnected routers having 2-way SSL connection between them and 
accepting 2-way SSL connections from clients.  We connected several senders to 
one router and several receivers to the other.  Total of 250K messages were 
sent from one end to the other.  We saw  rapid increase in memory usage of both 
the routers.  This memory never became low again and a subsequent identical run 
increased the memory further.

Since the routers are never intended to be terminated, such memory leaks on all 
subsequent connections and data transfers will eventually lead to routers being 
killed.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: Rapid_perm_memory_increase.png, 
> Subsequent_memory_increase.png, config1.conf, config2.conf, 
> val2_receiver.txt, val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-31 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308664#comment-15308664
 ] 

Vishal Sharda commented on DISPATCH-343:


The tracker used by sender is different than the one used by receiver.  Hence, 
receiver does get() followed by accept() and sender does settle().  We see that 
even if we do not call settle() at either end, the crash still occurs with 6 
senders and 1 receiver.


> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> drivers.tar.gz, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-26 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302363#comment-15302363
 ] 

Vishal Sharda commented on DISPATCH-343:


Ted,

Do you have an ETA of the fix?


> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> drivers.tar.gz, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-25 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: drivers.tar.gz

Ganesh,

I am attaching my drivers here (drivers.tar.gz).  When you run them against 2 
connected routers from your cluster of 3, you should be able to reproduce the 
crash.

Please do make in the extracted folder and run the drivers as follows:

./recv_no_ssl -n 1 amqp://:/examples

Separate terminal:

./send_no_ssl -m 10 -n 6 -a amqp://:/examples


> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> drivers.tar.gz, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-25 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300571#comment-15300571
 ] 

Vishal Sharda commented on DISPATCH-343:


Ganesh, can you provide the number of messages per sender that you tested with 
and the sender/receiver code that you used?  Also, it would help if you can 
test some scenarios in which senders outnumber receivers.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299218#comment-15299218
 ] 

Vishal Sharda commented on DISPATCH-343:


Also hit the crash reported in bt_qd_dealloc.png without SSL.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: config2_nossl.conf
config1_nossl.conf

The two config files used for no SSL configuration of two routers.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, config1_nossl.conf, config2_nossl.conf, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299200#comment-15299200
 ] 

Vishal Sharda commented on DISPATCH-343:


I have completely removed SSL.  The two config files are attached.  Still, I 
hit the crashes that I reported in bt_qdr_link_cleanup_CT.png and 
bt_sys_mutex_lock.png.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: bt_qdr_link_cleanup_CT.png

One more backtrace of a crash.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_qdr_link_cleanup_CT.png, 
> bt_sasl.png, bt_sys_mutex_lock.png, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: bt_sys_mutex_lock.png
bt_sasl.png
bt_qd_dealloc.png

Screenshots with backtrace under different types of crashes.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, bt_qd_dealloc.png, bt_sasl.png, 
> bt_sys_mutex_lock.png, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298571#comment-15298571
 ] 

Vishal Sharda commented on DISPATCH-343:


Environment for the tests using 2 latest routers:

Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and dependencies, Hardware: 2 
CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines.

Oenssl information:

vsharda@millennium-qpid-untouched-latest-2-8501:~$ dpkg -l | grep openssl
ii  libcurl4-openssl-dev:amd64   7.38.0-4+deb8u3  
amd64development files and documentation for libcurl (OpenSSL flavour)
ii  libgnutls-openssl27:amd643.3.8-6+deb8u3   
amd64GNU TLS library - OpenSSL wrapper
ii  openssl  1.0.1k-3+deb8u5  
amd64Secure Sockets Layer toolkit - cryptographic utility


> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-05-24 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298556#comment-15298556
 ] 

Vishal Sharda commented on DISPATCH-337:


There are 18 places in val2_sender.txt where memory is "definitely" lost 
according to valgrind report attached (val2_sender.txt).  All of these leaks 
seem to be coming from router C code.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1.conf, config2.conf, val2_receiver.txt, 
> val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: Crash_10S_2R.png

Router crash with 10 Senders attached to one router and 2 Receivers attached to 
the second router.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, Crash_10S_2R.png, R1.conf, R2.conf, R3.conf, 
> Sender_router_crash.png, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-24 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298528#comment-15298528
 ] 

Vishal Sharda commented on DISPATCH-343:


I will be switching to Debug build now and try to collect debug information.

Until then, here is another crash (Crash_10S_2R.png) that I saw with 10 senders 
sending to one router and 2 receivers receiving from the other router.  This 
crash is due to an invalid pointer.


> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, Sender_router_crash.png, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: Sender_router_crash.png

Crash in router due to assertion failure.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, Sender_router_crash.png, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297247#comment-15297247
 ] 

Vishal Sharda commented on DISPATCH-343:


Another test agains two latest routers connected to each other with 4 parallel 
senders each sending 50K "Hello World!" messages to one router and 4 parallel 
receivers receiving from another router:

Receivers only received 64,390 total messages out of 200,000 and exited after 
timeout.  The router to which they were connected seemed fine.  On the other 
hand, all 4 senders started receiving timeout on their messages.  As soon as 
the senders were killed, the router to which these senders were connected also 
crashed.  The screenshot Sender_router_crash.png is attached.  I have seen this 
assertion failure several times before at both the ends.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297112#comment-15297112
 ] 

Vishal Sharda edited comment on DISPATCH-343 at 5/23/16 10:25 PM:
--

I just ran against two latest routers connected to each other with 2 parallel 
senders each sending 50K "Hello World!" messages to one router and 2 parallel 
receivers receiving from another router and saw a crash.  The screenshot 
crash.png is attached showing double free error.


was (Author: vsharda):
I just ran against two latest routers connected to each other with 2 parallel 
senders sending to one router and 2 parallel receivers receiving from another 
router and saw a crash.  The screenshot crash.png is attached showing double 
free error.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297112#comment-15297112
 ] 

Vishal Sharda commented on DISPATCH-343:


I just ran against two latest routers connected to each other with 2 parallel 
senders sending to one router and 2 parallel receivers receiving from another 
router and saw a crash.  The screenshot crash.png is attached showing double 
free error.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: Crash.png

Screenshot showing crash in the router to which 2 receivers connected.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> Crash.png, R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297053#comment-15297053
 ] 

Vishal Sharda commented on DISPATCH-343:


This is how I start the two senders (-n 2 does the fork() inside the code, UUID 
is set for each message):

time ./send_ssl -c /home/vsharda/protected/switch-dr-network_cert.pem -k 
/home/vsharda/protected/switch-dr-network_key.pem -p  -m 5 -n 2 
amqps://guest:guest@10.24.170.251:5671/foo 

The receivers:

time ./recv_ssl -c /home/vsharda/protected/switch-dr-network_cert.pem -k 
/home/vsharda/protected/switch-dr-network_key.pem -p  -n 2 
amqps://guest:guest@10.24.170.251:5671/foo 

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: R3.conf
R2.conf
R1.conf

The three configuration files used for the routers.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> R1.conf, R2.conf, R3.conf, resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-23 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296625#comment-15296625
 ] 

Vishal Sharda commented on DISPATCH-343:


Hi Ted,

This test was run with a cluster of 3 routers R1, R2, R3 all configured for 
interior mode.  R2 has an inter-router connector to R1, and R3 has two 
inter-router connectors - one to R1 and other to R2.  All the connectors are 
2-way SSL.

Our driver is based on Proton-C Messenger API and we are now doing unsettled 
deliveries (stopped doing pre-settled after your response on DISPATCH-336).  We 
have configured fixedAddress "/" for closest distribution but have noticed that 
support for fixedAddress "/" is now gone.  Hence, all our queues are configured 
for balanced distribution which is default.

In our tests, we ran 2 parallel senders sending to R1 and 2 parallel receivers 
also receiving from R1.  All senders/receivers were listening on the same 
queue.  R1 as well as R2 have become completely irresponsive and are not 
accepting an incoming connection even from qdstat.

R1 has become around 919 MB resident.  Even R2 has grown to 233 MB resident and 
R3 to 146MB.  All three of them are leaking more memory.

Thanks,
Vishal

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-21 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295050#comment-15295050
 ] 

Vishal Sharda commented on DISPATCH-343:


Because of disrupted connections, total 100K messages also failed to reach.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-20 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-343:
---
Attachment: resource-limit-exceeded.png
Connection_aborted.png
Connection_aborted_1.png

Errors in router after putting load from senders.

> Router stops accepting connections after load from parallel senders
> ---
>
> Key: DISPATCH-343
> URL: https://issues.apache.org/jira/browse/DISPATCH-343
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
>Reporter: Vishal Sharda
>Priority: Blocker
> Attachments: Connection_aborted.png, Connection_aborted_1.png, 
> resource-limit-exceeded.png
>
>
> We ran 2 parallel senders and 2 receivers with each sender sending 5 
> messages.  After a while we saw that router stopped accepting connections 
> even from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-343) Router stops accepting connections after load from parallel senders

2016-05-20 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-343:
--

 Summary: Router stops accepting connections after load from 
parallel senders
 Key: DISPATCH-343
 URL: https://issues.apache.org/jira/browse/DISPATCH-343
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
Reporter: Vishal Sharda
Priority: Blocker


We ran 2 parallel senders and 2 receivers with each sender sending 5 
messages.  After a while we saw that router stopped accepting connections even 
from qdstat.  We saw various errors in the logs (screenshots attached).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-05-13 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-337:
---
Attachment: val2_sender.txt
val2_receiver.txt
config2.conf
config1.conf

Configuration files for two routers and the valgrind output with debug build of 
Qpid Dispatch running with 2 senders and 2 receivers.

> Huge memory leaks in Qpid Dispatch router
> -
>
> Key: DISPATCH-337
> URL: https://issues.apache.org/jira/browse/DISPATCH-337
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1.conf, config2.conf, val2_receiver.txt, 
> val2_sender.txt
>
>
> Valgrind shows huge memory leaks while running 2 interconnected routers with 
> 2 parallel senders connected to the one router and 2 parallel receivers 
> connected to the other router.
> The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:
> https://issues.apache.org/jira/browse/PROTON-1115
> However, the rest of the leaks are from qdrouterd.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-337) Huge memory leaks in Qpid Dispatch router

2016-05-13 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-337:
--

 Summary: Huge memory leaks in Qpid Dispatch router
 Key: DISPATCH-337
 URL: https://issues.apache.org/jira/browse/DISPATCH-337
 Project: Qpid Dispatch
  Issue Type: Bug
Affects Versions: 0.6.0
 Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
Reporter: Vishal Sharda
Priority: Critical


Valgrind shows huge memory leaks while running 2 interconnected routers with 2 
parallel senders connected to the one router and 2 parallel receivers connected 
to the other router.

The CRYPTO leak that is coming from Qpid Proton 0.12.2 is already fixed here:

https://issues.apache.org/jira/browse/PROTON-1115

However, the rest of the leaks are from qdrouterd.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender

2016-05-13 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-336:
---
Attachment: (was: output.txt)

> Very high latency for fire-and-forget sender
> 
>
> Key: DISPATCH-336
> URL: https://issues.apache.org/jira/browse/DISPATCH-336
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1.conf, config2.conf, output_1S_1R.txt
>
>
> We are running two interconnected routers with 1 fire-and-forget sender 
> connected to 1 router and 1 receiver connected to another router.  We are 
> observing increasing latency for the messages irrespective of number messages 
> sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender

2016-05-13 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-336:
---
Attachment: output_1S_1R.txt

> Very high latency for fire-and-forget sender
> 
>
> Key: DISPATCH-336
> URL: https://issues.apache.org/jira/browse/DISPATCH-336
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1.conf, config2.conf, output_1S_1R.txt
>
>
> We are running two interconnected routers with 1 fire-and-forget sender 
> connected to 1 router and 1 receiver connected to another router.  We are 
> observing increasing latency for the messages irrespective of number messages 
> sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-336) Very high latency for fire-and-forget sender

2016-05-13 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-336:
---
Attachment: output.txt
config2.conf
config1.conf

Configuration files for two routers and the observed latency for 200K messages 
to arrive from sender to the receiver (Both sender and receiver were running on 
the same machine).

> Very high latency for fire-and-forget sender
> 
>
> Key: DISPATCH-336
> URL: https://issues.apache.org/jira/browse/DISPATCH-336
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Apache Qpid Proton 0.12.2 for drivers and 
> dependencies, Hardware: 2 CPUs, 15 GB RAM, 60 GB HDD on 2 separate machines
>Reporter: Vishal Sharda
>Priority: Critical
> Attachments: config1.conf, config2.conf, output.txt
>
>
> We are running two interconnected routers with 1 fire-and-forget sender 
> connected to 1 router and 1 receiver connected to another router.  We are 
> observing increasing latency for the messages irrespective of number messages 
> sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-11 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280551#comment-15280551
 ] 

Vishal Sharda edited comment on DISPATCH-332 at 5/11/16 6:18 PM:
-

Two router configuration files to reproduce the message loss bug and the output 
from the receiver.

There were to simple_send.py senders running in parallel and each sending 20K 
messages.  The simple_recv.py on the other router however received only 1 
message - the last one (2) from both the senders.


was (Author: vsharda):
Two router configuration files to reproduce the message loss bug and the output 
from the receiver.


> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: config1.conf, config2.conf, output.txt
>
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router with 
> inter-router role.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-11 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-332:
---
Attachment: output.txt
config2.conf
config1.conf

Two router configuration files to reproduce the message loss bug and the output 
from the receiver.


> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Blocker
> Fix For: 0.6.0
>
> Attachments: config1.conf, config2.conf, output.txt
>
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router with 
> inter-router role.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-11 Thread Vishal Sharda (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280386#comment-15280386
 ] 

Vishal Sharda commented on DISPATCH-332:


I used the following fixedAddress in both the configuration files.

fixedAddress {
prefix: /
fanout: single
bias: closest
}

Insecure port 5672 was used for all the communication.

Everything is working fine if the 2 senders and 1 receiver are all attached to 
the same router and also if 1 sender and 1 receiver are each connected to the 
two interconnected routers.  The issue occurs only when we start a second 
parallel sender on the same router where one sender is already active.

Increasing the number of parallel senders and receivers further increases the 
percentage of messages lost.


> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Assignee: Ted Ross
>Priority: Blocker
> Fix For: 0.6.0
>
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router with 
> inter-router role.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-10 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-332:
---
Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and dependency for 
Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD.  (was: Debian 8.3, Qpid 
Proton 0.12.2 for drivers and dependency for Qpid Dispatch, Hardware: 2 CUPs, 
15 GB RAM, 30 GB HDD.)

> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CPUs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Priority: Blocker
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router with 
> inter-router role.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-10 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-332:
---
Description: 
We are running two Dispatch Routers each configured for interior mode and the 
second router's configuration includes a connector to the first router with 
inter-router role.

When we connect one sender to one router and one receiver to the other router 
both listening to the same queue, we see all messages (20,000 in our test) 
being transmitted.

As soon as we start a second sender connected to the same router to which the 
first sender connects and sending to the same queue, we start seeing heavy 
message loss.  Around 20% of messages are lost with each sender attempting to 
send 20,000 messages on its own (40,000 in total) and running in parallel with 
the other sender.  The message loss happens regardless of the message size.

We tried with simple_send.py, simple_recv.py as well as send and recv C 
executable files from Qpid Proton 0.12.2.

We even saw a crash in the router with the following message:

qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
sys_mutex_lock: Assertion `result == 0' failed.
Aborted

The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as 
the one taken on March 3 before the router core refactoring.


  was:
We are running two Dispatch Routers each configured for interior mode and the 
second router's configuration includes a connector to the first router.

When we connect one sender to one router and one receiver to the other router 
both listening to the same queue, we see all messages (20,000 in our test) 
being transmitted.

As soon as we start a second sender connected to the same router to which the 
first sender connects and sending to the same queue, we start seeing heavy 
message loss.  Around 20% of messages are lost with each sender attempting to 
send 20,000 messages on its own (40,000 in total) and running in parallel with 
the other sender.  The message loss happens regardless of the message size.

We tried with simple_send.py, simple_recv.py as well as send and recv C 
executable files from Qpid Proton 0.12.2.

We even saw a crash in the router with the following message:

qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
sys_mutex_lock: Assertion `result == 0' failed.
Aborted

The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as 
the one taken on March 3 before the router core refactoring.



> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Priority: Blocker
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router with 
> inter-router role.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-10 Thread Vishal Sharda (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vishal Sharda updated DISPATCH-332:
---
Description: 
We are running two Dispatch Routers each configured for interior mode and the 
second router's configuration includes a connector to the first router.

When we connect one sender to one router and one receiver to the other router 
both listening to the same queue, we see all messages (20,000 in our test) 
being transmitted.

As soon as we start a second sender connected to the same router to which the 
first sender connects and sending to the same queue, we start seeing heavy 
message loss.  Around 20% of messages are lost with each sender attempting to 
send 20,000 messages on its own (40,000 in total) and running in parallel with 
the other sender.  The message loss happens regardless of the message size.

We tried with simple_send.py, simple_recv.py as well as send and recv C 
executable files from Qpid Proton 0.12.2.

We even saw a crash in the router with the following message:

qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
sys_mutex_lock: Assertion `result == 0' failed.
Aborted

The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as 
the one taken on March 3 before the router core refactoring.


  was:
We are running two Dispatch Routers each configured for inter-router mode and 
the second router's configuration includes a connector to the first router.

When we connect one sender to one router and one receiver to the other router 
both listening to the same queue, we see all messages (20,000 in our test) 
being transmitted.

As soon as we start a second sender connected to the same router to which the 
first sender connects and sending to the same queue, we start seeing heavy 
message loss.  Around 20% of messages are lost with each sender attempting to 
send 20,000 messages on its own (40,000 in total) and running in parallel with 
the other sender.  The message loss happens regardless of the message size.

We tried with simple_send.py, simple_recv.py as well as send and recv C 
executable files from Qpid Proton 0.12.2.

We even saw a crash in the router with the following message:

qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
sys_mutex_lock: Assertion `result == 0' failed.
Aborted

The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as 
the one taken on March 3 before the router core refactoring.



> Heavy message loss happening with 2 interconnected routers
> --
>
> Key: DISPATCH-332
> URL: https://issues.apache.org/jira/browse/DISPATCH-332
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Routing Engine
>Affects Versions: 0.6.0
> Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and 
> dependency for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD.
>Reporter: Vishal Sharda
>Priority: Blocker
>
> We are running two Dispatch Routers each configured for interior mode and the 
> second router's configuration includes a connector to the first router.
> When we connect one sender to one router and one receiver to the other router 
> both listening to the same queue, we see all messages (20,000 in our test) 
> being transmitted.
> As soon as we start a second sender connected to the same router to which the 
> first sender connects and sending to the same queue, we start seeing heavy 
> message loss.  Around 20% of messages are lost with each sender attempting to 
> send 20,000 messages on its own (40,000 in total) and running in parallel 
> with the other sender.  The message loss happens regardless of the message 
> size.
> We tried with simple_send.py, simple_recv.py as well as send and recv C 
> executable files from Qpid Proton 0.12.2.
> We even saw a crash in the router with the following message:
> qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
> sys_mutex_lock: Assertion `result == 0' failed.
> Aborted
> The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well 
> as the one taken on March 3 before the router core refactoring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-332) Heavy message loss happening with 2 interconnected routers

2016-05-10 Thread Vishal Sharda (JIRA)
Vishal Sharda created DISPATCH-332:
--

 Summary: Heavy message loss happening with 2 interconnected routers
 Key: DISPATCH-332
 URL: https://issues.apache.org/jira/browse/DISPATCH-332
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
 Environment: Debian 8.3, Qpid Proton 0.12.2 for drivers and dependency 
for Qpid Dispatch, Hardware: 2 CUPs, 15 GB RAM, 30 GB HDD.
Reporter: Vishal Sharda
Priority: Blocker


We are running two Dispatch Routers each configured for inter-router mode and 
the second router's configuration includes a connector to the first router.

When we connect one sender to one router and one receiver to the other router 
both listening to the same queue, we see all messages (20,000 in our test) 
being transmitted.

As soon as we start a second sender connected to the same router to which the 
first sender connects and sending to the same queue, we start seeing heavy 
message loss.  Around 20% of messages are lost with each sender attempting to 
send 20,000 messages on its own (40,000 in total) and running in parallel with 
the other sender.  The message loss happens regardless of the message size.

We tried with simple_send.py, simple_recv.py as well as send and recv C 
executable files from Qpid Proton 0.12.2.

We even saw a crash in the router with the following message:

qdrouterd: /home/vsharda/qpid-dispatch/src/posix/threading.c:71: 
sys_mutex_lock: Assertion `result == 0' failed.
Aborted

The message loss was observed with the 0.6.0 SNAPSHOT taken on May 9 as well as 
the one taken on March 3 before the router core refactoring.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org