[jira] [Created] (DISPATCH-2255) Investigate enable_mask for removal of malloc

2021-10-01 Thread michael goulish (Jira)
michael goulish created DISPATCH-2255:
-

 Summary: Investigate enable_mask for removal of malloc
 Key: DISPATCH-2255
 URL: https://issues.apache.org/jira/browse/DISPATCH-2255
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish


Find out how often enable_mask() is called in log.c

See if it would be practical to remove the malloc() and free() in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-1956) log.c rewrite to reduce locking scope

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated DISPATCH-1956:
--
Summary: log.c rewrite to reduce locking scope  (was: Potential deadlock: 
logging lock vs entity cache lock)

> log.c rewrite to reduce locking scope
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.18.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1956:
-

Assignee: michael goulish  (was: Michael Goulish)

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.18.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-2173) 30-Mesh Behaving Badly

2021-09-28 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-2173.
-
Resolution: Won't Fix

It has been pointed out to me that a 30-mesh is not very realistic.

I was forced to admit that this was probably true.

> 30-Mesh Behaving Badly
> --
>
> Key: DISPATCH-2173
> URL: https://issues.apache.org/jira/browse/DISPATCH-2173
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Major
>
> While testing scale-up of full-mesh networks I encountered some Bad Behavior 
> at 30 nodes. (435 connections.)
> On my first try, 15 of the routers died.
> On my second try, no nodes died – but the network never converged. It 
> consumed all available CPU (32 cores) for three minutes, and the 30 routers 
> printed a combined total of more than 1000 radius calculations to their logs 
> by the time I became wrathful and cast them all into the Bitbucket of Woe.
>  
> For reference, those radius calculations are how I decide that the network 
> has converged – everybody has settled down and agreed on the topology and 
> stopped talking about it. The last thing each router prints to its log is a 
> radius calculation, and then it's done. This may happen multiple times for 
> each router, but when the total number of such prints stops changing – the 
> network has converged.
>  
> For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this 
> test exceeded that by 25x, I decided it was never going to quit.
>  
> ...Now looking at the logs to see if I can figure out what was happening...
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2252) Document router shutdown process

2021-09-23 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419066#comment-17419066
 ] 

michael goulish commented on DISPATCH-2252:
---

...And if I see along the way anything that clearly needs improvement or 
investigation, Jira that too.

> Document router shutdown process
> 
>
> Key: DISPATCH-2252
> URL: https://issues.apache.org/jira/browse/DISPATCH-2252
> Project: Qpid Dispatch
>  Issue Type: Improvement
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Minor
>
> Investigate the router shutdown process in detail, and produce a document in 
> the docs directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2252) Document router shutdown process

2021-09-23 Thread michael goulish (Jira)
michael goulish created DISPATCH-2252:
-

 Summary: Document router shutdown process
 Key: DISPATCH-2252
 URL: https://issues.apache.org/jira/browse/DISPATCH-2252
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish


Investigate the router shutdown process in detail, and produce a document in 
the docs directory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2173) 30-Mesh Behaving Badly

2021-06-15 Thread michael goulish (Jira)
michael goulish created DISPATCH-2173:
-

 Summary: 30-Mesh Behaving Badly
 Key: DISPATCH-2173
 URL: https://issues.apache.org/jira/browse/DISPATCH-2173
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Reporter: michael goulish
Assignee: michael goulish


While testing scale-up of full-mesh networks I encountered some Bad Behavior at 
30 nodes. (435 connections.)

On my first try, 15 of the routers died.

On my second try, no nodes died – but the network never converged. It consumed 
all available CPU (32 cores) for three minutes, and the 30 routers printed a 
combined total of more than 1000 radius calculations to their logs by the time 
I became wrathful and cast them all into the Bitbucket of Woe.

 

For reference, those radius calculations are how I decide that the network has 
converged – everybody has settled down and agreed on the topology and stopped 
talking about it. The last thing each router prints to its log is a radius 
calculation, and then it's done. This may happen multiple times for each 
router, but when the total number of such prints stops changing – the network 
has converged.

 

For 15 or 20 routers, the number of such prints was 20 or 40 or so. When this 
test exceeded that by 25x, I decided it was never going to quit.

 

...Now looking at the logs to see if I can figure out what was happening...

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-2122) Data race on alloc pool descriptor initialization

2021-06-14 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-2122:
-

Assignee: michael goulish  (was: Ken Giusti)

> Data race on alloc pool descriptor initialization
> -
>
> Key: DISPATCH-2122
> URL: https://issues.apache.org/jira/browse/DISPATCH-2122
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.16.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
>  Labels: race-condition, tsan
> Fix For: 1.17.0
>
>
> 65: WARNING: ThreadSanitizer: data race (pid=566240) 
> 65: Read of size 4 at 0x7f67599ae2c0 by thread T4: 
> 65: #0 qd_alloc 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:324 
> (libqpid-dispatch.so+0x6a1f2) 
> 65: #1 new_qd_link_ref_t 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 
> (libqpid-dispatch.so+0x79ae5) 
> 65: #2 qdr_node_connect_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 
> (libqpid-dispatch.so+0x121a78) 
> 65: #3 CORE_link_deliver 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 
> (libqpid-dispatch.so+0x127f1c) 
> 65: #4 qdr_link_process_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 
> (libqpid-dispatch.so+0x1045c6) 
> 65: #5 CORE_link_push 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 
> (libqpid-dispatch.so+0x127d00) 
> 65: #6 qdr_connection_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 
> (libqpid-dispatch.so+0xc4bec) 
> 65: #7 AMQP_writable_conn_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 
> (libqpid-dispatch.so+0x122d42) 
> 65: #8 writable_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 
> (libqpid-dispatch.so+0x7b2e2) 
> 65: #9 qd_container_handle_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 
> (libqpid-dispatch.so+0x7cfd5) 
> 65: #10 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 
> (libqpid-dispatch.so+0x130537) 
> 65: #11 thread_run 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 
> (libqpid-dispatch.so+0x13063a) 
> 65: #12 _thread_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 
> (libqpid-dispatch.so+0xad37a) 
> 65: #13   (libtsan.so.0+0x2d33f) 
> 65: 
> 65: Previous write of size 4 at 0x7f67599ae2c0 by thread T2 (mutexes: write 
> M10): 
> 65: #0 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:307 
> (libqpid-dispatch.so+0x6a14b) 
> 65: #1 qd_alloc 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:325 
> (libqpid-dispatch.so+0x6a20b) 
> 65: #2 new_qd_link_ref_t 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:76 
> (libqpid-dispatch.so+0x79ae5) 
> 65: #3 qdr_node_connect_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:67 
> (libqpid-dispatch.so+0x121a78) 
> 65: #4 CORE_link_deliver 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1971 
> (libqpid-dispatch.so+0x127f1c) 
> 65: #5 qdr_link_process_deliveries 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/transfer.c:178 
> (libqpid-dispatch.so+0x1045c6) 
> 65: #6 CORE_link_push 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:1920 
> (libqpid-dispatch.so+0x127d00) 
> 65: #7 qdr_connection_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/connections.c:414 
> (libqpid-dispatch.so+0xc4bec) 
> 65: #8 AMQP_writable_conn_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/router_node.c:299 
> (libqpid-dispatch.so+0x122d42) 
> 65: #9 writable_handler 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:395 
> (libqpid-dispatch.so+0x7b2e2) 
> 65: #10 qd_container_handle_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/container.c:747 
> (libqpid-dispatch.so+0x7cfd5) 
> 65: #11 handle /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1096 
> (libqpid-dispatch.so+0x130537) 
> 65: #12 thread_run 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/server.c:1121 
> (libqpid-dispatch.so+0x13063a) 
> 65: #13 _thread_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:172 
> (libqpid-dispatch.so+0xad37a) 
> 65: #14   (libtsan.so.0+0x2d33f)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357425#comment-17357425
 ] 

michael goulish commented on DISPATCH-1956:
---

I meant to close my *PR*.  Cripes.

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1956.
-
Resolution: Fixed

Closing this one in favor of a better one coming shortly.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Reopened] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reopened DISPATCH-1956:
---

No, wait.  I didn't mean it to say 'fixed'.  Dang.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-04 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17357401#comment-17357401
 ] 

michael goulish commented on DISPATCH-1956:
---

Hold on – I think I have a much better solution to this.  Need another hour or 
two...

 

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-06-03 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356529#comment-17356529
 ] 

michael goulish commented on DISPATCH-1956:
---

This might be an improvement in code logic, but it will introduce changes in 
behavior that are not relevant to this PR.  Indeed – when I tried it, I got a 
test failure. 

Any code clean-up like this suggestion should be pursued as part of a separate 
PR just for that purpose. And then we can fix whatever issues it may introduce 
as part of that PR.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-24 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350564#comment-17350564
 ] 

michael goulish commented on DISPATCH-1956:
---

Using Ken's reproducer, I cannot see exactly the same BT from latest master. 
But I see many reports of a similar cycle, so I will pick one of those and 
proceed.

 

Here it is:

65: WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock)
65: Cycle in lock order graph: M11
65: 
65: Mutex M9 acquired here while holding mutex M11 in main thread:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 push_event src/entity_cache.c:61 
65: #3 qd_entity_cache_add src/entity_cache.c:67
65: #4 qd_log_source_lh src/log.c:373
65: #5 qd_log_source_lh src/log.c:362
65: #6 qd_log_source src/log.c:381 
65: #7 qd_log_initialize src/log.c:516
65: #8 qd_dispatch src/dispatch.c:90 
65: #9 main_process router/src/main.c:92
65: #10 main router/src/main.c:369
65: 
65: Mutex M11 previously acquired by the same thread here:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 qd_log_source src/log.c:380 
65: #3 qd_log_initialize src/log.c:516
65: #4 qd_dispatch src/dispatch.c:90 
65: #5 main_process router/src/main.c:92
65: #6 main router/src/main.c:369
65: 
65: Mutex M11 acquired here while holding mutex M9 in main thread:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57
65: #2 qd_vlog_impl src/log.c:436
65: #3 qd_log_impl src/log.c:462 
65: #4 qd_python_log src/python_embedded.c:545
65: #5   
65: #6 main_process router/src/main.c:97
65: #7 main router/src/main.c:369
65: 
65: Mutex M9 previously acquired by the same thread here:
65: #0 pthread_mutex_lock  
65: #1 sys_mutex_lock src/posix/threading.c:57 
65: #2 qd_entity_refresh_begin src/entity_cache.c:78
65: #3 ffi_call_unix64  
65: #4 main_process router/src/main.c:97
65: #5 main router/src/main.c:369
65:

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> 

[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347716#comment-17347716
 ] 

michael goulish commented on DISPATCH-1956:
---

Thanks, Ken, that works!

I was commenting out this:

  #deadlock:qd_vlog_impl

I can see it now.

Tally ho!

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
> Attachments: tsan.supp
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347377#comment-17347377
 ] 

michael goulish commented on DISPATCH-1956:
---

I will try the QE technique, and something I haven't tried before ... running 
multiple ctests at once!    Yow!

Except we're never going to establish the original frequency. It is unknowable. 
Imponderable. Ineffable.

SO I will run the test enough times to support a proof-by-vigorous-handwaving!

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347365#comment-17347365
 ] 

michael goulish commented on DISPATCH-1956:
---

I am trying to restore my 'mgoulish' RH account, but I need help from someone 
with magical powers.

 

I assumed that if I ran ctest, that would be sufficient. But now that you 
inform me that TSan issues do not reliably manifest, I will run ctest more 
times and see if I can get it to show itself.

But if we don't know how it was observed, nor with what frequency  how will 
we know when it is fixed?

 

 

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1956) Potential deadlock: logging lock vs entity cache lock

2021-05-19 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347330#comment-17347330
 ] 

michael goulish commented on DISPATCH-1956:
---

As of recent code on master, this is gone.

If I unsuppress the following issues:

{{   #race:qd_vlog_impl}}
{{   #deadlock:qd_vlog_impl}}
{{   #race:qd_log_entity}}

...and then run    {{ctest -VV}}    I get 2676 mentions of the qd_vlog_impl 
race (yikes!), 6 mentions of the qd_log_entity race, and 0 mentions of this 
qd_vlog_impl deadlock.

 

I guess this should be closed, but I do not seem to have permission to close it.

I will try to get my account fixed.

> Potential deadlock: logging lock vs entity cache lock
> -
>
> Key: DISPATCH-1956
> URL: https://issues.apache.org/jira/browse/DISPATCH-1956
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.15.0
>Reporter: Ken Giusti
>Assignee: Michael Goulish
>Priority: Major
>  Labels: deadlock, tsan
> Fix For: 1.17.0
>
>
> {noformat}
> WARNING: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (pid=1474955) 
>  Cycle in lock order graph: M11 (0x7b1002c0) => M9 (0x7b100240) => 
> M11 
>  
>  Mutex M9 acquired here while holding mutex M11 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 push_event 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:63 
> (libqpid-dispatch.so+0x6fa13) 
>  #3 qd_entity_cache_add 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/entity_cache.c:69 
> (libqpid-dispatch.so+0x6fc26) 
>  #4 qd_alloc_init 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:302 
> (libqpid-dispatch.so+0x5878b) 
>  #5 qd_alloc /home/kgiusti/work/dispatch/qpid-dispatch/src/alloc_pool.c:318 
> (libqpid-dispatch.so+0x5878b) 
>  #6 new_qd_log_entry_t /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:61 
> (libqpid-dispatch.so+0x75891) 
>  #7 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:426 
> (libqpid-dispatch.so+0x76205) 
>  #8 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #9 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #10   (libpython3.8.so.1.0+0x12a23b) 
>  #11 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #12 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
>  Hint: use TSAN_OPTIONS=second_deadlock_stack=1 to get more informative 
> warning message 
>  
>  Mutex M11 acquired here while holding mutex M9 in main thread: 
>  #0 pthread_mutex_lock  (libtsan.so.0+0x528ac) 
>  #1 sys_mutex_lock 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/posix/threading.c:57 
> (libqpid-dispatch.so+0x8cb7d) 
>  #2 qd_vlog_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:425 
> (libqpid-dispatch.so+0x76200) 
>  #3 qd_log_impl /home/kgiusti/work/dispatch/qpid-dispatch/src/log.c:453 
> (libqpid-dispatch.so+0x76580) 
>  #4 qd_python_log 
> /home/kgiusti/work/dispatch/qpid-dispatch/src/python_embedded.c:547 
> (libqpid-dispatch.so+0x8d1cb) 
>  #5   (libpython3.8.so.1.0+0x12a23b) 
>  #6 main_process 
> /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:95 
> (qdrouterd+0x40281c) 
>  #7 main /home/kgiusti/work/dispatch/qpid-dispatch/router/src/main.c:367 
> (qdrouterd+0x4024fc) 
>  
> SUMMARY: ThreadSanitizer: lock-order-inversion (potential deadlock) 
> (/lib64/libtsan.so.0+0x528ac) in __interceptor_pthread_mutex_lock
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-29 Thread michael goulish (Jira)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-2088.
-
Resolution: Fixed

Fixed by Chuck's PR:

https://github.com/apache/qpid-dispatch/pull/1174

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Assignee: Charles E. Rolke
>Priority: Blocker
> Fix For: 1.16.0
>
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-28 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334743#comment-17334743
 ] 

michael goulish commented on DISPATCH-2088:
---

Here you go!

 

 


(gdb) thread apply all bt

{color:#172b4d}Thread 33{color} (Thread 0x7fa320ff9640 (LWP 53393)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2fb60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

{color:#172b4d}Thread 32{color} (Thread 0x7fa2e8ff9640 (LWP 53408)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2a8000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
--Type  for more, q to quit, c to continue without paging--
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

{color:#172b4d}Thread 31{color} (Thread 0x7fa2e37fe640 (LWP 53409)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2bc000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7fa30effd640 (LWP 53396)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa30b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
--Type  for more, q to quit, c to continue without paging--
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7fa30f7fe640 (LWP 53395)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2fc000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-dispatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7fa30cff9640 (LWP 53400)):
#0 0x7fa343d7f6c2 in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1 0x7fa343dbdcbb in suspend (ts=0x7fa2e4000b60, p=0xd46d30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:393
#2 next_event_batch (p=0xd46d30, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2455
#3 0x7fa343e9cf9f in thread_run (arg=0xb52c00) at 
/home/mick/latest/qpid-di--Type  for more, q to quit, c to continue 
without paging--c
spatch/src/server.c:1105
#4 0x7fa343d793f9 in start_thread () from /lib64/libpthread.so.0
#5 0x7fa3439f8b53 in clone () from /lib64/libc.so.6

*{color:#de350b}Thread 27{color}* (Thread 0x7fa2eb7fe640 (LWP 53403)):
#0 0x7fa343d8350c in send () from /lib64/libpthread.so.0
#1 0x7fa343dbe718 in snd (s=512, b=, fd=25) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:333
#2 pni_raw_write (send=, set_error=, 
sock=, conn=) at 
/home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:566
#3 pni_raw_write (send=, set_error=, sock=25, 
conn=0x7fa2dc129cf0) at 
/home/mick/latest/qpid-proton/c/src/proactor/raw_connection.c:554
#4 pni_raw_connection_process (sched_ready=, t=0x7fa2dc129c30) 
at /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:388
#5 process (tsk=0x7fa2dc129c30) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2230
#6 next_event_batch (p=, can_block=true) at 

[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-28 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334707#comment-17334707
 ] 

michael goulish commented on DISPATCH-2088:
---

 

I cannot repro with Debug build.

400 iterations with no failure.

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334451#comment-17334451
 ] 

michael goulish commented on DISPATCH-2088:
---

I'm afraid only the last few lines have anything in them.

 

 

2021-04-28 01:04:03.818860 -0400 ROUTER_CORE (info) [C190][L379] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818868 -0400 ROUTER_CORE (info) [C191][L380] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818877 -0400 ROUTER_CORE (info) [C191][L381] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818882 -0400 ROUTER_CORE (info) [C192][L382] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818893 -0400 ROUTER_CORE (info) [C192][L383] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818905 -0400 ROUTER_CORE (info) [C193][L384] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818913 -0400 ROUTER_CORE (info) [C193][L385] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818926 -0400 ROUTER_CORE (info) [C194][L386] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818931 -0400 ROUTER_CORE (info) [C194][L387] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818944 -0400 ROUTER_CORE (info) [C195][L388] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818949 -0400 ROUTER_CORE (info) [C195][L389] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.818957 -0400 ROUTER_CORE (info) [C196][L390] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819046 -0400 ROUTER_CORE (info) [C196][L391] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819052 -0400 ROUTER_CORE (info) [C197][L392] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819059 -0400 ROUTER_CORE (info) [C197][L393] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819074 -0400 ROUTER_CORE (info) [C198][L394] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819081 -0400 ROUTER_CORE (info) [C198][L395] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819087 -0400 ROUTER_CORE (info) [C199][L396] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:03.819096 -0400 ROUTER_CORE (info) [C199][L397] Stuck 
delivery: At least one delivery on this link has been undelivered/unsettled for 
more than 10 seconds
2021-04-28 01:04:34.431844 -0400 TCP_ADAPTOR (info) [C181] 
PN_RAW_CONNECTION_DISCONNECTED connector
2021-04-28 01:04:34.431903 -0400 TCP_ADAPTOR (info) [C180] EOS
2021-04-28 01:04:34.431956 -0400 ROUTER_CORE (info) [C181][L361] Link lost: 
del=1 presett=1 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no
2021-04-28 01:04:34.432011 -0400 ROUTER_CORE (info) [C181][L360] Link lost: 
del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=1 blocked=no
2021-04-28 01:04:34.432026 -0400 ROUTER_CORE (info) [C181] Connection Closed
2021-04-28 01:04:34.432479 -0400 TCP_ADAPTOR (info) [C183] 
PN_RAW_CONNECTION_DISCONNECTED connector
./r_one_router_Br: line 7: 27584 Segmentation fault (core dumped) qdrouterd 
--config ./Br_1.conf

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 

[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1780#comment-1780
 ] 

michael goulish commented on DISPATCH-2088:
---

Apparently it helps if you let the code cool down a while.

I tried it again after a break and it crashed immediately – same backtrace.  ( 
And with "-p 10" on the iperf client. )

So that is 2 crashes in 42 attempts.

 

Here is my router config file:

router {
 mode: interior
 id: Br
 workerThreads: 32
 }

tcpListener {
 host: 10.10.10.1
 port: 9090
 address: throughput
 siteId: my-site
 }


 tcpConnector {
 host: 10.10.10.1
 port: 8080
 address: throughput
 siteId: my-site
 }

 

 

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333284#comment-17333284
 ] 

michael goulish commented on DISPATCH-2088:
---

*The iperf commands I used in the test:*

   iperf3 -s -p 8080    # server

   iperf3 -c 10.10.10.1 -p 9090 -t 60 -P 10    # client



( The router's TCP listener was on port 9090, while its TCP connector was on 
8080. )

 

*Reproducability:*   

 

  Not trivial.

  I reduced test time to 10 second and tried 40 more times – without success. 
10 of those trials were with 100 parallel threads on the iperf sender, and 10 
of them were with 200 parallel threads.

 

> SEGV in qd_buffer_dec_fanout
> 
>
> Key: DISPATCH-2088
> URL: https://issues.apache.org/jira/browse/DISPATCH-2088
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> *code from 2021-04-26-afternoon*
> {
>   dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
>   proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936
> }
>  
> *Test*
>  * Doing 1-router TCP throughput testing across high-bandwidth link.
>  * Router has 32 worker threads.
>  * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
>  * Router is sustaining 10+ Gbit/sec during test.
>  * SEGV happens at end of test.
>  
> Here's the backtrace:
>  
> {color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
> {color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
> {color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
> {color:#de350b}#3 qd_message_stream_data_release 
> (stream_data=0x7f01b80038c8){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
> {color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
> (conn=conn@entry=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
> {color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
> (tc=0x7f0218012a88){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
> {color:#de350b}#6 0x7f023707491d in router_core_thread 
> (arg=0x1e6ccb0){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
> {color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-2088) SEGV in qd_buffer_dec_fanout

2021-04-27 Thread michael goulish (Jira)
michael goulish created DISPATCH-2088:
-

 Summary: SEGV in qd_buffer_dec_fanout
 Key: DISPATCH-2088
 URL: https://issues.apache.org/jira/browse/DISPATCH-2088
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Protocol Adaptors
Reporter: michael goulish


*code from 2021-04-26-afternoon*

{

  dispatch: (main) 22689e4f95ae1945e61eec814d3ab3e2d4259f04
  proton: (main) 08b301a97c834e002d41ee852bba1288fe83b936

}

 

*Test*
 * Doing 1-router TCP throughput testing across high-bandwidth link.
 * Router has 32 worker threads.
 * iperf client is using "-P 10" flag, i.e. doing 10 parallel streams.  
 * Router is sustaining 10+ Gbit/sec during test.
 * SEGV happens at end of test.

 

Here's the backtrace:

 

{color:#de350b}#0 sys_atomic_sub (value=1, ref=0x14){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:48{color}
{color:#de350b}#1 sys_atomic_dec (ref=0x14){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/atomic.h:212{color}
{color:#de350b}#2 qd_buffer_dec_fanout (buf=0x0){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/include/qpid/dispatch/buffer.h:177{color}
{color:#de350b}#3 qd_message_stream_data_release 
(stream_data=0x7f01b80038c8){color}
{color:#de350b} at /home/mick/latest/qpid-dispatch/src/message.c:2627{color}
{color:#de350b}#4 0x7f0237035895 in flush_outgoing_buffs 
(conn=conn@entry=0x7f0218012a88){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:431{color}
{color:#de350b}#5 0x7f023703905e in free_qdr_tcp_connection 
(tc=0x7f0218012a88){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/adaptors/tcp_adaptor.c:455{color}
{color:#de350b}#6 0x7f023707491d in router_core_thread 
(arg=0x1e6ccb0){color}
{color:#de350b} at 
/home/mick/latest/qpid-dispatch/src/router_core/router_core_thread.c:239{color}
{color:#de350b}#7 0x7f0236f663f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#8 0x7f0236be5b53 in clone () from /lib64/libc.so.6{color}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish edited comment on PROTON-2362 at 4/1/21, 4:49 PM:
--

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

With all actions enabled crash 10
{{ {{-no-close-connect    crash 12

{{-no-listen           }}{{crash 0 hang 2}}
{{ {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}}}
{{ {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}}}
{{ {{-no-close-connect    crash 10 hang 2
{{ {{-no-wake crash 11
{{ {{-no-timeout  crash 11
 {{no-cancel-timeout    crash 12}}

 

 


was (Author: michaelgoulish):
OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
 {{-no-close-connect    crash 12}}

-no-listen                        crash 0 hang 2
 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
 {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
 {{-no-close-connect    crash 10 hang 2}}
 {{-no-wake crash 11}}
 {{-no-timeout  crash 11}}
 {{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish edited comment on PROTON-2362 at 4/1/21, 4:48 PM:
--

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
 {{-no-close-connect    crash 12}}

-no-listen                        crash 0 hang 2
 {color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
 {color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
 {{-no-close-connect    crash 10 hang 2}}
 {{-no-wake crash 11}}
 {{-no-timeout  crash 11}}
 {{no-cancel-timeout    crash 12}}

 

 


was (Author: michaelgoulish):
OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
{{-no-close-connect    crash 12}}

{{ -no-listen   crash 0 hang 2}}
{color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
{color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
{{-no-close-connect    crash 10 hang 2}}
{{-no-wake crash 11}}
{{-no-timeout  crash 11}}
{{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313307#comment-17313307
 ] 

michael goulish commented on PROTON-2362:
-

OK, here's the whole list.

64 threads, 30 seconds per run, 50 runs for each feature.

 

{{With all actions enabled crash 10}}
{{-no-close-connect    crash 12}}

{{ -no-listen   crash 0 hang 2}}
{color:#de350b}{{-no-close-listen NO PROBLEMS}}{color}
{color:#de350b}{{-no-connect  NO PROBLEMS}}{color}
{{-no-close-connect    crash 10 hang 2}}
{{-no-wake crash 11}}
{{-no-timeout  crash 11}}
{{no-cancel-timeout    crash 12}}

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-04-01 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/PROTON-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17313151#comment-17313151
 ] 

michael goulish commented on PROTON-2362:
-

I am running batches of 50 threaderciser tests, 64 threads each, turning off 
one feature at a time, and counting failures.

See if you can spot the case, below, that I feel may be interesting.

 

All Features On   crash: 10

-no-close-connect    crash: 12

-no-listen   crash: 0    hang: 2

-no-close-listen  (y)   :)    (*)(*r)(*g) {color:#de350b}*NO PROBLEMS 
(*g)(*r)(*)    :)   (y)*      {color}

{color:#de350b}{color:#172b4d}~{color:#c1c7d0} 
(sorry, I can't figure out how to make the above text 
blink){color}~{color}{color} 

 

 

 

p.s.

    _"Brontosaurus"_ means _"Thunder Lizard"_, a kind of dinosaur.  

  I do not have a dinosaur.

  _"Brontonomicon",_ on the other hand, means _"What the Thunder Said"_    
or   _"Words of the Thunder"_   or possibly   _"The Book of Thunder"_.

  That's what I've got. 

  And when the thunder speaks, the software had better listen.

 

 

> c-threaderciser timed out on 32-core machine.
> -
>
> Key: PROTON-2362
> URL: https://issues.apache.org/jira/browse/PROTON-2362
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Priority: Major
>
> Using recent master – maybe 3 days old or so – I just ran Proton's ctest, 
> after turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 
> threads.
>  
> Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.
> ( 1.5e18 femtoseconds. )
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-2362) c-threaderciser timed out on 32-core machine.

2021-03-31 Thread michael goulish (Jira)
michael goulish created PROTON-2362:
---

 Summary: c-threaderciser timed out on 32-core machine.
 Key: PROTON-2362
 URL: https://issues.apache.org/jira/browse/PROTON-2362
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish


Using recent master – maybe 3 days old or so – I just ran Proton's ctest, after 
turning on THREADERCISER.  I ran it on a box with 32 physical cores, 64 threads.

 

Test number 6 – c-threaderciser – failed with timeout after 1500 seconds.

( 1.5e18 femtoseconds. )

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-31 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17312627#comment-17312627
 ] 

michael goulish commented on DISPATCH-2014:
---

I just ran Proton's ctest suite with the THREADERCISER turned on – on my box 
with 32 physical cores, 64 'threads'.

Test number 6 – "c-threaderciser" – timed out after 1500 seconds.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> 

[jira] [Commented] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-22 Thread michael goulish (Jira)


[ 
https://issues.apache.org/jira/browse/DISPATCH-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17306496#comment-17306496
 ] 

michael goulish commented on DISPATCH-2014:
---

When I used 64 dispatch worker threads and hit it with 200 'hey' senders – each 
test 30 seconds long – it died 3 out of 4 times.  (SEGV)

 

When I went down to 32 dispatch worker threads, it survived 3 out of 3 tests 
with 200 senders, and then 3 out of 3 tests with 500 senders, and then 3 out of 
3 tests with 1000 senders.

 

> Router TCP Adapter crash with high thread count and load
> 
>
> Key: DISPATCH-2014
> URL: https://issues.apache.org/jira/browse/DISPATCH-2014
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Protocol Adaptors
>Reporter: michael goulish
>Priority: Major
>
> Using latest proton and dispatch master code as of 3 hours ago.
> Testing router TCP adapter on a machine with 32 cores / 64 threads.
> I gave the router 64 worker threads, then used 'hey' load generator to send 
> it HTTP requests to a TCP listener which router forwarded to Nginx on same 
> machine. 
> Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
> sender throttled to 10 messages per second.
> It survived many tests, but crashed around test with 200 senders.
> I believe this is easily repeatable – I will go check that now.
>  
> Here is the thread that crashed:
> {color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b} #2 process (tsk=){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
> {color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
> {color:#de350b} at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
> {color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
>  
> {color:#172b4d}And here are all the threads:{color}
> {color:#de350b}(gdb) thread apply all bt{color}
> {color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
> {color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
> {color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
> batch=batch@entry=0x7f326811a578) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
> {color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1140{color}
> {color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
> {color:#de350b}#3 process (tsk=) at 
> /home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, 
> c to continue without paging--{color}
> {color:#de350b}r/epoll.c:2248{color}
> {color:#de350b}#4 next_event_batch (p=, can_block=true) at 
> /home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
> {color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
> /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
> {color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}
> {color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
> {color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
> /lib64/libpthread.so.0{color}
> {color:#de350b}#1 

[jira] [Created] (DISPATCH-2014) Router TCP Adapter crash with high thread count and load

2021-03-22 Thread michael goulish (Jira)
michael goulish created DISPATCH-2014:
-

 Summary: Router TCP Adapter crash with high thread count and load
 Key: DISPATCH-2014
 URL: https://issues.apache.org/jira/browse/DISPATCH-2014
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Protocol Adaptors
Reporter: michael goulish


Using latest proton and dispatch master code as of 3 hours ago.

Testing router TCP adapter on a machine with 32 cores / 64 threads.

I gave the router 64 worker threads, then used 'hey' load generator to send it 
HTTP requests to a TCP listener which router forwarded to Nginx on same 
machine. 

Multiple tests with increasing number of parallel senders: 10, 20, 30,...Each 
sender throttled to 10 messages per second.

It survived many tests, but crashed around test with 200 senders.

I believe this is easily repeatable – I will go check that now.

 

Here is the thread that crashed:

{color:#de350b} #0 0x7f33186a0684 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b} #1 0x7f33186e2848 in lock (m=){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b} #2 process (tsk=){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2248{color}
{color:#de350b} #3 next_event_batch (p=0x10ed970, can_block=true){color}
{color:#de350b} at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
{color:#de350b} #4 0x7f33187c192f in thread_run (arg=0x10f6e40){color}
{color:#de350b} at /home/mick/latest/qpid-dispatch/src/server.c:1107{color}
{color:#de350b} #5 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b} #6 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

 

{color:#172b4d}And here are all the threads:{color}


{color:#de350b}(gdb) thread apply all bt{color}

{color:#de350b}Thread 65 (Thread 0x7f3244ff9640 (LWP 36500)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
{color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
{color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
batch=batch@entry=0x7f326811a578) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
{color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 
/home/mick/latest/qpid-dispatch/src/server.c:1140{color}
{color:#de350b}#7 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#8 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

{color:#de350b}Thread 64 (Thread 0x7f327640 (LWP 36481)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186e2b7e in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 process (tsk=) at 
/home/mick/latest/qpid-proton/c/src/proacto--Type  for more, q to quit, c 
to continue without paging--{color}
{color:#de350b}r/epoll.c:2248{color}
{color:#de350b}#4 next_event_batch (p=, can_block=true) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2423{color}
{color:#de350b}#5 0x7f33187c192f in thread_run (arg=0x10f6e40) at 
/home/mick/latest/qpid-dispatch/src/server.c:1107{color}
{color:#de350b}#6 0x7f331869e3f9 in start_thread () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#7 0x7f33181b2b53 in clone () from /lib64/libc.so.6{color}

{color:#de350b}Thread 63 (Thread 0x7f322f7fe640 (LWP 36502)):{color}
{color:#de350b}#0 0x7f33186a7ea0 in __lll_lock_wait () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#1 0x7f33186a08f5 in pthread_mutex_lock () from 
/lib64/libpthread.so.0{color}
{color:#de350b}#2 0x7f33186dfc5f in lock (m=0x10edc90) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll-internal.h:326{color}
{color:#de350b}#3 pni_raw_connection_done (rc=0x10ed3b8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll_raw_connection.c:423{color}
{color:#de350b}#4 pn_proactor_done (batch=0x10ed970, p=0x10ed970) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2696{color}
{color:#de350b}#5 pn_proactor_done (p=0x10ed970, 
batch=batch@entry=0x7f32c8063af8) at 
/home/mick/latest/qpid-proton/c/src/proactor/epoll.c:2676{color}
{color:#de350b}#6 0x7f33187c1a11 in thread_run (arg=0x10f6e40) at 

[jira] [Assigned] (DISPATCH-1368) Link (address) priority is ignored by the second hop router

2019-06-14 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1368:
-

Assignee: michael goulish

> Link (address) priority is ignored by the second hop router
> ---
>
> Key: DISPATCH-1368
> URL: https://issues.apache.org/jira/browse/DISPATCH-1368
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.8.0
>Reporter: Ken Giusti
>Assignee: michael goulish
>Priority: Major
> Fix For: 1.9.0
>
>
> Address-based priority is only enforced on the egress of the first hop router.
> In a 3 router linear network:
> Sender --> Router A --> Router B --> Router C --> Receiver
> Message delivery is properly sent via the inter-router links between Router A 
> and Router B.
> However, those messages are all forwarded on the default priority (4) between 
> router B and C.
> [C --> Receiver is fine - priority doesn't apply to egress endpoint links]
> The expectation is that the message priority is honored across all 
> inter-router links.
> [Reproducer|https://github.com/kgiusti/dispatch/tree/DISPATCH-1368-reproducer]
> Build the router, then run the priority test (ctest -VV -R priority).
> Then grep for "DELIVERIES" in the log files:
>  grep "DELIVERIES" 
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/*.log
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/A.log:2019-06-14
>  11:10:00.324389 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=20 8=0 7=28 
> 6=0 5=0 4(default)=21 3=0 2=12 1=0 0=343 
> (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188)
> tests/system_test.dir/system_tests_priority/CongestionTests/setUpClass/B.log:2019-06-14
>  11:10:00.302570 -0400 ROUTER (error) DELIVERIES PER PRIORITY: 9=0 8=0 7=0 
> 6=0 5=0 4(default)=172 3=0 2=0 1=0 0=286 
> (/home/kgiusti/work/dispatch/qpid-dispatch/src/router_core/router_core_thread.c:188)
> ...
> Notice the counts on A (tx to B) - these are correct.
> On B all msgs are sent priority 4 (default) to C - this is wrong.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-2046) pn_connection_set_container should check for null or empty string

2019-05-13 Thread michael goulish (JIRA)
michael goulish created PROTON-2046:
---

 Summary: pn_connection_set_container should check for null or 
empty string
 Key: PROTON-2046
 URL: https://issues.apache.org/jira/browse/PROTON-2046
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


pn_connection_set_container() makes no checks of the ID string that gets passed 
in. This value is expected to be unique, so it should probably check for NULL 
and empty-string.

I was passing in empty strings and it was cheerfully accepting them.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809913#comment-16809913
 ] 

michael goulish commented on DISPATCH-1309:
---

Chuck –

Are you sure you mean "5672" ?

More normal for the console would be "5673".

I could not get mine to crash, with 50 repetitions of \{ connect + disconnect } 
, with 5673 – with one router or my whole Death Star network.

When I tried it 5672, I could not get it to connect at all.

 

 

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809806#comment-16809806
 ] 

michael goulish commented on DISPATCH-1309:
---

Yee hah!

Chuck's comment reminded me – I believe I have also seen crashes *only* when 
the console was attached.

Furthermore, I think I have seen crashes  maybe not *only* but *more often* 
when I was *shutting down* a console *while* the network was still running.

 

I tried that just now – with 1.6 code.  I had to start, stop, and restart the 
console 11 times, but then it happened. Boom. With this core:

 

#0 pn_collector_put (collector=0x4242424242424242, 
 clazz=0x7f0e99c38520 , context=0x0,
 type=type@entry=PN_CONNECTION_WAKE)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/event.c:134
#1 0x7f0e99ca6258 in http_thread_run (v=0x2036850)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/http-libwebsockets.c:731
#2 0x7f0e995df50b in start_thread () from /lib64/libpthread.so.0
#3 0x7f0e988a338f in clone () from /lib64/libc.so.6

 

Which is one I have seen before.

Now I have *some hope* of getting some kind of baseline, based on number of 
crashes per console stop-and-restart, so that I can do some kind of vivisection 
of the code.

 

 

 

 

 

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-02 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808055#comment-16808055
 ] 

michael goulish commented on DISPATCH-1309:
---

And since the above comment I have not been able to get another crash

:(

 

 

> Various crashes in 1.6 release
> --
>
> Key: DISPATCH-1309
> URL: https://issues.apache.org/jira/browse/DISPATCH-1309
> Project: Qpid Dispatch
>  Issue Type: Bug
>Affects Versions: 1.6.0
> Environment: System 'unused':(
> Fedora 5.0.3-200.fc29.x86_64,
> Python 2.7.15,
> Proton master @ eab1f.
> System 'taj':(
> Fedora 4.18.16-200.fc28.x86_64,
> Python 3.6.6,
> Proton master @ 68b38
>Reporter: Chuck Rolke
>Priority: Major
> Attachments: DISPATCH-1309-backtraces.txt, 
> DISPATCH-1309-gen_configs_linear.py
>
>
> qpid-dispatch master @ 51244, which is very close to the 1.6 release, has 
> various crashes.
> The test network is 12 routers spread over two systems. (Configuration 
> generator to be attached.) Four interior routers are in linear arrangement 
> with A and C on one system ('unused'), and B and D on the other system 
> ('taj'). Each system then attaches four edge routers, one to each interior 
> router.
> Running lightweight tests, like proton cpp simple_send and simple_recv to 
> ports on INTA and INTB interior routers leads to a crash on INTC. The crashes 
> typically look like reuse of structures after they have been freed (addresses 
> are 0x). Other crashes hint of general memory corruption 
> (crashes in malloc.c).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1309) Various crashes in 1.6 release

2019-04-02 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808007#comment-16808007
 ] 

michael goulish commented on DISPATCH-1309:
---

OK! I thought Mercury might help reproduce this more easily, and ... it did.

I made a 13-router star-shaped network  ( the Death Star ) – 12 routers in a 
circle and one at the center.

There was 1 receiver at every router on the circle all hoping for 1 million 
messages. 1 sender at the center router, trying to make all the receivers happy.

 

It ran for a good amount of time – I could see the traffic turning all the 
links green using the console – and then 7 routers crashed all at once, 
generating 5 different types of core files.

 

Which follow.

 

##
 # Type 1
 ##

#0 0x7f230750 in raise () from /lib64/libc.so.6
#1 0x7f231d31 in abort () from /lib64/libc.so.6
#2 0x7f23bbba905a in __assert_fail_base () from /lib64/libc.so.6
#3 0x7f23bbba90d2 in __assert_fail () from /lib64/libc.so.6
#4 0x7f23bc9b8e6f in __pthread_tpp_change_priority () from 
/lib64/libpthread.so.0
#5 0x7f23bc9af8fb in __pthread_mutex_lock_full () from 
/lib64/libpthread.so.0
#6 0x7f23bd044309 in qdra_config_address_create_CT (core=0x7f23a805e0d8,
 name=, query=0x7f23a00307d8, in_body=)
 at 
/home/mick/latest/qpid-dispatch-1.6.0/src/router_core/agent_config_address.c:446
#7 0x in ?? ()

in qdra_config_address_create_CT
 (gdb) list
 441 addr->priority = priority;
 442 pattern = 0;
 443
 444 qd_iterator_reset_view(iter, ITER_VIEW_ALL);
 445 qd_parse_tree_add_pattern(core->addr_parse_tree, iter, addr);
 446 DEQ_INSERT_TAIL(core->addr_config, addr);
 447
 448 //
 449 // Compose the result map for the response.
 450 //

 

##
 # Type 2
 ##

#0 connection_wake (conn=)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/remote_sasl.c:241
 #1 0x7f7cef4884cb in pni_sasl_impl_free (transport=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:181
 #2 pn_sasl_free (transport=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/sasl/sasl.c:764
 #3 0x7f7cef480b90 in pn_transport_finalize (object=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:665
 #4 0x7f7cef472a99 in pn_class_decref (clazz=0x7f7cef69aca0 ,
 clazz@entry=0x7f7cef69a520 , object=0x7f7cd4015180)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:95
 #5 0x7f7cef472cbf in pn_decref (object=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/object/object.c:253
 #6 0x7f7cef480851 in pn_transport_free (transport=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/transport.c:644
 #7 0x7f7cef47b994 in pn_connection_driver_destroy 
(d=d@entry=0x7f7cd4014d98)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/core/connection_driver.c:94
 #8 0x7f7cef25b604 in pconnection_final_free (pc=0x7f7cd40147f0)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:889
 #9 0x7f7cef25c4fc in pconnection_cleanup (pc=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:905
 #10 0x7f7cef25d295 in pconnection_process (pc=0x7f7cd40147f0, 
events=,
 timeout=timeout@entry=false, topup=false, is_io_2=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:1273
 #11 0x7f7cef25dd03 in proactor_do_epoll (p=0x1ee9600, 
can_block=can_block@entry=true)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2139
 #12 0x7f7cef25ef2a in pn_proactor_wait (p=)
 at /home/mick/latest/qpid-proton-0.26.0/c/src/proactor/epoll.c:2157
 #13 0x7f7cef7057af in thread_run (arg=0x1db7960)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/server.c:994
 #14 0x7f7cef04150b in start_thread () from /lib64/libpthread.so.0
 #15 0x7f7cee30538f in clone () from /lib64/libc.so.6

 

##
 # Type 3
 ##


 #0 qd_hash_internal_retrieve_with_hash (hash=,
 key=key@entry=0x7f140c097ad8, h=, h=)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:204
#1 0x7f1432401a15 in qd_hash_internal_retrieve (key=0x7f140c097ad8, 
h=0x7f141c000bc0)
 at /home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:219
#2 qd_hash_retrieve (h=0x7f141c000bc0, key=key@entry=0x7f140c097ad8,
 val=val@entry=0x7ffe6c6ac638) at 
/home/mick/latest/qpid-dispatch-1.6.0/src/hash.c:270
#3 0x7f14324312e6 in qdr_lookup_terminus_address_CT (core=0xb656c0,
 dir=, conn=conn@entry=0x7f140c076798, terminus=0x7f140c086258,
 link_route=link_route@entry=0x7ffe6c6ac77d,
 unavailable=unavailable@entry=0x7ffe6c6ac77e, core_endpoint=0x7ffe6c6ac77f,
 accept_dynamic=true, 

[jira] [Closed] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-22 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1280.
-
Resolution: Fixed

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-22 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798961#comment-16798961
 ] 

michael goulish commented on DISPATCH-1280:
---

LWS developer pushed patch. I got through 100 iterations of my reproducer on 
master with no crash.  (I could not do enough iterations before to get a real 
baseline, but I did get one crash in first 20 tries.)

I think it's a deadbug.

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Comment Edited] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-21 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302
 ] 

michael goulish edited comment on DISPATCH-1280 at 3/21/19 5:53 PM:


Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 ( On LWS version 3.0.1 the crash hapens every time. )

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    [https://github.com/warmcat/libwebsockets/issues/1527]

 

 


was (Author: mgoulish):
Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    https://github.com/warmcat/libwebsockets/issues/1527

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-21 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798302#comment-16798302
 ] 

michael goulish commented on DISPATCH-1280:
---

Well, kinda.

I saw one crash using LWS latest master, and then I tried 20 more times and all 
I got was this error message:

NOTICE: lws_server_socket_service_ssl: client did not send a valid tls hello 
(default vhost default)

 

But!  The one crash I did see had basically identical backtrace as in version 
3.0.1. (See previous comment.)

I raised an issue with LWS:

    https://github.com/warmcat/libwebsockets/issues/1527

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797619#comment-16797619
 ] 

michael goulish commented on DISPATCH-1280:
---

reproduced with simple example.

What I did:

  1. from the lws code tree for 3.0.1 (7 Sep 2018, 
fb31602ff9aeb88267fb8132d48df31195782ae5) use the example 
minimal-examples/http-server/minimal-http-server-tls.

  2. Alter the .c file this way:

     info.options = LWS_SERVER_OPTION_DO_SSL_GLOBAL_INIT |    
{color:#FF}LWS_SERVER_OPTION_ALLOW_NON_SSL_ON_SSL_PORT{color} ;

3. build and run it. It listens on 
[https://localhost:7681|https://localhost:7681/] 

4. In browser, do this request:  [http://localhost:7681/index.html]

big bada boom.

 

#0 0x7f63281fff60 in SSL_get0_alpn_selected () from /lib64/libssl.so.1.1
#1 0x7f632880ea17 in lws_tls_server_conn_alpn () from 
/usr/local/lib/libwebsockets.so.13
#2 0x7f632880ee98 in lws_server_socket_service_ssl () from 
/usr/local/lib/libwebsockets.so.13
#3 0x7f632880d1ad in rops_handle_POLLIN_listen () from 
/usr/local/lib/libwebsockets.so.13
#4 0x7f6328800389 in lws_service_fd_tsi () from 
/usr/local/lib/libwebsockets.so.13
#5 0x7f6328816ce7 in _lws_plat_service_tsi.part.1 () from 
/usr/local/lib/libwebsockets.so.13
#6 0x7f6328800455 in lws_service () from /usr/local/lib/libwebsockets.so.13
#7 0x00400965 in main (argc=1, argv=0x7fff71638b68) at 
minimal-http-server-tls.c:87

 

Next I will see if this still happens with latest code.

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797575#comment-16797575
 ] 

michael goulish commented on DISPATCH-1280:
---

Looked at closed issues back to release date of v2.4.2  (8 March 2018).

Nothing looks like the issue we are seeing.

Closed issues are here:

https://github.com/warmcat/libwebsockets/issues?page=11=is%3Aissue+is%3Aclosed

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-20 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797391#comment-16797391
 ] 

michael goulish commented on DISPATCH-1280:
---

It sounds like this happens all the time. Is that true? Not a rare occurrence? 

 

 

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1280) http against https enabled listener causes segfault

2019-03-18 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1280:
-

Assignee: michael goulish

> http against https enabled listener causes segfault
> ---
>
> Key: DISPATCH-1280
> URL: https://issues.apache.org/jira/browse/DISPATCH-1280
> Project: Qpid Dispatch
>  Issue Type: Bug
>Reporter: Gordon Sim
>Assignee: michael goulish
>Priority: Major
>
> If you have a listener with http enabled, an ssl profile referenced, but 
> requireSsl set to false, and then try to access it over plain http, you get a 
> segfault in libwebsockets if using version 3.0.1-2. Downgrading to 2.4.2 of 
> libwebsockets fixes this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1215) several memory leaks in edge-router soak test

2018-12-07 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1215:
-

 Summary: several memory leaks in edge-router soak test
 Key: DISPATCH-1215
 URL: https://issues.apache.org/jira/browse/DISPATCH-1215
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish


Using recent master code trees (dispatch and proton)...

The test sets up a simple 3-linear router network, A-B-C, and attaches 100 edge 
routers to A. It then kills one edge router, replaces it, and repeats that 
kill-and-replace operation 50 times. (At which point I manually killed router 
A.)

Router A was running under valgrind, and produced the following output:
 
{color:#ff} {color}
{color:#ff}[mick@colossus ~]$ /usr/bin/valgrind --leak-check=full 
--show-leak-kinds=definite --trace-children=yes 
--suppressions=/home/mick/latest/qpid-dispatch/tests/valgrind.supp 
/home/mick/latest/install/dispatch/sbin/qdrouterd  --config 
/home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I 
/home/mick/latest/install/dispatch/lib/qpid-dispatch/python
==9409== Memcheck, a memory error detector
==9409== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9409== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==9409== Command: /home/mick/latest/install/dispatch/sbin/qdrouterd --config 
/home/mick/mercury/results/test_03/2018_12_06/config/A.conf -I 
/home/mick/latest/install/dispatch/lib/qpid-dispatch/python
==9409==
^C==9409==
==9409== Process terminating with default action of signal 2 (SIGINT)
==9409==    at 0x61C0A37: kill (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==    by 0x401636: main (main.c:367)
==9409==
==9409== HEAP SUMMARY:
==9409== in use at exit: 6,933,690 bytes in 41,903 blocks
==9409==   total heap usage: 669,024 allocs, 627,121 frees, 92,449,020 bytes 
allocated
==9409==
==9409== *8,640 (480 direct, 8,160 indirect) bytes in 20 blocks are definitely 
lost in loss record 4,229 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E7D336: qdr_error_from_pn (error.c:37)
==9409==    by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822)
==9409==    by 0x4E60A6C: close_links (container.c:298)
==9409==    by 0x4E6109F: close_handler (container.c:311)
==9409==    by 0x4E6109F: qd_container_handle_event (container.c:639)
==9409==    by 0x4E93971: handle (server.c:985)
==9409==    by 0x4E944C8: thread_run (server.c:1010)
==9409==    by 0x4E947CF: qd_server_run (server.c:1284)
==9409==    by 0x40186E: main_process (main.c:112)
==9409==    by 0x401636: main (main.c:367)
==9409==
==9409== *14,256 (792 direct, 13,464 indirect) bytes in 33 blocks are 
definitely lost in loss record 4,261 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E7D336: qdr_error_from_pn (error.c:37)
==9409==    by 0x4E905D7: AMQP_link_detach_handler (router_node.c:822)
==9409==    by 0x4E60A6C: close_links (container.c:298)
==9409==    by 0x4E6109F: close_handler (container.c:311)
==9409==    by 0x4E6109F: qd_container_handle_event (container.c:639)
==9409==    by 0x4E93971: handle (server.c:985)
==9409==    by 0x4E944C8: thread_run (server.c:1010)
{color}
{color:#ff}==9409==    by 0x550150A: start_thread (in 
/usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/]){color}
 {color:#ff}==9409==    by 0x628138E: clone (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==
==9409== *575,713 (24 direct, 575,689 indirect) bytes in 1 blocks are 
definitely lost in loss record 4,321 of 4,323*
==9409==    at 0x4C2CB6B: malloc (vg_replace_malloc.c:299)
==9409==    by 0x4E83FCA: qdr_add_link_ref (router_core.c:518)
==9409==    by 0x4E7A3BF: qdr_link_inbound_first_attach_CT (connections.c:1517)
==9409==    by 0x4E8484B: router_core_thread (router_core_thread.c:116)
==9409==    by 0x550150A: start_thread (in 
/usr/lib64/[libpthread-2.26.so|http://libpthread-2.26.so/])
==9409==    by 0x628138E: clone (in 
/usr/lib64/[libc-2.26.so|http://libc-2.26.so/])
==9409==
==9409== LEAK SUMMARY:
==9409==    definitely lost: 1,296 bytes in 54 blocks
==9409==    indirectly lost: 597,313 bytes in 3,096 blocks
==9409==  possibly lost: 1,473,248 bytes in 6,538 blocks
==9409==    still reachable: 4,861,833 bytes in 32,215 blocks
==9409== suppressed: 0 bytes in 0 blocks
==9409== Reachable blocks (those to which a pointer was found) are not shown.
==9409== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==9409==
==9409== For counts of detected and suppressed errors, rerun with: -v
==9409== ERROR SUMMARY: 1040 errors from 1040 contexts (suppressed: 0 from 0)
{color}
[mick@colossus ~]$
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1155) dueling httpRootDirs

2018-10-24 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1155:
-

 Summary: dueling httpRootDirs
 Key: DISPATCH-1155
 URL: https://issues.apache.org/jira/browse/DISPATCH-1155
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish
Assignee: michael goulish


New version of qpid-dispatch-router uses 
"/usr/share/qpid-dispatch/console/stand-alone" as the default httpRootDir. But 
when installing new qpid-dispatch-console package, the pages are available at 
"/usr/share/qpid-dispatch/console".

This forces the user to define httpRootDir on the listener to bypass this issue.

Ted suggests this fix:

 
Remove the default behavior for httpRootDir. If it is not specified in the 
configuration for a listener, then HTTP requests shall be rejected on 
connections to that listener. Such a listener would only be usable for AMQP 
over websockets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-959) Rate limiting policy

2018-10-18 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16655668#comment-16655668
 ] 

michael goulish commented on DISPATCH-959:
--

This is not a bug, it's a new feature.

> Rate limiting policy
> 
>
> Key: DISPATCH-959
> URL: https://issues.apache.org/jira/browse/DISPATCH-959
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Policy Engine, Routing Engine
>Affects Versions: 1.0.1
>Reporter: Chuck Rolke
>Priority: Major
> Fix For: Backlog
>
>
> Router administrators would like rate-limiting policies to allow different 
> classes of users. A network-rate limit similar to how home cable networks are 
> provisioned for bandwidth is a classic model and is being considered as the 
> first choice.
> A message-per-second limit might be easier to enforce. But a single user 
> message may have a large data section, or have a small data section but have 
> huge message annotations. Thus a user might consume a lot of network 
> bandwidth with only a few messages.
> It is still unclear at what level the rate limiting should be applied. 
> Choices are:
>  * Per vhost
>  * Per vhost connection
>  * Per vhost user



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (DISPATCH-1139) support prioritized addresses

2018-10-18 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed DISPATCH-1139.
-
Resolution: Implemented

> support prioritized addresses
> -
>
> Key: DISPATCH-1139
> URL: https://issues.apache.org/jira/browse/DISPATCH-1139
> Project: Qpid Dispatch
>  Issue Type: New Feature
>  Components: Router Node, Routing Engine, Tests
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Major
>
> Support a new field in the address descriptor in router configuration files 
> that will assign a priority to the address.
> Any message that does not have an intrinsic priority already assigned will 
> inherit the priority of the address to which it is sent.  If no priority is 
> explicitly assigned to an address, then it will be assigned the default 
> priority.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Resolved] (DISPATCH-1140) tests for message priority

2018-10-12 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-1140.
---
Resolution: Duplicate

Sorry – I should have just included this with DISPATCH-1139.

When I PR that one, it will have a test that looks at both message and address 
priority.

 

 

> tests for message priority
> --
>
> Key: DISPATCH-1140
> URL: https://issues.apache.org/jira/browse/DISPATCH-1140
> Project: Qpid Dispatch
>  Issue Type: New Feature
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Major
>
> The message priority code recently checked in ( in DISPATCH-1096 ) should 
> have at least the following two tests:
>  
>  # Make a two-router network, A and B. Send messages from A to B, confirm 
> that they arrive, then kill and restart B and send and confirm more messages. 
> Do this test  once with B connecting to A, and once with A connecting to B.
>  # Two-router network again. Send some messages from A to B (i.e. sender 
> attached to A, rcvr to B) – sending at least one message of each priority.   
> ( 0 - 9, inclusive ). Send management commands to A to see how many outgoing 
> inter-router links had message traffic go over them. The number should be 10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1140) tests for message priority

2018-10-05 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1140:
-

 Summary: tests for message priority
 Key: DISPATCH-1140
 URL: https://issues.apache.org/jira/browse/DISPATCH-1140
 Project: Qpid Dispatch
  Issue Type: New Feature
Reporter: michael goulish
Assignee: michael goulish


The message priority code recently checked in ( in DISPATCH-1096 ) should have 
at least the following two tests:

 
 # Make a two-router network, A and B. Send messages from A to B, confirm that 
they arrive, then kill and restart B and send and confirm more messages. Do 
this test  once with B connecting to A, and once with A connecting to B.
 # Two-router network again. Send some messages from A to B (i.e. sender 
attached to A, rcvr to B) – sending at least one message of each priority.   ( 
0 - 9, inclusive ). Send management commands to A to see how many outgoing 
inter-router links had message traffic go over them. The number should be 10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Resolved] (DISPATCH-1096) support AMQP prioritized messages

2018-10-05 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-1096.
---
Resolution: Implemented

I will open a separate Jira for tests that this code needs.

> support AMQP prioritized messages
> -
>
> Key: DISPATCH-1096
> URL: https://issues.apache.org/jira/browse/DISPATCH-1096
> Project: Qpid Dispatch
>  Issue Type: New Feature
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Major
> Fix For: 1.4.0
>
>
> Detect priority info from message header in the router code.
> Create separate inter-router links for the various priorities.
> Per connection (i.e. not globally across the router) service high-priority 
> inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (PROTON-1949) no message header if priority == default

2018-10-05 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-1949.
---
Resolution: Not A Problem

We have found a nice workaround for this–probably better, actually--and do not 
need proton to change anything.

 

> no message header if priority == default
> 
>
> Key: PROTON-1949
> URL: https://issues.apache.org/jira/browse/PROTON-1949
> Project: Qpid Proton
>  Issue Type: Bug
>Reporter: michael goulish
>Priority: Major
>
> Proton does not send a message header if there would be nothing in it but the 
> priority field, and if the priority was set to the default value (4). 
> At the router level, we are allowing the user to set priorities on addresses. 
> Those priorities will be given to any message sent to that address if the 
> message otherwise had no priority set.
> So - we need to be able to distinguish between messages that were assigned 
> the default priority, and messages in which the priority was left undefined.
> We would like proton to send the priority field in the message header if the 
> user sets any priority. Then we will be able to interpret no header, or no 
> priority field in the header as "no priority was assigned".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1949) no message header if priority == default

2018-10-05 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/PROTON-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640115#comment-16640115
 ] 

michael goulish commented on PROTON-1949:
-

Nolo contendere.

We have decided that it is better to give precedence to the address's priority, 
which means that we do not need an ability in the message to express _no value_.

I will close this as not-a-bug.

 

 

> no message header if priority == default
> 
>
> Key: PROTON-1949
> URL: https://issues.apache.org/jira/browse/PROTON-1949
> Project: Qpid Proton
>  Issue Type: Bug
>Reporter: michael goulish
>Priority: Major
>
> Proton does not send a message header if there would be nothing in it but the 
> priority field, and if the priority was set to the default value (4). 
> At the router level, we are allowing the user to set priorities on addresses. 
> Those priorities will be given to any message sent to that address if the 
> message otherwise had no priority set.
> So - we need to be able to distinguish between messages that were assigned 
> the default priority, and messages in which the priority was left undefined.
> We would like proton to send the priority field in the message header if the 
> user sets any priority. Then we will be able to interpret no header, or no 
> priority field in the header as "no priority was assigned".
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1139) support prioritized addresses

2018-10-04 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1139:
-

 Summary: support prioritized addresses
 Key: DISPATCH-1139
 URL: https://issues.apache.org/jira/browse/DISPATCH-1139
 Project: Qpid Dispatch
  Issue Type: New Feature
  Components: Router Node, Routing Engine, Tests
Reporter: michael goulish
Assignee: michael goulish


Support a new field in the address descriptor in router configuration files 
that will assign a priority to the address.

Any message that does not have an intrinsic priority already assigned will 
inherit the priority of the address to which it is sent.  If no priority is 
explicitly assigned to an address, then it will be assigned the default 
priority.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.

2018-10-04 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638896#comment-16638896
 ] 

michael goulish commented on DISPATCH-1126:
---

pending fix for this in PR 384

> ERROR Attempt to attach too many inter-router links for priority sheaf.
> ---
>
> Key: DISPATCH-1126
> URL: https://issues.apache.org/jira/browse/DISPATCH-1126
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.3.0
> Environment: Fedora 28
>  * Three router network in linear arrangement A - B - C.
>  * B has a listener; A and C connect to it
>  
>Reporter: Chuck Rolke
>Assignee: michael goulish
>Priority: Major
> Attachments: taj-GRN.log
>
>
> Some state probably not cleaned up when router connections are lost. 10 
> messages
>     (error) Attempt to attach too many inter-router links for priority sheaf.
> appear when routers reconnect.
> Start the network. Then kill routers A and C and restart them. Router B 
> prints the messages.
> Log file attached



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-1949) no message header if priority == default

2018-10-04 Thread michael goulish (JIRA)
michael goulish created PROTON-1949:
---

 Summary: no message header if priority == default
 Key: PROTON-1949
 URL: https://issues.apache.org/jira/browse/PROTON-1949
 Project: Qpid Proton
  Issue Type: Bug
Reporter: michael goulish


Proton does not send a message header if there would be nothing in it but the 
priority field, and if the priority was set to the default value (4). 

At the router level, we are allowing the user to set priorities on addresses. 
Those priorities will be given to any message sent to that address if the 
message otherwise had no priority set.

So - we need to be able to distinguish between messages that were assigned the 
default priority, and messages in which the priority was left undefined.

We would like proton to send the priority field in the message header if the 
user sets any priority. Then we will be able to interpret no header, or no 
priority field in the header as "no priority was assigned".

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1135) Router A leaks memory when router B killed and restarted.

2018-10-01 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1135:
-

 Summary: Router A leaks memory when router B killed and restarted.
 Key: DISPATCH-1135
 URL: https://issues.apache.org/jira/browse/DISPATCH-1135
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish


I set up a 2-node router network, with B connecting to A.

No clients.

Repeatedly killing and restarting B – giving 3 seconds after each kill and 
after each restart for the network to settle down.   Repeated 100 times. The 
same router A ran for the duration of the test.

The 'ps' program, run repeatedly on router A, indicated that it was leaking 
about 82 KB per kill-and-restart.  Using 'qdstat m' on A after each 
kill-and-restart showed the following difference between iteration 1 and 
iteration 100.  ( Note, this shows growth of only 44 KB per iteration. )

 

As far as I looked into the past (about 1 year) I saw similar behavior.

 

In the chart below, the first column "size" is the number of bytes in a single 
struct of that type.

"In-threads" means how many of each struct are currently being used.

 

Note that, although there are no clients, the routers will be sending some 
messages to each other.

 

 

{{type    size  in-threads   in-threads    item  byte}}
{{   test 1  test 100 growth    growth}}
{{ ==}}

{{qd_buffer_t   536   256   2944    2688   1440768}}
{{ qd_message_content_t 1056   128   1216    1088   1148928}}
{{ qd_iterator_t 160   448   7488        7040   1126400}}
{{ qd_parsed_field_t  88   256   2880    2624    230912}}
{{ qdr_delivery_t    248   256   1152         896    08}}
{{ qd_message_t  160   256   1088 832    133120}}
{{ qd_connection_t  2320    32 64  32 74240}}
{{ qdr_general_work_t 64    64    448 384 24576}}
{{ qdr_link_t    360   192    256  64 23040}}
{{ qd_bitmask_t   24   192   1088 896 21504}}
{{ qdr_connection_work_t  48    64    384 320     15360}}
{{ qdr_link_work_t    48    64    384 320 15360}}
{{ qd_link_t              96   128    256 128     12288}}
{{ qdr_link_ref_t 24    64            448         384  9216}}
{{ qd_parsed_turbo_t  64   128    256 128  8192}}
{{ qd_link_ref_t  24    64    256         192  4608}}
{{ qdr_error_t    24    64    256         192  4608}}
{{ qd_deferred_call_t 32    64    192 128  4096}}
{{ qdr_terminus_t 64   192            256  64  4096}}
{{ qdr_delivery_ref_t 24    64    128  64      1536}}

 

( All other structs have zero growth. (Or, in one case, less.) )

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-1126) ERROR Attempt to attach too many inter-router links for priority sheaf.

2018-09-24 Thread michael goulish (JIRA)


 [ 
https://issues.apache.org/jira/browse/DISPATCH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-1126:
-

Assignee: michael goulish

> ERROR Attempt to attach too many inter-router links for priority sheaf.
> ---
>
> Key: DISPATCH-1126
> URL: https://issues.apache.org/jira/browse/DISPATCH-1126
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 1.3.0
> Environment: Fedora 28
>  * Three router network in linear arrangement A - B - C.
>  * B has a listener; A and C connect to it
>  
>Reporter: Chuck Rolke
>Assignee: michael goulish
>Priority: Major
> Attachments: taj-GRN.log
>
>
> Some state probably not cleaned up when router connections are lost. 10 
> messages
>     (error) Attempt to attach too many inter-router links for priority sheaf.
> appear when routers reconnect.
> Start the network. Then kill routers A and C and restart them. Router B 
> prints the messages.
> Log file attached



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-1096) support AMQP prioritized messages

2018-09-19 Thread michael goulish (JIRA)


[ 
https://issues.apache.org/jira/browse/DISPATCH-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621117#comment-16621117
 ] 

michael goulish commented on DISPATCH-1096:
---

The priority code should make messages default to priority 4 when there is no 
priority in the header, or no header at all in the message.

The proton library leaves out the message header (well, makes it an empty list) 
if there would otherwise be nothing but a default priority value in there.

> support AMQP prioritized messages
> -
>
> Key: DISPATCH-1096
> URL: https://issues.apache.org/jira/browse/DISPATCH-1096
> Project: Qpid Dispatch
>  Issue Type: New Feature
>Reporter: michael goulish
>Assignee: michael goulish
>Priority: Major
> Fix For: 1.4.0
>
>
> Detect priority info from message header in the router code.
> Create separate inter-router links for the various priorities.
> Per connection (i.e. not globally across the router) service high-priority 
> inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-1096) support AMQP prioritized messages

2018-08-06 Thread michael goulish (JIRA)
michael goulish created DISPATCH-1096:
-

 Summary: support AMQP prioritized messages
 Key: DISPATCH-1096
 URL: https://issues.apache.org/jira/browse/DISPATCH-1096
 Project: Qpid Dispatch
  Issue Type: New Feature
Reporter: michael goulish
Assignee: michael goulish


Detect priority info from message header in the router code.

Create separate inter-router links for the various priorities.

Per connection (i.e. not globally across the router) service high-priority 
inter-router links before low priority links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-873) new routes calculated wrongly after connector deletion

2017-11-09 Thread michael goulish (JIRA)
michael goulish created DISPATCH-873:


 Summary: new routes calculated wrongly after connector deletion
 Key: DISPATCH-873
 URL: https://issues.apache.org/jira/browse/DISPATCH-873
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 1.0.0
Reporter: michael goulish
Priority: Blocker
 Fix For: 1.0.0


I have a 3-mesh network with nodes A, B, C.
B-->A cost is 10
C-->A cost is 10
B-->C cost is 100.

Initial route from B to C is calculated correctly as B,A,C : cost == 20.

But after I used qdmanage to delete the connector from B to A, I get no further 
messages delivered from B to C.
Using qdstat to look at routing table, it looks wrong:

Both B and C think they can only get to each other by going through A.  But 
there is now no route that way, because B-->A has been deleted.  They should be 
using the direct connection B-->C. Yet they both calculate the cost 
correctly as 100.



===
A  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors   valid-origins
A  (self)- 1  ['C']   []
B  C - 1110   ['A', 'C']  []
C  - 1 110['A', 'B']  ['B']
===
B  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors  valid-origins
B  (self)- 1  ['C']  []
C  A - 1100   [] []
===
C  
===
Routers in the Network
router-id  next-hop  link  ver  cost  neighbors   valid-origins
A  - 0 110['C']   []
B  A - 1100   ['A', 'C']  ['A']
C  (self)- 1  ['A', 'B']  []





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-870) connection improperly reopened from closed connector

2017-11-03 Thread michael goulish (JIRA)
michael goulish created DISPATCH-870:


 Summary: connection improperly reopened from closed connector
 Key: DISPATCH-870
 URL: https://issues.apache.org/jira/browse/DISPATCH-870
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 1.0.0
Reporter: michael goulish
Priority: Major


I have a 3-mesh router network, ABC, and I am sending messages from B to C.  
The route being used is B,A,C -- because I have configured it to be cheaper 
than B,C .

I use the management interface to kill the connector from C to A.  For the next 
two seconds my messages are released. I use another management call to confirm 
that the connector has really been removed. ( I also see it happening in the C 
code, at fn qd_connection_manager_delete_connector()  .   )

What We Expect: the network should re-route to start sending these messages on 
the route B,C -- because that is now the only route available.

What We Observe: after 2 seconds, the function try_open_lh() is called.  It 
reopens the connection from C to A even though the connector has been removed.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-04-11 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed PROTON-1408.
---
   Resolution: Fixed
Fix Version/s: 0.18.0

Fixed with checkin d22f124b0534983f6557850e48f13317ec6df0e5

> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Assignee: michael goulish
> Fix For: 0.18.0
>
> Attachments: jira_proton_1408_reproducer.tar.gz
>
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-15 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated PROTON-1408:

Attachment: jira_proton_1408_reproducer.tar.gz

Everything you need in a tidy little package.
I have 10 out of 10 reproductions with this.


> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Assignee: Alan Conway
> Attachments: jira_proton_1408_reproducer.tar.gz
>
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-15 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15926821#comment-15926821
 ] 

michael goulish commented on PROTON-1408:
-

I can now reproduce the problem 100%, and after just a couple minutes instead 
of 9 hours or 27 hours as it was initially.
This is done by:
  1. storing deliveries in the receiver and only acking when I get 100,000
  2. Altering proton code so that the first outgoing ID it uses is already 
close to 2^31 - 1

I am now packaging up all my stuff for the reproducer.


> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Assignee: Alan Conway
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-03-01 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890860#comment-15890860
 ] 

michael goulish commented on PROTON-1408:
-

Using proton and dispatch code from 17 Feb 2017, I am running 5 simultaneous 
tests on a large machine, each with 1 router, 2 senders, 2 receivers.
So far I have no reproduction of the slow-down.  All the senders have gone 
beyond 8 billion messages with no slowdown at all.
OS is RHEL 7.2 .





> long-lived connections suffer large performance hit after many messages
> ---
>
> Key: PROTON-1408
> URL: https://issues.apache.org/jira/browse/PROTON-1408
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Reporter: michael goulish
>Assignee: Alan Conway
>
> In long-running soak tests, in which connections are never taken down, I am 
> seeing a sudden & severe performance degradation when the number of messages 
> over the connection reaches about 6.4 billion.  
> This is happening in tests with two senders, two receivers & one router 
> intermediating.  
> I have tried C libUV clients as well as CPP clients.  Behavior is not 
> identical, but I see sudden performance drop, ie. 8x throughput decrease or 
> worse, in both cases.
> Alan / Ted / Ken see an issue in use of improper comparison logic in 
> pn_do_disposition(), in transport.c  . I am trying to prove this now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (PROTON-1408) long-lived connections suffer large performance hit after many messages

2017-02-17 Thread michael goulish (JIRA)
michael goulish created PROTON-1408:
---

 Summary: long-lived connections suffer large performance hit after 
many messages
 Key: PROTON-1408
 URL: https://issues.apache.org/jira/browse/PROTON-1408
 Project: Qpid Proton
  Issue Type: Bug
  Components: proton-c
Reporter: michael goulish


In long-running soak tests, in which connections are never taken down, I am 
seeing a sudden & severe performance degradation when the number of messages 
over the connection reaches about 6.4 billion.  

This is happening in tests with two senders, two receivers & one router 
intermediating.  

I have tried C libUV clients as well as CPP clients.  Behavior is not 
identical, but I see sudden performance drop, ie. 8x throughput decrease or 
worse, in both cases.

Alan / Ted / Ken see an issue in use of improper comparison logic in 
pn_do_disposition(), in transport.c  . I am trying to prove this now.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-372) qdstat should have a timeout command line argument

2016-06-08 Thread michael goulish (JIRA)
michael goulish created DISPATCH-372:


 Summary: qdstat should have a timeout command line argument
 Key: DISPATCH-372
 URL: https://issues.apache.org/jira/browse/DISPATCH-372
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish


qdstat should have a timeout command line argument.
but -- it doesn't.

sometimes when the router is busy, it is helpful to allow a longer timeout.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage

2016-06-08 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321466#comment-15321466
 ] 

michael goulish commented on DISPATCH-369:
--

...and without anything interesting showing up in the output from 'qdstat -m'.



> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-369) investigate excursions in memory usage

2016-06-08 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321448#comment-15321448
 ] 

michael goulish commented on DISPATCH-369:
--

I rebuilt dispatch without the memory pooling feature, expecting that this 
would make the memory blow-ups go away.  It did not!  On the 7th run of my 
test, I saw memory go from 60 MB  (Resident Set Size) to 480 MB between one 
printout of 'top' and the next.  (3 seconds)  -- same behavior I was seeing 
with memory pooling enabled.

> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Updated] (DISPATCH-369) investigate excursions in memory usage

2016-06-07 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated DISPATCH-369:
-
Attachment: n_senders_vs_MEM_three_trials.jpg

Results of repeating each test three times, showing occasional excursions in 
memory usage.



> investigate excursions in memory usage
> --
>
> Key: DISPATCH-369
> URL: https://issues.apache.org/jira/browse/DISPATCH-369
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Router Node
>Affects Versions: 0.6.0
>Reporter: michael goulish
>Assignee: michael goulish
> Attachments: n_senders_vs_MEM_three_trials.jpg
>
>
> I don't know if this is a bug or not.  I'm Jirifying it as a way of 
> remembering an interesting behavior that my testing has shown, so that I can 
> continue developing the testing and  come back to this later.
> ...
> While measuring router memory usage under varying message rate and number of 
> senders -- when I run the same test multiple times, I am occasionally (about 
> 1 in 4 times or so) seeing a test in which memory usage is much higher than 
> the others.
> For example:
>   In this test:
>   {
> straight-through topology ( 1 sender --> 1 address --> 1 receiver )
> 200 senders
> 200 messages per second
> 100 bytes per message
>   }
> I record router memory usage at the point when all receivers are just hitting 
> 10,000 messages.   (This is because it grows -- see previous JIRA.)
> In three iterations I get the following memory usage:
>66 MB
>63 MB
>   181 MB
> Something similar, but less drastic, happened occasionally at lower levels in 
> the test.  
> In this case, this is a tripling of memory usage for the same scenario.  I 
> doubt that this is the result of slightly  different timing in a block 
> allocation of data structures.  What just happened?
> Start by investigating with "qdstat -m"  and see if that shows some or all of 
> the difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-369) investigate excursions in memory usage

2016-06-07 Thread michael goulish (JIRA)
michael goulish created DISPATCH-369:


 Summary: investigate excursions in memory usage
 Key: DISPATCH-369
 URL: https://issues.apache.org/jira/browse/DISPATCH-369
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.6.0
Reporter: michael goulish
Assignee: michael goulish


I don't know if this is a bug or not.  I'm Jirifying it as a way of remembering 
an interesting behavior that my testing has shown, so that I can continue 
developing the testing and  come back to this later.

...


While measuring router memory usage under varying message rate and number of 
senders -- when I run the same test multiple times, I am occasionally (about 1 
in 4 times or so) seeing a test in which memory usage is much higher than the 
others.

For example:
  In this test:
  {
straight-through topology ( 1 sender --> 1 address --> 1 receiver )
200 senders
200 messages per second
100 bytes per message
  }

I record router memory usage at the point when all receivers are just hitting 
10,000 messages.   (This is because it grows -- see previous JIRA.)

In three iterations I get the following memory usage:

   66 MB
   63 MB
  181 MB

Something similar, but less drastic, happened occasionally at lower levels in 
the test.  

In this case, this is a tripling of memory usage for the same scenario.  I 
doubt that this is the result of slightly  different timing in a block 
allocation of data structures.  What just happened?

Start by investigating with "qdstat -m"  and see if that shows some or all of 
the difference.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-344) memory growth after repeated calls from qdstat -m

2016-05-24 Thread michael goulish (JIRA)
michael goulish created DISPATCH-344:


 Summary: memory growth after repeated calls from qdstat -m
 Key: DISPATCH-344
 URL: https://issues.apache.org/jira/browse/DISPATCH-344
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Routing Engine
Affects Versions: 0.6.0
Reporter: michael goulish


0. version of dispatch code is   0.6.0 RC3
1. bring up a router
2. do not attach any clients, except...
3. ...repeatedly invoke qdstat -m on the router 

result:

After 1000 calls from "qdstat -m", top shows that router memory has grown by 
4947968 bytes.  The output from "qdstat -m" accounts for about 63% of that, or 
318 bytes.

Here are the data types that increased, according to qdstat, ordered from 
largest to smallest.



Um.   This table looked really nice when it was in a fixed-width font.




  type   size   total total increase   increase
beforeafter structs bytes
  
=
  qd_log_entry_t 2104112  1040 928 1952512
  qd_buffer_t536  80  11201040  557440
  qd_field_iterator_t128 192  12801088  139264
  qdr_delivery_t 136  64   512 448   60928
  qdr_connection_t   216  64   320 256   55296
  qdr_field_t40  192  12801088   43520
  qd_connection_t224  64   256 192   43008
  qd_message_content_t   640  1680  64   40960
  qd_message_t   128 192   512 320   40960
  qdpn_connector_t   600  1664  48   28800
  qdr_general_work_t 64   64   512 448   28672
  qdr_connection_work_t  56   64   512 448   25088
  qd_composite_t 112  64   256 192   21504
  qdr_link_t 264  1680  64   16896
  qd_composed_field_t64   64   256 192   12288
  qdr_terminus_t 64   64   256 192   12288
  qdr_delivery_ref_t 24   64   512 448   10752
  qdr_link_ref_t 24   64   512 448   10752
  qd_parsed_field_t  80  128   256 128   10240
  qdr_action_t   160 256   320  64   10240
  qd_link_t  48   64   256 1929216
  qdr_error_t240   320 3207680
  qd_deferred_call_t 32   64   256 1926144


grand total increase from qdstat:318
grand total increase from top:   4947968



Here is the script I used
This input window is breaking some lines.   >:-(   


#! /bin/bash

echo "NOTE:  router should already be running."

INSTALL_ROOT=${SHACKLETON_ROOT}/install
PROTON_INSTALL_DIR=${INSTALL_ROOT}/proton
DISPATCH_INSTALL_DIR=${INSTALL_ROOT}/dispatch

QDSTAT=${DISPATCH_INSTALL_DIR}/bin/qdstat

export LD_LIBRARY_PATH=${DISPATCH_INSTALL_DIR}/lib64:${PROTON_INSTALL_DIR}/lib64
export 
PYTHONPATH=${DISPATCH_INSTALL_DIR}/lib/qpid-dispatch/python:${DISPATCH_INSTALL_DIR}/lib/python2.7/site-packages:${PROTON_INSTALL_DIR}/lib64/proton/bindings/python

ROUTER_PID=`ps -aef | grep qdrouterd | grep -v grep | awk '{print $2}'`

count=1
while [ $count -lt 1001 ]
do
  echo "==="
  echo "TEST $count"
  echo "==="
  count=$(( $count + 1 ))

  top -b -n 1 -p ${ROUTER_PID}

  ${QDSTAT} -m

  sleep 3
done




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (PROTON-992) Proton's use of Cyrus SASL is not thread-safe.

2016-04-29 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/PROTON-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15264603#comment-15264603
 ] 

michael goulish commented on PROTON-992:


Dispatch is not yet immune to this issue.
Also, I think Proton needs to let the application handle initialization and 
shutdown of Cyrus SASL.

I made a test that brings up a 6-router network, and randomly kills and 
restarts routers.
I get a router core, usually within 5 iterations, because of this issue.

Here is how I fixed it:

  1. Let dispatch code call sasl_client_init() and sasl_server_init()  at the 
top of qd_server_run().  And remove these calls from Proton.  In keeping these 
calls to itself, Proton cannot prevent two threads from simultaneously getting 
into sasl_*_init().  SegV City.

  2. Prevent proton from calling sasl_{client,server}_done(), in 
pni_sasl_impl_free().   Being thread-agnostic, Proton cannot possibly know when 
it's safe to dispose of the sasl object, which is being used by many threads.   
Both of those Cyrus calls affect global state by NULLing out a global pointer 
that stores the mechanisms string.

With these changes, my test has now run to 400 iterations with no crash.



> Proton's use of Cyrus SASL is not thread-safe.
> --
>
> Key: PROTON-992
> URL: https://issues.apache.org/jira/browse/PROTON-992
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-c
>Affects Versions: 0.10
>Reporter: michael goulish
>Assignee: Andrew Stitcher
>Priority: Critical
>
> Documentation for the Cyrus SASL library says that the library is believed to 
> be thread-safe only if the code that uses it meets several requirements.
> The requirements are:
> * you supply mutex functions (see sasl_set_mutex())
> * you make no libsasl calls until sasl_client/server_init() completes
> * no libsasl calls are made after sasl_done() is begun
> * when using GSSAPI, you use a thread-safe GSS / Kerberos 5 library.
> It says explicitly that that sasl_set* calls are not thread safe, since they 
> set global state.
> The proton library makes calls to sasl_set* functions in :
>   pni_init_client()
>   pni_init_server(), and
>   pni_process_init()
> Since those are internal functions, there is no way for code that uses Proton 
> to lock around those calls.
> I think proton needs a new API call to let applications call 
> sasl_set_mutex().  Or something.
> We probably also need other protections to meet the other requirements 
> specified in the Cyrus documentation (and quoted above).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-296) segfault on router startup

2016-04-27 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15260774#comment-15260774
 ] 

michael goulish commented on DISPATCH-296:
--

I have also seen this crash, with same frequency Gordon is describing.
In my case, I have a network of 6 routers.  I repeatedly kill one and replace 
it.
After a few such kills and restarts, I see this crash.

After instrumenting the Cyrus SASL code, I see a bad situation just before the 
crash:
two threads from same process both inside the Cyrus fn sasl_client_init()  
within a
few microseconds of each other.  

The Cyrus SASL code for the fn sasl_client_init() has a little logic to try and 
protect 
against multiple calls to the function -- but it will not work in a 
multi-threaded 
environment except by luck.

MDEBUG proton called sasl_client_init.  PID 28668 TID 7f1ac85a01c0  TIME 
1461781160.774368  <- different threads in same fn 7 usec apart  
MDEBUG proton called sasl_client_init.  PID 28668 TID 7f1abaca1700  TIME 
1461781160.774375  <- just before crash in sasl_dispose
MDEBUG proton calling sasl_dispose.   PID 28668 TID  7f1ac85a01c0  TIME 
1461781160.77
MDEBUG proton calling sasl_dispose.   PID 28668  TID 7f1abaca1700  TIME 
1461781160.774532




> segfault on router startup
> --
>
> Key: DISPATCH-296
> URL: https://issues.apache.org/jira/browse/DISPATCH-296
> Project: Qpid Dispatch
>  Issue Type: Bug
>  Components: Container
>Affects Versions: 0.6
>Reporter: Gordon Sim
> Attachments: multiconnect.conf
>
>
> Starting up a router with a couple of connectors (connectingto qpidd 
> instances in my case), the router occasionally (maybe one in five) crashes 
> with a segfault.
> {noformat}
> (gdb) bt
> #0  0x7629c76e in sasl_client_add_plugin () from /lib64/libsasl2.so.3
> #1  0x7629cf58 in sasl_client_init () from /lib64/libsasl2.so.3
> #2  0x7796ecff in pni_init_client 
> (transport=transport@entry=0x7fffdc008fc0) at 
> /home/gordon/projects/proton/proton-c/src/sasl/cyrus_sasl.c:115
> #3  0x7796e87e in pn_do_mechanisms (transport=0x7fffdc008fc0, 
> frame_type=, channel=, args=, 
> payload=)
> at /home/gordon/projects/proton/proton-c/src/sasl/sasl.c:703
> #4  0x77959b26 in pni_dispatch_action (payload=0x7fffe96f2360, 
> args=0x7fffdc0091c0, channel=0, frame_type=1 '\001', lcode=, 
> transport=0x7fffdc008fc0)
> at /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:74
> #5  pni_dispatch_frame (args=0x7fffdc0091c0, transport=0x7fffdc008fc0, 
> frame=...) at 
> /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:116
> #6  pn_dispatcher_input (transport=0x7fffdc008fc0, bytes=0x7fffdc00f358 "", 
> available=0, batch=false, halt=0x7fffdc009144) at 
> /home/gordon/projects/proton/proton-c/src/dispatcher/dispatcher.c:135
> #7  0x7795fbba in transport_consume 
> (transport=transport@entry=0x7fffdc008fc0) at 
> /home/gordon/projects/proton/proton-c/src/transport/transport.c:1751
> #8  0x779630d2 in pn_transport_process 
> (transport=transport@entry=0x7fffdc008fc0, size=) at 
> /home/gordon/projects/proton/proton-c/src/transport/transport.c:2860
> #9  0x77bb08e3 in qdpn_connector_process (c=0x7fffdc0068c0) at 
> /home/gordon/projects/dispatch/src/posix/driver.c:761
> #10 0x77bc3a91 in process_connector (cxtr=0x7fffdc0068c0, 
> qd_server=0x702b50) at /home/gordon/projects/dispatch/src/server.c:683
> #11 thread_run (arg=0x87b9b0) at 
> /home/gordon/projects/dispatch/src/server.c:958
> #12 0x7772660a in start_thread () from /lib64/libpthread.so.0
> #13 0x76c8ba4d in clone () from /lib64/libc.so.6
> {noformat}
> other threads:
> {noformat}
> (gdb) thread 1
> [Switching to thread 1 (Thread 0x77fd1180 (LWP 19319))]
> #0  0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0
> (gdb) bt
> #0  0x7772e89d in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x777289cd in pthread_mutex_lock () from /lib64/libpthread.so.0
> #2  0x77bb1239 in sys_mutex_lock (mutex=0x702da0) at 
> /home/gordon/projects/dispatch/src/posix/threading.c:70
> #3  0x77bc4723 in qd_timer (qd=qd@entry=0x604240, 
> cb=cb@entry=0x77bc11b0 , context=context@entry=0x702b50) at 
> /home/gordon/projects/dispatch/src/timer.c:89
> #4  0x77bc3f33 in qd_server_run (qd=0x604240) at 
> /home/gordon/projects/dispatch/src/server.c:1349
> #5  0x00401ac7 in main_process 
> (config_path=config_path@entry=0x7fffe090 
> "./etc/qpid-dispatch/multiconnect.conf", 
> python_pkgdir=python_pkgdir@entry=0x402468 
> "/home/gordon/projects/dispatch/installs/master/lib/qpid-dispatch/python", 
> fd=fd@entry=2) at /home/gordon/projects/dispatch/router/src/main.c:135
> #6  0x004017b7 in main (argc=3, 

[jira] [Created] (DISPATCH-210) try an epoll-based driver ...

2016-01-29 Thread michael goulish (JIRA)
michael goulish created DISPATCH-210:


 Summary: try an epoll-based driver ...
 Key: DISPATCH-210
 URL: https://issues.apache.org/jira/browse/DISPATCH-210
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish



...to improve scalability to large numbers of attached messaging apps.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-157) add sasl tests to dispatch unit tests

2015-08-26 Thread michael goulish (JIRA)
michael goulish created DISPATCH-157:


 Summary: add sasl tests to dispatch unit tests
 Key: DISPATCH-157
 URL: https://issues.apache.org/jira/browse/DISPATCH-157
 Project: Qpid Dispatch
  Issue Type: Improvement
  Components: Tests
Affects Versions: 0.5
Reporter: michael goulish
Assignee: michael goulish
 Fix For: 0.5


Add a complete set of sasl tests to the Dispatch unit test framework.
ensure correct behavior for cross-product of 

   authenticatePeer  := { no, yes, insecureOk }
  x
   saslMechanisms:= { NONE, PLAIN, DIGEST-MD5, CRAM-MD5, GSSAPI, SRP }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-139) adapt to proton changes to avoid crashes under high session count

2015-05-19 Thread michael goulish (JIRA)
michael goulish created DISPATCH-139:


 Summary: adapt to proton changes to avoid crashes under high 
session count
 Key: DISPATCH-139
 URL: https://issues.apache.org/jira/browse/DISPATCH-139
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish


With high session count some ( i.e.  2^15 ) I implemented some changes in 
proton library code to avoid crashing in the library.  ( that was PROTON-864 )
But that means that the library code will sometimes return 0 rather than 
crashing.

Alter dispatch code to Do the Right Thing in case of these new null return 
values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Assigned] (DISPATCH-139) adapt to proton changes to avoid crashes under high session count

2015-05-19 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish reassigned DISPATCH-139:


Assignee: michael goulish

 adapt to proton changes to avoid crashes under high session count
 -

 Key: DISPATCH-139
 URL: https://issues.apache.org/jira/browse/DISPATCH-139
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish

 With high session count some ( i.e.  2^15 ) I implemented some changes in 
 proton library code to avoid crashing in the library.  ( that was PROTON-864 )
 But that means that the library code will sometimes return 0 rather than 
 crashing.
 Alter dispatch code to Do the Right Thing in case of these new null return 
 values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-140) adapt to proton changes for large number of link-handles

2015-05-19 Thread michael goulish (JIRA)
michael goulish created DISPATCH-140:


 Summary: adapt to proton changes for large number of link-handles
 Key: DISPATCH-140
 URL: https://issues.apache.org/jira/browse/DISPATCH-140
 Project: Qpid Dispatch
  Issue Type: Improvement
Reporter: michael goulish
Assignee: michael goulish


For PROTON-886 I will be changing proton library code to honor handle-max when 
large numbers of links are created.   There will probably be instances of 
proton library fns returning null.

Make Dispatch changes to account for the proton changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-117) SEG Fault when outgoing SSL connections fail

2015-02-24 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334815#comment-14334815
 ] 

michael goulish commented on DISPATCH-117:
--

This checkin also fixed a rare crash I was seeing in my 'topologist' testing 
(killing and restarting routers) -- which happened even when SSL was not 
involved.

 SEG Fault when outgoing SSL connections fail
 

 Key: DISPATCH-117
 URL: https://issues.apache.org/jira/browse/DISPATCH-117
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Container
Affects Versions: 0.3
Reporter: Ted Ross
Assignee: Ted Ross
Priority: Critical
 Fix For: 0.4


 Hat tip: Ken Giusti for isolating this bug
 When using SSL for outgoing connectors, a crash may occur when the connection 
 fails.
 There is a race condition whereby a second thread can interfere with an 
 outgoing connector before the cxtr_try_open function has completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-113) expose NodeTracker::last_topology_change in management

2015-02-17 Thread michael goulish (JIRA)
michael goulish created DISPATCH-113:


 Summary: expose NodeTracker::last_topology_change in management 
 Key: DISPATCH-113
 URL: https://issues.apache.org/jira/browse/DISPATCH-113
 Project: Qpid Dispatch
  Issue Type: Improvement
  Components: Router Node
Affects Versions: 0.4
Reporter: michael goulish
Priority: Minor


NodeTracker is already keeping track of the last time it saw a topology change.
I would like to expose that number to management so I can read it from my 
testing program and directly measure how long it takes the network to settle 
down after a topological change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Resolved] (DISPATCH-106) pn link corruption after router restart

2015-02-05 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-106.
--
Resolution: Fixed

Committed revision 1657604.


 pn link corruption after router restart
 ---

 Key: DISPATCH-106
 URL: https://issues.apache.org/jira/browse/DISPATCH-106
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.3
Reporter: michael goulish
 Fix For: 0.4


 With the standard 6-node demo network,  (A-D, X, Y)  after killing and 
 restarting node Y, I see a bad link on router D -- which causes D to crash.
 Here is sequence of events from logs of routers and the topologist testing 
 program:
   01:05:05.367 Killing router Y, pid 20074
   01:05:05.367 Sleeping 30 seconds
   01:05:35.367 Restarting router Y, pid 20120
   01:05:38 Router D : last valid origins post to its log file :
Node QDR.C valid origins: []
   01:05:46 Router D posts to its log file:
Exited Router Flux Mode
   01:06:05.368 checking for crash after node bounce
( no crash detected )
   01:06:17 last post to router D log file
ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 
 ls_seq=2 mobile_seq=0)
   01:06:35.369 second check for crash. (none detected)
   01:06:35.370 getting topology
( Node D fails to respond.  PID 20072 )
( core file, timestamped 01:06 )
   here is backtrace from router D's core file
   {
 #0  pn_string_get (string=0xfdfdfdfdbabecafe) at 
 /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
 #1  0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at 
 /home/mick/dispatch/src/router_agent.c:112
 #2  0x7ff73fa8e7dd in qd_entity_refresh_router_link 
 (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
 at /home/mick/dispatch/src/router_agent.c:120
 #3  0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
 #4  0x003e408056bc in ffi_call () from /lib64/libffi.so.6
 #5  0x7ff737d2dc8b in _ctypes_callproc () from 
 /usr/lib64/python2.7/lib-dynload/_ctypes.so
 #6  0x7ff737d27a85 in PyCFuncPtr_call () from 
 /usr/lib64/python2.7/lib-dynload/_ctypes.so
 #7  0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #8  0x0036df4de37c in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #9  0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #10 0x0036df4e088f in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #12 0x0036df4e088f in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
 #19 0x0036df44a29e in PyObject_CallFunction () from 
 /lib64/libpython2.7.so.1.0
 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, 
 msg=0x7ff728019bd0, link_id=0
 at /home/mick/dispatch/src/python_embedded.c:519
 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, 
 link=0x7ff730008710, delivery=0x7ff73004cc50)
 at /home/mick/dispatch/src/router_node.c:922
 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at 
 /home/mick/dispatch/src/container.c:221
 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, 
 unused=0x1e0a050, qd_conn=0x1e2c6a0)
 at /home/mick/dispatch/src/container.c:362
 #24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, 
 conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
 qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438
 #25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, 
 cxtr=0x1e1b9b0)
 at /home/mick/dispatch/src/server.c:322
 #26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at 
 /home/mick/dispatch/src/server.c:546
 #27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
 ...
 }
   Let's go up to qd_router_link_name
   at /home/mick/dispatch/src/router_agent.c:112
   (gdb) print * link
 $1 =
 {
   prev = 0x7ff72800b210,
   next = 0x7ff72800b390,
   mask_bit = 3,
   link_type = QD_LINK_ROUTER,
   link_direction = QD_OUTGOING,
   owning_addr = 0x1d7d6c0,
   waypoint = 0x0,
   link = 

[jira] [Commented] (DISPATCH-106) pn link corruption after router restart

2015-02-03 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303225#comment-14303225
 ] 

michael goulish commented on DISPATCH-106:
--

In server.c, the function thread_run() has this code:

if (qdpn_connector_failed(cxtr))
qdpn_connector_close(cxtr);
else
work_done = process_connector(qd_server, cxtr);

By removing the else we got my test to go to 148 iterations before failing.  
And the crash is much different from what I have been seeing.
Before this change, the test almost always failed no later than iteration 3.  
So -- bug fixed.

why:

Because when the connector has failed, there are still some events on it that 
need to be processed.  When they get processed, the links associated with this 
connection get cleaned up properly.  If you don't do this final processing of 
events on the dead connector, the dispatch code will still have dead links 
sitting around pointing to some memory that will (usually) get freed by proton. 
 Boom.



 pn link corruption after router restart
 ---

 Key: DISPATCH-106
 URL: https://issues.apache.org/jira/browse/DISPATCH-106
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.3
Reporter: michael goulish
 Fix For: 0.4


 With the standard 6-node demo network,  (A-D, X, Y)  after killing and 
 restarting node Y, I see a bad link on router D -- which causes D to crash.
 Here is sequence of events from logs of routers and the topologist testing 
 program:
   01:05:05.367 Killing router Y, pid 20074
   01:05:05.367 Sleeping 30 seconds
   01:05:35.367 Restarting router Y, pid 20120
   01:05:38 Router D : last valid origins post to its log file :
Node QDR.C valid origins: []
   01:05:46 Router D posts to its log file:
Exited Router Flux Mode
   01:06:05.368 checking for crash after node bounce
( no crash detected )
   01:06:17 last post to router D log file
ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 
 ls_seq=2 mobile_seq=0)
   01:06:35.369 second check for crash. (none detected)
   01:06:35.370 getting topology
( Node D fails to respond.  PID 20072 )
( core file, timestamped 01:06 )
   here is backtrace from router D's core file
   {
 #0  pn_string_get (string=0xfdfdfdfdbabecafe) at 
 /home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
 #1  0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at 
 /home/mick/dispatch/src/router_agent.c:112
 #2  0x7ff73fa8e7dd in qd_entity_refresh_router_link 
 (entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
 at /home/mick/dispatch/src/router_agent.c:120
 #3  0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
 #4  0x003e408056bc in ffi_call () from /lib64/libffi.so.6
 #5  0x7ff737d2dc8b in _ctypes_callproc () from 
 /usr/lib64/python2.7/lib-dynload/_ctypes.so
 #6  0x7ff737d27a85 in PyCFuncPtr_call () from 
 /usr/lib64/python2.7/lib-dynload/_ctypes.so
 #7  0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #8  0x0036df4de37c in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #9  0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #10 0x0036df4e088f in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #11 0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #12 0x0036df4e088f in PyEval_EvalFrameEx () from 
 /lib64/libpython2.7.so.1.0
 #13 0x0036df4e21dd in PyEval_EvalCodeEx () from 
 /lib64/libpython2.7.so.1.0
 #14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
 #15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
 #17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
 #18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
 #19 0x0036df44a29e in PyObject_CallFunction () from 
 /lib64/libpython2.7.so.1.0
 #20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, 
 msg=0x7ff728019bd0, link_id=0
 at /home/mick/dispatch/src/python_embedded.c:519
 #21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, 
 link=0x7ff730008710, delivery=0x7ff73004cc50)
 at /home/mick/dispatch/src/router_node.c:922
 #22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at 
 /home/mick/dispatch/src/container.c:221
 #23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, 
 unused=0x1e0a050, qd_conn=0x1e2c6a0)
 at /home/mick/dispatch/src/container.c:362
 #24 0x7ff73fa80135 in 

[jira] [Created] (DISPATCH-106) pn link corruption after router restart

2015-01-26 Thread michael goulish (JIRA)
michael goulish created DISPATCH-106:


 Summary: pn link corruption after router restart
 Key: DISPATCH-106
 URL: https://issues.apache.org/jira/browse/DISPATCH-106
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.4
Reporter: michael goulish


With the standard 6-node demo network,  (A-D, X, Y)  after killing and 
restarting node Y, I see a bad link on router D -- which causes D to crash.


Here is sequence of events from logs of routers and the topologist testing 
program:

  01:05:05.367 Killing router Y, pid 20074


  01:05:05.367 Sleeping 30 seconds


  01:05:35.367 Restarting router Y, pid 20120


  01:05:38 Router D : last valid origins post to its log file :
   Node QDR.C valid origins: []


  01:05:46 Router D posts to its log file:
   Exited Router Flux Mode


  01:06:05.368 checking for crash after node bounce
   ( no crash detected )


  01:06:17 last post to router D log file
   ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 
ls_seq=2 mobile_seq=0)


  01:06:35.369 second check for crash. (none detected)


  01:06:35.370 getting topology
   ( Node D fails to respond.  PID 20072 )
   ( core file, timestamped 01:06 )




  here is backtrace from router D's core file
  {
#0  pn_string_get (string=0xfdfdfdfdbabecafe) at 
/home/mick/rh-qpid-proton/proton-c/src/object/string.c:120
#1  0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at 
/home/mick/dispatch/src/router_agent.c:112
#2  0x7ff73fa8e7dd in qd_entity_refresh_router_link 
(entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
at /home/mick/dispatch/src/router_agent.c:120
#3  0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6
#4  0x003e408056bc in ffi_call () from /lib64/libffi.so.6
#5  0x7ff737d2dc8b in _ctypes_callproc () from 
/usr/lib64/python2.7/lib-dynload/_ctypes.so
#6  0x7ff737d27a85 in PyCFuncPtr_call () from 
/usr/lib64/python2.7/lib-dynload/_ctypes.so
#7  0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#8  0x0036df4de37c in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0
#9  0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0
#10 0x0036df4e088f in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0
#11 0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0
#12 0x0036df4e088f in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0
#13 0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0
#14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0
#15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0
#17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0
#19 0x0036df44a29e in PyObject_CallFunction () from 
/lib64/libpython2.7.so.1.0
#20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, 
msg=0x7ff728019bd0, link_id=0)
at /home/mick/dispatch/src/python_embedded.c:519
#21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, 
link=0x7ff730008710, delivery=0x7ff73004cc50)
at /home/mick/dispatch/src/router_node.c:922
#22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at 
/home/mick/dispatch/src/container.c:221
#23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, 
unused=0x1e0a050, qd_conn=0x1e2c6a0)
at /home/mick/dispatch/src/container.c:362
#24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, 
conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438
#25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, 
cxtr=0x1e1b9b0)
at /home/mick/dispatch/src/server.c:322
#26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at 
/home/mick/dispatch/src/server.c:546
#27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
...
}



  Let's go up to qd_router_link_name
  at /home/mick/dispatch/src/router_agent.c:112

  (gdb) print * link
$1 =
{
  prev = 0x7ff72800b210,
  next = 0x7ff72800b390,
  mask_bit = 3,
  link_type = QD_LINK_ROUTER,
  link_direction = QD_OUTGOING,
  owning_addr = 0x1d7d6c0,
  waypoint = 0x0,
  link = 0x7ff7280099d0,
  connected_link = 0x0,
  ref = 0x7ff72800f350,
  target = 0x0,
  event_fifo =
  {
head = 0x0,
tail = 0x0,
scratch = 0x0,
size = 0
  },
  msg_fifo =
  {
head = 

[jira] [Updated] (DISPATCH-106) pn link corruption after router restart

2015-01-26 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish updated DISPATCH-106:
-
Description: 
With the standard 6-node demo network,  (A-D, X, Y)  after killing and 
restarting node Y, I see a bad link on router D -- which causes D to crash.


Here is sequence of events from logs of routers and the topologist testing 
program:

  01:05:05.367 Killing router Y, pid 20074


  01:05:05.367 Sleeping 30 seconds


  01:05:35.367 Restarting router Y, pid 20120


  01:05:38 Router D : last valid origins post to its log file :
   Node QDR.C valid origins: []


  01:05:46 Router D posts to its log file:
   Exited Router Flux Mode


  01:06:05.368 checking for crash after node bounce
   ( no crash detected )


  01:06:17 last post to router D log file
   ROUTER_LS (trace) RCVD: RA(id=QDR.X area=0 inst=1422165872 
ls_seq=2 mobile_seq=0)


  01:06:35.369 second check for crash. (none detected)


  01:06:35.370 getting topology
   ( Node D fails to respond.  PID 20072 )
   ( core file, timestamped 01:06 )




  here is backtrace from router D's core file
  {
#0  pn_string_get (string=0xfdfdfdfdbabecafe) at 
/home/mick/rh-qpid-proton/proton-c/src/object/string.c:120

#1  0x7ff73fa8e752 in qd_router_link_name (link=0x7ff72800b2d0) at 
/home/mick/dispatch/src/router_agent.c:112

#2  0x7ff73fa8e7dd in qd_entity_refresh_router_link 
(entity=0x7ff7300c9b50, impl=0x7ff72800b2d0)
at /home/mick/dispatch/src/router_agent.c:120

#3  0x003e40805d8c in ffi_call_unix64 () from /lib64/libffi.so.6

#4  0x003e408056bc in ffi_call () from /lib64/libffi.so.6

#5  0x7ff737d2dc8b in _ctypes_callproc () from 
/usr/lib64/python2.7/lib-dynload/_ctypes.so

#6  0x7ff737d27a85 in PyCFuncPtr_call () from 
/usr/lib64/python2.7/lib-dynload/_ctypes.so

#7  0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0

#8  0x0036df4de37c in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0

#9  0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0

#10 0x0036df4e088f in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0

#11 0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0

#12 0x0036df4e088f in PyEval_EvalFrameEx () from 
/lib64/libpython2.7.so.1.0

#13 0x0036df4e21dd in PyEval_EvalCodeEx () from 
/lib64/libpython2.7.so.1.0

#14 0x0036df46f0d8 in ?? () from /lib64/libpython2.7.so.1.0

#15 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#16 0x0036df4590c5 in ?? () from /lib64/libpython2.7.so.1.0

#17 0x0036df44a0d3 in PyObject_Call () from /lib64/libpython2.7.so.1.0

#18 0x0036df44a1b5 in ?? () from /lib64/libpython2.7.so.1.0

#19 0x0036df44a29e in PyObject_CallFunction () from 
/lib64/libpython2.7.so.1.0

#20 0x7ff73fa8d77f in qd_io_rx_handler (context=0x7ff736321e68, 
msg=0x7ff728019bd0, link_id=0
at /home/mick/dispatch/src/python_embedded.c:519

#21 0x7ff73fa92533 in router_rx_handler (context=0x1db5fd0, 
link=0x7ff730008710, delivery=0x7ff73004cc50)
at /home/mick/dispatch/src/router_node.c:922

#22 0x7ff73fa7fa16 in do_receive (pnd=0x1e359a0) at 
/home/mick/dispatch/src/container.c:221

#23 0x7ff73fa7fea3 in process_handler (container=0x1dbd6f0, 
unused=0x1e0a050, qd_conn=0x1e2c6a0)
at /home/mick/dispatch/src/container.c:362

#24 0x7ff73fa80135 in handler (handler_context=0x1dbd6f0, 
conn_context=0x1e0a050, event=QD_CONN_EVENT_PROCESS,
qd_conn=0x1e2c6a0) at /home/mick/dispatch/src/container.c:438

#25 0x7ff73fa98346 in process_connector (qd_server=0x1d78460, 
cxtr=0x1e1b9b0)
at /home/mick/dispatch/src/server.c:322

#26 0x7ff73fa98c1f in thread_run (arg=0x1d70d30) at 
/home/mick/dispatch/src/server.c:546

#27 0x003e3dc07ee5 in start_thread () from /lib64/libpthread.so.0
...
}



  Let's go up to qd_router_link_name
  at /home/mick/dispatch/src/router_agent.c:112

  (gdb) print * link
$1 =
{
  prev = 0x7ff72800b210,
  next = 0x7ff72800b390,
  mask_bit = 3,
  link_type = QD_LINK_ROUTER,
  link_direction = QD_OUTGOING,
  owning_addr = 0x1d7d6c0,
  waypoint = 0x0,
  link = 0x7ff7280099d0,
  connected_link = 0x0,
  ref = 0x7ff72800f350,
  target = 0x0,
  event_fifo =
  {
head = 0x0,
tail = 0x0,
scratch = 0x0,
size = 0
  },
  msg_fifo =
  {
head = 0x7ff73003c230,
tail = 0x7ff73003bb70,
scratch = 0x7ff73003b9f0,
size = 102
  }
}


  (gdb) print * (link-link)

[jira] [Resolved] (DISPATCH-64) slow or sporadic memory leak

2014-10-14 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/DISPATCH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish resolved DISPATCH-64.
-
Resolution: Fixed

this was traced to a problem in proton, which rafi fixed.

 slow or sporadic memory leak
 

 Key: DISPATCH-64
 URL: https://issues.apache.org/jira/browse/DISPATCH-64
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish

 In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 
 5 minutes.
 Test setup
 ===
   1. single router on one box
   2. 10 senders, 10 receivers on separate box.
   3. each client handles 100 unique addresses.
   4. while test is running, I run 'top' in a loop to see router memory usage 
 (resident set size).  I also run qdstat -m in a loop, to see router's 
 report on usage of various data structures.
   5. clients all have single connection for duration of test.
   6. clients start once at beginning of test and do not stop until end.  No 
 new clients are started after the beginning.
   7. no clients failed during the test.
   8. no new addresses were added after test startup.
 Observations
 =
 1. During a 64 minute period which started at least 15 minutes after the 
 beginning of the test,  memory usage (resident set size) as measured by 'top' 
 grew from 96 to 109 megabytes.
 2. Some of the data types reported by 'qdstat -m'  increased.  Here is the 
 list:  (using numbers from the 'total' column of qdstat report. )
 qd_connection_t832 --   896
 qd_hash_handle_t  1408 --  1600
 qd_hash_item_t1408 --  1600
 qd_link_t 1536 --  1664
 qd_log_entry_t1152 --  1216
 qd_message_content_t 10256 -- 10272
 qd_parsed_field_t  448 --  1024
 qd_router_link_ref_t  1408 --  1600
 qd_router_link_t  1536 --  1664
 3. The data structures that increased did *not* increase smoothly.  For 
 example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes 
 before increasing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-64) slow or sporadic memory leak

2014-07-24 Thread michael goulish (JIRA)
michael goulish created DISPATCH-64:
---

 Summary: slow or sporadic memory leak
 Key: DISPATCH-64
 URL: https://issues.apache.org/jira/browse/DISPATCH-64
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish


In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 5 
minutes.


Test setup
===
  1. single router on one box
  2. 10 senders, 10 receivers on separate box.
  3. each client handles 100 unique addresses.
  4. while test is running, I run 'top' in a loop to see router memory usage 
(resident set size).  I also run qdstat -m in a loop, to see router's report 
on usage of various data structures.
  5. clients all have single connection for duration of test.
  6. clients start once at beginning of test and do not stop until end.  No new 
clients are started after the beginning.
  7. no clients failed during the test.
  8. no new addresses were added after test startup.


Observations
=
1. During a 64 minute period which started at least 15 minutes after the 
beginning of the test,  memory usage (resident set size) as measured by 'top' 
grew from 96 to 109 megabytes.

2. Some of the data types reported by 'qdstat -m'  increased.  Here is the 
list:  (using numbers from the 'total' column of qdstat report. )

qd_connection_t832 --   896
qd_hash_handle_t  1408 --  1600
qd_hash_item_t1408 --  1600
qd_link_t 1536 --  1664
qd_log_entry_t1152 --  1216
qd_message_content_t 10256 -- 10272
qd_parsed_field_t  448 --  1024
qd_router_link_ref_t  1408 --  1600
qd_router_link_t  1536 --  1664


3. The data structures that increased did *not* increase smoothly.  For 
example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes 
before increasing.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (DISPATCH-64) slow or sporadic memory leak

2014-07-24 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/DISPATCH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14073643#comment-14073643
 ] 

michael goulish commented on DISPATCH-64:
-

Yes, RSS is increasing much more smoothly -- 1 MB every 4 or 5 minutes.
I have 16 threads in the router.

 slow or sporadic memory leak
 

 Key: DISPATCH-64
 URL: https://issues.apache.org/jira/browse/DISPATCH-64
 Project: Qpid Dispatch
  Issue Type: Bug
Reporter: michael goulish

 In long-term soak tests, I am seeing router mem grow by 1 megabyte every 4 or 
 5 minutes.
 Test setup
 ===
   1. single router on one box
   2. 10 senders, 10 receivers on separate box.
   3. each client handles 100 unique addresses.
   4. while test is running, I run 'top' in a loop to see router memory usage 
 (resident set size).  I also run qdstat -m in a loop, to see router's 
 report on usage of various data structures.
   5. clients all have single connection for duration of test.
   6. clients start once at beginning of test and do not stop until end.  No 
 new clients are started after the beginning.
   7. no clients failed during the test.
   8. no new addresses were added after test startup.
 Observations
 =
 1. During a 64 minute period which started at least 15 minutes after the 
 beginning of the test,  memory usage (resident set size) as measured by 'top' 
 grew from 96 to 109 megabytes.
 2. Some of the data types reported by 'qdstat -m'  increased.  Here is the 
 list:  (using numbers from the 'total' column of qdstat report. )
 qd_connection_t832 --   896
 qd_hash_handle_t  1408 --  1600
 qd_hash_item_t1408 --  1600
 qd_link_t 1536 --  1664
 qd_log_entry_t1152 --  1216
 qd_message_content_t 10256 -- 10272
 qd_parsed_field_t  448 --  1024
 qd_router_link_ref_t  1408 --  1600
 qd_router_link_t  1536 --  1664
 3. The data structures that increased did *not* increase smoothly.  For 
 example, qd_hash_handle_t and qd_hash_item_t remained constant for 6 minutes 
 before increasing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (QPID-5910) Throughput regression relative to 0.14

2014-07-22 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/QPID-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed QPID-5910.
-

Resolution: Fixed

fixed in rev 1612559

 Throughput regression relative to 0.14
 --

 Key: QPID-5910
 URL: https://issues.apache.org/jira/browse/QPID-5910
 Project: Qpid
  Issue Type: Bug
Affects Versions: 0.22
Reporter: michael goulish
Assignee: michael goulish
 Fix For: 0.29


 If you use qpid-latency-test, hold message size constant, and gradually 
 increase the sending rate (in several tests) you will sooner or later reach a 
 point at which the messaging system's ability to handle throughput saturates. 
  When that happens, latency will go sky-high.  (I have producer flow-control 
 turned off to be able to compare vs. older code.)
 The latest code reaches throughput saturation significantly earlier than 
 older code.  (i.e. at a lower sending rate.)
 Also, using 'perf' to help analyze the code, recent code is executing 
 significantly fewer instructions per second than older code.
 This probably indicates that come parts of the code are spending too much 
 time *while a lock is held* -- thus preventing other threads from fulfilling 
 their destiny, and having an effect on overall throughput.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Closed] (QPID-5734) message loss in qpid client

2014-04-30 Thread michael goulish (JIRA)

 [ 
https://issues.apache.org/jira/browse/QPID-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael goulish closed QPID-5734.
-

Resolution: Fixed

This is superseded by QPID-5737, which has been fixed.

 message loss in qpid client
 ---

 Key: QPID-5734
 URL: https://issues.apache.org/jira/browse/QPID-5734
 Project: Qpid
  Issue Type: Bug
Reporter: michael goulish

 using latest qpid code as of 25 Apr 2014.
 In my qpid-messaging client, I do not ask for unreliable link:
 std::string sender_address = x;
Sender sender = session.createSender ( sender_address );
 I call sender.send() 1000 times, each time to a different address.
 The call returns, apparently successful every time -- no throws or anything 
 --   but my receivers do not get all messages.
 The messages are going through a dispatch router -- but I have now 
 successfully traced the qpid-messaging sender, and I see that the missing 
 messages are simply never transferred out of the sender -- so they never get 
 to the router.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (QPID-5733) qpid-messaging client does not honor settle-without-accept

2014-04-29 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/QPID-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984443#comment-13984443
 ] 

michael goulish commented on QPID-5733:
---

forgot to add -- this is with latest qpid trunk as of morning (EDT) 25 Apr 2014

 qpid-messaging client does not honor settle-without-accept
 --

 Key: QPID-5733
 URL: https://issues.apache.org/jira/browse/QPID-5733
 Project: Qpid
  Issue Type: Bug
  Components: C++ Client
Reporter: michael goulish

 I have a qpid-messaging based sender, and a proton messenger base receiver, 
 with a dispatch router in the middle.
 In my sender, if I do this:   
   sender.send ( msg, 1 )
 the sender locks up immediately and hangs.
 With tracing, I see that it is getting back a disposition frame for this 
 message with settled=true -- but there is no explicit accept.  That's when it 
 locks up.
 If I alter my proton messenger based receiver to explicitly accept the 
 message, then the test runs to completion with no problem. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (QPID-5733) qpid-messaging client does not honor settle-without-accept

2014-04-29 Thread michael goulish (JIRA)
michael goulish created QPID-5733:
-

 Summary: qpid-messaging client does not honor settle-without-accept
 Key: QPID-5733
 URL: https://issues.apache.org/jira/browse/QPID-5733
 Project: Qpid
  Issue Type: Bug
  Components: C++ Client
Reporter: michael goulish



I have a qpid-messaging based sender, and a proton messenger base receiver, 
with a dispatch router in the middle.

In my sender, if I do this:   
  sender.send ( msg, 1 )
the sender locks up immediately and hangs.

With tracing, I see that it is getting back a disposition frame for this 
message with settled=true -- but there is no explicit accept.  That's when it 
locks up.

If I alter my proton messenger based receiver to explicitly accept the message, 
then the test runs to completion with no problem. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (QPID-5734) message loss in qpid client

2014-04-29 Thread michael goulish (JIRA)
michael goulish created QPID-5734:
-

 Summary: message loss in qpid client
 Key: QPID-5734
 URL: https://issues.apache.org/jira/browse/QPID-5734
 Project: Qpid
  Issue Type: Bug
Reporter: michael goulish


using latest qpid code as of 25 Apr 2014.

In my qpid-messaging client, I do not ask for unreliable link:

std::string sender_address = x;
   Sender sender = session.createSender ( sender_address );

I call sender.send() 1000 times, each time to a different address.
The call returns, apparently successful every time -- no throws or anything --  
 but my receivers do not get all messages.

The messages are going through a dispatch router -- but I have now successfully 
traced the qpid-messaging sender, and I see that the missing messages are 
simply never transferred out of the sender -- so they never get to the router.





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Commented] (QPID-5734) message loss in qpid client

2014-04-29 Thread michael goulish (JIRA)

[ 
https://issues.apache.org/jira/browse/QPID-5734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984606#comment-13984606
 ] 

michael goulish commented on QPID-5734:
---

The receivers are proton messenger based.
I can reproduce this behavior whether or not the senders explicitly accept the 
messages.


 message loss in qpid client
 ---

 Key: QPID-5734
 URL: https://issues.apache.org/jira/browse/QPID-5734
 Project: Qpid
  Issue Type: Bug
Reporter: michael goulish

 using latest qpid code as of 25 Apr 2014.
 In my qpid-messaging client, I do not ask for unreliable link:
 std::string sender_address = x;
Sender sender = session.createSender ( sender_address );
 I call sender.send() 1000 times, each time to a different address.
 The call returns, apparently successful every time -- no throws or anything 
 --   but my receivers do not get all messages.
 The messages are going through a dispatch router -- but I have now 
 successfully traced the qpid-messaging sender, and I see that the missing 
 messages are simply never transferred out of the sender -- so they never get 
 to the router.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-45) starting clients too rapidly causes connection failures

2014-04-16 Thread michael goulish (JIRA)
michael goulish created DISPATCH-45:
---

 Summary: starting clients too rapidly causes connection failures
 Key: DISPATCH-45
 URL: https://issues.apache.org/jira/browse/DISPATCH-45
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.2
Reporter: michael goulish


I don't know if this should be a code change, or an extra warning issued by the 
router, or just a Note To Users of some kind, but I'm putting it here so as not 
to lose track of it.

If I start too many clients too rapidly, all trying to connect to the same 
router, some of them will fail.  My clients are very simple, not attempting any 
retries.  

When this shows up, it's looks like an error in the client, and users will 
probably hunt around for the cause.  It can be avoided by simply putting 
occasional pauses in my client-launching script.  Looks like some kind of 
backlog problem.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



[jira] [Created] (DISPATCH-32) Undeliverable messages should get released.

2014-02-28 Thread michael goulish (JIRA)
michael goulish created DISPATCH-32:
---

 Summary: Undeliverable messages should get released.
 Key: DISPATCH-32
 URL: https://issues.apache.org/jira/browse/DISPATCH-32
 Project: Qpid Dispatch
  Issue Type: Bug
  Components: Router Node
Affects Versions: 0.2
 Environment: cold, snowy.
Reporter: michael goulish


I have a test in which I make a 6-router network, then repeatedly kill and 
restart nodes.

To determine when the network is ready to rock, I send messages to each node 
that I expect to find in the network.  All messages are sent through the one 
node that I am connected to.

At first, some of those messages are undeliverable.  This is expected, since I 
just deliberately messed up the network.

the problem is that, for those undeliverables, I never get back any kind of 
disposition.  for the good ones, i get 'settled'.  for the undeliverable ones, 
i get nothing.

this means that i cannot close my session.
if i created the sender on it this way:
   Sender sender = session.createSender(mgmt);
then it will not close.

I can work around the problem by creating the sender this way:
  Sender sender = session.createSender(mgmt; {link:{reliability:unreliable}});

...but we should still get back dispos for all messages.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org



  1   2   >