Re: Testing failover on dispatcher/java-broker cluster

2016-09-30 Thread Adel Boutros
Hello Ted,


I confirm all my tests are GREEN at head of 0.6.x branch.


For reference:

Qpid Java Broker: 6.0.4

Qpid Proton: 0.12.2

Compiler: gcc 4.9.1

OS: Linux Red Hat


Regards,

Adel


From: Adel Boutros <adelbout...@live.com>
Sent: Friday, September 30, 2016 3:07:56 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Great!


I have synched your changes and we will run my tests.

I will get back to you with the results as soon as possible.


Regards,

Adel


From: Ted Ross <tr...@redhat.com>
Sent: Friday, September 30, 2016 2:39:51 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Done.  I've pushed the four cherry-picked commits to the 0.6.x branch if
you'd like to give it a go.

-Ted

On 09/30/2016 05:47 AM, Adel Boutros wrote:
> Hello Ted,
>
>
> Following discussions here 
> (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html),
>  can DISPATCH-500 be included in the minor release?
>
>
> PS: It still hasn't solved my below issue but I will continue the analysis on 
> the other thread
>
>
> Regards,
>
> Adel
>
> Apache Qpid users - [Dispatch router 0.6.1] Configuration 
> bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html>
> qpid.2158936.n2.nabble.com
> [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my 
> previous thread, I am having some issues with the dispatch router. I will 
> start with the first one here: It seems the...
>
>
> 
> From: Adel Boutros <adelbout...@live.com>
> Sent: Thursday, September 29, 2016 5:01:45 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
> I would expect what you have described however it doesn't seem to be the case.
>
>
> delete/recreate mobile address:
>
> qdmanage -b amqp://localhost:10501 delete --type=address --name 
> haProxy.queue.addr
> qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue 
> waypoint=true name=haProxy.queue.addr
>
> The stats remain at a positive value (10 10). If I restart the dispatchers 
> without the inter-router connection, I don't have the issue.
>
> Router Addresses
>   class   addr phs  distribin-proc  local  
> remote  cntnr  in  out  thru  to-proc  from-proc
>   
> ==
>   mobile  haProxy.queue  1balanced   0   0  0 
>  000  0 0   0
>   mobile  haProxy.queue  0balanced   0   1  0 
>  0   10  10 00        0
>
>
> Adel
>
> ____
> From: Ted Ross <tr...@redhat.com>
> Sent: Thursday, September 29, 2016 4:55 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
>
>
> On 09/29/2016 10:47 AM, Adel Boutros wrote:
>> They seem fair enough and quite related.
>>
>>
>> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
>> submitted it yet because I haven't reduced the test case yet.
>>
>> In resume, when I connect 2 dispatchers (inter-router) and then delete the 
>> connector/listener of "inter-router". If I delete and recreate a mobile 
>> address which has received a message on one of the dispatchers, the stats of 
>> the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain 
>> at the old values. However they reset correctly on the other router.
>
> What exactly do you mean by "delete and recreate a mobile address"?
>
> If an address is removed from the table, the next time it appears, a new
> record will be created for that address.  The new record will have
> zeroed statistics.  What behavior are you expecting?
>
>>
>>
>> Have you encountered something similar? Once I have a reduced test case, I 
>> will post it in a different thread of course.
>>
>>
>> Regards,
>>
>> Adel
>>
>> 
>> From: Ted Ross <tr...@redhat.com>
>> Sent: Thursday, September 29, 2016 4:38:26 PM
>> To: users@qpid.apache.org
>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>
>> Sorry, those Jira numbers and descriptions are mismatched.  Here's the
>> correct list:
>>
>> - DISPATCH-496 - Activation of an autolink does not result i

Re: Testing failover on dispatcher/java-broker cluster

2016-09-30 Thread Adel Boutros
Great!


I have synched your changes and we will run my tests.

I will get back to you with the results as soon as possible.


Regards,

Adel


From: Ted Ross <tr...@redhat.com>
Sent: Friday, September 30, 2016 2:39:51 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Done.  I've pushed the four cherry-picked commits to the 0.6.x branch if
you'd like to give it a go.

-Ted

On 09/30/2016 05:47 AM, Adel Boutros wrote:
> Hello Ted,
>
>
> Following discussions here 
> (http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html),
>  can DISPATCH-500 be included in the minor release?
>
>
> PS: It still hasn't solved my below issue but I will continue the analysis on 
> the other thread
>
>
> Regards,
>
> Adel
>
> Apache Qpid users - [Dispatch router 0.6.1] Configuration 
> bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html>
> qpid.2158936.n2.nabble.com
> [Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my 
> previous thread, I am having some issues with the dispatch router. I will 
> start with the first one here: It seems the...
>
>
> 
> From: Adel Boutros <adelbout...@live.com>
> Sent: Thursday, September 29, 2016 5:01:45 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
> I would expect what you have described however it doesn't seem to be the case.
>
>
> delete/recreate mobile address:
>
> qdmanage -b amqp://localhost:10501 delete --type=address --name 
> haProxy.queue.addr
> qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue 
> waypoint=true name=haProxy.queue.addr
>
> The stats remain at a positive value (10 10). If I restart the dispatchers 
> without the inter-router connection, I don't have the issue.
>
> Router Addresses
>   class   addr phs  distribin-proc  local  
> remote  cntnr  in  out  thru  to-proc  from-proc
>   
> ==
>   mobile  haProxy.queue  1balanced   0   0  0 
>  000  0 0   0
>   mobile  haProxy.queue  0balanced   0   1  0 
>  0   10  10 00    0
>
>
> Adel
>
> ____________
> From: Ted Ross <tr...@redhat.com>
> Sent: Thursday, September 29, 2016 4:55 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
>
>
> On 09/29/2016 10:47 AM, Adel Boutros wrote:
>> They seem fair enough and quite related.
>>
>>
>> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
>> submitted it yet because I haven't reduced the test case yet.
>>
>> In resume, when I connect 2 dispatchers (inter-router) and then delete the 
>> connector/listener of "inter-router". If I delete and recreate a mobile 
>> address which has received a message on one of the dispatchers, the stats of 
>> the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain 
>> at the old values. However they reset correctly on the other router.
>
> What exactly do you mean by "delete and recreate a mobile address"?
>
> If an address is removed from the table, the next time it appears, a new
> record will be created for that address.  The new record will have
> zeroed statistics.  What behavior are you expecting?
>
>>
>>
>> Have you encountered something similar? Once I have a reduced test case, I 
>> will post it in a different thread of course.
>>
>>
>> Regards,
>>
>> Adel
>>
>> 
>> From: Ted Ross <tr...@redhat.com>
>> Sent: Thursday, September 29, 2016 4:38:26 PM
>> To: users@qpid.apache.org
>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>
>> Sorry, those Jira numbers and descriptions are mismatched.  Here's the
>> correct list:
>>
>> - DISPATCH-496 - Activation of an autolink does not result in issuing
>>  credit to a blocked sender
>> - DISPATCH-505 - Eventual loss of credit on inter-router control
>>  links when the topology changes
>> - DISPATCH-523 - Topology changes can cause in-flight deliveries to
>>  be stuck in the ingress router
>>
>>
>> On 09/29/2016 10:35 AM

Re: Testing failover on dispatcher/java-broker cluster

2016-09-30 Thread Ted Ross
Done.  I've pushed the four cherry-picked commits to the 0.6.x branch if 
you'd like to give it a go.


-Ted

On 09/30/2016 05:47 AM, Adel Boutros wrote:

Hello Ted,


Following discussions here 
(http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html),
 can DISPATCH-500 be included in the minor release?


PS: It still hasn't solved my below issue but I will continue the analysis on 
the other thread


Regards,

Adel

Apache Qpid users - [Dispatch router 0.6.1] Configuration 
bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html>
qpid.2158936.n2.nabble.com
[Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my 
previous thread, I am having some issues with the dispatch router. I will start 
with the first one here: It seems the...



From: Adel Boutros <adelbout...@live.com>
Sent: Thursday, September 29, 2016 5:01:45 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

I would expect what you have described however it doesn't seem to be the case.


delete/recreate mobile address:

qdmanage -b amqp://localhost:10501 delete --type=address --name 
haProxy.queue.addr
qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue 
waypoint=true name=haProxy.queue.addr

The stats remain at a positive value (10 10). If I restart the dispatchers 
without the inter-router connection, I don't have the issue.

Router Addresses
  class   addr phs  distribin-proc  local  
remote  cntnr  in  out  thru  to-proc  from-proc
  
==
  mobile  haProxy.queue  1balanced   0   0  0   
   000  0 0   0
  mobile  haProxy.queue  0balanced   0   1  0   
   0   10  10 000


Adel


From: Ted Ross <tr...@redhat.com>
Sent: Thursday, September 29, 2016 4:55 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster



On 09/29/2016 10:47 AM, Adel Boutros wrote:

They seem fair enough and quite related.


As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
submitted it yet because I haven't reduced the test case yet.

In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener of 
"inter-router". If I delete and recreate a mobile address which has received a message on one of the 
dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat -a" but 
they remain at the old values. However they reset correctly on the other router.


What exactly do you mean by "delete and recreate a mobile address"?

If an address is removed from the table, the next time it appears, a new
record will be created for that address.  The new record will have
zeroed statistics.  What behavior are you expecting?




Have you encountered something similar? Once I have a reduced test case, I will 
post it in a different thread of course.


Regards,

Adel


From: Ted Ross <tr...@redhat.com>
Sent: Thursday, September 29, 2016 4:38:26 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Sorry, those Jira numbers and descriptions are mismatched.  Here's the
correct list:

- DISPATCH-496 - Activation of an autolink does not result in issuing
 credit to a blocked sender
- DISPATCH-505 - Eventual loss of credit on inter-router control
 links when the topology changes
- DISPATCH-523 - Topology changes can cause in-flight deliveries to
 be stuck in the ingress router


On 09/29/2016 10:35 AM, Ted Ross wrote:


On 09/24/2016 05:32 AM, Adel Boutros wrote:

We are indeed in favor of a minor release as long as the latest
version is still 0.6.x and we are willing to re-launch our tests and
give feedback on the release candidate once provided (It shouldn't
take us more than a day to compile and test).
Do you have a list of fixes in mind?


I've identified three fixes that look like good candidates for 0.6.2:

  - DISPATCH-496 - Topology changes can cause in-flight deliveries to
   be stuck in the ingress router
  - DISPATCH-505 - Eventual loss of credit on inter-router control
   links when the topology changes
  - DISPATCH-523 - Activation of an autolink does not result in issuing
   credit to a blocked sender

These are all stability-related issues.

Thoughts?

-Ted


Regards,Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 23 Sep 2016 17:23:57 -0400

Hi Adel,

A minor release is 

Re: Testing failover on dispatcher/java-broker cluster

2016-09-30 Thread Adel Boutros
Hello Ted,


Following discussions here 
(http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html),
 can DISPATCH-500 be included in the minor release?


PS: It still hasn't solved my below issue but I will continue the analysis on 
the other thread


Regards,

Adel

Apache Qpid users - [Dispatch router 0.6.1] Configuration 
bugs<http://qpid.2158936.n2.nabble.com/Dispatch-router-0-6-1-Configuration-bugs-td7651334.html>
qpid.2158936.n2.nabble.com
[Dispatch router 0.6.1] Configuration bugs. Hello, As a follow up to my 
previous thread, I am having some issues with the dispatch router. I will start 
with the first one here: It seems the...



From: Adel Boutros <adelbout...@live.com>
Sent: Thursday, September 29, 2016 5:01:45 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

I would expect what you have described however it doesn't seem to be the case.


delete/recreate mobile address:

qdmanage -b amqp://localhost:10501 delete --type=address --name 
haProxy.queue.addr
qdmanage -b amqp://localhost:10501 create --type=address prefix=haProxy.queue 
waypoint=true name=haProxy.queue.addr

The stats remain at a positive value (10 10). If I restart the dispatchers 
without the inter-router connection, I don't have the issue.

Router Addresses
  class   addr phs  distribin-proc  local  
remote  cntnr  in  out  thru  to-proc  from-proc
  
==
  mobile  haProxy.queue  1balanced   0   0  0   
   000  0 0   0
  mobile  haProxy.queue  0balanced   0   1  0   
   0   10  10 000


Adel


From: Ted Ross <tr...@redhat.com>
Sent: Thursday, September 29, 2016 4:55 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster



On 09/29/2016 10:47 AM, Adel Boutros wrote:
> They seem fair enough and quite related.
>
>
> As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
> submitted it yet because I haven't reduced the test case yet.
>
> In resume, when I connect 2 dispatchers (inter-router) and then delete the 
> connector/listener of "inter-router". If I delete and recreate a mobile 
> address which has received a message on one of the dispatchers, the stats of 
> the "in" and "out" do not reset to 0 when doing "qdstat -a" but they remain 
> at the old values. However they reset correctly on the other router.

What exactly do you mean by "delete and recreate a mobile address"?

If an address is removed from the table, the next time it appears, a new
record will be created for that address.  The new record will have
zeroed statistics.  What behavior are you expecting?

>
>
> Have you encountered something similar? Once I have a reduced test case, I 
> will post it in a different thread of course.
>
>
> Regards,
>
> Adel
>
> ________
> From: Ted Ross <tr...@redhat.com>
> Sent: Thursday, September 29, 2016 4:38:26 PM
> To: users@qpid.apache.org
> Subject: Re: Testing failover on dispatcher/java-broker cluster
>
> Sorry, those Jira numbers and descriptions are mismatched.  Here's the
> correct list:
>
> - DISPATCH-496 - Activation of an autolink does not result in issuing
>  credit to a blocked sender
> - DISPATCH-505 - Eventual loss of credit on inter-router control
>  links when the topology changes
> - DISPATCH-523 - Topology changes can cause in-flight deliveries to
>  be stuck in the ingress router
>
>
> On 09/29/2016 10:35 AM, Ted Ross wrote:
>>
>> On 09/24/2016 05:32 AM, Adel Boutros wrote:
>>> We are indeed in favor of a minor release as long as the latest
>>> version is still 0.6.x and we are willing to re-launch our tests and
>>> give feedback on the release candidate once provided (It shouldn't
>>> take us more than a day to compile and test).
>>> Do you have a list of fixes in mind?
>>
>> I've identified three fixes that look like good candidates for 0.6.2:
>>
>>   - DISPATCH-496 - Topology changes can cause in-flight deliveries to
>>be stuck in the ingress router
>>   - DISPATCH-505 - Eventual loss of credit on inter-router control
>>        links when the topology changes
>>   - DISPATCH-523 - Activation of an autolink does not result in issuing
>>credit to a blocked sender
>>
>> These are all stability-related issues.
>>
>> 

Re: Testing failover on dispatcher/java-broker cluster

2016-09-29 Thread Ted Ross



On 09/29/2016 10:47 AM, Adel Boutros wrote:

They seem fair enough and quite related.


As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
submitted it yet because I haven't reduced the test case yet.

In resume, when I connect 2 dispatchers (inter-router) and then delete the connector/listener of 
"inter-router". If I delete and recreate a mobile address which has received a message on one of the 
dispatchers, the stats of the "in" and "out" do not reset to 0 when doing "qdstat -a" but 
they remain at the old values. However they reset correctly on the other router.


What exactly do you mean by "delete and recreate a mobile address"?

If an address is removed from the table, the next time it appears, a new 
record will be created for that address.  The new record will have 
zeroed statistics.  What behavior are you expecting?





Have you encountered something similar? Once I have a reduced test case, I will 
post it in a different thread of course.


Regards,

Adel


From: Ted Ross <tr...@redhat.com>
Sent: Thursday, September 29, 2016 4:38:26 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Sorry, those Jira numbers and descriptions are mismatched.  Here's the
correct list:

- DISPATCH-496 - Activation of an autolink does not result in issuing
 credit to a blocked sender
- DISPATCH-505 - Eventual loss of credit on inter-router control
 links when the topology changes
- DISPATCH-523 - Topology changes can cause in-flight deliveries to
 be stuck in the ingress router


On 09/29/2016 10:35 AM, Ted Ross wrote:


On 09/24/2016 05:32 AM, Adel Boutros wrote:

We are indeed in favor of a minor release as long as the latest
version is still 0.6.x and we are willing to re-launch our tests and
give feedback on the release candidate once provided (It shouldn't
take us more than a day to compile and test).
Do you have a list of fixes in mind?


I've identified three fixes that look like good candidates for 0.6.2:

  - DISPATCH-496 - Topology changes can cause in-flight deliveries to
   be stuck in the ingress router
  - DISPATCH-505 - Eventual loss of credit on inter-router control
   links when the topology changes
  - DISPATCH-523 - Activation of an autolink does not result in issuing
   credit to a blocked sender

These are all stability-related issues.

Thoughts?

-Ted


Regards,Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 23 Sep 2016 17:23:57 -0400

Hi Adel,

A minor release is always possible.  It's up to us, the community, to
decide whether and when to produce one.  I'm in favor of releasing an
0.6.2 with some small backports to fix bugs for users that want to stay
on Proton 0.12.

-Ted

On 09/23/2016 09:44 AM, Adel Boutros wrote:

Hello Ted,
Did you happen to have the time to check if a minor release is
possible?
Regards,Adel


From: adelbout...@live.com
To: users@qpid.apache.org
Subject: RE: Testing failover on dispatcher/java-broker cluster
Date: Tue, 20 Sep 2016 15:13:03 +0200

Hello Ted,

I confirm the fix solved the issue.

Would it be possible to do a 0.6.2 release? We cannot compile newer
versions of Proton (We currently use 0.12.2) due to lack of
resources from our side and we really need this fix for our tests.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Mon, 19 Sep 2016 12:18:23 -0400

Hi Adel,

It's a one-liner and it applies cleanly to the 0.6.x branch.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407

-Ted


On 09/19/2016 11:41 AM, Adel Boutros wrote:

Hello Ted,

Antoine is on vacation so I will be taking over this task.

Does this fix have any dependencies? We would like to apply it on
0.6.1 without other fixes because it seems the master branch
requires proton 0.13.0 minimum whereas we have currently 0.12.2
and we cannot upgrade at the time being.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 16 Sep 2016 16:53:05 -0400

Antoine,

I think I know what that problem is.  I belileve you've stumbled
upon
this issue:

https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing
the
inbound link to be blocked (i.e. it has undelivered messages).
When the
broker reattaches, the blocked links are supposed to become
unblocked
but they don't in the case of auto-links.

This has been fixed on the master branch if you'd like to try
applying
the patch.

-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before
stopping of

Re: Testing failover on dispatcher/java-broker cluster

2016-09-29 Thread Adel Boutros
They seem fair enough and quite related.


As a side note, I have a bug with the dispatch router 0.6.1 but I haven't 
submitted it yet because I haven't reduced the test case yet.

In resume, when I connect 2 dispatchers (inter-router) and then delete the 
connector/listener of "inter-router". If I delete and recreate a mobile address 
which has received a message on one of the dispatchers, the stats of the "in" 
and "out" do not reset to 0 when doing "qdstat -a" but they remain at the old 
values. However they reset correctly on the other router.


Have you encountered something similar? Once I have a reduced test case, I will 
post it in a different thread of course.


Regards,

Adel


From: Ted Ross <tr...@redhat.com>
Sent: Thursday, September 29, 2016 4:38:26 PM
To: users@qpid.apache.org
Subject: Re: Testing failover on dispatcher/java-broker cluster

Sorry, those Jira numbers and descriptions are mismatched.  Here's the
correct list:

- DISPATCH-496 - Activation of an autolink does not result in issuing
 credit to a blocked sender
- DISPATCH-505 - Eventual loss of credit on inter-router control
 links when the topology changes
- DISPATCH-523 - Topology changes can cause in-flight deliveries to
 be stuck in the ingress router


On 09/29/2016 10:35 AM, Ted Ross wrote:
>
> On 09/24/2016 05:32 AM, Adel Boutros wrote:
>> We are indeed in favor of a minor release as long as the latest
>> version is still 0.6.x and we are willing to re-launch our tests and
>> give feedback on the release candidate once provided (It shouldn't
>> take us more than a day to compile and test).
>> Do you have a list of fixes in mind?
>
> I've identified three fixes that look like good candidates for 0.6.2:
>
>   - DISPATCH-496 - Topology changes can cause in-flight deliveries to
>be stuck in the ingress router
>   - DISPATCH-505 - Eventual loss of credit on inter-router control
>links when the topology changes
>   - DISPATCH-523 - Activation of an autolink does not result in issuing
>credit to a blocked sender
>
> These are all stability-related issues.
>
> Thoughts?
>
> -Ted
>
>> Regards,Adel
>>
>>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>> To: users@qpid.apache.org
>>> From: tr...@redhat.com
>>> Date: Fri, 23 Sep 2016 17:23:57 -0400
>>>
>>> Hi Adel,
>>>
>>> A minor release is always possible.  It's up to us, the community, to
>>> decide whether and when to produce one.  I'm in favor of releasing an
>>> 0.6.2 with some small backports to fix bugs for users that want to stay
>>> on Proton 0.12.
>>>
>>> -Ted
>>>
>>> On 09/23/2016 09:44 AM, Adel Boutros wrote:
>>>> Hello Ted,
>>>> Did you happen to have the time to check if a minor release is
>>>> possible?
>>>> Regards,Adel
>>>>
>>>>> From: adelbout...@live.com
>>>>> To: users@qpid.apache.org
>>>>> Subject: RE: Testing failover on dispatcher/java-broker cluster
>>>>> Date: Tue, 20 Sep 2016 15:13:03 +0200
>>>>>
>>>>> Hello Ted,
>>>>>
>>>>> I confirm the fix solved the issue.
>>>>>
>>>>> Would it be possible to do a 0.6.2 release? We cannot compile newer
>>>>> versions of Proton (We currently use 0.12.2) due to lack of
>>>>> resources from our side and we really need this fix for our tests.
>>>>>
>>>>> Regards,
>>>>> Adel
>>>>>
>>>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster
>>>>>> To: users@qpid.apache.org
>>>>>> From: tr...@redhat.com
>>>>>> Date: Mon, 19 Sep 2016 12:18:23 -0400
>>>>>>
>>>>>> Hi Adel,
>>>>>>
>>>>>> It's a one-liner and it applies cleanly to the 0.6.x branch.
>>>>>>
>>>>>> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407
>>>>>>
>>>>>> -Ted
>>>>>>
>>>>>>
>>>>>> On 09/19/2016 11:41 AM, Adel Boutros wrote:
>>>>>>> Hello Ted,
>>>>>>>
>>>>>>> Antoine is on vacation so I will be taking over this task.
>>>>>>>
>>>>>>> Does this fix have any dependencies? We would like to apply it on
>>

Re: Testing failover on dispatcher/java-broker cluster

2016-09-29 Thread Ted Ross
Sorry, those Jira numbers and descriptions are mismatched.  Here's the 
correct list:


   - DISPATCH-496 - Activation of an autolink does not result in issuing
credit to a blocked sender
   - DISPATCH-505 - Eventual loss of credit on inter-router control
links when the topology changes
   - DISPATCH-523 - Topology changes can cause in-flight deliveries to
be stuck in the ingress router


On 09/29/2016 10:35 AM, Ted Ross wrote:


On 09/24/2016 05:32 AM, Adel Boutros wrote:

We are indeed in favor of a minor release as long as the latest
version is still 0.6.x and we are willing to re-launch our tests and
give feedback on the release candidate once provided (It shouldn't
take us more than a day to compile and test).
Do you have a list of fixes in mind?


I've identified three fixes that look like good candidates for 0.6.2:

  - DISPATCH-496 - Topology changes can cause in-flight deliveries to
   be stuck in the ingress router
  - DISPATCH-505 - Eventual loss of credit on inter-router control
   links when the topology changes
  - DISPATCH-523 - Activation of an autolink does not result in issuing
   credit to a blocked sender

These are all stability-related issues.

Thoughts?

-Ted


Regards,Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 23 Sep 2016 17:23:57 -0400

Hi Adel,

A minor release is always possible.  It's up to us, the community, to
decide whether and when to produce one.  I'm in favor of releasing an
0.6.2 with some small backports to fix bugs for users that want to stay
on Proton 0.12.

-Ted

On 09/23/2016 09:44 AM, Adel Boutros wrote:

Hello Ted,
Did you happen to have the time to check if a minor release is
possible?
Regards,Adel


From: adelbout...@live.com
To: users@qpid.apache.org
Subject: RE: Testing failover on dispatcher/java-broker cluster
Date: Tue, 20 Sep 2016 15:13:03 +0200

Hello Ted,

I confirm the fix solved the issue.

Would it be possible to do a 0.6.2 release? We cannot compile newer
versions of Proton (We currently use 0.12.2) due to lack of
resources from our side and we really need this fix for our tests.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Mon, 19 Sep 2016 12:18:23 -0400

Hi Adel,

It's a one-liner and it applies cleanly to the 0.6.x branch.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407

-Ted


On 09/19/2016 11:41 AM, Adel Boutros wrote:

Hello Ted,

Antoine is on vacation so I will be taking over this task.

Does this fix have any dependencies? We would like to apply it on
0.6.1 without other fixes because it seems the master branch
requires proton 0.13.0 minimum whereas we have currently 0.12.2
and we cannot upgrade at the time being.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 16 Sep 2016 16:53:05 -0400

Antoine,

I think I know what that problem is.  I belileve you've stumbled
upon
this issue:

https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing
the
inbound link to be blocked (i.e. it has undelivered messages).
When the
broker reattaches, the blocked links are supposed to become
unblocked
but they don't in the case of auto-links.

This has been fixed on the master branch if you'd like to try
applying
the patch.

-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before
stopping of the
broker. I manually added the annotation (# stopping the broker)
and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it
is the broker
that initiates it.

I found something interesting. In my test, I always sent a
message when the
broker is down, expecting to get a JmsSendTimedOutException
(waiting for
the disposition frame). I assumed this was harmless. But it
turns out this
is not. When I don’t do that, I can send a message after the
broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher
and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the
dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame)
(Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the
disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot rep

Re: Testing failover on dispatcher/java-broker cluster

2016-09-29 Thread Ted Ross


On 09/24/2016 05:32 AM, Adel Boutros wrote:

We are indeed in favor of a minor release as long as the latest version is 
still 0.6.x and we are willing to re-launch our tests and give feedback on the 
release candidate once provided (It shouldn't take us more than a day to 
compile and test).
Do you have a list of fixes in mind?


I've identified three fixes that look like good candidates for 0.6.2:

  - DISPATCH-496 - Topology changes can cause in-flight deliveries to
   be stuck in the ingress router
  - DISPATCH-505 - Eventual loss of credit on inter-router control
   links when the topology changes
  - DISPATCH-523 - Activation of an autolink does not result in issuing
   credit to a blocked sender

These are all stability-related issues.

Thoughts?

-Ted


Regards,Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 23 Sep 2016 17:23:57 -0400

Hi Adel,

A minor release is always possible.  It's up to us, the community, to
decide whether and when to produce one.  I'm in favor of releasing an
0.6.2 with some small backports to fix bugs for users that want to stay
on Proton 0.12.

-Ted

On 09/23/2016 09:44 AM, Adel Boutros wrote:

Hello Ted,
Did you happen to have the time to check if a minor release is possible?
Regards,Adel


From: adelbout...@live.com
To: users@qpid.apache.org
Subject: RE: Testing failover on dispatcher/java-broker cluster
Date: Tue, 20 Sep 2016 15:13:03 +0200

Hello Ted,

I confirm the fix solved the issue.

Would it be possible to do a 0.6.2 release? We cannot compile newer versions of 
Proton (We currently use 0.12.2) due to lack of resources from our side and we 
really need this fix for our tests.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Mon, 19 Sep 2016 12:18:23 -0400

Hi Adel,

It's a one-liner and it applies cleanly to the 0.6.x branch.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407

-Ted


On 09/19/2016 11:41 AM, Adel Boutros wrote:

Hello Ted,

Antoine is on vacation so I will be taking over this task.

Does this fix have any dependencies? We would like to apply it on 0.6.1 without 
other fixes because it seems the master branch requires proton 0.13.0 minimum 
whereas we have currently 0.12.2 and we cannot upgrade at the time being.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 16 Sep 2016 16:53:05 -0400

Antoine,

I think I know what that problem is.  I belileve you've stumbled upon
this issue:

https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing the
inbound link to be blocked (i.e. it has undelivered messages).  When the
broker reattaches, the blocked links are supposed to become unblocked
but they don't in the case of auto-links.

This has been fixed on the master branch if you'd like to try applying
the patch.

-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.

I found something interesting. In my test, I always sent a message when the
broker is down, expecting to get a JmsSendTimedOutException (waiting for
the disposition frame). I assumed this was harmless. But it turns out this
is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot reproduce step (6) and my messages arrive
(Wireshark: transfer -> disposition) to the restarted broker.

I hope it makes it clearer for you. Sorry for my rookie mistakes :-).

Note: My colleague and I ran a small experiment to identify if the problem
comes from JMS or the AMQP protocol. He changed the code of the java broker
to not send the disposition frame one time out of two.

We got these results:

* I use Wireshark between the JMS client and the patched broker. *

1) Us

RE: Testing failover on dispatcher/java-broker cluster

2016-09-24 Thread Adel Boutros
We are indeed in favor of a minor release as long as the latest version is 
still 0.6.x and we are willing to re-launch our tests and give feedback on the 
release candidate once provided (It shouldn't take us more than a day to 
compile and test).
Do you have a list of fixes in mind?
Regards,Adel

> Subject: Re: Testing failover on dispatcher/java-broker cluster
> To: users@qpid.apache.org
> From: tr...@redhat.com
> Date: Fri, 23 Sep 2016 17:23:57 -0400
> 
> Hi Adel,
> 
> A minor release is always possible.  It's up to us, the community, to 
> decide whether and when to produce one.  I'm in favor of releasing an 
> 0.6.2 with some small backports to fix bugs for users that want to stay 
> on Proton 0.12.
> 
> -Ted
> 
> On 09/23/2016 09:44 AM, Adel Boutros wrote:
> > Hello Ted,
> > Did you happen to have the time to check if a minor release is possible?
> > Regards,Adel
> >
> >> From: adelbout...@live.com
> >> To: users@qpid.apache.org
> >> Subject: RE: Testing failover on dispatcher/java-broker cluster
> >> Date: Tue, 20 Sep 2016 15:13:03 +0200
> >>
> >> Hello Ted,
> >>
> >> I confirm the fix solved the issue.
> >>
> >> Would it be possible to do a 0.6.2 release? We cannot compile newer 
> >> versions of Proton (We currently use 0.12.2) due to lack of resources from 
> >> our side and we really need this fix for our tests.
> >>
> >> Regards,
> >> Adel
> >>
> >>> Subject: Re: Testing failover on dispatcher/java-broker cluster
> >>> To: users@qpid.apache.org
> >>> From: tr...@redhat.com
> >>> Date: Mon, 19 Sep 2016 12:18:23 -0400
> >>>
> >>> Hi Adel,
> >>>
> >>> It's a one-liner and it applies cleanly to the 0.6.x branch.
> >>>
> >>> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407
> >>>
> >>> -Ted
> >>>
> >>>
> >>> On 09/19/2016 11:41 AM, Adel Boutros wrote:
> >>>> Hello Ted,
> >>>>
> >>>> Antoine is on vacation so I will be taking over this task.
> >>>>
> >>>> Does this fix have any dependencies? We would like to apply it on 0.6.1 
> >>>> without other fixes because it seems the master branch requires proton 
> >>>> 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at 
> >>>> the time being.
> >>>>
> >>>> Regards,
> >>>> Adel
> >>>>
> >>>>> Subject: Re: Testing failover on dispatcher/java-broker cluster
> >>>>> To: users@qpid.apache.org
> >>>>> From: tr...@redhat.com
> >>>>> Date: Fri, 16 Sep 2016 16:53:05 -0400
> >>>>>
> >>>>> Antoine,
> >>>>>
> >>>>> I think I know what that problem is.  I belileve you've stumbled upon
> >>>>> this issue:
> >>>>>
> >>>>> https://issues.apache.org/jira/browse/DISPATCH-496
> >>>>>
> >>>>> Your second delivery, the one resulting in a timeout, is causing the
> >>>>> inbound link to be blocked (i.e. it has undelivered messages).  When the
> >>>>> broker reattaches, the blocked links are supposed to become unblocked
> >>>>> but they don't in the case of auto-links.
> >>>>>
> >>>>> This has been fixed on the master branch if you'd like to try applying
> >>>>> the patch.
> >>>>>
> >>>>> -Ted
> >>>>>
> >>>>> On 09/15/2016 04:56 AM, Antoine Chevin wrote:
> >>>>>> Hi Ted,
> >>>>>>
> >>>>>> You’re right, the connection close looked strange before stopping of 
> >>>>>> the
> >>>>>> broker. I manually added the annotation (# stopping the broker) and was
> >>>>>> wrong about the position of this one. I replayed the test and the
> >>>>>> connection close happens *after* the broker stop. I assume it is the 
> >>>>>> broker
> >>>>>> that initiates it.
> >>>>>>
> >>>>>> I found something interesting. In my test, I always sent a message 
> >>>>>> when the
> >>>>>> broker is down, expecting to get a JmsSendTimedOutException (waiting 
> >>>>>> for
> >>>>>

Re: Testing failover on dispatcher/java-broker cluster

2016-09-23 Thread Ted Ross

Hi Adel,

A minor release is always possible.  It's up to us, the community, to 
decide whether and when to produce one.  I'm in favor of releasing an 
0.6.2 with some small backports to fix bugs for users that want to stay 
on Proton 0.12.


-Ted

On 09/23/2016 09:44 AM, Adel Boutros wrote:

Hello Ted,
Did you happen to have the time to check if a minor release is possible?
Regards,Adel


From: adelbout...@live.com
To: users@qpid.apache.org
Subject: RE: Testing failover on dispatcher/java-broker cluster
Date: Tue, 20 Sep 2016 15:13:03 +0200

Hello Ted,

I confirm the fix solved the issue.

Would it be possible to do a 0.6.2 release? We cannot compile newer versions of 
Proton (We currently use 0.12.2) due to lack of resources from our side and we 
really need this fix for our tests.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Mon, 19 Sep 2016 12:18:23 -0400

Hi Adel,

It's a one-liner and it applies cleanly to the 0.6.x branch.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407

-Ted


On 09/19/2016 11:41 AM, Adel Boutros wrote:

Hello Ted,

Antoine is on vacation so I will be taking over this task.

Does this fix have any dependencies? We would like to apply it on 0.6.1 without 
other fixes because it seems the master branch requires proton 0.13.0 minimum 
whereas we have currently 0.12.2 and we cannot upgrade at the time being.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 16 Sep 2016 16:53:05 -0400

Antoine,

I think I know what that problem is.  I belileve you've stumbled upon
this issue:

https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing the
inbound link to be blocked (i.e. it has undelivered messages).  When the
broker reattaches, the blocked links are supposed to become unblocked
but they don't in the case of auto-links.

This has been fixed on the master branch if you'd like to try applying
the patch.

-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.

I found something interesting. In my test, I always sent a message when the
broker is down, expecting to get a JmsSendTimedOutException (waiting for
the disposition frame). I assumed this was harmless. But it turns out this
is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot reproduce step (6) and my messages arrive
(Wireshark: transfer -> disposition) to the restarted broker.

I hope it makes it clearer for you. Sorry for my rookie mistakes :-).

Note: My colleague and I ran a small experiment to identify if the problem
comes from JMS or the AMQP protocol. He changed the code of the java broker
to not send the disposition frame one time out of two.

We got these results:

* I use Wireshark between the JMS client and the patched broker. *

1) Using JMS I establish a connection to the patched broker and create a
message producer (Wireshark: connection open -> attach)
2)  I send a message to the broker and it replies with the disposition
frame (Wireshark: transfer -> disposition)
3) I send a message to the broker which drops the disposition frame. I get
a send timeout in JMS (Wireshark: transfer)
2)  I send a message to the broker and it replies with the disposition frame
(Wireshark: transfer -> disposition). It works fine.

We assume that there is something going on in the dispatcher.


Thanks,
Antoine



-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org






-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h.

RE: Testing failover on dispatcher/java-broker cluster

2016-09-23 Thread Adel Boutros
Hello Ted,
Did you happen to have the time to check if a minor release is possible?
Regards,Adel

> From: adelbout...@live.com
> To: users@qpid.apache.org
> Subject: RE: Testing failover on dispatcher/java-broker cluster
> Date: Tue, 20 Sep 2016 15:13:03 +0200
> 
> Hello Ted,
> 
> I confirm the fix solved the issue.
> 
> Would it be possible to do a 0.6.2 release? We cannot compile newer versions 
> of Proton (We currently use 0.12.2) due to lack of resources from our side 
> and we really need this fix for our tests.
> 
> Regards,
> Adel
> 
> > Subject: Re: Testing failover on dispatcher/java-broker cluster
> > To: users@qpid.apache.org
> > From: tr...@redhat.com
> > Date: Mon, 19 Sep 2016 12:18:23 -0400
> > 
> > Hi Adel,
> > 
> > It's a one-liner and it applies cleanly to the 0.6.x branch.
> > 
> > https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407
> > 
> > -Ted
> > 
> > 
> > On 09/19/2016 11:41 AM, Adel Boutros wrote:
> > > Hello Ted,
> > >
> > > Antoine is on vacation so I will be taking over this task.
> > >
> > > Does this fix have any dependencies? We would like to apply it on 0.6.1 
> > > without other fixes because it seems the master branch requires proton 
> > > 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at 
> > > the time being.
> > >
> > > Regards,
> > > Adel
> > >
> > >> Subject: Re: Testing failover on dispatcher/java-broker cluster
> > >> To: users@qpid.apache.org
> > >> From: tr...@redhat.com
> > >> Date: Fri, 16 Sep 2016 16:53:05 -0400
> > >>
> > >> Antoine,
> > >>
> > >> I think I know what that problem is.  I belileve you've stumbled upon
> > >> this issue:
> > >>
> > >> https://issues.apache.org/jira/browse/DISPATCH-496
> > >>
> > >> Your second delivery, the one resulting in a timeout, is causing the
> > >> inbound link to be blocked (i.e. it has undelivered messages).  When the
> > >> broker reattaches, the blocked links are supposed to become unblocked
> > >> but they don't in the case of auto-links.
> > >>
> > >> This has been fixed on the master branch if you'd like to try applying
> > >> the patch.
> > >>
> > >> -Ted
> > >>
> > >> On 09/15/2016 04:56 AM, Antoine Chevin wrote:
> > >>> Hi Ted,
> > >>>
> > >>> You’re right, the connection close looked strange before stopping of the
> > >>> broker. I manually added the annotation (# stopping the broker) and was
> > >>> wrong about the position of this one. I replayed the test and the
> > >>> connection close happens *after* the broker stop. I assume it is the 
> > >>> broker
> > >>> that initiates it.
> > >>>
> > >>> I found something interesting. In my test, I always sent a message when 
> > >>> the
> > >>> broker is down, expecting to get a JmsSendTimedOutException (waiting for
> > >>> the disposition frame). I assumed this was harmless. But it turns out 
> > >>> this
> > >>> is not. When I don’t do that, I can send a message after the broker
> > >>> restart. So to sum up the experiment I did:
> > >>>
> > >>> * I use Wireshark between the JMS client and the dispatcher. *
> > >>>
> > >>> 1)  Using JMS I establish a connection to the dispatcher and create 
> > >>> a
> > >>> message producer (Wireshark: connection open -> attach)
> > >>> 2)  I’m able to send a message to the broker through the dispatcher 
> > >>> (
> > >>> Wireshark: transfer -> disposition)
> > >>> 3)  I stop the broker
> > >>> 4)  With the same link, I send a message and I get a
> > >>> JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
> > >>> transfer)
> > >>> 5)  I restart the broker
> > >>> 6)  With the same link, I try to send a message and I get a
> > >>> JmsSendTimedOutException for the same reason (waiting for the 
> > >>> disposition
> > >>> frame) (Wireshark: transfer)
> > >>>
> > >>> If I skip step (4), I cannot reproduce step (6) and my messages arrive
> > >>> (Wireshark: transfer -

RE: Testing failover on dispatcher/java-broker cluster

2016-09-20 Thread Adel Boutros
Hello Ted,

I confirm the fix solved the issue.

Would it be possible to do a 0.6.2 release? We cannot compile newer versions of 
Proton (We currently use 0.12.2) due to lack of resources from our side and we 
really need this fix for our tests.

Regards,
Adel

> Subject: Re: Testing failover on dispatcher/java-broker cluster
> To: users@qpid.apache.org
> From: tr...@redhat.com
> Date: Mon, 19 Sep 2016 12:18:23 -0400
> 
> Hi Adel,
> 
> It's a one-liner and it applies cleanly to the 0.6.x branch.
> 
> https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407
> 
> -Ted
> 
> 
> On 09/19/2016 11:41 AM, Adel Boutros wrote:
> > Hello Ted,
> >
> > Antoine is on vacation so I will be taking over this task.
> >
> > Does this fix have any dependencies? We would like to apply it on 0.6.1 
> > without other fixes because it seems the master branch requires proton 
> > 0.13.0 minimum whereas we have currently 0.12.2 and we cannot upgrade at 
> > the time being.
> >
> > Regards,
> > Adel
> >
> >> Subject: Re: Testing failover on dispatcher/java-broker cluster
> >> To: users@qpid.apache.org
> >> From: tr...@redhat.com
> >> Date: Fri, 16 Sep 2016 16:53:05 -0400
> >>
> >> Antoine,
> >>
> >> I think I know what that problem is.  I belileve you've stumbled upon
> >> this issue:
> >>
> >> https://issues.apache.org/jira/browse/DISPATCH-496
> >>
> >> Your second delivery, the one resulting in a timeout, is causing the
> >> inbound link to be blocked (i.e. it has undelivered messages).  When the
> >> broker reattaches, the blocked links are supposed to become unblocked
> >> but they don't in the case of auto-links.
> >>
> >> This has been fixed on the master branch if you'd like to try applying
> >> the patch.
> >>
> >> -Ted
> >>
> >> On 09/15/2016 04:56 AM, Antoine Chevin wrote:
> >>> Hi Ted,
> >>>
> >>> You’re right, the connection close looked strange before stopping of the
> >>> broker. I manually added the annotation (# stopping the broker) and was
> >>> wrong about the position of this one. I replayed the test and the
> >>> connection close happens *after* the broker stop. I assume it is the 
> >>> broker
> >>> that initiates it.
> >>>
> >>> I found something interesting. In my test, I always sent a message when 
> >>> the
> >>> broker is down, expecting to get a JmsSendTimedOutException (waiting for
> >>> the disposition frame). I assumed this was harmless. But it turns out this
> >>> is not. When I don’t do that, I can send a message after the broker
> >>> restart. So to sum up the experiment I did:
> >>>
> >>> * I use Wireshark between the JMS client and the dispatcher. *
> >>>
> >>> 1)  Using JMS I establish a connection to the dispatcher and create a
> >>> message producer (Wireshark: connection open -> attach)
> >>> 2)  I’m able to send a message to the broker through the dispatcher (
> >>> Wireshark: transfer -> disposition)
> >>> 3)  I stop the broker
> >>> 4)  With the same link, I send a message and I get a
> >>> JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
> >>> transfer)
> >>> 5)  I restart the broker
> >>> 6)  With the same link, I try to send a message and I get a
> >>> JmsSendTimedOutException for the same reason (waiting for the disposition
> >>> frame) (Wireshark: transfer)
> >>>
> >>> If I skip step (4), I cannot reproduce step (6) and my messages arrive
> >>> (Wireshark: transfer -> disposition) to the restarted broker.
> >>>
> >>> I hope it makes it clearer for you. Sorry for my rookie mistakes :-).
> >>>
> >>> Note: My colleague and I ran a small experiment to identify if the problem
> >>> comes from JMS or the AMQP protocol. He changed the code of the java 
> >>> broker
> >>> to not send the disposition frame one time out of two.
> >>>
> >>> We got these results:
> >>>
> >>> * I use Wireshark between the JMS client and the patched broker. *
> >>>
> >>> 1) Using JMS I establish a connection to the patched broker and create a
> >>> message producer (Wireshark: connection open -> attach)
> >>> 2)  I send a message to the broker and it r

Re: Testing failover on dispatcher/java-broker cluster

2016-09-19 Thread Ted Ross

Hi Adel,

It's a one-liner and it applies cleanly to the 0.6.x branch.

https://git-wip-us.apache.org/repos/asf?p=qpid-dispatch.git;h=41b7407

-Ted


On 09/19/2016 11:41 AM, Adel Boutros wrote:

Hello Ted,

Antoine is on vacation so I will be taking over this task.

Does this fix have any dependencies? We would like to apply it on 0.6.1 without 
other fixes because it seems the master branch requires proton 0.13.0 minimum 
whereas we have currently 0.12.2 and we cannot upgrade at the time being.

Regards,
Adel


Subject: Re: Testing failover on dispatcher/java-broker cluster
To: users@qpid.apache.org
From: tr...@redhat.com
Date: Fri, 16 Sep 2016 16:53:05 -0400

Antoine,

I think I know what that problem is.  I belileve you've stumbled upon
this issue:

https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing the
inbound link to be blocked (i.e. it has undelivered messages).  When the
broker reattaches, the blocked links are supposed to become unblocked
but they don't in the case of auto-links.

This has been fixed on the master branch if you'd like to try applying
the patch.

-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.

I found something interesting. In my test, I always sent a message when the
broker is down, expecting to get a JmsSendTimedOutException (waiting for
the disposition frame). I assumed this was harmless. But it turns out this
is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot reproduce step (6) and my messages arrive
(Wireshark: transfer -> disposition) to the restarted broker.

I hope it makes it clearer for you. Sorry for my rookie mistakes :-).

Note: My colleague and I ran a small experiment to identify if the problem
comes from JMS or the AMQP protocol. He changed the code of the java broker
to not send the disposition frame one time out of two.

We got these results:

* I use Wireshark between the JMS client and the patched broker. *

1) Using JMS I establish a connection to the patched broker and create a
message producer (Wireshark: connection open -> attach)
2)  I send a message to the broker and it replies with the disposition
frame (Wireshark: transfer -> disposition)
3) I send a message to the broker which drops the disposition frame. I get
a send timeout in JMS (Wireshark: transfer)
2)  I send a message to the broker and it replies with the disposition frame
(Wireshark: transfer -> disposition). It works fine.

We assume that there is something going on in the dispatcher.


Thanks,
Antoine



-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org






-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



Re: Testing failover on dispatcher/java-broker cluster

2016-09-16 Thread Ted Ross

Antoine,

I think I know what that problem is.  I belileve you've stumbled upon 
this issue:


https://issues.apache.org/jira/browse/DISPATCH-496

Your second delivery, the one resulting in a timeout, is causing the 
inbound link to be blocked (i.e. it has undelivered messages).  When the 
broker reattaches, the blocked links are supposed to become unblocked 
but they don't in the case of auto-links.


This has been fixed on the master branch if you'd like to try applying 
the patch.


-Ted

On 09/15/2016 04:56 AM, Antoine Chevin wrote:

Hi Ted,

You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.

I found something interesting. In my test, I always sent a message when the
broker is down, expecting to get a JmsSendTimedOutException (waiting for
the disposition frame). I assumed this was harmless. But it turns out this
is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot reproduce step (6) and my messages arrive
(Wireshark: transfer -> disposition) to the restarted broker.

I hope it makes it clearer for you. Sorry for my rookie mistakes :-).

Note: My colleague and I ran a small experiment to identify if the problem
comes from JMS or the AMQP protocol. He changed the code of the java broker
to not send the disposition frame one time out of two.

We got these results:

* I use Wireshark between the JMS client and the patched broker. *

1) Using JMS I establish a connection to the patched broker and create a
message producer (Wireshark: connection open -> attach)
2)  I send a message to the broker and it replies with the disposition
frame (Wireshark: transfer -> disposition)
3) I send a message to the broker which drops the disposition frame. I get
a send timeout in JMS (Wireshark: transfer)
2)  I send a message to the broker and it replies with the disposition frame
(Wireshark: transfer -> disposition). It works fine.

We assume that there is something going on in the dispatcher.


Thanks,
Antoine



-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



Re: Testing failover on dispatcher/java-broker cluster

2016-09-16 Thread Antoine Chevin
Hi Ted,

Do you have any insights into that problem?

Thanks,
Antoine

> Hi Ted,
>
> You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.
>
> I found something interesting. In my test, I always sent a message when
the broker is down, expecting to get a JmsSendTimedOutException (waiting
for the disposition frame). I assumed this was harmless. But it turns out
this is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:
>
> * I use Wireshark between the JMS client and the dispatcher. *
>
> 1)  Using JMS I establish a connection to the dispatcher and create a
> message producer (Wireshark: connection open -> attach)
> 2)  I’m able to send a message to the broker through the dispatcher (
> Wireshark: transfer -> disposition)
> 3)  I stop the broker
> 4)  With the same link, I send a message and I get a
> JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
> transfer)
> 5)  I restart the broker
> 6)  With the same link, I try to send a message and I get a
> JmsSendTimedOutException for the same reason (waiting for the disposition
> frame) (Wireshark: transfer)
>
> If I skip step (4), I cannot reproduce step (6) and my messages arrive
> (Wireshark: transfer -> disposition) to the restarted broker.
>
> I hope it makes it clearer for you. Sorry for my rookie mistakes :-).
>
> Note: My colleague and I ran a small experiment to identify if the
problem comes from JMS or the AMQP protocol. He changed the code of the
java broker to not send the disposition frame one time out of two.
>
> We got these results:
>
> * I use Wireshark between the JMS client and the patched broker. *
>
> 1) Using JMS I establish a connection to the patched broker and create a
message producer (Wireshark: connection open -> attach)
> 2)  I send a message to the broker and it replies with the disposition
frame (Wireshark: transfer -> disposition)
> 3) I send a message to the broker which drops the disposition frame. I
get a send timeout in JMS (Wireshark: transfer)
> 2)  I send a message to the broker and it replies with the disposition
frame
> (Wireshark: transfer -> disposition). It works fine.
>
> We assume that there is something going on in the dispatcher.
>
>
> Thanks,
> Antoine


Re: Testing failover on dispatcher/java-broker cluster

2016-09-15 Thread Antoine Chevin
Hi Ted,

You’re right, the connection close looked strange before stopping of the
broker. I manually added the annotation (# stopping the broker) and was
wrong about the position of this one. I replayed the test and the
connection close happens *after* the broker stop. I assume it is the broker
that initiates it.

I found something interesting. In my test, I always sent a message when the
broker is down, expecting to get a JmsSendTimedOutException (waiting for
the disposition frame). I assumed this was harmless. But it turns out this
is not. When I don’t do that, I can send a message after the broker
restart. So to sum up the experiment I did:

* I use Wireshark between the JMS client and the dispatcher. *

1)  Using JMS I establish a connection to the dispatcher and create a
message producer (Wireshark: connection open -> attach)
2)  I’m able to send a message to the broker through the dispatcher (
Wireshark: transfer -> disposition)
3)  I stop the broker
4)  With the same link, I send a message and I get a
JmsSendTimedOutException (waiting for the disposition frame) (Wireshark:
transfer)
5)  I restart the broker
6)  With the same link, I try to send a message and I get a
JmsSendTimedOutException for the same reason (waiting for the disposition
frame) (Wireshark: transfer)

If I skip step (4), I cannot reproduce step (6) and my messages arrive
(Wireshark: transfer -> disposition) to the restarted broker.

I hope it makes it clearer for you. Sorry for my rookie mistakes :-).

Note: My colleague and I ran a small experiment to identify if the problem
comes from JMS or the AMQP protocol. He changed the code of the java broker
to not send the disposition frame one time out of two.

We got these results:

* I use Wireshark between the JMS client and the patched broker. *

1) Using JMS I establish a connection to the patched broker and create a
message producer (Wireshark: connection open -> attach)
2)  I send a message to the broker and it replies with the disposition
frame (Wireshark: transfer -> disposition)
3) I send a message to the broker which drops the disposition frame. I get
a send timeout in JMS (Wireshark: transfer)
2)  I send a message to the broker and it replies with the disposition frame
(Wireshark: transfer -> disposition). It works fine.

We assume that there is something going on in the dispatcher.


Thanks,
Antoine


Re: Testing failover on dispatcher/java-broker cluster

2016-09-14 Thread Ted Ross

Hi Antoine,

In the broker traces, I see connection shutdown after the transfer but 
before you shut down the broker.  Do you know what is happening there? 
What was the disposition of the delivery?


-Ted

On 09/14/2016 09:12 AM, Antoine Chevin wrote:

Hello Qpid community,



I’m testing the resilience of a dispatcher/broker infrastructure and I
noticed the following behavior:



I run a test with one JMS client connected to a dispatcher, which is
connected to a broker.



1)  Using JMS I establish a connection to the dispatcher and create a
message producer

2)  I’m able to send a message to the broker through the dispatcher

3)  I stop and restart the broker

4)  I cannot send any messages using the message producer I created
before.

5)  If a recreate a MessageProducer (new AMQP link), the message
arrives to the broker



In the failing scenario 4, I noticed using Wireshark that the dispatcher
does not send any messages to the broker. So I deduced that the broker is
not responsible for this behavior.



*Is it an expected behavior? What can I change in the dispatcher/JMS
configuration to avoid the failure?*



You can find attached the Wireshark logs I produced from this experiment:



-  JMS – dispatcher – reuse sender: logs between JMS and the
dispatcher when I reuse the message producer after the restart

-  JMS – dispatcher – new sender: logs between JMS and the
dispatcher when I create a new message producer after the restart

-  dispatcher – broker – reuse sender: logs between the dispatcher
and the broker, I reuse the message producer

-  dispatcher – broker – reuse sender: logs between the dispatcher
and the broker, I create a new message producer



I’m using qpid-dispatch 0.6.0, JMS 0.9.0 and qpid-java-broker 6.0.1.



Thanks,

Best regards,

Antoine




-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org



-
To unsubscribe, e-mail: users-unsubscr...@qpid.apache.org
For additional commands, e-mail: users-h...@qpid.apache.org