Re: What is our strategy for errata and/or hot fixes ?
Hi Andrew, Thanks on the CI setup, I think a separate list would be good for the traffic it generates. Be really good to have a shared view of the tests running etc for the project. Cheers, Marnie On Tue, Apr 19, 2011 at 10:14 PM, Andrew Kennedy andrewinternatio...@gmail.com wrote: On 19 April 2011 17:55, Marnie McCormack marnie.mccorm...@googlemail.com wrote: I think regular, on schedule releases are key for an Apache project - without them we are seen as less credible than other OS projects in the messaging space. We have historically really struggled to get our releases out on time/as planned and across the ?5 years we've been going we have made (I think) 9 releases including those from incubator. We need to get faster and not be holding up the greater number of fixes for the last couple in each release. Only if we get quicker can we more rapidly evolve the product Apache users actually get to use :-) We also should really discuss testing as a group - we currently have a non-defined release gateway for testing, which lets us down somewhat. Just my tuppence - based on reading non-Apache Qpid people's assessments of messaging products in the OS space. +1, see below for update regarding automated testing during CI. I'd like to hear some thoughts about this from the community. My personal preference is to do an errata release for QPID-3214 QPID-3216 (and any other serious issues like that). What do you guys think ? I tend to view errata/hotfix as services that downstream support providers handle. You can use up a lot of time and effort going down that road. Errata and hotfix releases are a red herring. If we get our continuous integration process right, we should be able to provide snapshot nightly releases from trunk, which will give people the fixes they require. If there is a glaring security hole found in the broker then that may warrant an off-cycle release, but keeping to a schedule of releases is a better plan. If QPID-3214 and 3216 are serious issues I would prefer to hold 0.10 for the fixes. The word deadlock in the jira subject seems serious, but I'm wondering how likely it is a user will hit it if it hasn't been found before. Fair point, but I would need to investigate further, maybe Rajith has more information? These issues were apparently found when he ran the tests under the C++ profile, but they didn't show up under any of the Java profile system test runs I did when I was made the change that seems to have caused QPID-3214. Also, I believe that this issue occurs during QueueBrowser failover, and the tests for this are, in fact, disabled for Java profiles. The code involved is quite brittle, and I know there are a few more threading and locking issues either there or nearby. These intermittent issues are always difficult to diagnose, so it's good that we have a way to reproduce some of them now. I'd personally mark these as blockers for the next release, and make sure that we have better test coverage during 0.12 development. I know Rajith is looking at this, and I will as well, to determine if there *is* a quick fix possible. Similar statements hold for QPID-3216, as well ;( On that subject, the Java continuous integration is running happily on the ASF Hudson instances, however it does *not* run the C++ builds or tests, since we cannot get the right libraries installed across the whole Hudson cluster. Fortunately, my employer, Cloudsoft, has kindly agreed to provide some Amazon EC2 instances for the project. I have installed Hudson on one currently, and have got the C++ and Java system test suites running. Once I get a few issues ironed out, and also start the clustered system tests running as well, we should have a much better view of the test coverage, and these sorts of problems are less likely to occur. If interested, lok at this temporary URL (I will sort out a static IP and DNS shortly) below: http://ec2-46-137-19-127.eu-west-1.compute.amazonaws.com:8080/ Once the CI system is stable, is everyone happy for me to point the e-mails it gereates at this list, or do you think we need another, bu...@qpid.apache.org perhaps? Cheers, Andrew. -- -- andrew d kennedy ? cloudsoft monterey : http://www.cloudsoftcorp.com/ ; - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Created] (QPID-3219) some logic used to find current/available messages may return a QueueEntry currently in the dequeued state.
some logic used to find current/available messages may return a QueueEntry currently in the dequeued state. --- Key: QPID-3219 URL: https://issues.apache.org/jira/browse/QPID-3219 Project: Qpid Issue Type: Bug Components: Java Broker Affects Versions: 0.8, 0.7, 0.6, 0.5, 0.9, 0.10 Reporter: Robbie Gemmell Fix For: 0.11 Some logic used to find current/available messages may incorrectly gather/return a QueueEntry in the dequeued state. The QueueEntry state model has an intermediate 'dequeued' state that the entry will enter before being disposed of and moved to the 'deleted' state. At present, most logic dealing with QueueEntry presence/availability checks only that they aren't deleted or that they can be acquired (typically both are done in the latter case, but checking the deletion is unnecessary in this case since the acquire attempt would fail if it were deleted). There is no intended path for a queue entry to transition from being dequeued to anything other than the deleted state, and so any logic inspecting for current/available messages should also disregard entries in the dequeued state. Areas known to be affected by the issues include the various SimpleAMQQueue#'getMessagesOnTheQueue' type operations in used for tasks such as viewing / copying / moving messages via JMX, and also the QueueEntryImpl#getNext method used to traverse the QueueEntryList during delivery operations. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: [VOTE] Release 0.10
All, I've had a look at the details and I'd like to propose the release go ahead now, rather than be delayed any further. The reasoning is that the Java items highlighted are not strictly regressions (existing largely in Qpid 0.8) and more importantly the fixes for them, and associated issues, are likely best measured in weeks and 0.12 seems a realistic target for them. If they are fixed earlier, we could consider bringing forward the schedule for 0.12. Thoughts ? Regards, Marnie On Tue, Apr 19, 2011 at 4:22 PM, Rajith Attapattu rajit...@gmail.comwrote: We do have two serious regressions in QPID-3214 QPID-3216 But I don't think we should stop the release as I don't believe we can fix them quickly and safely enough to make this release viable. We are trying very hard to release often and I don't want to jeopardize that. Since we do have another release coming in another 2 months, there is no point delaying this release any further. Therefore IMO we should release-note these issues and go ahead. Another option is to do an errata just for the java client (or any other component with a similar problem) as soon as we are confident about the fixes. Regards, Rajith On Tue, Apr 19, 2011 at 7:36 AM, Robbie Gemmell robbie.gemm...@gmail.com wrote: +1 Robbie On 14 April 2011 15:39, Justin Ross jr...@redhat.com wrote: Hello, everyone. The blocker issue raised earlier this week has been resolved. There's more information, including release notes, at the release page[1]. The proposed final distribution of Qpid 0.10 is available from the link below. It's from revision 1091571 of the 0.10 branch. http://people.apache.org/~astitcher/dist/qpid-0.10/ Andrew has graciously generated and signed this distro. It therefore meets the requirements for a vote. If you favor releasing this distribution as Qpid 0.10, vote +1. If you are aware of problems that ought to prevent this distribution from being released, vote -1. Thanks, Justin --- [1] https://cwiki.apache.org/confluence/display/qpid/0.10+Release - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Resolved] (QPID-3213) enable publishing of Maven artifacts for the client to the ASF Nexus instance
[ https://issues.apache.org/jira/browse/QPID-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Wall resolved QPID-3213. -- Resolution: Fixed Hi Robbie. No review comments from me. Thanks Keith enable publishing of Maven artifacts for the client to the ASF Nexus instance - Key: QPID-3213 URL: https://issues.apache.org/jira/browse/QPID-3213 Project: Qpid Issue Type: Task Components: Ant Build System Reporter: Robbie Gemmell Assignee: Keith Wall Fix For: 0.11 Maven artifacts have been produced for the client as part of the 0.10 release. These need to be published to the ASF's Nexus instance, so support should be added for uploading the files to a staging repository so that the artifacts can be released. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Updated] (QPID-2498) Upgrade Mina to 1.1.7
[ https://issues.apache.org/jira/browse/QPID-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell updated QPID-2498: - Fix Version/s: (was: 0.9) 0.11 Upgrade Mina to 1.1.7 - Key: QPID-2498 URL: https://issues.apache.org/jira/browse/QPID-2498 Project: Qpid Issue Type: Improvement Components: Java Broker, Java Client Affects Versions: 0.6 Reporter: Emmanuel Bourg Assignee: Andrew Kennedy Fix For: 0.11 Attachments: mina-update.patch, mina-upgrade.patch Upgrading Mina to 1.1.7 would allow the removal of backport-util-concurrent.jar from the dependencies. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Updated] (QPID-2476) Complete ACL implementation for 0-10 code path
[ https://issues.apache.org/jira/browse/QPID-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robbie Gemmell updated QPID-2476: - Fix Version/s: (was: 0.9) 0.11 Complete ACL implementation for 0-10 code path -- Key: QPID-2476 URL: https://issues.apache.org/jira/browse/QPID-2476 Project: Qpid Issue Type: New Feature Components: Java Broker Affects Versions: 0.7, 0.9 Reporter: Andrew Kennedy Labels: qpid, security Fix For: 0.11 Attachments: acl.txt, method-considered-harmful.txt, method-redux.txt Original Estimate: 336h Remaining Estimate: 336h Complete ACL implementation for 0-10 code path, providing an ACLv2 implementation that covers the following features/requirements: - Best practice security design - Support for roles/groups - Appropriate for standard stores for authorisation credentials (e.g. LDAP, Kerberos) - Expressable as XML - Easy to store/backup/extract ACL config - Exception handling catching at point of ACL application and return to client via Connection ExceptionListener with correct error code, log failure in broker - No significant performance cost on publish, permissions to be cached - Security handled at correct level of abstraction internally - Interoperability with existing ACLv2 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
RE: [VOTE] Release 0.10
That seems reasonable to me, Marnie. -Steve -Original Message- From: Marnie McCormack [mailto:marnie.mccorm...@googlemail.com] Sent: Wednesday, April 20, 2011 5:37 AM To: dev@qpid.apache.org Subject: Re: [VOTE] Release 0.10 All, I've had a look at the details and I'd like to propose the release go ahead now, rather than be delayed any further. The reasoning is that the Java items highlighted are not strictly regressions (existing largely in Qpid 0.8) and more importantly the fixes for them, and associated issues, are likely best measured in weeks and 0.12 seems a realistic target for them. If they are fixed earlier, we could consider bringing forward the schedule for 0.12. Thoughts ? Regards, Marnie On Tue, Apr 19, 2011 at 4:22 PM, Rajith Attapattu rajit...@gmail.comwrote: We do have two serious regressions in QPID-3214 QPID-3216 But I don't think we should stop the release as I don't believe we can fix them quickly and safely enough to make this release viable. We are trying very hard to release often and I don't want to jeopardize that. Since we do have another release coming in another 2 months, there is no point delaying this release any further. Therefore IMO we should release-note these issues and go ahead. Another option is to do an errata just for the java client (or any other component with a similar problem) as soon as we are confident about the fixes. Regards, Rajith On Tue, Apr 19, 2011 at 7:36 AM, Robbie Gemmell robbie.gemm...@gmail.com wrote: +1 Robbie On 14 April 2011 15:39, Justin Ross jr...@redhat.com wrote: Hello, everyone. The blocker issue raised earlier this week has been resolved. There's more information, including release notes, at the release page[1]. The proposed final distribution of Qpid 0.10 is available from the link below. It's from revision 1091571 of the 0.10 branch. http://people.apache.org/~astitcher/dist/qpid-0.10/ Andrew has graciously generated and signed this distro. It therefore meets the requirements for a vote. If you favor releasing this distribution as Qpid 0.10, vote +1. If you are aware of problems that ought to prevent this distribution from being released, vote -1. Thanks, Justin --- [1] https://cwiki.apache.org/confluence/display/qpid/0.10+Release --- -- Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Updated] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gordon Sim updated QPID-3214: - Attachment: QPID-3214.patch Suggested fix. This simply takes the call to connection.exceptionReceived() outside the scope of the current_exception_lock in the session. Holding the session lock over that call would not in any case prevent other sessions calling in on the connection, so its not clear what value there is in doing so. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at
Re: [VOTE] Release 0.10
On 04/20/2011 10:36 AM, Marnie McCormack wrote: I've had a look at the details and I'd like to propose the release go ahead now, rather than be delayed any further. The reasoning is that the Java items highlighted are not strictly regressions (existing largely in Qpid 0.8) and more importantly the fixes for them, and associated issues, are likely best measured in weeks and 0.12 seems a realistic target for them. QPID-3216 was apparently caused by r1092510, itself a fix to QPID-3207 made last Thursday, which isn't even on the 0-10 branch as far as I can see. That being the case I don't think we need to consider blocking for this one (unless QPID-3207 was considered a blocker). That leaves QPID-3214 which was apparently caused by r985262 made on Aug 13 2010. As Marnie points out we didn't branch for 0.8 until Nov 7 2010, r1032358, so this bug exists in 0.8 and is not *in itself* a regression. That particular deadlock looks like it may be triggered when the cancellation of a subscription occurs. It *could* therefore be more visible following the changes made to the c++ broker for QPID-2324, which was fixed for 0.10 (throw 404 if client attempts to close a non-existent subscription). I tend to agree with Marnie. If the fix looks like it may take some time (or if there is any doubt or risk associated with it), it may be best to proceed with the 0.10 release without this change. - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022110#comment-13022110 ] Gordon Sim commented on QPID-3214: -- AMQSession_0_10.java seems to be duplicating in part what the transport.Session is doing (i.e. holding exception). In the trace above the threads are each trying to set the current exception on the session to the same thing I believe. My suggested 'fix' btw is a workaround of the deadlock only. It doesn't try to untangle the code. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at
Re: [VOTE] Release 0.10
Marnie, that sounds quite reasonable to me. Justin On Wed, 20 Apr 2011, Marnie McCormack wrote: All, I've had a look at the details and I'd like to propose the release go ahead now, rather than be delayed any further. The reasoning is that the Java items highlighted are not strictly regressions (existing largely in Qpid 0.8) and more importantly the fixes for them, and associated issues, are likely best measured in weeks and 0.12 seems a realistic target for them. If they are fixed earlier, we could consider bringing forward the schedule for 0.12. Thoughts ? Regards, Marnie On Tue, Apr 19, 2011 at 4:22 PM, Rajith Attapattu rajit...@gmail.comwrote: We do have two serious regressions in QPID-3214 QPID-3216 But I don't think we should stop the release as I don't believe we can fix them quickly and safely enough to make this release viable. We are trying very hard to release often and I don't want to jeopardize that. Since we do have another release coming in another 2 months, there is no point delaying this release any further. Therefore IMO we should release-note these issues and go ahead. Another option is to do an errata just for the java client (or any other component with a similar problem) as soon as we are confident about the fixes. Regards, Rajith On Tue, Apr 19, 2011 at 7:36 AM, Robbie Gemmell robbie.gemm...@gmail.com wrote: +1 Robbie On 14 April 2011 15:39, Justin Ross jr...@redhat.com wrote: Hello, everyone. The blocker issue raised earlier this week has been resolved. There's more information, including release notes, at the release page[1]. The proposed final distribution of Qpid 0.10 is available from the link below. It's from revision 1091571 of the 0.10 branch. http://people.apache.org/~astitcher/dist/qpid-0.10/ Andrew has graciously generated and signed this distro. It therefore meets the requirements for a vote. If you favor releasing this distribution as Qpid 0.10, vote +1. If you are aware of problems that ought to prevent this distribution from being released, vote -1. Thanks, Justin --- [1] https://cwiki.apache.org/confluence/display/qpid/0.10+Release - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3215) cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated
[ https://issues.apache.org/jira/browse/QPID-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022120#comment-13022120 ] Alan Conway commented on QPID-3215: --- Review https://reviews.apache.org/r/623/ cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated Key: QPID-3215 URL: https://issues.apache.org/jira/browse/QPID-3215 Project: Qpid Issue Type: Bug Components: C++ Broker, C++ Clustering Affects Versions: 0.10 Reporter: Alan Conway Assignee: Alan Conway Fix For: 0.11 Description of problem: SemanticState::route() uses a simple cache variable to avoid looking up the exchange for every message. However if the exchange in question is deleted, even if then recreated, this can cause inconsistencies in a cluster. Version-Release number of selected component (if applicable): 1.3 How reproducible: 100% (Quite a contrived example though) Steps to Reproduce: 1. start one cluster node 2. create an exchange, a queue and a binding between them qpid-config add exchange topic x qpid-config add queue q qpid-config bind x q k 3. start a session and send a message to the exchange with the relevant key (leave session running) qpid-send --content-stdin --address x/k then enter a few lines to send some messages 4. start a new cluster node 5. delete and recreate the exchange, this time add in a different binding qpid-config del exchange x qpid-config add exchange topic x qpid-config add queue q2 qpid-config bind x q2 k 6. send some more messages on the session from 3. with same exchange and key (i.e. type in some more messages if using qpid-send as suggested) now have an inconsistency where the second node has some messages in q2 and some (though fewer than first node) in q1, whereas for first node all the messages are in q1 7. qpid-receive --address 'q2; {mode: browse}' --broker localhost:5673 --capacity 1 (assuming second node is 5673) Actual results: First node shutsdown with inconsistent error Expected results: No inconsistency, should be able to run the command in 7 against q or q2 on either node and see the same results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: [VOTE] Release 0.10
While the root cause for both these items have been present in 0.8 (and perhaps before for QPID-3216) these issues are *more likely* to happen in the current release than in 0.8 In that sense they are regressions, and certainly from a users pov of they are. I think recent changes in the client and broker may have made these issues happen more likely. While r1092510 may have caused QPID-3216 to happen more readily there may be other triggers that can cause this as well. (Also please note that r1092510 is actually the correct behaviour and if thats causing a deadlock then it's a concern.) Same can be said for QPID-3214. The fact remains that we do have deadlocks lying around in the code and they have a better chance of happening with 0.10 ! Having said that I agree with Marnie that we should not be holding up the release, as fixing these issues safely may not be possible within the next week or so. (We need to release note both these issues, if we go ahead with the release.) Regards, Rajith On Wed, Apr 20, 2011 at 9:14 AM, Gordon Sim g...@redhat.com wrote: On 04/20/2011 10:36 AM, Marnie McCormack wrote: I've had a look at the details and I'd like to propose the release go ahead now, rather than be delayed any further. The reasoning is that the Java items highlighted are not strictly regressions (existing largely in Qpid 0.8) and more importantly the fixes for them, and associated issues, are likely best measured in weeks and 0.12 seems a realistic target for them. QPID-3216 was apparently caused by r1092510, itself a fix to QPID-3207 made last Thursday, which isn't even on the 0-10 branch as far as I can see. That being the case I don't think we need to consider blocking for this one (unless QPID-3207 was considered a blocker). That leaves QPID-3214 which was apparently caused by r985262 made on Aug 13 2010. As Marnie points out we didn't branch for 0.8 until Nov 7 2010, r1032358, so this bug exists in 0.8 and is not *in itself* a regression. That particular deadlock looks like it may be triggered when the cancellation of a subscription occurs. It *could* therefore be more visible following the changes made to the c++ broker for QPID-2324, which was fixed for 0.10 (throw 404 if client attempts to close a non-existent subscription). I tend to agree with Marnie. If the fix looks like it may take some time (or if there is any doubt or risk associated with it), it may be best to proceed with the 0.10 release without this change. - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Proposed fixes for 0.10 (was Re: What is our strategy for errata and/or hot fixes ?)
On 04/19/2011 02:31 PM, Alan Conway wrote: On 04/19/2011 12:26 PM, Rajith Attapattu wrote: Based on the comments so far it seems that delaying the 0.10 release is probably a better idea. I agree with Carl and Steve that erratas can be time consuming and may actually be detrimental to our overall goals. For now lets hold on to the release a bit and see what needs to be done to fix QPID-3214 QPID-3216. Based on that we can make a call to just release note it and fix it in 0.12 or whether we can safely fix them for 0.10 If we are holding up 0.10 release I'd like to propose these two fixes: QPID-3215 cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated QPID-3208 Exchanges make best effort to route messages if there is an error. They are not regressions but both can cause cluster brokers to shut down (and have for at least one user.) The fixes are low risk and already on the trunk. I'd like to add this trivial fix: QPID-3217 Exchanges with IVE option cause cluster inconsistencies in updatees All 3 are on trunk and have had code review. - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3217) Exchanges with IVE option cause cluster inconsistencies in updatees
[ https://issues.apache.org/jira/browse/QPID-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022126#comment-13022126 ] Gordon Sim commented on QPID-3217: -- This change would be fine for 0.10; it adds no risk and makes the fact that the IVE feature doesn't work in a cluster explicit. Exchanges with IVE option cause cluster inconsistencies in updatees --- Key: QPID-3217 URL: https://issues.apache.org/jira/browse/QPID-3217 Project: Qpid Issue Type: Bug Components: C++ Clustering Affects Versions: 0.10 Reporter: Alan Conway Assignee: Alan Conway Fix For: 0.11 Steps to Reproduce: 1. start a cluster node 2. create an exchange with the IVE option qpid-config add exchange topic my-topic-exchange --ive 3. send a message to that exchange spout --content abc my-topic-exchange/my-key 4. start another node in the cluster 5. create a queue and bind it to the exchange created in 2. using the key with which the message was sent in 3. qpid-config add queue my-queue qpid-config bind my-topic-exchange my-queue my-key (queue is now inconsistent; on the first node it has a message, on the second it has none) 6. run ./src/tests/qpid-receive --capacity 1 --address 'my-queue; {mode: browse}' against the first node Actual results: second node shutsdown with something like: 2011-04-14 14:33:04 error Execution exception: invalid-argument: anonymous.35276a61-4f3a-46a9-a070-e88c6c6ac01f: confirmed (2+0) but only sent (1+0) (../../src/qpid/SessionState.cpp:154) 2011-04-14 14:33:04 critical cluster(192.168.0.3:9532 READY/error) local error 832 did not occur on member 192.168.0.3:9482: invalid-argument: anonymous.35276a61-4f3a-46a9-a070-e88c6c6ac01f: confirmed (2+0) but only sent (1+0) (../../src/qpid/SessionState.cpp:154) Expected results: no shutdown and the command in 6 can be repeated against wither node with the same results Additional info: Root of the problem is that exchange with IVE holds extra state in the form of the 'last message', and this is not transferred to new members. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Proposed fixes for 0.10 (was Re: What is our strategy for errata and/or hot fixes ?)
On Wed, 20 Apr 2011, Alan Conway wrote: On 04/19/2011 02:31 PM, Alan Conway wrote: If we are holding up 0.10 release I'd like to propose these two fixes: QPID-3215 cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated QPID-3208 Exchanges make best effort to route messages if there is an error. They are not regressions but both can cause cluster brokers to shut down (and have for at least one user.) The fixes are low risk and already on the trunk. I'd like to add this trivial fix: QPID-3217 Exchanges with IVE option cause cluster inconsistencies in updatees All 3 are on trunk and have had code review. Hi, Alan. It's looking more and more like we will not extend 0.10 and not produce a new RC, so these will probably need to wait for 0.12. Justin - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Created] (QPID-3220) Specifying connection url option failover='singlebroker' causes the wrong failover policy to be used
Specifying connection url option failover='singlebroker' causes the wrong failover policy to be used Key: QPID-3220 URL: https://issues.apache.org/jira/browse/QPID-3220 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.5, 0.10 Reporter: Keith Wall Priority: Minor If the user specifies the option failover='singlebroker' in the connection URL, an coding error in FailoverPolicy.java means that the client actually elects to use FailoverRoundRobinServers policy rather than FailoverSingleServer. For example, the following URL will cause the client to use the round-robin policy. {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672?retries='50'connectdelay='5000''failover='singlebroker' {code} Furthermore, as the logic FailoverRoundRobinServers.getNextBrokerDetails means that if the roundrobin policy is used with a brokerlist containing *one broker*, connectdelay is never applied, this defect also means that if a user passes failover='singlebroker' with a broker list containing one entry, connectdelay is ignored. This can give the appearance that server reconnection is broken as the client can spin through thousands of retries before then broker has chance to restart. To workaround this issue, the user can omit the failover option from the URL and let the defaulting within FailoverPolicy class choose FailoverSingleServer if the brokerlist contains one entry, or FailoverRoundRobinServers if the list contains more than one. Client will default to singlebroker policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672' {code} Client will default to roundrobin policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672;tcp://localhost:5672' {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: [VOTE] Release 0.10
On 04/20/2011 02:38 PM, Rajith Attapattu wrote: While the root cause for both these items have been present in 0.8 (and perhaps before for QPID-3216) these issues are *more likely* to happen in the current release than in 0.8 In that sense they are regressions, and certainly from a users pov of they are. What is that based on? The fact that we've seen these test failures? Or identification of specific changes first included in 0.10 that make this worse? I think recent changes in the client and broker may have made these issues happen more likely. While r1092510 may have caused QPID-3216 to happen more readily there may be other triggers that can cause this as well. (Also please note that r1092510 is actually the correct behaviour and if thats causing a deadlock then it's a concern.) The point is r1092510 is not included in the current 0.10 release candidate. Same can be said for QPID-3214. The fact remains that we do have deadlocks lying around in the code and they have a better chance of happening with 0.10 ! Again, why do 'they have a better chance of happening with 0.10'? I'm not saying it is not true, and I'm not disagreeing the current locking 'strategy' seems very prone to deadlocks. I would just like to see a more concrete demonstration that there is regression. - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Updated] (QPID-3220) Specifying connection url option failover='singlebroker' causes the wrong failover policy to be used
[ https://issues.apache.org/jira/browse/QPID-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Keith Wall updated QPID-3220: - Description: If the user specifies the option failover='singlebroker' in the connection URL, a coding error in FailoverPolicy.java means that the client actually elects to use FailoverRoundRobinServers policy rather than FailoverSingleServer. For example, the following URL will cause the client to use the round-robin policy. {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672?retries='50'connectdelay='5000''failover='singlebroker' {code} Furthermore, as the logic FailoverRoundRobinServers.getNextBrokerDetails means that if the roundrobin policy is used with a brokerlist containing *one broker*, connectdelay is never applied, this defect also means that if a user passes failover='singlebroker' with a broker list containing one entry, connectdelay is ignored. This can give the appearance that server reconnection is broken as the client can spin through thousands of retries before then broker has chance to restart. To workaround this issue, the user can omit the failover option from the URL and let the defaulting within FailoverPolicy class choose FailoverSingleServer if the brokerlist contains one entry, or FailoverRoundRobinServers if the list contains more than one. Client will default to singlebroker policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672' {code} Client will default to roundrobin policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672;tcp://localhost:5672' {code} was: If the user specifies the option failover='singlebroker' in the connection URL, an coding error in FailoverPolicy.java means that the client actually elects to use FailoverRoundRobinServers policy rather than FailoverSingleServer. For example, the following URL will cause the client to use the round-robin policy. {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672?retries='50'connectdelay='5000''failover='singlebroker' {code} Furthermore, as the logic FailoverRoundRobinServers.getNextBrokerDetails means that if the roundrobin policy is used with a brokerlist containing *one broker*, connectdelay is never applied, this defect also means that if a user passes failover='singlebroker' with a broker list containing one entry, connectdelay is ignored. This can give the appearance that server reconnection is broken as the client can spin through thousands of retries before then broker has chance to restart. To workaround this issue, the user can omit the failover option from the URL and let the defaulting within FailoverPolicy class choose FailoverSingleServer if the brokerlist contains one entry, or FailoverRoundRobinServers if the list contains more than one. Client will default to singlebroker policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672' {code} Client will default to roundrobin policy: {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672;tcp://localhost:5672' {code} Specifying connection url option failover='singlebroker' causes the wrong failover policy to be used Key: QPID-3220 URL: https://issues.apache.org/jira/browse/QPID-3220 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.5, 0.10 Reporter: Keith Wall Priority: Minor If the user specifies the option failover='singlebroker' in the connection URL, a coding error in FailoverPolicy.java means that the client actually elects to use FailoverRoundRobinServers policy rather than FailoverSingleServer. For example, the following URL will cause the client to use the round-robin policy. {code} amqp://guest:guest@clientid/test?brokerlist='tcp://localhost:5672?retries='50'connectdelay='5000''failover='singlebroker' {code} Furthermore, as the logic FailoverRoundRobinServers.getNextBrokerDetails means that if the roundrobin policy is used with a brokerlist containing *one broker*, connectdelay is never applied, this defect also means that if a user passes failover='singlebroker' with a broker list containing one entry, connectdelay is ignored. This can give the appearance that server reconnection is broken as the client can spin through thousands of retries before then broker has chance to restart. To workaround this issue, the user can omit the failover option from the URL and let the defaulting within FailoverPolicy class choose FailoverSingleServer if the brokerlist contains one entry, or FailoverRoundRobinServers if the list contains more than one. Client will default to singlebroker policy: {code}
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022150#comment-13022150 ] Rajith Attapattu commented on QPID-3214: I am actually keen to fix the exception handling part rather than addressing the deadlock directly. I agree with Gordon that the exception handling code could be improved a bit. I am hoping to work out a patch soon. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573) -
[jira] [Created] (QPID-3221) Fix bug in Cluster::timerDrop - calling wrong function
Fix bug in Cluster::timerDrop - calling wrong function -- Key: QPID-3221 URL: https://issues.apache.org/jira/browse/QPID-3221 Project: Qpid Issue Type: Bug Components: C++ Clustering Affects Versions: 0.10 Reporter: Alan Conway Assignee: Alan Conway Fix For: 0.10 Fix obvious bug in Cluster::timerDrop, it calls the wrong function. On one occassion this has been seen to cause a cluster broker to go into an infinite loop sending itself timer-drop messages. There's no know way to reproduce but by inspection of the code the problem and the fix are obvious and trivial. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3221) Fix bug in Cluster::timerDrop - calling wrong function
[ https://issues.apache.org/jira/browse/QPID-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022153#comment-13022153 ] Alan Conway commented on QPID-3221: --- Fixed on trunk r1089953 Fix bug in Cluster::timerDrop - calling wrong function -- Key: QPID-3221 URL: https://issues.apache.org/jira/browse/QPID-3221 Project: Qpid Issue Type: Bug Components: C++ Clustering Affects Versions: 0.10 Reporter: Alan Conway Assignee: Alan Conway Fix For: 0.10 Fix obvious bug in Cluster::timerDrop, it calls the wrong function. On one occassion this has been seen to cause a cluster broker to go into an infinite loop sending itself timer-drop messages. There's no know way to reproduce but by inspection of the code the problem and the fix are obvious and trivial. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Resolved] (QPID-3221) Fix bug in Cluster::timerDrop - calling wrong function
[ https://issues.apache.org/jira/browse/QPID-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Conway resolved QPID-3221. --- Resolution: Fixed Fix bug in Cluster::timerDrop - calling wrong function -- Key: QPID-3221 URL: https://issues.apache.org/jira/browse/QPID-3221 Project: Qpid Issue Type: Bug Components: C++ Clustering Affects Versions: 0.10 Reporter: Alan Conway Assignee: Alan Conway Fix For: 0.10 Fix obvious bug in Cluster::timerDrop, it calls the wrong function. On one occassion this has been seen to cause a cluster broker to go into an infinite loop sending itself timer-drop messages. There's no know way to reproduce but by inspection of the code the problem and the fix are obvious and trivial. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Proposed fixes for 0.10 (was Re: What is our strategy for errata and/or hot fixes ?)
On 04/20/2011 09:43 AM, Alan Conway wrote: On 04/19/2011 02:31 PM, Alan Conway wrote: On 04/19/2011 12:26 PM, Rajith Attapattu wrote: Based on the comments so far it seems that delaying the 0.10 release is probably a better idea. I agree with Carl and Steve that erratas can be time consuming and may actually be detrimental to our overall goals. For now lets hold on to the release a bit and see what needs to be done to fix QPID-3214 QPID-3216. Based on that we can make a call to just release note it and fix it in 0.12 or whether we can safely fix them for 0.10 If we are holding up 0.10 release I'd like to propose these two fixes: QPID-3215 cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated QPID-3208 Exchanges make best effort to route messages if there is an error. They are not regressions but both can cause cluster brokers to shut down (and have for at least one user.) The fixes are low risk and already on the trunk. I'd like to add this trivial fix: QPID-3217 Exchanges with IVE option cause cluster inconsistencies in updatees All 3 are on trunk and have had code review. I'd like to add 2 more - this is the final list: QPID-3208 Exchanges make best effort to route messages if there is an error. QPID-3215 cached exchange reference can cause cluster inconsistencies if exchange is deleted/recreated (2) QPID-3217 Exchanges with IVE option cause cluster inconsistencies in updatees QPID-3221 Fix bug in Cluster::timerDrop - calling wrong function QPID-3202: Clustered broker shuts down with unknown connection error - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Created] (QPID-3222) Potentially TTL Overflow
Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Assigned] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gordon Sim reassigned QPID-3222: Assignee: Gordon Sim Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Created] (QPID-3223) Setting TTL to 0 internally set it forever
Setting TTL to 0 internally set it forever -- Key: QPID-3223 URL: https://issues.apache.org/jira/browse/QPID-3223 Project: Qpid Issue Type: Bug Components: C++ Client Reporter: Jerome Ajot Priority: Minor Setting the Message TTL is 0 via the Client API, sets it internally to never expired. The AMQP spec (0.10 and 1.0) does not define zero as a special value. An easy way to fix this will be to define TTL as a boost::optionalboost::uint64_t, that way it can be optional/never expire and have 0 as valid value. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Resolved] (QPID-3192) .NET Binding for Messaging classes are missing intrinsic copy contructors
[ https://issues.apache.org/jira/browse/QPID-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chuck Rolke resolved QPID-3192. --- Resolution: Fixed New copy constructors work in cpp/clr. .NET Binding for Messaging classes are missing intrinsic copy contructors - Key: QPID-3192 URL: https://issues.apache.org/jira/browse/QPID-3192 Project: Qpid Issue Type: Bug Reporter: Chuck Rolke Assignee: Chuck Rolke Fix For: 0.10 Apparent copy constructors for each class are defined using 'T(const T ^)' but these are ordinary contructors. Intrinsic copy constructors use 'T(const T %)'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: [VOTE] Release 0.10
On Wed, Apr 20, 2011 at 10:12 AM, Gordon Sim g...@redhat.com wrote: On 04/20/2011 02:38 PM, Rajith Attapattu wrote: While the root cause for both these items have been present in 0.8 (and perhaps before for QPID-3216) these issues are *more likely* to happen in the current release than in 0.8 In that sense they are regressions, and certainly from a users pov of they are. What is that based on? The fact that we've seen these test failures? Or identification of specific changes first included in 0.10 that make this worse? All commits related to QPID-3214 are in 0.10 and in 0.8 as well (I stand to be corrected on this). But somehow this became more visible now. My suspicion is that some other changes in that time frame has made this issue more likely. Unfortunately I am unable to pin point what exactly those changes are. The fact that our tests are failing is not helping either. I think recent changes in the client and broker may have made these issues happen more likely. While r1092510 may have caused QPID-3216 to happen more readily there may be other triggers that can cause this as well. (Also please note that r1092510 is actually the correct behaviour and if thats causing a deadlock then it's a concern.) The point is r1092510 is not included in the current 0.10 release candidate. Correct and one reason why it wasn't was bcos I wasn't sure about it's consequences and nobody seems to know why the existing code was done that way. However that is just one code path that caused this deadlock, there can be others and bcos of test failures we are not sure. Perhaps I am a bit pessimistic here, but then it's always better be safer than sorry. Same can be said for QPID-3214. The fact remains that we do have deadlocks lying around in the code and they have a better chance of happening with 0.10 ! Again, why do 'they have a better chance of happening with 0.10'? I'm not saying it is not true, and I'm not disagreeing the current locking 'strategy' seems very prone to deadlocks. I would just like to see a more concrete demonstration that there is regression. Unfortunately I am unable to pin point to certain commits to backup my assertions. That is one reason why I didn't want to hold up the release and didn't make any of the two JIRA's as blockers. But it's still makes me a bit uncomfortable. Rajith - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Review Request: QPID-3222: Fix ttl overflow
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022214#comment-13022214 ] jirapos...@reviews.apache.org commented on QPID-3222: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- Ship it! The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? - Alan On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1301#comment-1301 ] jirapos...@reviews.apache.org commented on QPID-3222: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- Ship it! The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? - Alan On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
On 2011-04-20 16:59:07, Alan Conway wrote: The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? Actually looking at the BZ, the real issue is giving FOREVER the correct special treatment at each step, it's not really an overflow issue at all. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1308#comment-1308 ] jirapos...@reviews.apache.org commented on QPID-3222: - bq. On 2011-04-20 16:59:07, Alan Conway wrote: bq. The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? Actually looking at the BZ, the real issue is giving FOREVER the correct special treatment at each step, it's not really an overflow issue at all. - Alan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
On 2011-04-20 16:59:07, Alan Conway wrote: The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? Alan Conway wrote: Actually looking at the BZ, the real issue is giving FOREVER the correct special treatment at each step, it's not really an overflow issue at all. FOREVER is indeed the most important case. However as the value on the wire is uint64 (and is in ms), we need to make sure that no positive value that is sent is turned into a negative value and thus immediately marks the messages as expired. It is indeed very unlikely that anyone would have a real need to use anything other than FOREVER that would cause this, but other values could trigger the same thing, so why not fix the overflow entirely. - Gordon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
Re: Review Request: QPID-3222: Fix ttl overflow
On 04/20/2011 01:14 PM, Gordon Sim wrote: On 2011-04-20 16:59:07, Alan Conway wrote: The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? Alan Conway wrote: Actually looking at the BZ, the real issue is giving FOREVER the correct special treatment at each step, it's not really an overflow issue at all. FOREVER is indeed the most important case. However as the value on the wire is uint64 (and is in ms), we need to make sure that no positive value that is sent is turned into a negative value and thus immediately marks the messages as expired. It is indeed very unlikely that anyone would have a real need to use anything other than FOREVER that would cause this, but other values could trigger the same thing, so why not fix the overflow entirely. Agreed, it does no harm. - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022234#comment-13022234 ] jirapos...@reviews.apache.org commented on QPID-3222: - bq. On 2011-04-20 16:59:07, Alan Conway wrote: bq. The code looks OK, but do we really need to worry about people setting a TTL of more than 17 billion years? bq. bq. Alan Conway wrote: bq. Actually looking at the BZ, the real issue is giving FOREVER the correct special treatment at each step, it's not really an overflow issue at all. FOREVER is indeed the most important case. However as the value on the wire is uint64 (and is in ms), we need to make sure that no positive value that is sent is turned into a negative value and thus immediately marks the messages as expired. It is indeed very unlikely that anyone would have a real need to use anything other than FOREVER that would cause this, but other values could trigger the same thing, so why not fix the overflow entirely. - Gordon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review501 --- On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review506 --- 1. To be complete the equality tests must propagate to the .NET binding as well. See patch below. 2. This patch changes the API/ABI a little does it not? For the .NET case assume you have Duration A(100) and Duration B(100). Before this patch A==B is false and after this patch A==B is true. How can we sell that? -Chuck Index: qpid/cpp/bindings/qpid/dotnet/src/Duration.h === --- qpid/cpp/bindings/qpid/dotnet/src/Duration.h(revision 1089977) +++ qpid/cpp/bindings/qpid/dotnet/src/Duration.h(working copy) @@ -81,8 +81,18 @@ Duration ^ result = gcnew Duration(multiplier * dur-Milliseconds); return result; } - }; +static bool operator == (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds == b-Milliseconds; +} + +static bool operator != (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds != b-Milliseconds; +} +}; + public ref class DurationConstants sealed { private: - Chug On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022265#comment-13022265 ] jirapos...@reviews.apache.org commented on QPID-3222: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review506 --- 1. To be complete the equality tests must propagate to the .NET binding as well. See patch below. 2. This patch changes the API/ABI a little does it not? For the .NET case assume you have Duration A(100) and Duration B(100). Before this patch A==B is false and after this patch A==B is true. How can we sell that? -Chuck Index: qpid/cpp/bindings/qpid/dotnet/src/Duration.h === --- qpid/cpp/bindings/qpid/dotnet/src/Duration.h(revision 1089977) +++ qpid/cpp/bindings/qpid/dotnet/src/Duration.h(working copy) @@ -81,8 +81,18 @@ Duration ^ result = gcnew Duration(multiplier * dur-Milliseconds); return result; } - }; +static bool operator == (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds == b-Milliseconds; +} + +static bool operator != (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds != b-Milliseconds; +} +}; + public ref class DurationConstants sealed { private: - Chug On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
On 2011-04-20 18:04:37, Chug Rolke wrote: 1. To be complete the equality tests must propagate to the .NET binding as well. See patch below. 2. This patch changes the API/ABI a little does it not? For the .NET case assume you have Duration A(100) and Duration B(100). Before this patch A==B is false and after this patch A==B is true. How can we sell that? -Chuck Index: qpid/cpp/bindings/qpid/dotnet/src/Duration.h === --- qpid/cpp/bindings/qpid/dotnet/src/Duration.h(revision 1089977) +++ qpid/cpp/bindings/qpid/dotnet/src/Duration.h(working copy) @@ -81,8 +81,18 @@ Duration ^ result = gcnew Duration(multiplier * dur-Milliseconds); return result; } - }; +static bool operator == (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds == b-Milliseconds; +} + +static bool operator != (Duration ^ a, Duration ^ b) +{ +return a-Milliseconds != b-Milliseconds; +} +}; + public ref class DurationConstants sealed { private: As far as c++ API/ABI goes this is purely additive (it is a change but can't break anything). Prior to this change comparing two durations for equality would fail to compile due to the lack of equality operator. I would expect though that the semantic change from a .NET perspective would not only be acceptable but actually desired, no? - Gordon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review506 --- On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022282#comment-13022282 ] jirapos...@reviews.apache.org commented on QPID-3222: - bq. On 2011-04-20 18:04:37, Chug Rolke wrote: bq. 1. To be complete the equality tests must propagate to the .NET binding as well. See patch below. bq. bq. 2. This patch changes the API/ABI a little does it not? bq. For the .NET case assume you have Duration A(100) and Duration B(100). bq. Before this patch A==B is false and after this patch A==B is true. bq. How can we sell that? bq. bq. -Chuck bq. bq. bq. Index: qpid/cpp/bindings/qpid/dotnet/src/Duration.h bq. === bq. --- qpid/cpp/bindings/qpid/dotnet/src/Duration.h (revision 1089977) bq. +++ qpid/cpp/bindings/qpid/dotnet/src/Duration.h (working copy) bq. @@ -81,8 +81,18 @@ bq.Duration ^ result = gcnew Duration(multiplier * dur-Milliseconds); bq.return result; bq.} bq. -}; bq. bq. +static bool operator == (Duration ^ a, Duration ^ b) bq. +{ bq. +return a-Milliseconds == b-Milliseconds; bq. +} bq. + bq. +static bool operator != (Duration ^ a, Duration ^ b) bq. +{ bq. +return a-Milliseconds != b-Milliseconds; bq. +} bq. +}; bq. + bq.public ref class DurationConstants sealed bq.{ bq.private: bq. As far as c++ API/ABI goes this is purely additive (it is a change but can't break anything). Prior to this change comparing two durations for equality would fail to compile due to the lack of equality operator. I would expect though that the semantic change from a .NET perspective would not only be acceptable but actually desired, no? - Gordon --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review506 --- On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Review Request: QPID-3222: Fix ttl overflow
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review510 --- Ship it! Agreed. I'll patch the .NET file if you don't. - Chug On 2011-04-20 16:47:00, Gordon Sim wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/ --- (Updated 2011-04-20 16:47:00) Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. Summary --- Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). This addresses bug QPID-3222. https://issues.apache.org/jira/browse/QPID-3222 Diffs - /trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 /trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 /trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 /trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 Diff: https://reviews.apache.org/r/627/diff Testing --- New test added, make check passes. Thanks, Gordon
[jira] [Commented] (QPID-3185) Wrong help text for qpid-config
[ https://issues.apache.org/jira/browse/QPID-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022321#comment-13022321 ] Jonathan Robie commented on QPID-3185: -- Further fixes have been made; see http://svn.apache.org/viewvc?view=revisionrevision=1094727 for the current version. Wrong help text for qpid-config --- Key: QPID-3185 URL: https://issues.apache.org/jira/browse/QPID-3185 Project: Qpid Issue Type: Bug Components: python tools Affects Versions: 0.10 Reporter: Jonathan Robie Fix For: 0.10 Help text for qpid-config is wrong for --max-queue-size, --max-queue-count. This is fixed in 1087706: http://svn.apache.org/viewvc?rev=1087706view=rev -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3222) Potentially TTL Overflow
[ https://issues.apache.org/jira/browse/QPID-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022322#comment-13022322 ] jirapos...@reviews.apache.org commented on QPID-3222: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/627/#review510 --- Ship it! Agreed. I'll patch the .NET file if you don't. - Chug On 2011-04-20 16:47:00, Gordon Sim wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/627/ bq. --- bq. bq. (Updated 2011-04-20 16:47:00) bq. bq. bq. Review request for qpid, Andrew Stitcher, Alan Conway, Chug Rolke, and Steve Huston. bq. bq. bq. Summary bq. --- bq. bq. Fixes ttl overflow on the broker. Added equality operator for qpid::messaging::Duration (needed it in test and seemed generally valuable). bq. bq. bq. This addresses bug QPID-3222. bq. https://issues.apache.org/jira/browse/QPID-3222 bq. bq. bq. Diffs bq. - bq. bq./trunk/qpid/cpp/include/qpid/messaging/Duration.h 1090157 bq./trunk/qpid/cpp/src/qpid/broker/Message.cpp 1090157 bq./trunk/qpid/cpp/src/qpid/messaging/Duration.cpp 1090157 bq./trunk/qpid/cpp/src/tests/MessagingSessionTests.cpp 1090157 bq. bq. Diff: https://reviews.apache.org/r/627/diff bq. bq. bq. Testing bq. --- bq. bq. New test added, make check passes. bq. bq. bq. Thanks, bq. bq. Gordon bq. bq. Potentially TTL Overflow Key: QPID-3222 URL: https://issues.apache.org/jira/browse/QPID-3222 Project: Qpid Issue Type: Bug Components: C++ Broker Reporter: Jerome Ajot Assignee: Gordon Sim Labels: ttl When a message TTL is set by the client to Duration::FOREVER, the message is not reachable anymore. Every addition and multiplication to the TTL/Expiration Time should check to avoid uint64_t overflow. For example: broker/Message.cpp: Message::setTimestamp() overflows the uint64_t. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022341#comment-13022341 ] Rajith Attapattu commented on QPID-3214: After looking at the code, I believe the exception handling is not exactly correct. 1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states, If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. So certainly execution exceptions which only invalidates the session should not be notified via the exception listener. 2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me. It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic. if (hardError(cause)) { closer = (!_closed.getAndSet(true)) || closer; { _logger.info(Closing AMQConnection due to : + cause); } } Digging further for any AMQException hardError method will return true. Therefore we need to rework that piece of code. 3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there. Therefore we definitely do not need to notify anything from the session. I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place. For now I believe we need to do two things. 1. Remove the following lines from AMQSession_0_10.setCurrentException(SessionException se) method AMQException amqe = new AMQException(AMQConstant.getConstant(code), se.getMessage(), se.getCause()); _connection.exceptionReceived(amqe); 2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at
[jira] [Issue Comment Edited] (QPID-3185) Wrong help text for qpid-config
[ https://issues.apache.org/jira/browse/QPID-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022321#comment-13022321 ] Jonathan Robie edited comment on QPID-3185 at 4/20/11 8:06 PM: --- Further fixes have been made; see http://svn.apache.org/viewvc?view=revisionrevision=1094727 for the current version. This includes: 1. Fixing the text for --broker-addr, which had the text for the wrong option 2. Fixing the name of --force-if-used, which had been mistakenly renamed --force-if-not-used was (Author: jonathan.robie): Further fixes have been made; see http://svn.apache.org/viewvc?view=revisionrevision=1094727 for the current version. Wrong help text for qpid-config --- Key: QPID-3185 URL: https://issues.apache.org/jira/browse/QPID-3185 Project: Qpid Issue Type: Bug Components: python tools Affects Versions: 0.10 Reporter: Jonathan Robie Fix For: 0.10 Help text for qpid-config is wrong for --max-queue-size, --max-queue-count. This is fixed in 1087706: http://svn.apache.org/viewvc?rev=1087706view=rev -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
Re: Proposed fixes for 0.10 (was Re: What is our strategy for errata and/or hot fixes ?)
I have one change for this list: QPID-3185, see the second comment qpid-config has two fixes on trunk that are not in the 0.10 release candidate: 1. Fixing the text for --broker-addr, which had the text for the wrong option 2. Fixing the name of --force-if-used, which had been mistakenly renamed --force-if-not-used Jonathan - Apache Qpid - AMQP Messaging Implementation Project: http://qpid.apache.org Use/Interact: mailto:dev-subscr...@qpid.apache.org
[jira] [Issue Comment Edited] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022341#comment-13022341 ] Rajith Attapattu edited comment on QPID-3214 at 4/20/11 8:34 PM: - After looking at the code, I believe the exception handling is not exactly correct. 1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states, If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. So certainly execution exceptions which only invalidates the session should not be notified via the exception listener. 2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me. It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic. if (hardError(cause)) { closer = (!_closed.getAndSet(true)) || closer; { _logger.info(Closing AMQConnection due to : + cause); } } Digging further for any AMQException hardError method will return true. Therefore we need to rework that piece of code. 3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there. Therefore we definitely do not need to notify anything from the session. I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place. For now I believe we need to do two things. 1. Remove the following line from AMQSession_0_10.setCurrentException(SessionException se) method _connection.exceptionReceived(amqe); 2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method. was (Author: rajith): After looking at the code, I believe the exception handling is not exactly correct. 1. For starters we shouldn't really notify the connection's exception listener when there is a session level error. If we avoid that we can certainly get rid of the deadlock. The JMS API doc clearly states, If a JMS provider detects a serious problem with a Connection object, it informs the Connection object's ExceptionListener, if one has been registered. It does this by calling the listener's onException method, passing it a JMSException argument describing the problem. So certainly execution exceptions which only invalidates the session should not be notified via the exception listener. 2. Going further if one looks at the AMQConnection.java exceptionReceived() method, it again looks wrong to me. It seems that an execution exception can in fact close the underlying connection (and the rest of the sessions) due to the following piece of logic. if (hardError(cause)) { closer = (!_closed.getAndSet(true)) || closer; { _logger.info(Closing AMQConnection due to : + cause); } } Digging further for any AMQException hardError method will return true. Therefore we need to rework that piece of code. 3. The 0-10 code client (if using the Java IO transport) has a separate code path to notify the Connection if a connection level error is there. Therefore we definitely do not need to notify anything from the session. I believe the MINA transport notifies the IOException etc via the ProtocolSession, perhaps the reason for this code in the first place. For now I believe we need to do two things. 1. Remove the following lines from AMQSession_0_10.setCurrentException(SessionException se) method AMQException amqe = new AMQException(AMQConstant.getConstant(code), se.getMessage(), se.getCause()); _connection.exceptionReceived(amqe); 2. Fix the AMQConnection.isHardError method and the AMQException.isHardError method. Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the
Review Request: QPID-3216 - suggested fix for the deadlock issue.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/642/ --- Review request for qpid. Summary --- Suggested fix for https://issues.apache.org/jira/browse/QPID-3216 Please note we may also need to fix the exceptionReceived method in AMQConnection.java for reasons outlined in the above JIRA. Diffs - http://svn.apache.org/repos/asf/qpid/trunk/qpid/java/client/src/main/java/org/apache/qpid/client/AMQSession_0_10.java 1095508 Diff: https://reviews.apache.org/r/642/diff Testing --- Thanks, rajith
[jira] [Commented] (QPID-3214) Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java)
[ https://issues.apache.org/jira/browse/QPID-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13022399#comment-13022399 ] Rajith Attapattu commented on QPID-3214: I used review board to post the patch just to get the hang of it. https://reviews.apache.org/r/642/ Deadlock between the failover mutex (in AMQConnection.java) and the current_exception_lock (in AMQSession.java) --- Key: QPID-3214 URL: https://issues.apache.org/jira/browse/QPID-3214 Project: Qpid Issue Type: Bug Components: Java Client Affects Versions: 0.10 Reporter: Rajith Attapattu Assignee: Rajith Attapattu Priority: Critical Fix For: 0.11 Attachments: QPID-3214.patch As per the following thread dump you can clearly see the deadlock between the failover mutex in AMQConnection.java and the current_exception_lock in AMQSession.java This is a regression and was introduced in rev 985262 Found one Java-level deadlock: = IoReceiver - localhost/127.0.0.1:15672: waiting to lock monitor 0x002ac2ea3b70 (object 0x002ab70156b0, a java.lang.Object), which is held by main main: waiting to lock monitor 0x002ac28db1b8 (object 0x002ab7048d70, a java.lang.Object), which is held by IoReceiver - localhost/127.0.0.1:15672 Java stack information for the threads listed above: === IoReceiver - localhost/127.0.0.1:15672: at org.apache.qpid.client.AMQConnection.exceptionReceived(AMQConnection.java:1297) - waiting to lock0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1033) - locked0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.AMQSession_0_10.exception(AMQSession_0_10.java:913) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:156) at org.apache.qpid.transport.SessionDelegate.executionException(SessionDelegate.java:32) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:50) at org.apache.qpid.transport.SessionDelegate.command(SessionDelegate.java:32) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Session.received(Session.java:528) at org.apache.qpid.transport.Connection.dispatch(Connection.java:404) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:64) at org.apache.qpid.transport.ConnectionDelegate.handle(ConnectionDelegate.java:40) at org.apache.qpid.transport.MethodDelegate.executionException(MethodDelegate.java:110) at org.apache.qpid.transport.ExecutionException.dispatch(ExecutionException.java:112) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:54) at org.apache.qpid.transport.ConnectionDelegate.command(ConnectionDelegate.java:40) at org.apache.qpid.transport.Method.delegate(Method.java:159) at org.apache.qpid.transport.Connection.received(Connection.java:369) at org.apache.qpid.transport.Connection.received(Connection.java:59) at org.apache.qpid.transport.network.Assembler.emit(Assembler.java:95) at org.apache.qpid.transport.network.Assembler.assemble(Assembler.java:196) at org.apache.qpid.transport.network.Assembler.frame(Assembler.java:129) at org.apache.qpid.transport.network.Frame.delegate(Frame.java:133) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:100) at org.apache.qpid.transport.network.Assembler.received(Assembler.java:42) at org.apache.qpid.transport.network.InputHandler.next(InputHandler.java:187) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:103) at org.apache.qpid.transport.network.InputHandler.received(InputHandler.java:42) at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:128) at java.lang.Thread.run(Thread.java:619) main: at org.apache.qpid.client.AMQSession_0_10.setCurrentException(AMQSession_0_10.java:1025) - waiting to lock0x002ab7048d70 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer_0_10.sendCancel(BasicMessageConsumer_0_10.java:193) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:573) - locked0x002ab70156b0 (a java.lang.Object) at org.apache.qpid.client.BasicMessageConsumer.close(BasicMessageConsumer.java:535)