Oh, our messages overlapped... Your questions:
"... doing this queue to queue work using one or two ActiveMQ brokers?" => One broker "... you may want to try camel-sims" => I guess you mean Camel sJms, that's the closest match I found in the list of Camel components on GitHub :-) Never heard (ok, it is only since 2.11), but I will have a look at it. "If you’d be using XA in the real world..." => No, we don't use XA But your hint with "transacted = false" works! I was able to run the "standard" version 5 times in a row and it was always successful. Currently it looks like either one has to define the whole Tx stuff and set transacted = false OR to use the simple config with transacted = true. At least I learned that I have not real understanding what this flag does. And I claim that most examples use transacted = true even when they define Tx manager etc. Regards Stephan On Mon, Feb 8, 2016 at 4:59 PM, Stephan Burkard <sburk...@gmail.com> wrote: > Hi Quinn > > Here is the new version of my test project that uses an embedded ActiveMQ > broker. Since the AMQ libs are of version 5.9.0 (standard edition) there is > no more special Redhat version. > > There is a new BrokerManagementExecutor that is configured to stop the > broker 5 seconds after the test starts. On my machine the broker shutdown > happens after about 400 messages are sent. Some seconds after the broker is > stopped, it starts again and the test finishes. I guess the automagical > broker restart is due to the vm-transport used. > > Results on my machine: > > 1. The "noTxManager" version still never fails, so it never loses a > message between the queues. It often misses one or two messages, but always > on both queues. So these messages could not be sent from the client, but > all messages that arrived at the first queue also arrive at the second > queue. > > 2. The "standard" version fails on almost every attempt, it mostly loses > one or two messages between the queues. So these messags arrived at the > first queue but not on the second one. > > Since the embedded broker is stopped at the end of the test, I added an > additional Camel route that consumes the default DLQ. I added the messages > arrived at the DLQ in the test summary output. But on my machine I never > had messages in it. > > I hope you can reproduce the problem now more easily. > > Regards > Stephan > > > > > > > On Sat, Feb 6, 2016 at 10:32 PM, Stephan Burkard <sburk...@gmail.com> > wrote: > >> Hi Quinn >> >> I don't think that you need to match exactly my broker version. I had >> first discovered this issue on ActiveMQ 5.9.0 standard edition. I guess >> that simply every broker version suffers from this. I really don't think it >> is an ActiveMQ problem. It is according to Redhat a Spring JMS problem. >> >> No, I never tried to use an embedded broker. Probably because I used >> remote brokers when I discovered the problem during Master-Slave failover >> tests. I will try to rewrite the test project to use an embedded broker >> that can be stopped and started as part of the test. >> >> Yes, that's what I meant. That the remote broker increases the >> probability to show the issue. Because when the analysis of Redhat was >> correct, it is really a timing issue. You can also increase the chance for >> the issue if you produce even more messages per second. That increases the >> probability that a message falls just into the problematic time slice where >> the consumer has committed but not the producer. >> >> Yes, that's right. I start the test and when I see lots of console output >> I hit enter on the console where the stop command of the broker has waited. >> Then I wait about 5 to 10 seconds and then I start the broker again. The >> test reconnects and continues. >> >> Regards >> Stephan >> >> >> >> >> >> On Fri, Feb 5, 2016 at 7:40 PM, Quinn Stevenson < >> qu...@pronoia-solutions.com> wrote: >> >>> Stephan - >>> >>> I’ll get a broker running and try to match your version - I think I can >>> get it from one of my customers whose running Fuse 6.2. >>> >>> While I do that - have you considered trying to reproduce this using an >>> embedded broker that the test could control? It would make it much easier >>> to reproduce. >>> >>> I don’t think running the broker locally vs remotely should increase any >>> probably of losing messages - we shouldn’t lose any as long as the >>> configuration is correct. It may increase the probably of an issue, but we >>> shouldn’t lose messages. >>> >>> Also, just to confirm - when you’re testing this you are >>> stopping/starting the broker in the middle of the test, not killing and >>> restarting the broker - correct? >>> >>> >>> > On Feb 5, 2016, at 12:37 AM, Stephan Burkard <sburk...@gmail.com> >>> wrote: >>> > >>> > Hi Quinn >>> > >>> > I just tested the POM changes you posted and the second run failed >>> (without >>> > failover-URL). I then tested with the failover-URL and the third >>> attempt >>> > failed. >>> > >>> > The latter is no big surprise since I discovered the problem during >>> > failover tests in a master-slave-config. I then reduced the setup to a >>> > single broker environment and it was still there. >>> > >>> > My test broker is apache-activemq-5.11.0.redhat-620133, a patched >>> Redhat >>> > version of AMQ 5.11. As you, I also don't change the AMQ version >>> number in >>> > the POM, I just use a newer broker than the library version. My broker >>> runs >>> > on another machine than the test. Perhaps this increases the >>> probability of >>> > losing a message? >>> > >>> > Regards >>> > Stephan >>> > >>> > >>> > >>> > >>> > On Thu, Feb 4, 2016 at 7:06 PM, Quinn Stevenson < >>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com> >>> >> wrote: >>> > >>> >> I tested this with a 5.9.0 broker and I am seeing messages dropped >>> with >>> >> the TxText, but I still have to use the failover URL or the test just >>> stops >>> >> after the broker is restarted. >>> >> >>> >> I don’t have a 5.9.1 broker to test with, so I don’t know if that >>> would >>> >> help, but the next oldest broker I have is 5.10.1, and it seems to be >>> >> working with that broker. >>> >> >>> >> NOTE: I’m not changing the activemq-version in the POM when I change >>> the >>> >> broker version - I’m just starting a different broker (locally) on >>> the same >>> >> port. >>> >> >>> >> >>> >>> On Feb 4, 2016, at 10:41 AM, Quinn Stevenson < >>> >> qu...@pronoia-solutions.com> wrote: >>> >>> >>> >>> I still can’t make either test drop messages between the input and >>> the >>> >> output queue with the POM changes I sent, but I did find one >>> difference >>> >> between what you’ve done and what I normally do that changes the >>> output I’m >>> >> seeing - I always use a failover URL >>> >>> >>> >>> <property name="brokerURL" >>> >> >>> value="failover:(tcp://localhost:61616?wireFormat.tightEncodingEnabled=false >>> >> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false >>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>>)"/> >>> >>> >>> >>> My test broker is v 5.10.1 as well - I’ll see if it makes any >>> difference >>> >> with 5.9.0 >>> >>> >>> >>> >>> >>> >>> >>>> On Feb 4, 2016, at 9:52 AM, Quinn Stevenson < >>> >> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com> >>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>>> >>> wrote: >>> >>>> >>> >>>> It is strange - I’m trying to compare what you have in the >>> “standard” >>> >> version to what I did before. We tested our configs pretty heavily >>> under >>> >> all sorts of strange conditions to verify we weren’t looking >>> messages, but >>> >> we were using newer versions of Camel and ActiveMQ. >>> >>>> >>> >>>> So we’re on the same page - can you try your tests again with POM >>> >> dependencies that look something like this? >>> >>>> >>> >>>> <properties> >>> >>>> <camel-version>2.12.5</camel-version> >>> >>>> <activemq-version>5.9.0</activemq-version> >>> >>>> </properties> >>> >>>> >>> >>>> <dependencies> >>> >>>> <dependency> >>> >>>> <groupId>org.apache.activemq</groupId> >>> >>>> <artifactId>activemq-all</artifactId> >>> >>>> <version>${activemq-version}</version> >>> >>>> </dependency> >>> >>>> <dependency> >>> >>>> <groupId>org.apache.activemq</groupId> >>> >>>> <artifactId>activemq-pool</artifactId> >>> >>>> <version>${activemq-version}</version> >>> >>>> </dependency> >>> >>>> >>> >>>> <dependency> >>> >>>> <groupId>org.apache.camel</groupId> >>> >>>> <artifactId>camel-spring</artifactId> >>> >>>> <version>${camel-version}</version> >>> >>>> </dependency> >>> >>>> <dependency> >>> >>>> <groupId>org.apache.camel</groupId> >>> >>>> <artifactId>camel-jms</artifactId> >>> >>>> <version>${camel-version}</version> >>> >>>> </dependency> >>> >>>> >>> >>>> <dependency> >>> >>>> <groupId>org.apache.camel</groupId> >>> >>>> <artifactId>camel-test-spring</artifactId> >>> >>>> <version>${camel-version}</version> >>> >>>> <scope>test</scope> >>> >>>> </dependency> >>> >>>> >>> >>>> <dependency> >>> >>>> <groupId>commons-collections</groupId> >>> >>>> <artifactId>commons-collections</artifactId> >>> >>>> <version>3.2.1</version> >>> >>>> <scope>test</scope> >>> >>>> </dependency> >>> >>>> <dependency> >>> >>>> <groupId>org.hamcrest</groupId> >>> >>>> <artifactId>hamcrest-integration</artifactId> >>> >>>> <version>1.3</version> >>> >>>> <scope>test</scope> >>> >>>> </dependency> >>> >>>> >>> >>>> </dependencies> >>> >>>> >>> >>>> >>> >>>> >>> >>>>> On Feb 4, 2016, at 9:49 AM, Stephan Burkard <sburk...@gmail.com >>> <mailto:sburk...@gmail.com> >>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote: >>> >>>>> >>> >>>>> Hi Quinn >>> >>>>> >>> >>>>> The "standard" version is the big mystery. As I stated in my first >>> >> post, a >>> >>>>> Redhat engineer analysed a similar project (with less book-keeping >>> and >>> >>>>> logging stuff) and his conclusion was that as soon as a transaction >>> >> manager >>> >>>>> is explicitly defined, Spring JMS Template (that is used by Camel >>> >> under the >>> >>>>> hood) creates two of them by bug, by accident or just by strange >>> >> behaviour. >>> >>>>> >>> >>>>> This conclusion was quite suprising since that meant that all our >>> >> Camel-JMS >>> >>>>> project are theoretically suffering from message loss. >>> >>>>> >>> >>>>> The "no-tx" version should definitely be OK, see also CAMEL-5055 >>> for >>> >> the " >>> >>>>> lazyCreateTransactionManager" flag. The JMS transaction manager may >>> >> not be >>> >>>>> defined but it creates one implicitly because of "transacted = >>> true". >>> >>>>> >>> >>>>> The two "flaws" you mentioned are perhaps an issue. It would be >>> somehow >>> >>>>> calming if it is my project who has a flaw. >>> >>>>> >>> >>>>> Regards >>> >>>>> Stephan >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> On Thu, Feb 4, 2016 at 4:44 PM, Quinn Stevenson < >>> >> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com> >>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com >>> >> >>> >>>>>> wrote: >>> >>>>> >>> >>>>>> I’m still going through the project, but the first couple of >>> things >>> >> that >>> >>>>>> jump out at me are you have two Spring versions - the one you >>> >> explicitly >>> >>>>>> put in your POM (3.2.8.RELEASE) and the one pulled in by >>> camel-spring >>> >>>>>> (3.2.11.RELEASE). Also, camel-spring should be included in the >>> POM >>> >> since >>> >>>>>> you’re using Spring routes. I’m not sure if that’s enough to >>> cause >>> >> issues >>> >>>>>> or not. >>> >>>>>> >>> >>>>>> I believe what’s going on with the “no-tx” version is you’re >>> actually >>> >>>>>> using JMS transactions since you still have transacted set to >>> true in >>> >> the >>> >>>>>> JmsConfiguration. >>> >>>>>> >>> >>>>>> I’m not sure what’s going in with the “standard” version - it >>> looks >>> >>>>>> similar to some XA stuff I’ve setup before (because I had multiple >>> >> brokers >>> >>>>>> involved) except I had to use XA Connection Factories. >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>>> On Feb 3, 2016, at 3:12 PM, Stephan Burkard <sburk...@gmail.com >>> <mailto:sburk...@gmail.com> >>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote: >>> >>>>>>> >>> >>>>>>> Yes, same broker. There is only one ActiveMQ connection config >>> in the >>> >>>>>>> project. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Wed, Feb 3, 2016 at 8:00 PM, Quinn Stevenson < >>> >>>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com> >>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com >>> >> >>> >>>>>>>> wrote: >>> >>>>>>> >>> >>>>>>>> Are both the source and destination queues hosted by the same >>> >> ActiveMQ >>> >>>>>>>> broker? >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>>> On Feb 3, 2016, at 8:21 AM, Stephan Burkard < >>> sburk...@gmail.com <mailto:sburk...@gmail.com> >>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> >>> >>>>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>> Hi >>> >>>>>>>>> >>> >>>>>>>>> I have built a small Maven project (attached) to demonstrate a >>> JMS >>> >>>>>>>> transaction problem in Camel routes under certain load >>> conditions. >>> >> In >>> >>>>>> fact >>> >>>>>>>> I am losing messages between two queues. >>> >>>>>>>>> >>> >>>>>>>>> The project contains two different flavours of the same test. >>> One >>> >> of >>> >>>>>>>> them suffers from the problem, the other (due to my tests) not. >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** What does the testcase? >>> >>>>>>>>> 1. Produces 1000 messages (100/s) and sends them to an "input" >>> >> queue. >>> >>>>>>>>> 2. Sends the messages from the "input" queue to an "output" >>> queue. >>> >>>>>>>>> 3. Finally consumes the messages from the "output" queue to >>> count >>> >> them. >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** What is the difference between the two test flavours? >>> >>>>>>>>> - There is a "standard" flavour that suffers from the problem >>> >>>>>>>>> - And there is a "noTxManager" flavour that seems to not have >>> the >>> >>>>>> problem >>> >>>>>>>>> - The "standard" flavour is kind of a well known Camel/ActiveMQ >>> >>>>>>>> configuration >>> >>>>>>>>> - with a Spring transaction manager >>> >>>>>>>>> - with a Spring transaction policy >>> >>>>>>>>> - With a "transacted" flag in Camel routes >>> >>>>>>>>> - The "noTxManager" flavour is a "simple" configuration >>> >>>>>>>>> - no Spring transaction manager >>> >>>>>>>>> - no Spring transaction policy >>> >>>>>>>>> - no "transacted" flag in Camel routes >>> >>>>>>>>> - BUT: "lazyCreateTransactionManager" = false (so routes are >>> >>>>>>>> transacted too) >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** How to run the testcases? >>> >>>>>>>>> 1. Replace "[yourBrokerHost]" with the hostname of your >>> ActiveMQ >>> >> broker >>> >>>>>>>>> 2. Run the testcase as JUnit test >>> >>>>>>>>> 3. When you see lots of console messages that messages are >>> sent, >>> >> stop >>> >>>>>>>> your ActiveMQ broker (do not kill-9 it, just shut it down >>> normally) >>> >>>>>>>>> 4. Exceptions are thrown on the console output >>> >>>>>>>>> 5. After some seconds start your broker again >>> >>>>>>>>> 6. The test finish normally and after some seconds dumps a book >>> >> keeping >>> >>>>>>>> on the console >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** How to interpret the results? >>> >>>>>>>>> - When the test is successful, no message is lost. You can run >>> the >>> >> test >>> >>>>>>>> without broker shutdown/startup and it will obviously always be >>> >>>>>> successful. >>> >>>>>>>>> - When the test fails, one or more messages are lost between >>> queue >>> >>>>>>>> "input" and "output". In my tests I was not able to run the >>> >> "standard" >>> >>>>>>>> flavour three times in a row successfully. About every second >>> run >>> >>>>>> failed. >>> >>>>>>>> In contrast, the "noTxManager" flavour never failed in my tests. >>> >>>>>>>>> >>> >>>>>>>>> The book keeping for a failed test looks like the following. In >>> >> this >>> >>>>>>>> example Message number 281 is arrived at the input queue but >>> not at >>> >> the >>> >>>>>>>> output queue. So it is lost. >>> >>>>>>>>> >>> >>>>>>>>> Messages created by Client: 1000 >>> >>>>>>>>> Client Exceptions during send: 0 [] >>> >>>>>>>>> >>> >>>>>>>>> Messages received at input queue: 993 >>> >>>>>>>>> Missing Messages at input queue: 7 >>> >> [282,283,284,285,286,287,288] >>> >>>>>>>>> Duplicate Messages at input queue: 0 [] >>> >>>>>>>>> >>> >>>>>>>>> Messages received at output queue: 992 >>> >>>>>>>>> Missing Messages at output queue: 8 >>> >>>>>> [281,282,283,284,285,286,287,288] >>> >>>>>>>>> Duplicate Messages at output queue: 0 [] >>> >>>>>>>>> >>> >>>>>>>>> Lost Messages between Queues: 1 [281] >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** What is the problem? >>> >>>>>>>>> A Redhat engineer tracked the problem down to a Spring JMS >>> template >>> >>>>>>>> behaviour that is kind of strange. If a Spring transaction >>> manager >>> >> is >>> >>>>>>>> defined in the config, it will end up with two of them. >>> Therefore >>> >> the >>> >>>>>> small >>> >>>>>>>> time range where messages can get lost that arises only when you >>> >> have a >>> >>>>>>>> certain load. >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> *** So, what is my question? >>> >>>>>>>>> - Does this really mean that it is unsafe to use the "standard" >>> >> flavour >>> >>>>>>>> of configuration? >>> >>>>>>>>> - Is there another config with TxManager etc that works >>> correctly? >>> >>>>>>>>> - What are limits of the "noTxManager" config? When is it not >>> >>>>>> sufficent? >>> >>>>>>>>> >>> >>>>>>>>> Regards >>> >>>>>>>>> Stephan >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> <CamelAmqTxTest.zip> >>> >>> >> >