Re: Transaction problem with Camel, ActiveMQ and Spring JMS

Stephan Burkard Mon, 08 Feb 2016 08:30:58 -0800

Oh, our messages overlapped...

Your questions:


"... doing this queue to queue work using one or two ActiveMQ brokers?"
=> One broker

"... you may want to try camel-sims"
=> I guess you mean Camel sJms, that's the closest match I found in the
list of Camel components on GitHub :-) Never heard (ok, it is only since
2.11), but I will have a look at it.

"If you’d be using XA in the real world..."
=> No, we don't use XA


But your hint with "transacted = false" works! I was able to run the
"standard" version 5 times in a row and it was always successful.

Currently it looks like either one has to define the whole Tx stuff and set
transacted = false OR to use the simple config with transacted = true.

At least I learned that I have not real understanding what this flag does.
And I claim that most examples use transacted = true even when they define
Tx manager etc.

Regards
Stephan





On Mon, Feb 8, 2016 at 4:59 PM, Stephan Burkard <sburk...@gmail.com> wrote:

> Hi Quinn
>
> Here is the new version of my test project that uses an embedded ActiveMQ
> broker. Since the AMQ libs are of version 5.9.0 (standard edition) there is
> no more special Redhat version.
>
> There is a new BrokerManagementExecutor that is configured to stop the
> broker 5 seconds after the test starts. On my machine the broker shutdown
> happens after about 400 messages are sent. Some seconds after the broker is
> stopped, it starts again and the test finishes. I guess the automagical
> broker restart is due to the vm-transport used.
>
> Results on my machine:
>
> 1. The "noTxManager" version still never fails, so it never loses a
> message between the queues. It often misses one or two messages, but always
> on both queues. So these messages could not be sent from the client, but
> all messages that arrived at the first queue also arrive at the second
> queue.
>
> 2. The "standard" version fails on almost every attempt, it mostly loses
> one or two messages between the queues. So these messags arrived at the
> first queue but not on the second one.
>
> Since the embedded broker is stopped at the end of the test, I added an
> additional Camel route that consumes the default DLQ. I added the messages
> arrived at the DLQ in the test summary output. But on my machine I never
> had messages in it.
>
> I hope you can reproduce the problem now more easily.
>
> Regards
> Stephan
>
>
>
>
>
>
> On Sat, Feb 6, 2016 at 10:32 PM, Stephan Burkard <sburk...@gmail.com>
> wrote:
>
>> Hi Quinn
>>
>> I don't think that you need to match exactly my broker version. I had
>> first discovered this issue on ActiveMQ 5.9.0 standard edition. I guess
>> that simply every broker version suffers from this. I really don't think it
>> is an ActiveMQ problem. It is according to Redhat a Spring JMS problem.
>>
>> No, I never tried to use an embedded broker. Probably because I used
>> remote brokers when I discovered the problem during Master-Slave failover
>> tests. I will try to rewrite the test project to use an embedded broker
>> that can be stopped and started as part of the test.
>>
>> Yes, that's what I meant. That the remote broker increases the
>> probability to show the issue. Because when the analysis of Redhat was
>> correct, it is really a timing issue. You can also increase the chance for
>> the issue if you produce even more messages per second. That increases the
>> probability that a message falls just into the problematic time slice where
>> the consumer has committed but not the producer.
>>
>> Yes, that's right. I start the test and when I see lots of console output
>> I hit enter on the console where the stop command of the broker has waited.
>> Then I wait about 5 to 10 seconds and then I start the broker again. The
>> test reconnects and continues.
>>
>> Regards
>> Stephan
>>
>>
>>
>>
>>
>> On Fri, Feb 5, 2016 at 7:40 PM, Quinn Stevenson <
>> qu...@pronoia-solutions.com> wrote:
>>
>>> Stephan -
>>>
>>> I’ll get a broker running and try to match your version - I think I can
>>> get it from one of my customers whose running Fuse 6.2.
>>>
>>> While I do that - have you considered trying to reproduce this using an
>>> embedded broker that the test could control?  It would make it much easier
>>> to reproduce.
>>>
>>> I don’t think running the broker locally vs remotely should increase any
>>> probably of losing messages - we shouldn’t lose any as long as the
>>> configuration is correct.  It may increase the probably of an issue, but we
>>> shouldn’t lose messages.
>>>
>>> Also, just to confirm - when you’re testing this you are
>>> stopping/starting the broker in the middle of the test, not killing and
>>> restarting the broker - correct?
>>>
>>>
>>> > On Feb 5, 2016, at 12:37 AM, Stephan Burkard <sburk...@gmail.com>
>>> wrote:
>>> >
>>> > Hi Quinn
>>> >
>>> > I just tested the POM changes you posted and the second run failed
>>> (without
>>> > failover-URL). I then tested with the failover-URL and the third
>>> attempt
>>> > failed.
>>> >
>>> > The latter is no big surprise since I discovered the problem during
>>> > failover tests in a master-slave-config. I then reduced the setup to a
>>> > single broker environment and it was still there.
>>> >
>>> > My test broker is apache-activemq-5.11.0.redhat-620133, a patched
>>> Redhat
>>> > version of AMQ 5.11. As you, I also don't change the AMQ version
>>> number in
>>> > the POM, I just use a newer broker than the library version. My broker
>>> runs
>>> > on another machine than the test. Perhaps this increases the
>>> probability of
>>> > losing a message?
>>> >
>>> > Regards
>>> > Stephan
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Feb 4, 2016 at 7:06 PM, Quinn Stevenson <
>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>> >> wrote:
>>> >
>>> >> I tested this with a 5.9.0 broker and I am seeing messages dropped
>>> with
>>> >> the TxText, but I still have to use the failover URL or the test just
>>> stops
>>> >> after the broker is restarted.
>>> >>
>>> >> I don’t have a 5.9.1 broker to test with, so I don’t know if that
>>> would
>>> >> help, but the next oldest broker I have is 5.10.1, and it seems to be
>>> >> working with that broker.
>>> >>
>>> >> NOTE:  I’m not changing the activemq-version in the POM when I change
>>> the
>>> >> broker version - I’m just starting a different broker (locally) on
>>> the same
>>> >> port.
>>> >>
>>> >>
>>> >>> On Feb 4, 2016, at 10:41 AM, Quinn Stevenson <
>>> >> qu...@pronoia-solutions.com> wrote:
>>> >>>
>>> >>> I still can’t make either test drop messages between the input and
>>> the
>>> >> output queue with the POM changes I sent, but I did find one
>>> difference
>>> >> between what you’ve done and what I normally do that changes the
>>> output I’m
>>> >> seeing - I always use a failover URL
>>> >>>
>>> >>> <property name="brokerURL"
>>> >>
>>> value="failover:(tcp://localhost:61616?wireFormat.tightEncodingEnabled=false
>>> >> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false
>>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>>)"/>
>>> >>>
>>> >>> My test broker is v 5.10.1 as well - I’ll see if it makes any
>>> difference
>>> >> with 5.9.0
>>> >>>
>>> >>>
>>> >>>
>>> >>>> On Feb 4, 2016, at 9:52 AM, Quinn Stevenson <
>>> >> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>>>
>>> wrote:
>>> >>>>
>>> >>>> It is strange - I’m trying to compare what you have in the
>>> “standard”
>>> >> version to what I did before.  We tested our configs pretty heavily
>>> under
>>> >> all sorts of strange conditions to verify we weren’t looking
>>> messages, but
>>> >> we were using newer versions of Camel and ActiveMQ.
>>> >>>>
>>> >>>> So we’re on the same page - can you try your tests again with POM
>>> >> dependencies that look something like this?
>>> >>>>
>>> >>>> <properties>
>>> >>>>    <camel-version>2.12.5</camel-version>
>>> >>>>    <activemq-version>5.9.0</activemq-version>
>>> >>>> </properties>
>>> >>>>
>>> >>>> <dependencies>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.apache.activemq</groupId>
>>> >>>>        <artifactId>activemq-all</artifactId>
>>> >>>>        <version>${activemq-version}</version>
>>> >>>>    </dependency>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.apache.activemq</groupId>
>>> >>>>        <artifactId>activemq-pool</artifactId>
>>> >>>>        <version>${activemq-version}</version>
>>> >>>>    </dependency>
>>> >>>>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.apache.camel</groupId>
>>> >>>>        <artifactId>camel-spring</artifactId>
>>> >>>>        <version>${camel-version}</version>
>>> >>>>    </dependency>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.apache.camel</groupId>
>>> >>>>        <artifactId>camel-jms</artifactId>
>>> >>>>        <version>${camel-version}</version>
>>> >>>>    </dependency>
>>> >>>>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.apache.camel</groupId>
>>> >>>>        <artifactId>camel-test-spring</artifactId>
>>> >>>>        <version>${camel-version}</version>
>>> >>>>        <scope>test</scope>
>>> >>>>    </dependency>
>>> >>>>
>>> >>>>    <dependency>
>>> >>>>        <groupId>commons-collections</groupId>
>>> >>>>        <artifactId>commons-collections</artifactId>
>>> >>>>        <version>3.2.1</version>
>>> >>>>        <scope>test</scope>
>>> >>>>    </dependency>
>>> >>>>    <dependency>
>>> >>>>        <groupId>org.hamcrest</groupId>
>>> >>>>        <artifactId>hamcrest-integration</artifactId>
>>> >>>>        <version>1.3</version>
>>> >>>>        <scope>test</scope>
>>> >>>>    </dependency>
>>> >>>>
>>> >>>> </dependencies>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>> On Feb 4, 2016, at 9:49 AM, Stephan Burkard <sburk...@gmail.com
>>> <mailto:sburk...@gmail.com>
>>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote:
>>> >>>>>
>>> >>>>> Hi Quinn
>>> >>>>>
>>> >>>>> The "standard" version is the big mystery. As I stated in my first
>>> >> post, a
>>> >>>>> Redhat engineer analysed a similar project (with less book-keeping
>>> and
>>> >>>>> logging stuff) and his conclusion was that as soon as a transaction
>>> >> manager
>>> >>>>> is explicitly defined, Spring JMS Template (that is used by Camel
>>> >> under the
>>> >>>>> hood) creates two of them by bug, by accident or just by strange
>>> >> behaviour.
>>> >>>>>
>>> >>>>> This conclusion was quite suprising since that meant that all our
>>> >> Camel-JMS
>>> >>>>> project are theoretically suffering from message loss.
>>> >>>>>
>>> >>>>> The "no-tx" version should definitely be OK, see also CAMEL-5055
>>> for
>>> >> the "
>>> >>>>> lazyCreateTransactionManager" flag. The JMS transaction manager may
>>> >> not be
>>> >>>>> defined but it creates one implicitly because of "transacted =
>>> true".
>>> >>>>>
>>> >>>>> The two "flaws" you mentioned are perhaps an issue. It would be
>>> somehow
>>> >>>>> calming if it is my project who has a flaw.
>>> >>>>>
>>> >>>>> Regards
>>> >>>>> Stephan
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, Feb 4, 2016 at 4:44 PM, Quinn Stevenson <
>>> >> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com
>>> >>
>>> >>>>>> wrote:
>>> >>>>>
>>> >>>>>> I’m still going through the project, but the first couple of
>>> things
>>> >> that
>>> >>>>>> jump out at me are you have two Spring versions - the one you
>>> >> explicitly
>>> >>>>>> put in your POM (3.2.8.RELEASE) and the one pulled in by
>>> camel-spring
>>> >>>>>> (3.2.11.RELEASE).  Also, camel-spring should be included in the
>>> POM
>>> >> since
>>> >>>>>> you’re using Spring routes.  I’m not sure if that’s enough to
>>> cause
>>> >> issues
>>> >>>>>> or not.
>>> >>>>>>
>>> >>>>>> I believe what’s going on with the “no-tx” version is you’re
>>> actually
>>> >>>>>> using JMS transactions since you still have transacted set to
>>> true in
>>> >> the
>>> >>>>>> JmsConfiguration.
>>> >>>>>>
>>> >>>>>> I’m not sure what’s going in with the “standard” version - it
>>> looks
>>> >>>>>> similar to some XA stuff I’ve setup before (because I had multiple
>>> >> brokers
>>> >>>>>> involved) except I had to use XA Connection Factories.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>> On Feb 3, 2016, at 3:12 PM, Stephan Burkard <sburk...@gmail.com
>>> <mailto:sburk...@gmail.com>
>>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Yes, same broker. There is only one ActiveMQ connection config
>>> in the
>>> >>>>>>> project.
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Feb 3, 2016 at 8:00 PM, Quinn Stevenson <
>>> >>>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com
>>> >>
>>> >>>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>>> Are both the source and destination queues hosted by the same
>>> >> ActiveMQ
>>> >>>>>>>> broker?
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>> On Feb 3, 2016, at 8:21 AM, Stephan Burkard <
>>> sburk...@gmail.com <mailto:sburk...@gmail.com>
>>> >> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>>
>>> >>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Hi
>>> >>>>>>>>>
>>> >>>>>>>>> I have built a small Maven project (attached) to demonstrate a
>>> JMS
>>> >>>>>>>> transaction problem in Camel routes under certain load
>>> conditions.
>>> >> In
>>> >>>>>> fact
>>> >>>>>>>> I am losing messages between two queues.
>>> >>>>>>>>>
>>> >>>>>>>>> The project contains two different flavours of the same test.
>>> One
>>> >> of
>>> >>>>>>>> them suffers from the problem, the other (due to my tests) not.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** What does the testcase?
>>> >>>>>>>>> 1. Produces 1000 messages (100/s) and sends them to an "input"
>>> >> queue.
>>> >>>>>>>>> 2. Sends the messages from the "input" queue to an "output"
>>> queue.
>>> >>>>>>>>> 3. Finally consumes the messages from the "output" queue to
>>> count
>>> >> them.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** What is the difference between the two test flavours?
>>> >>>>>>>>> - There is a "standard" flavour that suffers from the problem
>>> >>>>>>>>> - And there is a "noTxManager" flavour that seems to not have
>>> the
>>> >>>>>> problem
>>> >>>>>>>>> - The "standard" flavour is kind of a well known Camel/ActiveMQ
>>> >>>>>>>> configuration
>>> >>>>>>>>> - with a Spring transaction manager
>>> >>>>>>>>> - with a Spring transaction policy
>>> >>>>>>>>> - With a "transacted" flag in Camel routes
>>> >>>>>>>>> - The "noTxManager" flavour is a "simple" configuration
>>> >>>>>>>>> - no Spring transaction manager
>>> >>>>>>>>> - no Spring transaction policy
>>> >>>>>>>>> - no "transacted" flag in Camel routes
>>> >>>>>>>>> - BUT: "lazyCreateTransactionManager" = false (so routes are
>>> >>>>>>>> transacted too)
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** How to run the testcases?
>>> >>>>>>>>> 1. Replace "[yourBrokerHost]" with the hostname of your
>>> ActiveMQ
>>> >> broker
>>> >>>>>>>>> 2. Run the testcase as JUnit test
>>> >>>>>>>>> 3. When you see lots of console messages that messages are
>>> sent,
>>> >> stop
>>> >>>>>>>> your ActiveMQ broker (do not kill-9 it, just shut it down
>>> normally)
>>> >>>>>>>>> 4. Exceptions are thrown on the console output
>>> >>>>>>>>> 5. After some seconds start your broker again
>>> >>>>>>>>> 6. The test finish normally and after some seconds dumps a book
>>> >> keeping
>>> >>>>>>>> on the console
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** How to interpret the results?
>>> >>>>>>>>> - When the test is successful, no message is lost. You can run
>>> the
>>> >> test
>>> >>>>>>>> without broker shutdown/startup and it will obviously always be
>>> >>>>>> successful.
>>> >>>>>>>>> - When the test fails, one or more messages are lost between
>>> queue
>>> >>>>>>>> "input" and "output". In my tests I was not able to run the
>>> >> "standard"
>>> >>>>>>>> flavour three times in a row successfully. About every second
>>> run
>>> >>>>>> failed.
>>> >>>>>>>> In contrast, the "noTxManager" flavour never failed in my tests.
>>> >>>>>>>>>
>>> >>>>>>>>> The book keeping for a failed test looks like the following. In
>>> >> this
>>> >>>>>>>> example Message number 281 is arrived at the input queue but
>>> not at
>>> >> the
>>> >>>>>>>> output queue. So it is lost.
>>> >>>>>>>>>
>>> >>>>>>>>> Messages created by Client:          1000
>>> >>>>>>>>> Client Exceptions during send:       0 []
>>> >>>>>>>>>
>>> >>>>>>>>> Messages received at input queue:    993
>>> >>>>>>>>> Missing Messages at input queue:     7
>>> >> [282,283,284,285,286,287,288]
>>> >>>>>>>>> Duplicate Messages at input queue:   0 []
>>> >>>>>>>>>
>>> >>>>>>>>> Messages received at output queue:   992
>>> >>>>>>>>> Missing Messages at output queue:    8
>>> >>>>>> [281,282,283,284,285,286,287,288]
>>> >>>>>>>>> Duplicate Messages at output queue:  0 []
>>> >>>>>>>>>
>>> >>>>>>>>> Lost Messages between Queues:        1 [281]
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** What is the problem?
>>> >>>>>>>>> A Redhat engineer tracked the problem down to a Spring JMS
>>> template
>>> >>>>>>>> behaviour that is kind of strange. If a Spring transaction
>>> manager
>>> >> is
>>> >>>>>>>> defined in the config, it will end up with two of them.
>>> Therefore
>>> >> the
>>> >>>>>> small
>>> >>>>>>>> time range where messages can get lost that arises only when you
>>> >> have a
>>> >>>>>>>> certain load.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> *** So, what is my question?
>>> >>>>>>>>> - Does this really mean that it is unsafe to use the "standard"
>>> >> flavour
>>> >>>>>>>> of configuration?
>>> >>>>>>>>> - Is there another config with TxManager etc that works
>>> correctly?
>>> >>>>>>>>> - What are limits of the "noTxManager" config? When is it not
>>> >>>>>> sufficent?
>>> >>>>>>>>>
>>> >>>>>>>>> Regards
>>> >>>>>>>>> Stephan
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> <CamelAmqTxTest.zip>
>>>
>>>
>>
>

Re: Transaction problem with Camel, ActiveMQ and Spring JMS

Reply via email to