Re: Transaction problem with Camel, ActiveMQ and Spring JMS

Quinn Stevenson Mon, 08 Feb 2016 08:42:04 -0800

Good to hear ;-)

Yeah - I meant camel-sjms - I think autocorrect got me on that one - sorry.  I 
like the way camel-sjms does it’s internal pooling - I don’t need to create 
pooled connection factories anymore.  You can also just add “transacted=true” 
in the URI and you’re using JMS Session transactions - no other config needed.


I have a guess as to what’s going on with the tests - the “transacted” 
attribute enables/disables JMS Session transactions, which is why the example 
named “noTxManager” worked.  It’s true it wasn’t using a transaction manager, 
it didn’t need one - JMS Session transactions don’t require an external 
transaction manager.

In the second example, we were trying to use two types of transactions at the 
same time - transactions managed by the Spring JMS transaction manager and JMS 
Session transactions.  Somewhere in the Spring wiring, something got crossed 
with this and we would up without good transactional behavior.  The details of 
why are beyond me here - somebody more knowledgeable with Spring JMS will have 
to answer that.

Anyway - I’m glad it’s working now.  

Also, theres a ActiveMQ JUnit rule in 5.13.1 that would make this testing 
easier (shameless plug - I contributed the rule ;-) ).  It should pickup the 
ActiveMQ broker jars from the test class path, so it should work with versions 
of the broker other than 5.13.1.

> On Feb 8, 2016, at 9:30 AM, Stephan Burkard <sburk...@gmail.com> wrote:
> 
> Oh, our messages overlapped...
> 
> Your questions:
> 
> "... doing this queue to queue work using one or two ActiveMQ brokers?"
> => One broker
> 
> "... you may want to try camel-sims"
> => I guess you mean Camel sJms, that's the closest match I found in the
> list of Camel components on GitHub :-) Never heard (ok, it is only since
> 2.11), but I will have a look at it.
> 
> "If you’d be using XA in the real world..."
> => No, we don't use XA
> 
> 
> But your hint with "transacted = false" works! I was able to run the
> "standard" version 5 times in a row and it was always successful.
> 
> Currently it looks like either one has to define the whole Tx stuff and set
> transacted = false OR to use the simple config with transacted = true.
> 
> At least I learned that I have not real understanding what this flag does.
> And I claim that most examples use transacted = true even when they define
> Tx manager etc.
> 
> Regards
> Stephan
> 
> 
> 
> 
> 
> On Mon, Feb 8, 2016 at 4:59 PM, Stephan Burkard <sburk...@gmail.com> wrote:
> 
>> Hi Quinn
>> 
>> Here is the new version of my test project that uses an embedded ActiveMQ
>> broker. Since the AMQ libs are of version 5.9.0 (standard edition) there is
>> no more special Redhat version.
>> 
>> There is a new BrokerManagementExecutor that is configured to stop the
>> broker 5 seconds after the test starts. On my machine the broker shutdown
>> happens after about 400 messages are sent. Some seconds after the broker is
>> stopped, it starts again and the test finishes. I guess the automagical
>> broker restart is due to the vm-transport used.
>> 
>> Results on my machine:
>> 
>> 1. The "noTxManager" version still never fails, so it never loses a
>> message between the queues. It often misses one or two messages, but always
>> on both queues. So these messages could not be sent from the client, but
>> all messages that arrived at the first queue also arrive at the second
>> queue.
>> 
>> 2. The "standard" version fails on almost every attempt, it mostly loses
>> one or two messages between the queues. So these messags arrived at the
>> first queue but not on the second one.
>> 
>> Since the embedded broker is stopped at the end of the test, I added an
>> additional Camel route that consumes the default DLQ. I added the messages
>> arrived at the DLQ in the test summary output. But on my machine I never
>> had messages in it.
>> 
>> I hope you can reproduce the problem now more easily.
>> 
>> Regards
>> Stephan
>> 
>> 
>> 
>> 
>> 
>> 
>> On Sat, Feb 6, 2016 at 10:32 PM, Stephan Burkard <sburk...@gmail.com>
>> wrote:
>> 
>>> Hi Quinn
>>> 
>>> I don't think that you need to match exactly my broker version. I had
>>> first discovered this issue on ActiveMQ 5.9.0 standard edition. I guess
>>> that simply every broker version suffers from this. I really don't think it
>>> is an ActiveMQ problem. It is according to Redhat a Spring JMS problem.
>>> 
>>> No, I never tried to use an embedded broker. Probably because I used
>>> remote brokers when I discovered the problem during Master-Slave failover
>>> tests. I will try to rewrite the test project to use an embedded broker
>>> that can be stopped and started as part of the test.
>>> 
>>> Yes, that's what I meant. That the remote broker increases the
>>> probability to show the issue. Because when the analysis of Redhat was
>>> correct, it is really a timing issue. You can also increase the chance for
>>> the issue if you produce even more messages per second. That increases the
>>> probability that a message falls just into the problematic time slice where
>>> the consumer has committed but not the producer.
>>> 
>>> Yes, that's right. I start the test and when I see lots of console output
>>> I hit enter on the console where the stop command of the broker has waited.
>>> Then I wait about 5 to 10 seconds and then I start the broker again. The
>>> test reconnects and continues.
>>> 
>>> Regards
>>> Stephan
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Feb 5, 2016 at 7:40 PM, Quinn Stevenson <
>>> qu...@pronoia-solutions.com> wrote:
>>> 
>>>> Stephan -
>>>> 
>>>> I’ll get a broker running and try to match your version - I think I can
>>>> get it from one of my customers whose running Fuse 6.2.
>>>> 
>>>> While I do that - have you considered trying to reproduce this using an
>>>> embedded broker that the test could control?  It would make it much easier
>>>> to reproduce.
>>>> 
>>>> I don’t think running the broker locally vs remotely should increase any
>>>> probably of losing messages - we shouldn’t lose any as long as the
>>>> configuration is correct.  It may increase the probably of an issue, but we
>>>> shouldn’t lose messages.
>>>> 
>>>> Also, just to confirm - when you’re testing this you are
>>>> stopping/starting the broker in the middle of the test, not killing and
>>>> restarting the broker - correct?
>>>> 
>>>> 
>>>>> On Feb 5, 2016, at 12:37 AM, Stephan Burkard <sburk...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hi Quinn
>>>>> 
>>>>> I just tested the POM changes you posted and the second run failed
>>>> (without
>>>>> failover-URL). I then tested with the failover-URL and the third
>>>> attempt
>>>>> failed.
>>>>> 
>>>>> The latter is no big surprise since I discovered the problem during
>>>>> failover tests in a master-slave-config. I then reduced the setup to a
>>>>> single broker environment and it was still there.
>>>>> 
>>>>> My test broker is apache-activemq-5.11.0.redhat-620133, a patched
>>>> Redhat
>>>>> version of AMQ 5.11. As you, I also don't change the AMQ version
>>>> number in
>>>>> the POM, I just use a newer broker than the library version. My broker
>>>> runs
>>>>> on another machine than the test. Perhaps this increases the
>>>> probability of
>>>>> losing a message?
>>>>> 
>>>>> Regards
>>>>> Stephan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Feb 4, 2016 at 7:06 PM, Quinn Stevenson <
>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>>>>> wrote:
>>>>> 
>>>>>> I tested this with a 5.9.0 broker and I am seeing messages dropped
>>>> with
>>>>>> the TxText, but I still have to use the failover URL or the test just
>>>> stops
>>>>>> after the broker is restarted.
>>>>>> 
>>>>>> I don’t have a 5.9.1 broker to test with, so I don’t know if that
>>>> would
>>>>>> help, but the next oldest broker I have is 5.10.1, and it seems to be
>>>>>> working with that broker.
>>>>>> 
>>>>>> NOTE:  I’m not changing the activemq-version in the POM when I change
>>>> the
>>>>>> broker version - I’m just starting a different broker (locally) on
>>>> the same
>>>>>> port.
>>>>>> 
>>>>>> 
>>>>>>> On Feb 4, 2016, at 10:41 AM, Quinn Stevenson <
>>>>>> qu...@pronoia-solutions.com> wrote:
>>>>>>> 
>>>>>>> I still can’t make either test drop messages between the input and
>>>> the
>>>>>> output queue with the POM changes I sent, but I did find one
>>>> difference
>>>>>> between what you’ve done and what I normally do that changes the
>>>> output I’m
>>>>>> seeing - I always use a failover URL
>>>>>>> 
>>>>>>> <property name="brokerURL"
>>>>>> 
>>>> value="failover:(tcp://localhost:61616?wireFormat.tightEncodingEnabled=false
>>>>>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false
>>>> <tcp://localhost:61616?wireFormat.tightEncodingEnabled=false>>)"/>
>>>>>>> 
>>>>>>> My test broker is v 5.10.1 as well - I’ll see if it makes any
>>>> difference
>>>>>> with 5.9.0
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Feb 4, 2016, at 9:52 AM, Quinn Stevenson <
>>>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>>>
>>>> wrote:
>>>>>>>> 
>>>>>>>> It is strange - I’m trying to compare what you have in the
>>>> “standard”
>>>>>> version to what I did before.  We tested our configs pretty heavily
>>>> under
>>>>>> all sorts of strange conditions to verify we weren’t looking
>>>> messages, but
>>>>>> we were using newer versions of Camel and ActiveMQ.
>>>>>>>> 
>>>>>>>> So we’re on the same page - can you try your tests again with POM
>>>>>> dependencies that look something like this?
>>>>>>>> 
>>>>>>>> <properties>
>>>>>>>>   <camel-version>2.12.5</camel-version>
>>>>>>>>   <activemq-version>5.9.0</activemq-version>
>>>>>>>> </properties>
>>>>>>>> 
>>>>>>>> <dependencies>
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.apache.activemq</groupId>
>>>>>>>>       <artifactId>activemq-all</artifactId>
>>>>>>>>       <version>${activemq-version}</version>
>>>>>>>>   </dependency>
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.apache.activemq</groupId>
>>>>>>>>       <artifactId>activemq-pool</artifactId>
>>>>>>>>       <version>${activemq-version}</version>
>>>>>>>>   </dependency>
>>>>>>>> 
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>>>       <artifactId>camel-spring</artifactId>
>>>>>>>>       <version>${camel-version}</version>
>>>>>>>>   </dependency>
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>>>       <artifactId>camel-jms</artifactId>
>>>>>>>>       <version>${camel-version}</version>
>>>>>>>>   </dependency>
>>>>>>>> 
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.apache.camel</groupId>
>>>>>>>>       <artifactId>camel-test-spring</artifactId>
>>>>>>>>       <version>${camel-version}</version>
>>>>>>>>       <scope>test</scope>
>>>>>>>>   </dependency>
>>>>>>>> 
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>commons-collections</groupId>
>>>>>>>>       <artifactId>commons-collections</artifactId>
>>>>>>>>       <version>3.2.1</version>
>>>>>>>>       <scope>test</scope>
>>>>>>>>   </dependency>
>>>>>>>>   <dependency>
>>>>>>>>       <groupId>org.hamcrest</groupId>
>>>>>>>>       <artifactId>hamcrest-integration</artifactId>
>>>>>>>>       <version>1.3</version>
>>>>>>>>       <scope>test</scope>
>>>>>>>>   </dependency>
>>>>>>>> 
>>>>>>>> </dependencies>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Feb 4, 2016, at 9:49 AM, Stephan Burkard <sburk...@gmail.com
>>>> <mailto:sburk...@gmail.com>
>>>>>> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Quinn
>>>>>>>>> 
>>>>>>>>> The "standard" version is the big mystery. As I stated in my first
>>>>>> post, a
>>>>>>>>> Redhat engineer analysed a similar project (with less book-keeping
>>>> and
>>>>>>>>> logging stuff) and his conclusion was that as soon as a transaction
>>>>>> manager
>>>>>>>>> is explicitly defined, Spring JMS Template (that is used by Camel
>>>>>> under the
>>>>>>>>> hood) creates two of them by bug, by accident or just by strange
>>>>>> behaviour.
>>>>>>>>> 
>>>>>>>>> This conclusion was quite suprising since that meant that all our
>>>>>> Camel-JMS
>>>>>>>>> project are theoretically suffering from message loss.
>>>>>>>>> 
>>>>>>>>> The "no-tx" version should definitely be OK, see also CAMEL-5055
>>>> for
>>>>>> the "
>>>>>>>>> lazyCreateTransactionManager" flag. The JMS transaction manager may
>>>>>> not be
>>>>>>>>> defined but it creates one implicitly because of "transacted =
>>>> true".
>>>>>>>>> 
>>>>>>>>> The two "flaws" you mentioned are perhaps an issue. It would be
>>>> somehow
>>>>>>>>> calming if it is my project who has a flaw.
>>>>>>>>> 
>>>>>>>>> Regards
>>>>>>>>> Stephan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Feb 4, 2016 at 4:44 PM, Quinn Stevenson <
>>>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com
>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I’m still going through the project, but the first couple of
>>>> things
>>>>>> that
>>>>>>>>>> jump out at me are you have two Spring versions - the one you
>>>>>> explicitly
>>>>>>>>>> put in your POM (3.2.8.RELEASE) and the one pulled in by
>>>> camel-spring
>>>>>>>>>> (3.2.11.RELEASE).  Also, camel-spring should be included in the
>>>> POM
>>>>>> since
>>>>>>>>>> you’re using Spring routes.  I’m not sure if that’s enough to
>>>> cause
>>>>>> issues
>>>>>>>>>> or not.
>>>>>>>>>> 
>>>>>>>>>> I believe what’s going on with the “no-tx” version is you’re
>>>> actually
>>>>>>>>>> using JMS transactions since you still have transacted set to
>>>> true in
>>>>>> the
>>>>>>>>>> JmsConfiguration.
>>>>>>>>>> 
>>>>>>>>>> I’m not sure what’s going in with the “standard” version - it
>>>> looks
>>>>>>>>>> similar to some XA stuff I’ve setup before (because I had multiple
>>>>>> brokers
>>>>>>>>>> involved) except I had to use XA Connection Factories.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Feb 3, 2016, at 3:12 PM, Stephan Burkard <sburk...@gmail.com
>>>> <mailto:sburk...@gmail.com>
>>>>>> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Yes, same broker. There is only one ActiveMQ connection config
>>>> in the
>>>>>>>>>>> project.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Feb 3, 2016 at 8:00 PM, Quinn Stevenson <
>>>>>>>>>> qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com>
>>>> <mailto:qu...@pronoia-solutions.com <mailto:qu...@pronoia-solutions.com
>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Are both the source and destination queues hosted by the same
>>>>>> ActiveMQ
>>>>>>>>>>>> broker?
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Feb 3, 2016, at 8:21 AM, Stephan Burkard <
>>>> sburk...@gmail.com <mailto:sburk...@gmail.com>
>>>>>> <mailto:sburk...@gmail.com <mailto:sburk...@gmail.com>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have built a small Maven project (attached) to demonstrate a
>>>> JMS
>>>>>>>>>>>> transaction problem in Camel routes under certain load
>>>> conditions.
>>>>>> In
>>>>>>>>>> fact
>>>>>>>>>>>> I am losing messages between two queues.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The project contains two different flavours of the same test.
>>>> One
>>>>>> of
>>>>>>>>>>>> them suffers from the problem, the other (due to my tests) not.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** What does the testcase?
>>>>>>>>>>>>> 1. Produces 1000 messages (100/s) and sends them to an "input"
>>>>>> queue.
>>>>>>>>>>>>> 2. Sends the messages from the "input" queue to an "output"
>>>> queue.
>>>>>>>>>>>>> 3. Finally consumes the messages from the "output" queue to
>>>> count
>>>>>> them.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** What is the difference between the two test flavours?
>>>>>>>>>>>>> - There is a "standard" flavour that suffers from the problem
>>>>>>>>>>>>> - And there is a "noTxManager" flavour that seems to not have
>>>> the
>>>>>>>>>> problem
>>>>>>>>>>>>> - The "standard" flavour is kind of a well known Camel/ActiveMQ
>>>>>>>>>>>> configuration
>>>>>>>>>>>>> - with a Spring transaction manager
>>>>>>>>>>>>> - with a Spring transaction policy
>>>>>>>>>>>>> - With a "transacted" flag in Camel routes
>>>>>>>>>>>>> - The "noTxManager" flavour is a "simple" configuration
>>>>>>>>>>>>> - no Spring transaction manager
>>>>>>>>>>>>> - no Spring transaction policy
>>>>>>>>>>>>> - no "transacted" flag in Camel routes
>>>>>>>>>>>>> - BUT: "lazyCreateTransactionManager" = false (so routes are
>>>>>>>>>>>> transacted too)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** How to run the testcases?
>>>>>>>>>>>>> 1. Replace "[yourBrokerHost]" with the hostname of your
>>>> ActiveMQ
>>>>>> broker
>>>>>>>>>>>>> 2. Run the testcase as JUnit test
>>>>>>>>>>>>> 3. When you see lots of console messages that messages are
>>>> sent,
>>>>>> stop
>>>>>>>>>>>> your ActiveMQ broker (do not kill-9 it, just shut it down
>>>> normally)
>>>>>>>>>>>>> 4. Exceptions are thrown on the console output
>>>>>>>>>>>>> 5. After some seconds start your broker again
>>>>>>>>>>>>> 6. The test finish normally and after some seconds dumps a book
>>>>>> keeping
>>>>>>>>>>>> on the console
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** How to interpret the results?
>>>>>>>>>>>>> - When the test is successful, no message is lost. You can run
>>>> the
>>>>>> test
>>>>>>>>>>>> without broker shutdown/startup and it will obviously always be
>>>>>>>>>> successful.
>>>>>>>>>>>>> - When the test fails, one or more messages are lost between
>>>> queue
>>>>>>>>>>>> "input" and "output". In my tests I was not able to run the
>>>>>> "standard"
>>>>>>>>>>>> flavour three times in a row successfully. About every second
>>>> run
>>>>>>>>>> failed.
>>>>>>>>>>>> In contrast, the "noTxManager" flavour never failed in my tests.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The book keeping for a failed test looks like the following. In
>>>>>> this
>>>>>>>>>>>> example Message number 281 is arrived at the input queue but
>>>> not at
>>>>>> the
>>>>>>>>>>>> output queue. So it is lost.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Messages created by Client:          1000
>>>>>>>>>>>>> Client Exceptions during send:       0 []
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Messages received at input queue:    993
>>>>>>>>>>>>> Missing Messages at input queue:     7
>>>>>> [282,283,284,285,286,287,288]
>>>>>>>>>>>>> Duplicate Messages at input queue:   0 []
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Messages received at output queue:   992
>>>>>>>>>>>>> Missing Messages at output queue:    8
>>>>>>>>>> [281,282,283,284,285,286,287,288]
>>>>>>>>>>>>> Duplicate Messages at output queue:  0 []
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Lost Messages between Queues:        1 [281]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** What is the problem?
>>>>>>>>>>>>> A Redhat engineer tracked the problem down to a Spring JMS
>>>> template
>>>>>>>>>>>> behaviour that is kind of strange. If a Spring transaction
>>>> manager
>>>>>> is
>>>>>>>>>>>> defined in the config, it will end up with two of them.
>>>> Therefore
>>>>>> the
>>>>>>>>>> small
>>>>>>>>>>>> time range where messages can get lost that arises only when you
>>>>>> have a
>>>>>>>>>>>> certain load.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> *** So, what is my question?
>>>>>>>>>>>>> - Does this really mean that it is unsafe to use the "standard"
>>>>>> flavour
>>>>>>>>>>>> of configuration?
>>>>>>>>>>>>> - Is there another config with TxManager etc that works
>>>> correctly?
>>>>>>>>>>>>> - What are limits of the "noTxManager" config? When is it not
>>>>>>>>>> sufficent?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> <CamelAmqTxTest.zip>
>>>> 
>>>> 
>>> 
>>

Re: Transaction problem with Camel, ActiveMQ and Spring JMS

Reply via email to