Re: TextMessage.getText returning null

2018-07-11 Thread Arthur Naseef
If concurrent access can be reproduced in a dev environment, it is possible
- with temporary code changes - to determine what two threads were
accessing the message at the same time.

An effective means to track this down is to record Exception objects (and
hence stack traces) at each point accessing the fields of concern.  Then,
when the problem is detected, dump the stack traces of those exceptions.

Also - on the "saving space" topic - is the text field actually storing a
copy of the bytes, or is it backed by the same memory used by the byte
array?  I'd be very curious to see that.  I'm also curious about the "isn't
tracked" assertion.

Hope this helps.

Art

On Thu, May 31, 2018 at 4:56 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Art,
>
> Concurrent store and dispatch should be off by default for topics (which I
> believe the test case is using).  Also, it wouldn't hit the store since
> it's a topic and not a durable subscription so should just be in memory.
> That's why this issue is weird as the messages shouldn't be concurrently
> touched by any thread since they get copied.  (but there still seems to be
> something going on that i am missing off the top of my head involving
> multiple threads touching the message)
>
> Clearing the text field is important because it prevents memory waste.  The
> explanation is explained more in https://issues.apache.org/
> jira/browse/AMQ-5857 If you don't clear it then extra memory is used that
> isn't tracked and can lead to OOM errors (it's why the
> reduceMemoryFootprintFlag exists in the broker).  So I think that change is
> probably ok but a side effect is causing this other issue to pop up.
>
> When I was doing some testing today, I tried a couple variations of
> synchronization (using volatile, adding mutexes in places, etc) and some
> worked and some didn't.  One thing that was interesting was that when I
> synchronized on the copy() method of ActiveMQText message the issue was
> fixed.  That is weird however because we shouldn't need to sync on copy.
> So either there is a problem with copy() itself or adding the
> synchronization is throwing off the timing just enough to hide the real
> issue.
>
> Ultimately I think adding synchronization in a test (like what you did)
> will be useful to track down the problem since I think it's a race
> condition issue but as a permanently fix I would avoid it.  The reason is
> that it shouldn't be needed...the objects are designed to only be accessed
> by one thread at a time and copied if necessary which makes sense to me.
> However if adding synchronization does fix the issue then hopefully it will
> at least be a clue as to what the real issue is.
>
> Chris
>
> On Thu, May 31, 2018 at 5:41 PM, Arthur Naseef  wrote:
>
> > Ahh, I bet that is a result of using the VM transport.  The broker is
> > likely doing the "beforeMarshal()" call that clears the text just before
> > the client is calling "getText()" on the same message - or just before
> > "copy()" is called to make a copy of the message.
> >
> > Yeah, looking at the code carefully - the storeContent() method is
> handling
> > "text != null" with "content = null" by converting text to a byte
> sequence
> > (
> > org.apache.activemq.util.ByteSequence) and stuffing that into "content".
> >
> > If all this happens concurrent with a copy() or a call to getText(), then
> > it's entirely possible to get this race condition.  So, disabling
> > "concurrent store and dispatch" could solve this problem.  Also, using
> > critical sections in "storeContentAndClear()" as well as "getText()"
> might
> > eliminate the race condition.
> >
> > With all that said, I don't see the big win of clearing the text field in
> > storeContentAndClear().
> >
> > @Chris - can you try the patch here and see if it eliminates the problem?
> > https://gist.github.com/artnaseef/d518c0431a34cd3d49e8d6e045a719e2
> >
> > I may have missed some paths that access both "text" and "content", but I
> > think I got the important ones.  If that patch works, that increases
> > confidence that you found the right race condition.
> >
> > Art
> >
> >
> >
> > On Thu, May 31, 2018 at 12:55 PM, Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> > > Found the culprit: Seems to be related to
> > > https://issues.apache.org/jira/browse/AMQ-5857
> > >
> > > Specifically, this commit:
> > > https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=
> > > blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/
> > > ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929
> f1f81a1902;hp=
> > > 347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=
> > > de86f473f776fe3a8ca32d7f929ecec7993c458b
> > >
> > > Changing the storeContentAndClear() call back to storeContent() fixes
> it,
> > > however it was changed for a reason to prevent memory waste so need to
> > see
> > > if/what a permanent fix is
> > >
> > >
> > > On Thu, May 31, 2018 at 3:54 PM, 

Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
Art,

Concurrent store and dispatch should be off by default for topics (which I
believe the test case is using).  Also, it wouldn't hit the store since
it's a topic and not a durable subscription so should just be in memory.
That's why this issue is weird as the messages shouldn't be concurrently
touched by any thread since they get copied.  (but there still seems to be
something going on that i am missing off the top of my head involving
multiple threads touching the message)

Clearing the text field is important because it prevents memory waste.  The
explanation is explained more in https://issues.apache.org/
jira/browse/AMQ-5857 If you don't clear it then extra memory is used that
isn't tracked and can lead to OOM errors (it's why the
reduceMemoryFootprintFlag exists in the broker).  So I think that change is
probably ok but a side effect is causing this other issue to pop up.

When I was doing some testing today, I tried a couple variations of
synchronization (using volatile, adding mutexes in places, etc) and some
worked and some didn't.  One thing that was interesting was that when I
synchronized on the copy() method of ActiveMQText message the issue was
fixed.  That is weird however because we shouldn't need to sync on copy.
So either there is a problem with copy() itself or adding the
synchronization is throwing off the timing just enough to hide the real
issue.

Ultimately I think adding synchronization in a test (like what you did)
will be useful to track down the problem since I think it's a race
condition issue but as a permanently fix I would avoid it.  The reason is
that it shouldn't be needed...the objects are designed to only be accessed
by one thread at a time and copied if necessary which makes sense to me.
However if adding synchronization does fix the issue then hopefully it will
at least be a clue as to what the real issue is.

Chris

On Thu, May 31, 2018 at 5:41 PM, Arthur Naseef  wrote:

> Ahh, I bet that is a result of using the VM transport.  The broker is
> likely doing the "beforeMarshal()" call that clears the text just before
> the client is calling "getText()" on the same message - or just before
> "copy()" is called to make a copy of the message.
>
> Yeah, looking at the code carefully - the storeContent() method is handling
> "text != null" with "content = null" by converting text to a byte sequence
> (
> org.apache.activemq.util.ByteSequence) and stuffing that into "content".
>
> If all this happens concurrent with a copy() or a call to getText(), then
> it's entirely possible to get this race condition.  So, disabling
> "concurrent store and dispatch" could solve this problem.  Also, using
> critical sections in "storeContentAndClear()" as well as "getText()" might
> eliminate the race condition.
>
> With all that said, I don't see the big win of clearing the text field in
> storeContentAndClear().
>
> @Chris - can you try the patch here and see if it eliminates the problem?
> https://gist.github.com/artnaseef/d518c0431a34cd3d49e8d6e045a719e2
>
> I may have missed some paths that access both "text" and "content", but I
> think I got the important ones.  If that patch works, that increases
> confidence that you found the right race condition.
>
> Art
>
>
>
> On Thu, May 31, 2018 at 12:55 PM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
> > Found the culprit: Seems to be related to
> > https://issues.apache.org/jira/browse/AMQ-5857
> >
> > Specifically, this commit:
> > https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=
> > blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/
> > ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929f1f81a1902;hp=
> > 347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=
> > de86f473f776fe3a8ca32d7f929ecec7993c458b
> >
> > Changing the storeContentAndClear() call back to storeContent() fixes it,
> > however it was changed for a reason to prevent memory waste so need to
> see
> > if/what a permanent fix is
> >
> >
> > On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 
> > wrote:
> >
> > > Ok, thank you...my thought also was that performance impact of using
> tcp
> > > wouldn't be enough to hurt us in any substantial way, even though it
> > > *feels*
> > > really wasteful to be marshaling the message and using TCP when we're
> in
> > > the
> > > same JVM.
> > >
> > > In response to your first message, I changed the transport from nio://
> to
> > > tcp:// on all the non-vm threads and the behavior still reproduces.
> Using
> > > Java and raw AMQ libs is a much tougher problem...which I can do...but
> I
> > > probably won't be able to get to that for a few days.
> > >
> > >
> > >
> > > --
> > > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> > > f2368404.html
> > >
> >
>


Re: TextMessage.getText returning null

2018-05-31 Thread Arthur Naseef
Ahh, I bet that is a result of using the VM transport.  The broker is
likely doing the "beforeMarshal()" call that clears the text just before
the client is calling "getText()" on the same message - or just before
"copy()" is called to make a copy of the message.

Yeah, looking at the code carefully - the storeContent() method is handling
"text != null" with "content = null" by converting text to a byte sequence (
org.apache.activemq.util.ByteSequence) and stuffing that into "content".

If all this happens concurrent with a copy() or a call to getText(), then
it's entirely possible to get this race condition.  So, disabling
"concurrent store and dispatch" could solve this problem.  Also, using
critical sections in "storeContentAndClear()" as well as "getText()" might
eliminate the race condition.

With all that said, I don't see the big win of clearing the text field in
storeContentAndClear().

@Chris - can you try the patch here and see if it eliminates the problem?
https://gist.github.com/artnaseef/d518c0431a34cd3d49e8d6e045a719e2

I may have missed some paths that access both "text" and "content", but I
think I got the important ones.  If that patch works, that increases
confidence that you found the right race condition.

Art



On Thu, May 31, 2018 at 12:55 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Found the culprit: Seems to be related to
> https://issues.apache.org/jira/browse/AMQ-5857
>
> Specifically, this commit:
> https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=
> blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/
> ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929f1f81a1902;hp=
> 347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=
> de86f473f776fe3a8ca32d7f929ecec7993c458b
>
> Changing the storeContentAndClear() call back to storeContent() fixes it,
> however it was changed for a reason to prevent memory waste so need to see
> if/what a permanent fix is
>
>
> On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 
> wrote:
>
> > Ok, thank you...my thought also was that performance impact of using tcp
> > wouldn't be enough to hurt us in any substantial way, even though it
> > *feels*
> > really wasteful to be marshaling the message and using TCP when we're in
> > the
> > same JVM.
> >
> > In response to your first message, I changed the transport from nio:// to
> > tcp:// on all the non-vm threads and the behavior still reproduces. Using
> > Java and raw AMQ libs is a much tougher problem...which I can do...but I
> > probably won't be able to get to that for a few days.
> >
> >
> >
> > --
> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> > f2368404.html
> >
>


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
Yeah, TCP should be fine.  Even if this gets fixed it will take time before
a new release is done anyways so you'll need to use the TCP or NIO for a
bit.

On Thu, May 31, 2018 at 3:55 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Found the culprit: Seems to be related to https://issues.apache.org/
> jira/browse/AMQ-5857
>
> Specifically, this commit: https://git-wip-us.apache.org/repos/asf?p=
> activemq.git;a=blobdiff;f=activemq-client/src/main/java/
> org/apache/activemq/command/ActiveMQTextMessage.java;h=
> cca09be21bfb877213a4a506415929f1f81a1902;hp=347db78da559bf57b049574c96cd3a
> 8e08681dae;hb=84ec047;hpb=de86f473f776fe3a8ca32d7f929ecec7993c458b
>
> Changing the storeContentAndClear() call back to storeContent() fixes it,
> however it was changed for a reason to prevent memory waste so need to see
> if/what a permanent fix is
>
>
> On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 
> wrote:
>
>> Ok, thank you...my thought also was that performance impact of using tcp
>> wouldn't be enough to hurt us in any substantial way, even though it
>> *feels*
>> really wasteful to be marshaling the message and using TCP when we're in
>> the
>> same JVM.
>>
>> In response to your first message, I changed the transport from nio:// to
>> tcp:// on all the non-vm threads and the behavior still reproduces. Using
>> Java and raw AMQ libs is a much tougher problem...which I can do...but I
>> probably won't be able to get to that for a few days.
>>
>>
>>
>> --
>> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-f2368404.
>> html
>>
>
>


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
Found the culprit: Seems to be related to
https://issues.apache.org/jira/browse/AMQ-5857

Specifically, this commit:
https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929f1f81a1902;hp=347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=de86f473f776fe3a8ca32d7f929ecec7993c458b

Changing the storeContentAndClear() call back to storeContent() fixes it,
however it was changed for a reason to prevent memory waste so need to see
if/what a permanent fix is


On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 
wrote:

> Ok, thank you...my thought also was that performance impact of using tcp
> wouldn't be enough to hurt us in any substantial way, even though it
> *feels*
> really wasteful to be marshaling the message and using TCP when we're in
> the
> same JVM.
>
> In response to your first message, I changed the transport from nio:// to
> tcp:// on all the non-vm threads and the behavior still reproduces. Using
> Java and raw AMQ libs is a much tougher problem...which I can do...but I
> probably won't be able to get to that for a few days.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> f2368404.html
>


Re: TextMessage.getText returning null

2018-05-31 Thread codingismy11to7
Ok, thank you...my thought also was that performance impact of using tcp
wouldn't be enough to hurt us in any substantial way, even though it *feels*
really wasteful to be marshaling the message and using TCP when we're in the
same JVM.

In response to your first message, I changed the transport from nio:// to
tcp:// on all the non-vm threads and the behavior still reproduces. Using
Java and raw AMQ libs is a much tougher problem...which I can do...but I
probably won't be able to get to that for a few days.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-f2368404.html


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
I did some quick testing and it looks like the first version it breaks is
5.12.0.  It seemed to work ok in 5.11.3.

5.12.0 was a big release though with over 200 issues so will still need to
narrow it down more.

In the meantime I would say if the TCP transport works then go with that, I
doubt you will notice much if any real performance impact.

On Thu, May 31, 2018 at 3:30 PM, codingismy11to7 
wrote:

> The current workaround we've been testing is to change to the TCP transport
> from the VM transport on the code that runs inside the broker process
> (where
> we want to guarantee that some queues or topics are always are being dealt
> with as long as ActiveMQ is running). That seems like it would be a pretty
> big inefficiency, but so far it looks to have eliminated the null bodies.
> If
> it's recommended that the VM transport should never be used, then we can
> deal with that. I'm not going to rewrite all of our unit tests to not use
> VM
> transport, but the odds of any of them failing because of this seems low.
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> f2368404.html
>


Re: TextMessage.getText returning null

2018-05-31 Thread codingismy11to7
The current workaround we've been testing is to change to the TCP transport
from the VM transport on the code that runs inside the broker process (where
we want to guarantee that some queues or topics are always are being dealt
with as long as ActiveMQ is running). That seems like it would be a pretty
big inefficiency, but so far it looks to have eliminated the null bodies. If
it's recommended that the VM transport should never be used, then we can
deal with that. I'm not going to rewrite all of our unit tests to not use VM
transport, but the odds of any of them failing because of this seems low.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-f2368404.html


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
A lot of changes happened between 5.10 and 5.15.  Knowing the first version
that it broke would be helpful to narrow down the change that broke it.
Even better would be to use 'git bisect' and find the exact commit that
introduced the issue.

On Thu, May 31, 2018 at 3:14 PM, codingismy11to7 
wrote:

> I put some text in the (very long) README in the reproduction project that
> kind of addresses this, but I didn't really specify that yes - I did throw
> in a debug library of the wrapper that printed out when the getText() call
> returned null at the call site, and it absolutely is returning null (the
> wrapper takes the value and puts it into an immutable object that cannot be
> modified...barring reflection hacks that aren't happening).
>
> The library is basically doing :
>
> consumer.addMessageListener(new MessageListener {
>   def onMessage(message: Message) = {
> try {
>   // extract message body and properties, getText() if it's a
> TextMessage, copy fields into
>   //   an immutable data structure, hand that off, never touch original
> message again
>   } //... catch elided
> })
>
> As another datapoint, our code didn't change between the upgrade from AMQ
> 5.10 to 5.15, this problem appeared, and between those two versions is when
> (I believe) the change happened where ActiveMQTextMessage.text was set to
> null during (before? after?) marshaling.
>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> f2368404.html
>


Re: TextMessage.getText returning null

2018-05-31 Thread codingismy11to7
I put some text in the (very long) README in the reproduction project that
kind of addresses this, but I didn't really specify that yes - I did throw
in a debug library of the wrapper that printed out when the getText() call
returned null at the call site, and it absolutely is returning null (the
wrapper takes the value and puts it into an immutable object that cannot be
modified...barring reflection hacks that aren't happening).

The library is basically doing :

consumer.addMessageListener(new MessageListener {
  def onMessage(message: Message) = {
try {
  // extract message body and properties, getText() if it's a
TextMessage, copy fields into
  //   an immutable data structure, hand that off, never touch original
message again
  } //... catch elided
})

As another datapoint, our code didn't change between the upgrade from AMQ
5.10 to 5.15, this problem appeared, and between those two versions is when
(I believe) the change happened where ActiveMQTextMessage.text was set to
null during (before? after?) marshaling.




--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-f2368404.html


Re: TextMessage.getText returning null

2018-05-31 Thread Arthur Naseef
Another thing I didn't dig into earlier - I generally avoid the VM
transport.  At some point in the past, I ran into a deadlock with the VM
transport that doesn't exist with the other transports because the VM
transport performs an operation synchronously that the interface explicitly
defines as asynchronous.  And given that the savings from the VM transport
are unlikely to be significant in most uses of ActiveMQ (with the broker
being in a separate JVM or across a network), the motivation to fix this
problem has been very low.

Anyway, that's why I recommend testing without the VM transport.

Art


On Thu, May 31, 2018 at 8:09 AM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Steven,
>
> I think you might need to do some more debugging to try and pinpoint the
> exact point where the body is null unless others have more time to look at
> it.  As Art said, check different points where the body could be null.
>
> I originally thought I had found the issue when I tested synchronizing on
> the getText() method just to see if that would fix it but I forgot that the
> VM transport and other parts of the broker actually copy the message so two
> threads shouldn't be accessing getText() at the same time and your
> consumers should each get their own copy of the message.
>
> On Thu, May 31, 2018 at 9:50 AM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
> > I forgot that when using the VM transport the message is supposed to be
> > copied on dispatch inside ActiveMQConnection so this may not be the exact
> > issue, I need to look at it a bit more.
> >
> >
> > On Thu, May 31, 2018 at 9:41 AM, Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> >> The issue is when using the VM transport and the getText() method has to
> >> unmarshall data back into text from the byte sequence.  This happens
> >> because you go from the NIO to VM transport.
> >>
> >> The main problem is you have multiple threads (3 consumers) at the same
> >> time calling getText() on the text message and there is a race condition
> >> where it will set the content to null during it.
> >>
> >> I created a very simple test case to show the issue and created a jira
> >> https://issues.apache.org/jira/browse/AMQ-6981
> >>
> >> On Wed, May 30, 2018 at 6:06 PM, Arthur Naseef  wrote:
> >>
> >>> Try turning the broker and client logging up to trace and see what the
> >>> logging shows for the message content.
> >>>
> >>> Also, try the following:
> >>> - Same test with a Java producer (using plain ActiveMQ libs) instead of
> >>> the
> >>> Scala producer
> >>> - Same test with a Java consumer (using plain ActiveMQ libs) instead
> of a
> >>> Scala consumer
> >>> - Use the "tcp" transport instead of the "nio" transport
> >>>
> >>> Anyway, the goal I would say is to trace the problem to the point of
> >>> hand-off between the broker and client.  For example, the following
> >>> questions come to mind (for the problematic messages):
> >>> - Is the broker receiving the body over the wire (i.e. tcp connection)?
> >>> - Is the broker sending the body over the wire?
> >>> - Is the client library receiving the body from the application?
> >>> - Is the client library sending the body to the application?
> >>>
> >>> Hope this helps.
> >>>
> >>> Art
> >>>
> >>>
> >>> On Wed, May 30, 2018 at 2:41 PM, codingismy11to7 <
> ste...@codemettle.com>
> >>> wrote:
> >>>
> >>> > We recently updated from (ancient) ActiveMQ 5.10.x to 5.15.4. I've
> >>> noticed
> >>> > that text messages are now sporadically returning null as their
> message
> >>> > bodies when they were sent with bodies.
> >>> >
> >>> > After failing to make a unit test showing this error, I created a
> >>> project
> >>> > that reproduces part of what our software is doing, which reliably
> >>> > replicates the race condition (my last run got null message bodies on
> >>> > ~0.03%
> >>> > of messages). I was able to vastly simplify what our software is
> doing
> >>> and
> >>> > still reproduce, but when I tried to simplify further the issue
> stopped
> >>> > appearing.
> >>> >
> >>> > Reproduction and full explanation:
> >>> > https://github.com/CodeMettle/amq_null_body_repro
> >>> >
> >>> > Since I don't have a self-contained unit test I was hesitant to open
> an
> >>> > issue directly and decided to send a message here instead.
> >>> >
> >>> > I'm a Scala/Akka developer, so I used the tools I was comfortable
> with
> >>> to
> >>> > make this reproduction without writing a lot of code, but I'm sure
> >>> somebody
> >>> > more comfortable in Java and the ActiveMQ API could reproduce in a
> unit
> >>> > test. I'm assuming it has something to do with multiple Sessions
> >>> consuming
> >>> > the same topic and calling TextMessage.getText at the same time.
> >>> >
> >>> > Any suggestions of what to do next?
> >>> >
> >>> > Thanks
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> >>> > f2368404.html
> >>> >

Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
Steven,

I think you might need to do some more debugging to try and pinpoint the
exact point where the body is null unless others have more time to look at
it.  As Art said, check different points where the body could be null.

I originally thought I had found the issue when I tested synchronizing on
the getText() method just to see if that would fix it but I forgot that the
VM transport and other parts of the broker actually copy the message so two
threads shouldn't be accessing getText() at the same time and your
consumers should each get their own copy of the message.

On Thu, May 31, 2018 at 9:50 AM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> I forgot that when using the VM transport the message is supposed to be
> copied on dispatch inside ActiveMQConnection so this may not be the exact
> issue, I need to look at it a bit more.
>
>
> On Thu, May 31, 2018 at 9:41 AM, Christopher Shannon <
> christopher.l.shan...@gmail.com> wrote:
>
>> The issue is when using the VM transport and the getText() method has to
>> unmarshall data back into text from the byte sequence.  This happens
>> because you go from the NIO to VM transport.
>>
>> The main problem is you have multiple threads (3 consumers) at the same
>> time calling getText() on the text message and there is a race condition
>> where it will set the content to null during it.
>>
>> I created a very simple test case to show the issue and created a jira
>> https://issues.apache.org/jira/browse/AMQ-6981
>>
>> On Wed, May 30, 2018 at 6:06 PM, Arthur Naseef  wrote:
>>
>>> Try turning the broker and client logging up to trace and see what the
>>> logging shows for the message content.
>>>
>>> Also, try the following:
>>> - Same test with a Java producer (using plain ActiveMQ libs) instead of
>>> the
>>> Scala producer
>>> - Same test with a Java consumer (using plain ActiveMQ libs) instead of a
>>> Scala consumer
>>> - Use the "tcp" transport instead of the "nio" transport
>>>
>>> Anyway, the goal I would say is to trace the problem to the point of
>>> hand-off between the broker and client.  For example, the following
>>> questions come to mind (for the problematic messages):
>>> - Is the broker receiving the body over the wire (i.e. tcp connection)?
>>> - Is the broker sending the body over the wire?
>>> - Is the client library receiving the body from the application?
>>> - Is the client library sending the body to the application?
>>>
>>> Hope this helps.
>>>
>>> Art
>>>
>>>
>>> On Wed, May 30, 2018 at 2:41 PM, codingismy11to7 
>>> wrote:
>>>
>>> > We recently updated from (ancient) ActiveMQ 5.10.x to 5.15.4. I've
>>> noticed
>>> > that text messages are now sporadically returning null as their message
>>> > bodies when they were sent with bodies.
>>> >
>>> > After failing to make a unit test showing this error, I created a
>>> project
>>> > that reproduces part of what our software is doing, which reliably
>>> > replicates the race condition (my last run got null message bodies on
>>> > ~0.03%
>>> > of messages). I was able to vastly simplify what our software is doing
>>> and
>>> > still reproduce, but when I tried to simplify further the issue stopped
>>> > appearing.
>>> >
>>> > Reproduction and full explanation:
>>> > https://github.com/CodeMettle/amq_null_body_repro
>>> >
>>> > Since I don't have a self-contained unit test I was hesitant to open an
>>> > issue directly and decided to send a message here instead.
>>> >
>>> > I'm a Scala/Akka developer, so I used the tools I was comfortable with
>>> to
>>> > make this reproduction without writing a lot of code, but I'm sure
>>> somebody
>>> > more comfortable in Java and the ActiveMQ API could reproduce in a unit
>>> > test. I'm assuming it has something to do with multiple Sessions
>>> consuming
>>> > the same topic and calling TextMessage.getText at the same time.
>>> >
>>> > Any suggestions of what to do next?
>>> >
>>> > Thanks
>>> >
>>> >
>>> >
>>> > --
>>> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
>>> > f2368404.html
>>> >
>>>
>>
>>
>


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
I forgot that when using the VM transport the message is supposed to be
copied on dispatch inside ActiveMQConnection so this may not be the exact
issue, I need to look at it a bit more.

On Thu, May 31, 2018 at 9:41 AM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> The issue is when using the VM transport and the getText() method has to
> unmarshall data back into text from the byte sequence.  This happens
> because you go from the NIO to VM transport.
>
> The main problem is you have multiple threads (3 consumers) at the same
> time calling getText() on the text message and there is a race condition
> where it will set the content to null during it.
>
> I created a very simple test case to show the issue and created a jira
> https://issues.apache.org/jira/browse/AMQ-6981
>
> On Wed, May 30, 2018 at 6:06 PM, Arthur Naseef  wrote:
>
>> Try turning the broker and client logging up to trace and see what the
>> logging shows for the message content.
>>
>> Also, try the following:
>> - Same test with a Java producer (using plain ActiveMQ libs) instead of
>> the
>> Scala producer
>> - Same test with a Java consumer (using plain ActiveMQ libs) instead of a
>> Scala consumer
>> - Use the "tcp" transport instead of the "nio" transport
>>
>> Anyway, the goal I would say is to trace the problem to the point of
>> hand-off between the broker and client.  For example, the following
>> questions come to mind (for the problematic messages):
>> - Is the broker receiving the body over the wire (i.e. tcp connection)?
>> - Is the broker sending the body over the wire?
>> - Is the client library receiving the body from the application?
>> - Is the client library sending the body to the application?
>>
>> Hope this helps.
>>
>> Art
>>
>>
>> On Wed, May 30, 2018 at 2:41 PM, codingismy11to7 
>> wrote:
>>
>> > We recently updated from (ancient) ActiveMQ 5.10.x to 5.15.4. I've
>> noticed
>> > that text messages are now sporadically returning null as their message
>> > bodies when they were sent with bodies.
>> >
>> > After failing to make a unit test showing this error, I created a
>> project
>> > that reproduces part of what our software is doing, which reliably
>> > replicates the race condition (my last run got null message bodies on
>> > ~0.03%
>> > of messages). I was able to vastly simplify what our software is doing
>> and
>> > still reproduce, but when I tried to simplify further the issue stopped
>> > appearing.
>> >
>> > Reproduction and full explanation:
>> > https://github.com/CodeMettle/amq_null_body_repro
>> >
>> > Since I don't have a self-contained unit test I was hesitant to open an
>> > issue directly and decided to send a message here instead.
>> >
>> > I'm a Scala/Akka developer, so I used the tools I was comfortable with
>> to
>> > make this reproduction without writing a lot of code, but I'm sure
>> somebody
>> > more comfortable in Java and the ActiveMQ API could reproduce in a unit
>> > test. I'm assuming it has something to do with multiple Sessions
>> consuming
>> > the same topic and calling TextMessage.getText at the same time.
>> >
>> > Any suggestions of what to do next?
>> >
>> > Thanks
>> >
>> >
>> >
>> > --
>> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
>> > f2368404.html
>> >
>>
>
>


Re: TextMessage.getText returning null

2018-05-31 Thread Christopher Shannon
The issue is when using the VM transport and the getText() method has to
unmarshall data back into text from the byte sequence.  This happens
because you go from the NIO to VM transport.

The main problem is you have multiple threads (3 consumers) at the same
time calling getText() on the text message and there is a race condition
where it will set the content to null during it.

I created a very simple test case to show the issue and created a jira
https://issues.apache.org/jira/browse/AMQ-6981

On Wed, May 30, 2018 at 6:06 PM, Arthur Naseef  wrote:

> Try turning the broker and client logging up to trace and see what the
> logging shows for the message content.
>
> Also, try the following:
> - Same test with a Java producer (using plain ActiveMQ libs) instead of the
> Scala producer
> - Same test with a Java consumer (using plain ActiveMQ libs) instead of a
> Scala consumer
> - Use the "tcp" transport instead of the "nio" transport
>
> Anyway, the goal I would say is to trace the problem to the point of
> hand-off between the broker and client.  For example, the following
> questions come to mind (for the problematic messages):
> - Is the broker receiving the body over the wire (i.e. tcp connection)?
> - Is the broker sending the body over the wire?
> - Is the client library receiving the body from the application?
> - Is the client library sending the body to the application?
>
> Hope this helps.
>
> Art
>
>
> On Wed, May 30, 2018 at 2:41 PM, codingismy11to7 
> wrote:
>
> > We recently updated from (ancient) ActiveMQ 5.10.x to 5.15.4. I've
> noticed
> > that text messages are now sporadically returning null as their message
> > bodies when they were sent with bodies.
> >
> > After failing to make a unit test showing this error, I created a project
> > that reproduces part of what our software is doing, which reliably
> > replicates the race condition (my last run got null message bodies on
> > ~0.03%
> > of messages). I was able to vastly simplify what our software is doing
> and
> > still reproduce, but when I tried to simplify further the issue stopped
> > appearing.
> >
> > Reproduction and full explanation:
> > https://github.com/CodeMettle/amq_null_body_repro
> >
> > Since I don't have a self-contained unit test I was hesitant to open an
> > issue directly and decided to send a message here instead.
> >
> > I'm a Scala/Akka developer, so I used the tools I was comfortable with to
> > make this reproduction without writing a lot of code, but I'm sure
> somebody
> > more comfortable in Java and the ActiveMQ API could reproduce in a unit
> > test. I'm assuming it has something to do with multiple Sessions
> consuming
> > the same topic and calling TextMessage.getText at the same time.
> >
> > Any suggestions of what to do next?
> >
> > Thanks
> >
> >
> >
> > --
> > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> > f2368404.html
> >
>


Re: TextMessage.getText returning null

2018-05-30 Thread Arthur Naseef
Try turning the broker and client logging up to trace and see what the
logging shows for the message content.

Also, try the following:
- Same test with a Java producer (using plain ActiveMQ libs) instead of the
Scala producer
- Same test with a Java consumer (using plain ActiveMQ libs) instead of a
Scala consumer
- Use the "tcp" transport instead of the "nio" transport

Anyway, the goal I would say is to trace the problem to the point of
hand-off between the broker and client.  For example, the following
questions come to mind (for the problematic messages):
- Is the broker receiving the body over the wire (i.e. tcp connection)?
- Is the broker sending the body over the wire?
- Is the client library receiving the body from the application?
- Is the client library sending the body to the application?

Hope this helps.

Art


On Wed, May 30, 2018 at 2:41 PM, codingismy11to7 
wrote:

> We recently updated from (ancient) ActiveMQ 5.10.x to 5.15.4. I've noticed
> that text messages are now sporadically returning null as their message
> bodies when they were sent with bodies.
>
> After failing to make a unit test showing this error, I created a project
> that reproduces part of what our software is doing, which reliably
> replicates the race condition (my last run got null message bodies on
> ~0.03%
> of messages). I was able to vastly simplify what our software is doing and
> still reproduce, but when I tried to simplify further the issue stopped
> appearing.
>
> Reproduction and full explanation:
> https://github.com/CodeMettle/amq_null_body_repro
>
> Since I don't have a self-contained unit test I was hesitant to open an
> issue directly and decided to send a message here instead.
>
> I'm a Scala/Akka developer, so I used the tools I was comfortable with to
> make this reproduction without writing a lot of code, but I'm sure somebody
> more comfortable in Java and the ActiveMQ API could reproduce in a unit
> test. I'm assuming it has something to do with multiple Sessions consuming
> the same topic and calling TextMessage.getText at the same time.
>
> Any suggestions of what to do next?
>
> Thanks
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> f2368404.html
>