some of my users are attempting a pattern to deduplicate messages based on
a time window instead of a fixed amount of space (a duplicate id cache)

so far the concept has been working very well. So they send their AMQP
messages (qpid-jms-client) into a Last Value Queue with an appropriate
identifier in the _AMQ_LVQ_NAME. They also set a TimeToLive on the message
that is essentially the lag they will allow as they want to wait for
possible duplicates. If any duplicates come in the Last Value Queue
behavior is replacing the older message with the newer message until the
expiration. expired messages are delivered to the preconfigured expiry
queue where their application is listening. This is not perfect but its not
intended to be. Its just intended to reduce additional unnecessary
processing and they understand this is not a guarantee. It really helps
with a system that produces messages in a way that has flurries of
"notifications" about the same assetID over and over again.

BUT where we are seeing is a problem is when we are consuming from the
queue used to hold expired messages and we toss some exception and the
message needs to be redelivered. the first time or two the message is
redelivered it is delivered OK. But when the JMSXDeliveryCount is about 3
or 4 (we use redelivery delay and multipliers to spread these out) our
qpid-jms-client stops being able to read the messages.

we were only able to reproduce this when an AMQP message expired onto the
queue. (expired from a LVQ in case that is relevant). if we place the
message directly on a queue and test different exception and redelivery
scenarios we cannot reproduce this behavior.

i enable the qpid-jms-client frame logging (via env variable
PN_TRACE_FRM=true) and i saw that in the situation when the client code
cannot access the payload, even though the broker WAS still sending the
payload. so i thought it was some odd issue with the client. The Apache
Qpid team responded that the issue seems to be that the broker starts to
send some ill formed payloads in this scenario. i dont want to repeat the
stack traces and their response, you can read those here

https://lists.apache.org/thread.html/b1fd9c09a1f66f5529601a8651fbb96585c011b22bbd84e07c4f23b1@%3Cusers.qpid.apache.org%3E

would it be helpful if i tested that this happens if there is not a LVQ
involved? i could have a message in a non-LVQ expire to another queue and
see if redeliveries over their get messed up after a few attempts. For the
record this is AMQP for producing and consuming. i do notice the messages
waiting in the expiry queue have much more headers messages sent directly
to a queue from client code. they seem to be headers full of information
about the message as it left the previous queue. I tried to send a message
directly to the expiry queue with all these headers to determine if it was
the existence of one of these specifically that trigger the malformed frame
but was not able to fully set all those headers. the JMSDeliverCount (type
Long) was the one that the client would not let me set and as a result i
could not test. for clarity thought i dont know that the issue exists due
to a header that is just what i saw as a difference between messages be
delivered to the queue by client code versus messages expiring from one
queue to another.

please look over the linked thread on the qpid list and let me know if you
know why a message transfer fram would become malformed after a few failed
deliveries only if the message expired onto the current queue.

thanks so much

Reply via email to