Re: Messages replayed

Tecno Brain Mon, 25 Apr 2022 14:16:23 -0700

I observed the same problem again, the cursor of a topic got reset about an
hour and 15 minutes.
The queue went from zero to about 2,000 elements in a second without any
application adding messages. All messages that had occurred during that
time were replayed.
(My application, as you can see, does not generate a lot of messages for
this topic).


I need to understand how the cursors are stored.

Can anyone point me to any documentation or code?

I could not find any exceptions in the log for that topic, but I did found
around the same time the following exception in a different topic. Any help
is appreciated.

"Apr 25, 2022 @ 12:18:32.509","19:18:32.509
[BookKeeperClientWorker-OrderedExecutor-0-0] WARN
org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble:
[pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181,
pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181,
pulsar-bookie-2.pulsar-bookie.pulsar.svc.cluster.local:3181] is not
adhering to Placement Policy. quarantinedBookies: []"
"Apr 25, 2022 @ 12:18:31.958","19:18:31.958 [pulsar-io-4-3] WARN
org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble:
[pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181,
pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181,
pulsar-bookie-4.pulsar-bookie.pulsar.svc.cluster.local:3181] is not
adhering to Placement Policy. quarantinedBookies: []"
"Apr 25, 2022 @ 12:18:31.921","java.lang.NullPointerException: null"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BacklogQuotaManager.dropBacklogForTimeLimit(BacklogQuotaManager.java:247)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BrokerService.lambda$forEachTopic$71(BrokerService.java:1684)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:413)
~[io.streamnative-pulsar-common-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:185)
~[io.streamnative-pulsar-common-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BrokerService.monitorBacklogQuota(BrokerService.java:1693)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32)
[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36)
[org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
[?:?]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[?:?]"
"Apr 25, 2022 @ 12:18:31.921","19:18:31.906
[pulsar-backlog-quota-checker-35-1] ERROR
org.apache.pulsar.broker.service.BacklogQuotaManager -
[persistent://platform/system/merge-response-publish-odd-partition-2]
Error resetting cursor for slowest consumer
[platform_merge-response-publish-odd_queue]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BacklogQuotaManager.handleExceededBacklogQuota(BacklogQuotaManager.java:129)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BrokerService.lambda$monitorBacklogQuota$72(BrokerService.java:1700)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.Optional.ifPresent(Optional.java:183) ~[?:?]"
"Apr 25, 2022 @ 12:18:31.921"," at
org.apache.pulsar.broker.service.BrokerService.forEachTopic(BrokerService.java:1682)
~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
[?:?]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
[?:?]"
"Apr 25, 2022 @ 12:18:31.921"," at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[?:?]"
"Apr 25, 2022 @ 12:18:31.921"," at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
[io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]"
"Apr 25, 2022 @ 12:18:31.921"," at java.lang.Thread.run(Thread.java:829) [?:?]"
"Apr 25, 2022 @ 12:18:31.888","19:18:31.888
[BookKeeperClientWorker-OrderedExecutor-0-0] WARN
org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble:
[pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local

Thanks




On Fri, Mar 18, 2022 at 9:15 PM PengHui Li <peng...@apache.org> wrote:

> It means you have other reset cursor options when doing the ledger trimming
>
> On Sat, Mar 19, 2022 at 5:52 AM Tecno Brain <cerebrotecnolog...@gmail.com>
> wrote:
>
>> I found this in the logs:
>> *    Failed to mark delete while trimming data ledgers: Reset cursor in
>> progress - unable to mark delete position 18377:-1*
>> but although the messages indicate different topics, the time matches
>> when we started to see messages replayed.
>>
>> What causes that message? What are the consequences of it?
>>
>> On Fri, Mar 18, 2022 at 1:31 PM Tecno Brain <cerebrotecnolog...@gmail.com>
>> wrote:
>>
>>> I wonder what would I have seen in the logs if someone had done
>>> something like this:
>>>
>>> bin/pulsar-admin persistent reset-cursor --time 3h 
>>> persistent://tenant/namespace/topic
>>>
>>>
>>> On Thu, Mar 17, 2022 at 1:54 PM Tecno Brain <
>>> cerebrotecnolog...@gmail.com> wrote:
>>>
>>>> Hi Penghui,
>>>>    No, we are not seeing messages "disappear" because of TTL.
>>>>   What we observed is that messages from 3 out of 20  topics were
>>>> reprocessed.
>>>>   We initially thought that the messages were written again into the
>>>> topic by our own applications, but we did not find any evidence of that
>>>> happening. Our applications logs are pretty good and we would have found
>>>> some evidence.
>>>>   We couldn't find the reason.
>>>>   Our hypothesis was  that the cursor was lost and I was trying to find
>>>> a way to verify that hypothesis through the Pulsar logs...looking if we had
>>>> lost a broker or a bookie.
>>>>   Initially, I thought that perhaps those 3 topics belonged to the same
>>>> "bundle" and whenever the broker changed, the cursor was lost.
>>>>   But Pulsar stores the cursor in the bookies, not the broker....so, a
>>>> broker change shouldn't affect the cursor for the subscription (a shared
>>>> subscription)
>>>>   We haven't observed the same issue again.
>>>>
>>>>
>>>>
>>>> On Sat, Mar 12, 2022 at 7:21 AM PengHui Li <peng...@apache.org> wrote:
>>>>
>>>>> If you have TTL, the messages will be expired.
>>>>> You mean "cursor was lost", how do you verify this?
>>>>> to list subscriptions or not able to consume the message?
>>>>> If it is the latter, I think it should be related to the message TTL.
>>>>>
>>>>> And how about the "brokerDeleteInactiveTopicsMode" in your broker
>>>>> settings?
>>>>> If you are using "delete_when_subscriptions_caught_up", after all the
>>>>> message
>>>>> been expired, the topic will be deleted automatically by default.
>>>>>
>>>>> Penghui
>>>>>
>>>>> On Sat, Mar 12, 2022 at 5:28 AM Tecno Brain <
>>>>> cerebrotecnolog...@gmail.com> wrote:
>>>>>
>>>>>> We do have a backlog quota and messageTTL
>>>>>>
>>>>>> Our namespace is configured as follows:
>>>>>>
>>>>>> Backlog quota:
>>>>>>
>>>>>>
>>>>>> *"message_age    BacklogQuotaImpl(limit=-1, limitSize=-1,
>>>>>> limitTime=180000, policy=consumer_backlog_eviction)"*
>>>>>> Retention:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *{  "retentionTimeInMinutes" : 0,  "retentionSizeInMB" : 0}*
>>>>>>
>>>>>> Message TTL:
>>>>>>
>>>>>> *172800*
>>>>>>
>>>>>> All topics are partitioned.
>>>>>>
>>>>>> *{
>>>>>>   "partitions" : 3
>>>>>> }*
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 10, 2022 at 11:24 PM PengHui Li <codelipeng...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Have you changed the backlog quota policy or enabled the message TTL?
>>>>>>> Pulsar will not remove any cursors or skip messages by default.
>>>>>>>
>>>>>>> Penghui
>>>>>>> On Mar 11, 2022, 1:25 PM +0800, Tecno Brain <
>>>>>>> cerebrotecnolog...@gmail.com>, wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>  I have an application using Pulsar just as JMS (we get single
>>>>>>> messages, acknowledge them when we are done processing it)
>>>>>>>  The entire system, composed of several types of apps, uses about 40
>>>>>>> different topics.
>>>>>>>
>>>>>>>  Yesterday, an application that subscribes to about 20 queues,
>>>>>>> suddenly was inundated with thousands of messages from two of the queues
>>>>>>> but I could not track those messages to an application producing them. 
>>>>>>> We
>>>>>>> found that the messages were *duplicates*.
>>>>>>>   So it seems that the cursor to these two topics was lost, and
>>>>>>> messages from 3 hours earlier were consumed again.  I found the 
>>>>>>> following
>>>>>>> paragraph :
>>>>>>>
>>>>>>>  "Each subscription stores a cursor. The cursor is the current
>>>>>>> offset in the log. Subscriptions store their cursor in BookKeeper in
>>>>>>> ledgers. This makes cursor tracking scalable just like topics."
>>>>>>>  (
>>>>>>> https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works
>>>>>>> )
>>>>>>>
>>>>>>> My guess is that the cursor was lost.
>>>>>>> How could I verify that this was the case? I can't find anything
>>>>>>> relevant in the logs.
>>>>>>>
>>>>>>> The only message I found occurring around the same time is
>>>>>>>
>>>>>>> New ensemble:
>>>>>>> [pulsar-bookie-2.pulsar-bookie.pulsar.svc.cluster.local:3181,
>>>>>>> pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181,
>>>>>>> pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181] is not
>>>>>>> adhering to Placement Policy
>>>>>>>
>>>>>>> Any pointers are appreciated.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

Re: Messages replayed

Reply via email to