I observed the same problem again, the cursor of a topic got reset about an hour and 15 minutes. The queue went from zero to about 2,000 elements in a second without any application adding messages. All messages that had occurred during that time were replayed. (My application, as you can see, does not generate a lot of messages for this topic).
I need to understand how the cursors are stored. Can anyone point me to any documentation or code? I could not find any exceptions in the log for that topic, but I did found around the same time the following exception in a different topic. Any help is appreciated. "Apr 25, 2022 @ 12:18:32.509","19:18:32.509 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181, pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181, pulsar-bookie-2.pulsar-bookie.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []" "Apr 25, 2022 @ 12:18:31.958","19:18:31.958 [pulsar-io-4-3] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181, pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181, pulsar-bookie-4.pulsar-bookie.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []" "Apr 25, 2022 @ 12:18:31.921","java.lang.NullPointerException: null" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BacklogQuotaManager.dropBacklogForTimeLimit(BacklogQuotaManager.java:247) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BrokerService.lambda$forEachTopic$71(BrokerService.java:1684) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:413) ~[io.streamnative-pulsar-common-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:185) ~[io.streamnative-pulsar-common-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BrokerService.monitorBacklogQuota(BrokerService.java:1693) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]" "Apr 25, 2022 @ 12:18:31.921","19:18:31.906 [pulsar-backlog-quota-checker-35-1] ERROR org.apache.pulsar.broker.service.BacklogQuotaManager - [persistent://platform/system/merge-response-publish-odd-partition-2] Error resetting cursor for slowest consumer [platform_merge-response-publish-odd_queue]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BacklogQuotaManager.handleExceededBacklogQuota(BacklogQuotaManager.java:129) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BrokerService.lambda$monitorBacklogQuota$72(BrokerService.java:1700) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.Optional.ifPresent(Optional.java:183) ~[?:?]" "Apr 25, 2022 @ 12:18:31.921"," at org.apache.pulsar.broker.service.BrokerService.forEachTopic(BrokerService.java:1682) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]" "Apr 25, 2022 @ 12:18:31.921"," at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]" "Apr 25, 2022 @ 12:18:31.921"," at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.72.Final.jar:4.1.72.Final]" "Apr 25, 2022 @ 12:18:31.921"," at java.lang.Thread.run(Thread.java:829) [?:?]" "Apr 25, 2022 @ 12:18:31.888","19:18:31.888 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local Thanks On Fri, Mar 18, 2022 at 9:15 PM PengHui Li <peng...@apache.org> wrote: > It means you have other reset cursor options when doing the ledger trimming > > On Sat, Mar 19, 2022 at 5:52 AM Tecno Brain <cerebrotecnolog...@gmail.com> > wrote: > >> I found this in the logs: >> * Failed to mark delete while trimming data ledgers: Reset cursor in >> progress - unable to mark delete position 18377:-1* >> but although the messages indicate different topics, the time matches >> when we started to see messages replayed. >> >> What causes that message? What are the consequences of it? >> >> On Fri, Mar 18, 2022 at 1:31 PM Tecno Brain <cerebrotecnolog...@gmail.com> >> wrote: >> >>> I wonder what would I have seen in the logs if someone had done >>> something like this: >>> >>> bin/pulsar-admin persistent reset-cursor --time 3h >>> persistent://tenant/namespace/topic >>> >>> >>> On Thu, Mar 17, 2022 at 1:54 PM Tecno Brain < >>> cerebrotecnolog...@gmail.com> wrote: >>> >>>> Hi Penghui, >>>> No, we are not seeing messages "disappear" because of TTL. >>>> What we observed is that messages from 3 out of 20 topics were >>>> reprocessed. >>>> We initially thought that the messages were written again into the >>>> topic by our own applications, but we did not find any evidence of that >>>> happening. Our applications logs are pretty good and we would have found >>>> some evidence. >>>> We couldn't find the reason. >>>> Our hypothesis was that the cursor was lost and I was trying to find >>>> a way to verify that hypothesis through the Pulsar logs...looking if we had >>>> lost a broker or a bookie. >>>> Initially, I thought that perhaps those 3 topics belonged to the same >>>> "bundle" and whenever the broker changed, the cursor was lost. >>>> But Pulsar stores the cursor in the bookies, not the broker....so, a >>>> broker change shouldn't affect the cursor for the subscription (a shared >>>> subscription) >>>> We haven't observed the same issue again. >>>> >>>> >>>> >>>> On Sat, Mar 12, 2022 at 7:21 AM PengHui Li <peng...@apache.org> wrote: >>>> >>>>> If you have TTL, the messages will be expired. >>>>> You mean "cursor was lost", how do you verify this? >>>>> to list subscriptions or not able to consume the message? >>>>> If it is the latter, I think it should be related to the message TTL. >>>>> >>>>> And how about the "brokerDeleteInactiveTopicsMode" in your broker >>>>> settings? >>>>> If you are using "delete_when_subscriptions_caught_up", after all the >>>>> message >>>>> been expired, the topic will be deleted automatically by default. >>>>> >>>>> Penghui >>>>> >>>>> On Sat, Mar 12, 2022 at 5:28 AM Tecno Brain < >>>>> cerebrotecnolog...@gmail.com> wrote: >>>>> >>>>>> We do have a backlog quota and messageTTL >>>>>> >>>>>> Our namespace is configured as follows: >>>>>> >>>>>> Backlog quota: >>>>>> >>>>>> >>>>>> *"message_age BacklogQuotaImpl(limit=-1, limitSize=-1, >>>>>> limitTime=180000, policy=consumer_backlog_eviction)"* >>>>>> Retention: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *{ "retentionTimeInMinutes" : 0, "retentionSizeInMB" : 0}* >>>>>> >>>>>> Message TTL: >>>>>> >>>>>> *172800* >>>>>> >>>>>> All topics are partitioned. >>>>>> >>>>>> *{ >>>>>> "partitions" : 3 >>>>>> }* >>>>>> >>>>>> >>>>>> On Thu, Mar 10, 2022 at 11:24 PM PengHui Li <codelipeng...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Have you changed the backlog quota policy or enabled the message TTL? >>>>>>> Pulsar will not remove any cursors or skip messages by default. >>>>>>> >>>>>>> Penghui >>>>>>> On Mar 11, 2022, 1:25 PM +0800, Tecno Brain < >>>>>>> cerebrotecnolog...@gmail.com>, wrote: >>>>>>> >>>>>>> Hello, >>>>>>> I have an application using Pulsar just as JMS (we get single >>>>>>> messages, acknowledge them when we are done processing it) >>>>>>> The entire system, composed of several types of apps, uses about 40 >>>>>>> different topics. >>>>>>> >>>>>>> Yesterday, an application that subscribes to about 20 queues, >>>>>>> suddenly was inundated with thousands of messages from two of the queues >>>>>>> but I could not track those messages to an application producing them. >>>>>>> We >>>>>>> found that the messages were *duplicates*. >>>>>>> So it seems that the cursor to these two topics was lost, and >>>>>>> messages from 3 hours earlier were consumed again. I found the >>>>>>> following >>>>>>> paragraph : >>>>>>> >>>>>>> "Each subscription stores a cursor. The cursor is the current >>>>>>> offset in the log. Subscriptions store their cursor in BookKeeper in >>>>>>> ledgers. This makes cursor tracking scalable just like topics." >>>>>>> ( >>>>>>> https://jack-vanlightly.com/blog/2018/10/2/understanding-how-apache-pulsar-works >>>>>>> ) >>>>>>> >>>>>>> My guess is that the cursor was lost. >>>>>>> How could I verify that this was the case? I can't find anything >>>>>>> relevant in the logs. >>>>>>> >>>>>>> The only message I found occurring around the same time is >>>>>>> >>>>>>> New ensemble: >>>>>>> [pulsar-bookie-2.pulsar-bookie.pulsar.svc.cluster.local:3181, >>>>>>> pulsar-bookie-0.pulsar-bookie.pulsar.svc.cluster.local:3181, >>>>>>> pulsar-bookie-3.pulsar-bookie.pulsar.svc.cluster.local:3181] is not >>>>>>> adhering to Placement Policy >>>>>>> >>>>>>> Any pointers are appreciated. >>>>>>> >>>>>>> >>>>>>> >>>>>>>