2020-10-18 13:13:16 UTC - Yuval Kovler: @Yuval Kovler has joined the channel
----
2020-10-18 18:59:50 UTC - Rattanjot Singh: Is there a way to list proxies like 
we list brokers.
```pulsar-admin brokers list use```
----
2020-10-19 06:20:54 UTC - Lari Hotari: @hangc the change looks simple and 
effectively prevents the infinite loop. It will take some time for me to 
confirm in the real environment.

I started looking more into the reason why the state become invalid in the 
first place. There seems to be quite a few past issues where a race condition 
in updating readPosition has been an issue. For example 
<https://github.com/apache/pulsar/pull/1478> , 
<https://github.com/apache/pulsar/pull/3015> &amp; 
<https://github.com/apache/pulsar/pull/287> .
I also noticed <https://github.com/apache/pulsar/pull/6606> which adds 
READ_POSITION_UPDATER for readPosition in ManagedCursorImpl .

It seems that ManagedCursorImpl.readPosition could only get out of sync from 
OpReadEntry.readPosition if ManagedCursorImpl.readPosition gets updated after 
the OpReadEntry has been created since OpReadEntry's readPosition gets 
initialized from ManagedCursorImpl.readPosition.
The race condition seems to happen in this code in the setAcknowledgePosition 
method:
<https://github.com/apache/pulsar/blob/825fdd4222dd65ef3099f1a975a1555226297379/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java#L1512-L1523>
In other locations, whenever readPosition field is modified, it is locked. In 
this location, there is no lock.
However <https://github.com/apache/pulsar/pull/6606> introduced another method 
for handling race condition. So there are 2 ways to handle race conditions for 
readPosition field: ManagedCursorImpl.lock.writeLock() and there's also the 
ManagedCursorImpl.READ_POSITION_UPDATER .

What would be the way to fix the root cause, the race condition in updating the 
readPosition field?
----
2020-10-19 06:40:56 UTC - Lari Hotari: I created a separate issue to handle the 
root cause: <https://github.com/apache/pulsar/issues/8293>
----
2020-10-19 06:56:03 UTC - Johannes Wienke: Thanks for the replies. Getting that 
WIP into production would definitely help us. We'd still like to use validation 
if possible
----

Reply via email to