Alexandre Dupriez created KAFKA-15678:
-----------------------------------------

             Summary: [Tiered Storage] Stall remote reads with long-spanning 
transactions
                 Key: KAFKA-15678
                 URL: https://issues.apache.org/jira/browse/KAFKA-15678
             Project: Kafka
          Issue Type: Bug
          Components: Tiered-Storage
    Affects Versions: 3.6.0
            Reporter: Alexandre Dupriez


Hi team,

I am facing an issue on the remote data path for uncommitted reads.

As mentioned in [the original 
PR|https://github.com/apache/kafka/pull/13535#discussion_r1166887367], if a 
transaction spans over a long sequence of segments, the time taken to retrieve 
the producer snapshots from the remote storage can, in the worst case, become 
redhibitory and block the reads if it consistently exceed the deadline of fetch 
requests ({{{}fetch.max.wait.ms{}}}).

Essentially, the method used to compute the uncommitted records to return have 
an asymptotic complexity proportional to the number of segments in the log. 
That is not a problem with local storage since the constant factor to traverse 
the files is small enough, but that is not the case with a remote storage which 
exhibits higher read latency. An aggravating factor was the lock contention in 
the remote index cache which was then mitigated by KAFKA-15084. But 
unfortunately, despite the improvements observed without said contention, the 
algorithmic complexity of the current method used to compute uncommitted 
records can always defeat any optimisation made on the remote read path.

Maybe we could start thinking (if not already) about a different construct 
which would reduce that complexity to O(1) - i.e. to make the computation 
independent from the number of segments and irrespective of the transaction 
spans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to