Re: [PR] CASSANALYTICS-124: Commitlog reading not progressing in CDC due to incorrect CommitLogReader.isFullyRead [cassandra-analytics]

2026-03-09 Thread via GitHub


jyothsnakonisa merged PR #176:
URL: https://github.com/apache/cassandra-analytics/pull/176


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] CASSANALYTICS-124: Commitlog reading not progressing in CDC due to incorrect CommitLogReader.isFullyRead [cassandra-analytics]

2026-03-06 Thread via GitHub


jyothsnakonisa commented on PR #176:
URL: 
https://github.com/apache/cassandra-analytics/pull/176#issuecomment-4014533608

   Realized that I can verify the fix I made in the existing test in 
BufferingCommitLogReaderTests. Hence reverted additional tests that I have 
added earlier to avoid complexity


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] CASSANALYTICS-124: Commitlog reading not progressing in CDC due to incorrect CommitLogReader.isFullyRead [cassandra-analytics]

2026-03-06 Thread via GitHub


jyothsnakonisa commented on code in PR #176:
URL: 
https://github.com/apache/cassandra-analytics/pull/176#discussion_r2898197366


##
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##
@@ -427,6 +437,7 @@ private void readSection(FileDataInput reader,
 if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
 {
 logger.trace("Encountered end of segment marker at", 
"position", reader.getFilePointer());
+this.position = (int) log.maxOffset();

Review Comment:
   Thanks for flagging this. This code gets executed only for legacy commit 
logs (2.x version). I added a test to test this 
`testPositionAtMaxOffsetInCommitlogsWithPaddedZeros`, but realized that I can't 
produce a 2.x version commit log easily hence removed the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] CASSANALYTICS-124: Commitlog reading not progressing in CDC due to incorrect CommitLogReader.isFullyRead [cassandra-analytics]

2026-03-05 Thread via GitHub


jyothsnakonisa commented on code in PR #176:
URL: 
https://github.com/apache/cassandra-analytics/pull/176#discussion_r2892910811


##
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##
@@ -284,6 +284,9 @@ private void readCommitLogSegment() throws IOException
 {
 stats.commitLogBytesSkippedOnRead(startMarker.position() - 
reader.getFilePointer());
 segmentReader.seek(startMarker.position());
+// When starting from an offset, position must be initialized 
to startMarker.position()

Review Comment:
   I agree, we can note this down and do the refactoring in another patch later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] CASSANALYTICS-124: Commitlog reading not progressing in CDC due to incorrect CommitLogReader.isFullyRead [cassandra-analytics]

2026-03-05 Thread via GitHub


jmckenzie-dev commented on code in PR #176:
URL: 
https://github.com/apache/cassandra-analytics/pull/176#discussion_r2892788908


##
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##
@@ -309,6 +312,13 @@ private void readCommitLogSegment() throws IOException
 break;
 }
 }
+
+// If the segment reader iterator completes reading commitlog with 
padded zeros, set the position
+// as maxOffset to mark completion of reading commitlog

Review Comment:
   This comment confused me a bit and sent me down a rabbit hole. wdyt about 
this instead? Is it accurate?
   ```
   // If the loop finished naturally (iterator exhausted) without hitting an 
error or limit,
   // ensure the position reflects the end of the file. If we aborted early due 
to an error
   // or mutation limit, 'this.position' remains at the last valid read offset.
   ```



##
cassandra-analytics-cdc/src/test/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReaderTests.java:
##
@@ -112,18 +94,76 @@ public void testReaderSeek()
 prevMarker = marker;
 
 if (marker.equals(allMarkers.get(allMarkers.size() - 1)))
-{
-// last marker should return 0 updates
-// and be at the end of the file
 assertThat(result).isEmpty();
-}
 else
-{
 assertThat(result).isNotEmpty();
-}
 }
 }
 
+@Test
+public void testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()
+{
+CommitLog log = writeAndSync(100, null);
+// Seek to maxOffset so the loop runs 0 iterations (natural 
exhaustion);
+// Fix 1 sets position = startMarker.position, Fix 2 sets position = 
maxOffset on clean exit.
+Marker maxOffsetMarker = log.markerAt(log.segmentId(), (int) 
log.maxOffset());
+try (BufferingCommitLogReader reader = new 
BufferingCommitLogReader(log, maxOffsetMarker, CdcStats.STUB, null))
+{
+CommitLogReader.Result result = reader.result();
+assertThat(reader.position()).isEqualTo((int) log.maxOffset());
+assertThat(result.isFullyRead()).isTrue();
+}
+}
+
+@Test
+public void testPositionAtMaxOffsetWhenSeekingToEnd()

Review Comment:
   Excepting the `assertThat(result.updates()).isEmpty();` check, this test is 
identical to `testPositionAtMaxOffsetInCommitlogsWithPaddedZeros()`. Maybe we 
keep this one and just leave a javadoc breadcrumb to the JIRA ticket here and 
the 2 cases this test covers?



##
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##
@@ -427,6 +437,7 @@ private void readSection(FileDataInput reader,
 if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
 {
 logger.trace("Encountered end of segment marker at", 
"position", reader.getFilePointer());
+this.position = (int) log.maxOffset();

Review Comment:
   Do we have a unit test that exercises this case of "hit 
`LEGACY_END_OF_SEGMENT_MARKER`?



##
cassandra-four-zero-bridge/src/main/java/org/apache/cassandra/db/commitlog/BufferingCommitLogReader.java:
##
@@ -284,6 +284,9 @@ private void readCommitLogSegment() throws IOException
 {
 stats.commitLogBytesSkippedOnRead(startMarker.position() - 
reader.getFilePointer());
 segmentReader.seek(startMarker.position());
+// When starting from an offset, position must be initialized 
to startMarker.position()

Review Comment:
   Just leaving a note so we're aligned (no change requested). The relationship 
between reader, startMarker, and this.position is something we should consider 
structurally relating in the future. If there's an expectation that these 
numeric sentinels are linked, wrapping them in code that enforces those 
invariants would help prevent gaps and allow us to refactor more safely.
   
   Again - nothing to do here. ;) Just my first thoughts on looking at this 
code in detail. I think this is worth considering for a broader refactor; this 
BufferingCommitLogReader seems like a leaky abstraction on top of the regular 
commit log reader that's forcing us to do a lot of heavy lifting and exposing 
us to a lot of potential pain if we change it too much. 😬



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For add