Copilot commented on code in PR #16765:
URL: https://github.com/apache/iotdb/pull/16765#discussion_r2554517146
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/utils/datastructure/AlignedTVList.java:
##########
@@ -2172,7 +2184,7 @@ public void encodeBatch(IChunkWriter chunkWriter,
BatchEncodeInfo encodeInfo, lo
encodeInfo.pointNumInChunk++;
} else {
if (Objects.isNull(timeDuplicateInfo)) {
- timeDuplicateInfo = new BitMap(rows);
+ timeDuplicateInfo = new LazyBitMap(index,
maxRowCountOfCurrentBatch, rows - 1);
Review Comment:
The LazyBitMap is initialized with `index` as the startPosition, but later
at line 2206, the code iterates from `startIndex` (saved at line 2160) to check
`isMarked(sortedRowIndex)` at line 2220. If the first duplicate is found after
some iterations, `sortedRowIndex` could be less than `index`, causing incorrect
behavior. The first parameter should be `startIndex` instead of `index` to
align with the iteration range at line 2206.
Example: If startIndex=0 and the first duplicate is at index=50,
LazyBitMap(50,...) is created, but isMarked(0) through isMarked(49) will be
called, which are below the startPosition.
```suggestion
timeDuplicateInfo = new LazyBitMap(startIndex,
maxRowCountOfCurrentBatch, rows - 1);
```
##########
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/memtable/AlignedTVListIteratorTest.java:
##########
@@ -952,4 +955,50 @@ private void testSkipTimeRange(
}
Assert.assertEquals(expectedTimestamps, resultTimestamps);
}
+
+ @Test
+ public void testEncodeBatch() {
+ testEncodeBatch(largeSingleTvListMap, 400000);
+ testEncodeBatch(largeOrderedMultiTvListMap, 400000);
+ testEncodeBatch(largeMergeSortMultiTvListMap, 400000);
+ }
+
+ private void testEncodeBatch(Map<TVList, Integer> tvListMap, long
expectedCount) {
+ AlignedChunkWriterImpl alignedChunkWriter = new
AlignedChunkWriterImpl(getMeasurementSchema());
+ List<Integer> columnIdxList = Arrays.asList(0, 1, 2);
+ IMeasurementSchema measurementSchema = getMeasurementSchema();
+ AlignedReadOnlyMemChunk chunk =
+ new AlignedReadOnlyMemChunk(
+ fragmentInstanceContext,
+ columnIdxList,
+ measurementSchema,
+ tvListMap,
+ Collections.emptyList(),
+ Arrays.asList(
+ Collections.emptyList(), Collections.emptyList(),
Collections.emptyList()));
+ chunk.sortTvLists();
+ chunk.initChunkMetaFromTVListsWithFakeStatistics();
+ MemPointIterator memPointIterator =
chunk.createMemPointIterator(Ordering.ASC, null);
+ BatchEncodeInfo encodeInfo =
+ new BatchEncodeInfo(
+ 0, 0, 0, 10000, 100000,
IoTDBDescriptor.getInstance().getConfig().getTargetChunkSize());
+ long[] times = new long[10000];
+ long count = 0;
+ while (memPointIterator.hasNextBatch()) {
+ memPointIterator.encodeBatch(alignedChunkWriter, encodeInfo, times);
+ if (encodeInfo.pointNumInPage >= encodeInfo.maxNumberOfPointsInPage) {
+ alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
+ encodeInfo.pointNumInPage = 0;
+ }
+
+ if (encodeInfo.pointNumInChunk >= encodeInfo.maxNumberOfPointsInChunk) {
+ alignedChunkWriter.sealCurrentPage();
+ alignedChunkWriter.clearPageWriter();
+ count +=
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
+ alignedChunkWriter = new
AlignedChunkWriterImpl(getMeasurementSchema());
+ encodeInfo.reset();
+ }
+ }
Review Comment:
The test only counts data from sealed chunks (line 997), but doesn't count
data remaining in the final unsealed chunk after the loop completes. This could
lead to incorrect count assertions if the data doesn't perfectly align with
chunk boundaries.
Consider adding after the loop:
```java
if (encodeInfo.pointNumInChunk > 0 || encodeInfo.pointNumInPage > 0) {
if (encodeInfo.pointNumInPage > 0) {
alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
}
alignedChunkWriter.sealCurrentPage();
count +=
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
}
```
```suggestion
}
// Handle remaining data in the final unsealed chunk
if (encodeInfo.pointNumInChunk > 0 || encodeInfo.pointNumInPage > 0) {
if (encodeInfo.pointNumInPage > 0) {
alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
}
alignedChunkWriter.sealCurrentPage();
count +=
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]