Copilot commented on code in PR #16765:
URL: https://github.com/apache/iotdb/pull/16765#discussion_r2554517146


##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/utils/datastructure/AlignedTVList.java:
##########
@@ -2172,7 +2184,7 @@ public void encodeBatch(IChunkWriter chunkWriter, 
BatchEncodeInfo encodeInfo, lo
           encodeInfo.pointNumInChunk++;
         } else {
           if (Objects.isNull(timeDuplicateInfo)) {
-            timeDuplicateInfo = new BitMap(rows);
+            timeDuplicateInfo = new LazyBitMap(index, 
maxRowCountOfCurrentBatch, rows - 1);

Review Comment:
   The LazyBitMap is initialized with `index` as the startPosition, but later 
at line 2206, the code iterates from `startIndex` (saved at line 2160) to check 
`isMarked(sortedRowIndex)` at line 2220. If the first duplicate is found after 
some iterations, `sortedRowIndex` could be less than `index`, causing incorrect 
behavior. The first parameter should be `startIndex` instead of `index` to 
align with the iteration range at line 2206.
   
   Example: If startIndex=0 and the first duplicate is at index=50, 
LazyBitMap(50,...) is created, but isMarked(0) through isMarked(49) will be 
called, which are below the startPosition.
   ```suggestion
               timeDuplicateInfo = new LazyBitMap(startIndex, 
maxRowCountOfCurrentBatch, rows - 1);
   ```



##########
iotdb-core/datanode/src/test/java/org/apache/iotdb/db/storageengine/dataregion/memtable/AlignedTVListIteratorTest.java:
##########
@@ -952,4 +955,50 @@ private void testSkipTimeRange(
     }
     Assert.assertEquals(expectedTimestamps, resultTimestamps);
   }
+
+  @Test
+  public void testEncodeBatch() {
+    testEncodeBatch(largeSingleTvListMap, 400000);
+    testEncodeBatch(largeOrderedMultiTvListMap, 400000);
+    testEncodeBatch(largeMergeSortMultiTvListMap, 400000);
+  }
+
+  private void testEncodeBatch(Map<TVList, Integer> tvListMap, long 
expectedCount) {
+    AlignedChunkWriterImpl alignedChunkWriter = new 
AlignedChunkWriterImpl(getMeasurementSchema());
+    List<Integer> columnIdxList = Arrays.asList(0, 1, 2);
+    IMeasurementSchema measurementSchema = getMeasurementSchema();
+    AlignedReadOnlyMemChunk chunk =
+        new AlignedReadOnlyMemChunk(
+            fragmentInstanceContext,
+            columnIdxList,
+            measurementSchema,
+            tvListMap,
+            Collections.emptyList(),
+            Arrays.asList(
+                Collections.emptyList(), Collections.emptyList(), 
Collections.emptyList()));
+    chunk.sortTvLists();
+    chunk.initChunkMetaFromTVListsWithFakeStatistics();
+    MemPointIterator memPointIterator = 
chunk.createMemPointIterator(Ordering.ASC, null);
+    BatchEncodeInfo encodeInfo =
+        new BatchEncodeInfo(
+            0, 0, 0, 10000, 100000, 
IoTDBDescriptor.getInstance().getConfig().getTargetChunkSize());
+    long[] times = new long[10000];
+    long count = 0;
+    while (memPointIterator.hasNextBatch()) {
+      memPointIterator.encodeBatch(alignedChunkWriter, encodeInfo, times);
+      if (encodeInfo.pointNumInPage >= encodeInfo.maxNumberOfPointsInPage) {
+        alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
+        encodeInfo.pointNumInPage = 0;
+      }
+
+      if (encodeInfo.pointNumInChunk >= encodeInfo.maxNumberOfPointsInChunk) {
+        alignedChunkWriter.sealCurrentPage();
+        alignedChunkWriter.clearPageWriter();
+        count += 
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
+        alignedChunkWriter = new 
AlignedChunkWriterImpl(getMeasurementSchema());
+        encodeInfo.reset();
+      }
+    }

Review Comment:
   The test only counts data from sealed chunks (line 997), but doesn't count 
data remaining in the final unsealed chunk after the loop completes. This could 
lead to incorrect count assertions if the data doesn't perfectly align with 
chunk boundaries.
   
   Consider adding after the loop:
   ```java
   if (encodeInfo.pointNumInChunk > 0 || encodeInfo.pointNumInPage > 0) {
     if (encodeInfo.pointNumInPage > 0) {
       alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
     }
     alignedChunkWriter.sealCurrentPage();
     count += 
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
   }
   ```
   ```suggestion
       }
       // Handle remaining data in the final unsealed chunk
       if (encodeInfo.pointNumInChunk > 0 || encodeInfo.pointNumInPage > 0) {
         if (encodeInfo.pointNumInPage > 0) {
           alignedChunkWriter.write(times, encodeInfo.pointNumInPage, 0);
         }
         alignedChunkWriter.sealCurrentPage();
         count += 
alignedChunkWriter.getTimeChunkWriter().getStatistics().getCount();
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to