JackieTien97 commented on code in PR #17664:
URL: https://github.com/apache/iotdb/pull/17664#discussion_r3285239990
##########
iotdb-client/isession/src/main/java/org/apache/iotdb/isession/SessionDataSet.java:
##########
@@ -153,45 +153,42 @@ public boolean hasNext() throws
StatementExecutionException, IoTDBConnectionExce
}
private RowRecord constructRowRecordFromValueArray() throws
StatementExecutionException {
- List<Field> outFields = new ArrayList<>();
- for (int i = ioTDBRpcDataSet.getValueColumnStartIndex();
- i < ioTDBRpcDataSet.getColumnSize();
- i++) {
+ int valueColumnStartIndex = ioTDBRpcDataSet.getValueColumnStartIndex();
+ int columnSize = ioTDBRpcDataSet.getColumnSize();
+ List<Field> outFields = new ArrayList<>(columnSize -
valueColumnStartIndex);
+ for (int columnIndex = valueColumnStartIndex + 1; columnIndex <=
columnSize; columnIndex++) {
Review Comment:
Same index-base change as the C++ client: the loop is now
`valueColumnStartIndex + 1 .. columnSize` (inclusive) feeding 1-based
`isNull(columnIndex)` / `getDataType(columnIndex)` / `getXxx(columnIndex)`.
Please confirm these index-based getters are 1-based relative to the
column-name list (i.e. `columnIndex == oldNameListIndex + 1`). An off-by-one is
silent here — it reads an adjacent column rather than failing. A unit test
exercising a multi-column row would lock this down.
##########
iotdb-client/client-cpp/src/main/SessionDataSet.cpp:
##########
@@ -246,38 +246,39 @@ const std::vector<std::string>&
SessionDataSet::DataIterator::getColumnTypeList(
shared_ptr<RowRecord> SessionDataSet::constructRowRecordFromValueArray() {
std::vector<Field> outFields;
- for (int i = iotdbRpcDataSet_->getValueColumnStartIndex(); i <
iotdbRpcDataSet_->getColumnSize();
- i++) {
+ const int32_t valueColumnStartIndex =
iotdbRpcDataSet_->getValueColumnStartIndex();
+ const int32_t columnSize = iotdbRpcDataSet_->getColumnSize();
+ outFields.reserve(columnSize - valueColumnStartIndex);
+ for (int32_t columnIndex = valueColumnStartIndex + 1; columnIndex <=
columnSize; ++columnIndex) {
Review Comment:
The loop changed from 0-based name-list indexing (`[valueColumnStartIndex,
columnSize)`) to 1-based index access (`[valueColumnStartIndex + 1,
columnSize]`). This is only correct if `isNullByIndex` / `getXxxByIndex` are
1-based such that `columnIndex == oldNameListIndex + 1`. Please double-check
this against `IoTDBRpcDataSet` — an off-by-one here silently returns the wrong
column's value with no error.
Also note `getBooleanByIndex(...).value()` will throw `bad_optional_access`
if the optional is empty. It is guarded by `isNullByIndex` here, so it should
be fine, but the `isNull` guard and the optional must agree on null semantics.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/metadata/fetcher/cache/TableDeviceSchemaCache.java:
##########
@@ -482,7 +482,7 @@ void invalidateLastCache(final PartialPath devicePath,
final String measurement)
},
cachedDeviceID -> {
try {
- return new PartialPath(cachedDeviceID).matchFullPath(devicePath);
+ return devicePath.matchFullPath(cachedDeviceID);
Review Comment:
This swaps the `matchFullPath` direction: previously `new
PartialPath(cachedDeviceID).matchFullPath(devicePath)` (cached entry as the
pattern), now `devicePath.matchFullPath(cachedDeviceID)` (devicePath as the
pattern). The new direction is correct and consistent with `invalidateCache`
below.
Note this also changes behavior when `devicePath` contains wildcards: the
old form would have failed to match (a concrete path cannot match a wildcard
path when used as the pattern), so entries that should have been invalidated
were left stale. If that is an intended bug fix, great — worth calling out in
the PR description and covering with a test.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/utils/ModificationUtils.java:
##########
@@ -82,7 +82,11 @@ public static void modifyChunkMetaData(
if (range.contains(metaData.getStartTime(),
metaData.getEndTime())) {
return true;
} else {
- if (range.overlaps(new TimeRange(metaData.getStartTime(),
metaData.getEndTime()))) {
+ if (overlap(
Review Comment:
Replacing `range.overlaps(new TimeRange(start, end))` with `overlap(start,
end, range.getMin(), range.getMax())` avoids the allocation — good. Please make
sure `overlap(...)` uses exactly the same boundary semantics as
`TimeRange.overlaps` (inclusive endpoints). A subtle inclusive/exclusive
mismatch here changes whether a deletion is considered to touch a chunk, which
directly affects query correctness.
##########
iotdb-client/session/src/main/java/org/apache/iotdb/session/Session.java:
##########
@@ -1865,10 +1815,12 @@ private void
filterNullValueAndMeasurementWithStringType(
*/
private boolean filterNullValueAndMeasurementWithStringType(
List<String> valuesList, String deviceId, List<String> measurementsList)
{
- Map<String, Object> nullMap = new HashMap<>();
+ Map<String, Object> nullMap = logger.isInfoEnabled() ? new HashMap<>() :
null;
Review Comment:
Same as the other `nullMap` change above — `nullMap` is `null` when info
logging is disabled. Confirm the code after this loop guards every `nullMap`
access (null check, or `logger.isInfoEnabled()` short-circuit before
`nullMap.isEmpty()`).
##########
iotdb-client/session/src/main/java/org/apache/iotdb/session/Session.java:
##########
@@ -1817,10 +1765,12 @@ private boolean filterNullValueAndMeasurement(
List<String> measurementsList,
List<TSDataType> types,
List<Object> valuesList) {
- Map<String, Object> nullMap = new HashMap<>();
+ Map<String, Object> nullMap = logger.isInfoEnabled() ? new HashMap<>() :
null;
Review Comment:
`nullMap` is now `null` whenever info logging is disabled. The loop body is
correctly guarded, but the code after this loop (not in the diff) presumably
consumes `nullMap` for logging — e.g. `if (!nullMap.isEmpty())
logger.info(...)`. If that consumer dereferences `nullMap` without a null check
(or without short-circuiting on `logger.isInfoEnabled()` first), this throws an
NPE when info logging is off. Please verify every subsequent use of `nullMap`
in this method is null-safe.
##########
iotdb-client/service-rpc/src/main/java/org/apache/iotdb/rpc/IoTDBJDBCDataSet.java:
##########
@@ -321,6 +311,31 @@ public IoTDBJDBCDataSet(
this.emptyResultSet = (queryDataSet == null ||
!queryDataSet.time.hasRemaining());
}
+ private static int getDeduplicatedColumnSize(Map<String, Integer>
columnNameIndex) {
Review Comment:
`getDeduplicatedColumnSize` now returns `max(index) + 1` instead of the
previous `distinct().count()`. These are equal only when the dedup indices are
contiguous from 0 — which the old `columnTypeDeduplicatedList.set(index, ...)`
already required (a gap would have thrown `IndexOutOfBoundsException`), so in
practice this is equivalent and more robust. Just confirm no downstream
consumer of `columnTypeDeduplicatedList` relies on `size()` being exactly the
distinct count, or assumes no `null` entries.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/schemaengine/SchemaEngine.java:
##########
@@ -371,11 +371,13 @@ public int getSchemaRegionNumber() {
public Map<Integer, Long> countDeviceNumBySchemaRegion(final List<Integer>
schemaIds) {
final Map<Integer, Long> deviceNum = new HashMap<>();
+ final java.util.Collection<Integer> targetSchemaIds =
Review Comment:
Use regular imports for `java.util.Collection` / `java.util.HashSet` rather
than inline fully-qualified names — the rest of the file uses imports and this
is inconsistent (and may trip checkstyle). The same pattern appears in
`StorageEngine.getDiskSizeByDataRegion`. The `size() > 1` guard to avoid
allocating a `Set` for the trivial case is a reasonable micro-optimization.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/utils/QueryDataSetUtils.java:
##########
@@ -572,40 +558,30 @@ private static void fillRemainingBitMap(
}
private static void fillTimeColumn(
- int rowCount, ByteArrayOutputStream[] byteArrayOutputStreams,
TSQueryDataSet tsQueryDataSet) {
- // calculate the time buffer size
- int timeOccupation = rowCount * 8;
- ByteBuffer timeBuffer = ByteBuffer.allocate(timeOccupation);
- timeBuffer.put(byteArrayOutputStreams[0].toByteArray());
- timeBuffer.flip();
- tsQueryDataSet.setTime(timeBuffer);
+ int rowCount, PublicBAOS[] byteArrayOutputStreams, TSQueryDataSet
tsQueryDataSet) {
Review Comment:
After the refactor `rowCount` is no longer used in `fillTimeColumn` (the old
`rowCount * 8` allocation is gone). The same applies to `rowCount` in
`fillValueColumnsAndBitMaps` (line 566), where `valueOccupation` is now only
read for `.length`. Please drop the now-unused parameters — they will otherwise
trip checkstyle/IDE warnings and are misleading to readers. `columnNum` can be
derived as `byteArrayOutputStreams.length / 2` if you want to remove
`valueOccupation` entirely.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/tsfile/TsFileManager.java:
##########
@@ -223,10 +216,18 @@ public void removeAll(List<TsFileResource>
tsFileResourceList, boolean sequence)
writeLock("removeAll");
try {
for (TsFileResource resource : tsFileResourceList) {
- remove(resource, sequence);
+ removeFromPartitionFileList(resource, sequence);
}
} finally {
- writeLock("removeAll");
+ writeUnlock();
+ }
+ }
+
+ private void removeFromPartitionFileList(TsFileResource tsFileResource,
boolean sequence) {
+ Map<Long, TsFileResourceList> selectedMap = sequence ? sequenceFiles :
unsequenceFiles;
+ TsFileResourceList tsFileResources =
selectedMap.get(tsFileResource.getTimePartition());
+ if (tsFileResources != null && tsFileResources.remove(tsFileResource)) {
Review Comment:
`removeFromPartitionFileList` now looks up only the partition
`tsFileResource.getTimePartition()` instead of scanning all partitions — a good
O(1) improvement. It assumes a resource is always stored in the list keyed by
its own `getTimePartition()`. That should hold as an invariant; just confirm
there is no path where a resource's time partition can change after insertion,
otherwise `remove` would silently fail to remove it (the old full scan would
still have found it).
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/filescan/impl/UnclosedFileScanHandleImpl.java:
##########
@@ -48,6 +51,8 @@ public class UnclosedFileScanHandleImpl implements
IFileScanHandle {
private final TsFileResource tsFileResource;
private final Map<IDeviceID, Map<String, List<IChunkMetadata>>>
deviceToChunkMetadataMap;
private final Map<IDeviceID, Map<String, List<IChunkHandle>>>
deviceToMemChunkHandleMap;
+ private final Map<IDeviceID, List<TimeRange>> deviceToDeletionRanges;
+ private final Map<IDeviceID, Map<String, List<TimeRange>>>
deviceToTimeSeriesDeletionRanges;
Review Comment:
These two new caches are plain `HashMap`s, mutated lazily by
`isDeviceTimeDeleted` / `isTimeSeriesTimeDeleted` (`put` and
`computeIfAbsent`). If a single `UnclosedFileScanHandleImpl` can be queried
from multiple threads concurrently — common in the scan/query path — these
unsynchronized mutations can corrupt the map or lose updates. The PR's
"concurrent read" checkbox is unchecked. Please confirm single-threaded access,
or switch to `ConcurrentHashMap`.
Separately: `appendDeletionRanges` adds `TimeRange` references straight from
`chunkMetadata.getDeleteIntervalList()` and then `sortAndMerge` runs over them.
Please confirm `sortAndMerge` does not mutate the shared `TimeRange` instances
in place, otherwise it could corrupt the chunk metadata's delete intervals.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/read/filescan/impl/ClosedFileScanHandleImpl.java:
##########
@@ -87,10 +87,18 @@ public boolean isDeviceTimeDeleted(IDeviceID deviceID, long
timestamp)
curFileModEntries != null
? curFileModEntries
: queryContext.loadAllModificationsFromDisk(tsFileResource);
- List<ModEntry> modifications =
queryContext.getPathModifications(curFileModEntries, deviceID);
List<TimeRange> timeRangeList =
-
modifications.stream().map(ModEntry::getTimeRange).collect(Collectors.toList());
- return ModificationUtils.isPointDeletedWithoutOrderedRange(timestamp,
timeRangeList);
+
getMergedTimeRanges(queryContext.getPathModifications(curFileModEntries,
deviceID));
Review Comment:
`isDeviceTimeDeleted` recomputes `getPathModifications` and now also runs
`sortAndMerge` on every call, with no caching — whereas
`isTimeSeriesTimeDeleted` just below caches via `deviceToModifications`, and
`UnclosedFileScanHandleImpl` caches both device- and series-level ranges. If
`isDeviceTimeDeleted` is called per-timestamp, this is a small regression
versus the old `isPointDeletedWithoutOrderedRange` (an extra sort on each
call). Consider caching the device-level merged ranges here too, for
consistency.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/StorageEngine.java:
##########
@@ -1116,12 +1116,14 @@ private void
stopTimedServiceAndThrow(ScheduledExecutorService pool, String pool
public void getDiskSizeByDataRegion(
Map<Integer, Long> dataRegionDisk, List<Integer> dataRegionIds) {
- dataRegionMap.forEach(
- (dataRegionId, dataRegion) -> {
- if (dataRegionIds.contains(dataRegionId.getId())) {
- dataRegionDisk.put(dataRegionId.getId(),
dataRegion.countRegionDiskSize());
- }
- });
+ final java.util.Collection<Integer> targetDataRegionIds =
Review Comment:
Same fully-qualified-name nit as `SchemaEngine` — please import `Collection`
/ `HashSet`. Logic-wise the rewrite (iterate the requested ids and look each
up) is fine and equivalent: duplicates in `dataRegionIds` are handled by the
`Set` when `size > 1`, and `dataRegionDisk.put` is idempotent regardless.
Switching from a full-map scan to direct `get` is a good improvement.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/utils/QueryDataSetUtils.java:
##########
@@ -572,40 +558,30 @@ private static void fillRemainingBitMap(
}
private static void fillTimeColumn(
- int rowCount, ByteArrayOutputStream[] byteArrayOutputStreams,
TSQueryDataSet tsQueryDataSet) {
- // calculate the time buffer size
- int timeOccupation = rowCount * 8;
- ByteBuffer timeBuffer = ByteBuffer.allocate(timeOccupation);
- timeBuffer.put(byteArrayOutputStreams[0].toByteArray());
- timeBuffer.flip();
- tsQueryDataSet.setTime(timeBuffer);
+ int rowCount, PublicBAOS[] byteArrayOutputStreams, TSQueryDataSet
tsQueryDataSet) {
+ tsQueryDataSet.setTime(wrapBuffer(byteArrayOutputStreams[0]));
}
private static void fillValueColumnsAndBitMaps(
int rowCount,
- ByteArrayOutputStream[] byteArrayOutputStreams,
+ PublicBAOS[] byteArrayOutputStreams,
int[] valueOccupation,
TSQueryDataSet tsQueryDataSet) {
- // calculate the bitmap buffer size
- int bitmapOccupation = (rowCount + 7) / 8;
-
- List<ByteBuffer> bitmapList = new LinkedList<>();
- List<ByteBuffer> valueList = new LinkedList<>();
+ int columnNum = valueOccupation.length;
+ List<ByteBuffer> bitmapList = new ArrayList<>(columnNum);
+ List<ByteBuffer> valueList = new ArrayList<>(columnNum);
for (int i = 1; i < byteArrayOutputStreams.length; i += 2) {
- ByteBuffer valueBuffer = ByteBuffer.allocate(valueOccupation[(i - 1) /
2]);
- valueBuffer.put(byteArrayOutputStreams[i].toByteArray());
- valueBuffer.flip();
- valueList.add(valueBuffer);
-
- ByteBuffer bitmapBuffer = ByteBuffer.allocate(bitmapOccupation);
- bitmapBuffer.put(byteArrayOutputStreams[i + 1].toByteArray());
- bitmapBuffer.flip();
- bitmapList.add(bitmapBuffer);
+ valueList.add(wrapBuffer(byteArrayOutputStreams[i]));
+ bitmapList.add(wrapBuffer(byteArrayOutputStreams[i + 1]));
}
tsQueryDataSet.setBitmapList(bitmapList);
tsQueryDataSet.setValueList(valueList);
}
+ private static ByteBuffer wrapBuffer(PublicBAOS outputStream) {
Review Comment:
Nice zero-copy win — wrapping the `PublicBAOS` internal buffer avoids the
`toByteArray()` + `allocate` + `put` double copy. One invariant to keep: the
returned `ByteBuffer` aliases the stream's internal array, so nothing must
write to these `PublicBAOS` / `DataOutputStream` instances after `wrapBuffer`
is called. That holds in the current flow (wrapping happens last, right before
return), just keep it in mind if this code is extended later.
##########
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/storageengine/dataregion/tsfile/TsFileManager.java:
##########
@@ -223,10 +216,18 @@ public void removeAll(List<TsFileResource>
tsFileResourceList, boolean sequence)
writeLock("removeAll");
try {
for (TsFileResource resource : tsFileResourceList) {
- remove(resource, sequence);
+ removeFromPartitionFileList(resource, sequence);
}
} finally {
- writeLock("removeAll");
+ writeUnlock();
+ }
Review Comment:
Good catch — the original `removeAll` `finally` block called
`writeLock("removeAll")` instead of `writeUnlock()`, which acquired the write
lock a second time and never released it (a write-lock count leak on every
`removeAll`). This fix is correct and important; worth highlighting explicitly
in the PR description since it is a real bug fix, not just an optimization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]