Caideyipi commented on PR #16606:
URL: https://github.com/apache/iotdb/pull/16606#issuecomment-4516721688
I found a few correctness risks around the negative metadata cache. The
optimization itself makes sense, but the cache
entry currently loses some important context and clear() changes the cache
invalidation semantics.
Inline comments
1. On TimeSeriesMetadataCache#get, around the placeholder insertion:
This negative cache entry does not preserve the `ignoreNotExists`
semantics.
If one read uses `ignoreNotExists=true` and the device is missing, this
code may cache the placeholder. A later read
for the same key with `ignoreNotExists=false` will hit the placeholder and
return `null`, while the original
`reader.readTimeseriesMetadata(..., false, ...)` path should throw an
`IOException` for a missing device.
Can we only cache the placeholder when we are sure the device exists and
only the measurement is missing, or avoid
caching negative results produced under `ignoreNotExists=true`?
2. On TimeSeriesMetadataCache#clear:
This changes the semantics of `clear()` from invalidating the existing
Caffeine cache to replacing the `lruCache`
reference.
`clear()` is used by runtime paths such as cache clearing and
snapshot/data-region replacement. With the new
implementation, concurrent readers may still be operating on the old cache
instance and can continue to read stale
metadata after `clear()` returns. Also, `lruCache` is not `volatile`, so
the replacement does not have a clear
visibility guarantee.
I think we should keep `lruCache` final and continue using
`invalidateAll()` + `cleanUp()` here.
3. On the placeholder put after the synchronized block:
The negative-cache `put` happens outside the per-device/file synchronized
block.
For an existing metadata entry, the cache is populated inside the lock, so
concurrent requests for the same missing
key avoid duplicate IO after the first load. For the placeholder path,
waiting threads can enter the lock before the
first thread inserts the placeholder, miss the cache again, and repeat the
same disk read.
Can we insert the placeholder inside the same critical section that
performs the disk lookup?
4. On DataNodeMemoryConfig default value:
Should this fallback use `isMayCacheNonExistSeries()` instead of
`isMetaDataCacheEnable()`?
The field default is controlled by `mayCacheNonExistSeries`, so using
`metaDataCacheEnable` here makes the new option
implicitly depend on a different setting when the property is absent.
Test coverage comment
The added test is ignored as a performance test, so CI will not cover the
new behavior. It would be useful to add a
small non-ignored unit test for the negative cache semantics, especially
the `ignoreNotExists=true` followed by
`ignoreNotExists=false` case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]