Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
github-actions[bot] commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3863096252 Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
github-actions[bot] closed pull request #9529: HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask URL: https://github.com/apache/ozone/pull/9529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
github-actions[bot] commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3821055228 This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3724911340 > @navinko Can you raise a separate PR for removing key collection in memory and separate out this with byteArrayCodec replacement change? Lets get that change merge in before the Codec change sure . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3720495253 Thanks @adoroszlai @swamirishi @szetszwo The suggested changed has been updated and needs your review . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3696664213 > @navinko Thanks for the patch I have added a comment to remove the ByteArrayCodec from the parallel operation as well. Please check and lmk if you agree with it. Thanks @swamirishi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3696659380 > > ple worker and they do key comparison like key.compareTo(startKey) , since strings are comprable. > > @navinko > > CodecBufferCodec can be also used in parallel function. As long as one table iterator is used by only one thread we can use CodecBufferCodec there. We can make the value use CodecBufferCodec it would inturn lead to using the RDBStoreCodecBufferIterator otherwise currently the parallel iterator would use RBDStoreByteArrayIterator which is going to be unoptimal. For this we need to get rid of the batching logic which I have seen during my previous benchmark is unnecessary since we would not be blocked on IO ever from rocksdb. Let us get rid of this valueExecutor logic > > https://github.com/apache/ozone/blob/ad891ec2e44d191829feeafc2832f0ae594baf1c/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L83-L85 Thanks @swamirishi for reviewing and providing the suggested changes . I have added the changes with new test case . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
swamirishi commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679663832
This should ideally work
```
Index:
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
diff --git
a/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
b/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
---
a/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
(revision 13fd2c6028a63d55f211bff5c3e22355dac9dcdd)
+++
b/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
(date 1766358462652)
@@ -56,37 +56,26 @@
// Thread Pools
private final ExecutorService iteratorExecutor; // 5
- private final ExecutorService valueExecutors; // 20
-
- private final int maxNumberOfVals;
+ private final long maxNumberOfVals;
private final OMMetadataManager metadataManager;
private final int maxIteratorTasks;
- private final int maxWorkerTasks;
private final long logCountThreshold;
private static final Logger LOG =
LoggerFactory.getLogger(ParallelTableIteratorOperation.class);
public ParallelTableIteratorOperation(OMMetadataManager metadataManager,
Table table, Codec keyCodec,
-int iteratorCount, int workerCount,
int maxNumberOfValsInMemory,
-long logThreshold) {
+int iteratorCount, long
logThreshold) {
this.table = table;
this.keyCodec = keyCodec;
this.metadataManager = metadataManager;
this.maxIteratorTasks = 2 * iteratorCount; // Allow up to 10 pending
iterator tasks
-this.maxWorkerTasks = workerCount * 2; // Allow up to 40 pending
worker tasks
// Create team of 5 iterator threads with UNLIMITED queue
// LinkedBlockingQueue() with no size = can hold infinite pending tasks
this.iteratorExecutor = new ThreadPoolExecutor(iteratorCount,
iteratorCount, 1, TimeUnit.MINUTES,
new LinkedBlockingQueue<>());
-
-// Create team of 20 worker threads with UNLIMITED queue
-this.valueExecutors = new ThreadPoolExecutor(workerCount, workerCount,
1, TimeUnit.MINUTES,
-new LinkedBlockingQueue<>());
-
-// Calculate batch size per worker (e.g., 2000 / 20 = 100 keys per
batch per worker)
-this.maxNumberOfVals = Math.max(10, maxNumberOfValsInMemory /
(workerCount));
this.logCountThreshold = logThreshold;
+this.maxNumberOfVals = Math.max(1000, this.logCountThreshold /
(iteratorCount));
}
private List getBounds(K startKey, K endKey) throws IOException {
@@ -166,9 +155,6 @@
// Queue to track iterator threads (5 threads creating work)
Queue> iterFutures = new LinkedList<>();
-// Queue to track worker threads (20 threads doing work)
-Queue> workerFutures = new ConcurrentLinkedQueue<>();
-
AtomicLong keyCounter = new AtomicLong();
AtomicLong prevLogCounter = new AtomicLong();
Object logLock = new Object();
@@ -190,75 +176,42 @@
iterFutures.add(iteratorExecutor.submit(() -> {
try (TableIterator> iter =
table.iterator()) {
iter.seek(beg);
+ int count = 0;
while (iter.hasNext()) {
-List> keyValues = new ArrayList<>();
-boolean reachedEnd = false;
-while (iter.hasNext()) {
- Table.KeyValue kv = iter.next();
- K key = kv.getKey();
-
- // Check if key is within this segment's range
- boolean withinBounds;
- if (inclusive) {
-// Last segment: include everything from beg onwards (or
until endKey if specified)
-withinBounds = (endKey == null || key.compareTo(endKey) <=
0);
- } else {
-// Middle segment: include keys in range [beg, end)
-withinBounds = key.compareTo(end) < 0;
- }
-
- if (withinBounds) {
-keyValues.add(kv);
- } else {
-reachedEnd = true;
-break;
- }
-
- // If batch is full (2000 keys), stop collecting
- if (keyValues.size() >= maxNumberOfVals) {
-break;
- }
+Table.KeyValue kv = iter.next();
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
swamirishi commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679632733 > ple worker and they do key comparison like key.compareTo(startKey) , since strings are comprable. @navinko CodecBufferCodec can be also used in parallel function. As long as one table iterator is used by only one thread we can use CodecBufferCodec there. We can make the value use CodecBufferCodec it would inturn lead to using the RDBStoreCodecBufferIterator otherwise currently the parallel iterator would use RBDStoreByteArrayIterator which is going to be unoptimal. For this we need to get rid of the batching logic which I have seen during my previous benchmark is unnecessary since we would not be blocked on IO ever from rocksdb. Let us get rid of this valueExecutor logic https://github.com/apache/ozone/blob/ad891ec2e44d191829feeafc2832f0ae594baf1c/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L83-L85 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
szetszwo commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679474315 @swamirishi , it would be great if you could review this. Thank you in advance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3677757682 > > The CodecBufferCodec implementation is not ThreadSafe implementation hence we can not used it for processTableInParallel > > Is the `ByteArrayCodec` thread-safe? Yes 'ByteArrayCodec' is thread safe . My bad , I am not very sure on other implementation details for CodecBufferCodec but looks like the Codec implementation CodecBufferCodec reuse CodecBuffer instances so once CodecBufferCodec will be used as values across multiple thread the CodeBuffer can be modified by other thread in ParallelTableIteratorOperation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
adoroszlai commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3676634757 > The CodecBufferCodec implementation is not ThreadSafe implementation hence we can not used it for processTableInParallel Is the `ByteArrayCodec` thread-safe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3676544224 > ```java > try (ParallelTableIteratorOperation parallelIter = new ParallelTableIteratorOperation<>( > ``` Please correct me if my understanding is wrong . The CodecBufferCodec implementation is not ThreadSafe implementation hence we can not used it for processTableInParallel . https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/CodecBufferCodec.java#L34 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
adoroszlai commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3675966979 > String as Key are comparable but CodecBuffer not. I meant replacing `ByteArrayCodec`, not `StringCodec`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]
navinko commented on PR #9529: URL: https://github.com/apache/ozone/pull/9529#issuecomment-3675877806 Thanks @adoroszlai for reviewing > Can it be replaced in `processTableInParallel`, too? I see the processTableInParallel performs many background operation to make it work in parallel which requires key to be Comparable. https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L128 String as Key are comparable but CodeBuffer not. https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/CodecBuffer.java#L53 --> processTableSequentially Works for non-string keys , the keys are just byte array some binary data and with TableIterator we are again iterating over these keys , no operations being performed like comparison of two binary row data , there is no range splitting. Hence CodecBuffer works , it is not implementing Comparable. Table table = omMetadataManager.getStore() .getTable(tableName, ByteArrayCodec.get(), ByteArrayCodec.get(), TableCache.CacheType.NO_CACHE); --> processTableInParallel , The keys are string which are comparable and under ParallelTableIteratorOperation.java we see there are operations like splitting the ranges for multiple worker and they do key comparison like key.compareTo(startKey) , since strings are comprable. Table table = omMetadataManager.getStore() .getTable(tableName, StringCodec.get(), ByteArrayCodec.get(), TableCache.CacheType.NO_CACHE); This seems to be the reason there is separate flow for sequential vs parallel processing and CodecBuffer should not be used existing implementation of processTableSequentially. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
