Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2026-02-06 Thread via GitHub


github-actions[bot] commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3863096252

   Thank you for your contribution. This PR is being closed due to inactivity. 
If needed, feel free to reopen it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2026-02-06 Thread via GitHub


github-actions[bot] closed pull request #9529: HDDS-14155. Replace 
ByteArrayCodec with CodecBufferCodec in OmTableInsightTask
URL: https://github.com/apache/ozone/pull/9529


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2026-01-29 Thread via GitHub


github-actions[bot] commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3821055228

   This PR has been marked as stale due to 21 days of inactivity. Please 
comment or remove the stale label to keep it open. Otherwise, it will be 
automatically closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2026-01-08 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3724911340

   > @navinko Can you raise a separate PR for removing key collection in memory 
and separate out this with byteArrayCodec replacement change? Lets get that 
change merge in before the Codec change
   
   sure .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2026-01-07 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3720495253

   Thanks @adoroszlai @swamirishi @szetszwo 
   The suggested changed has been updated and needs your review .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-29 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3696664213

   > @navinko Thanks for the patch I have added a comment to remove the 
ByteArrayCodec from the parallel operation as well. Please check and lmk if you 
agree with it.
   
   Thanks @swamirishi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-29 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3696659380

   > > ple worker and they do key comparison like key.compareTo(startKey) , 
since strings are comprable.
   > > @navinko
   > 
   > CodecBufferCodec can be also used in parallel function. As long as one 
table iterator is used by only one thread we can use CodecBufferCodec there. We 
can make the value use CodecBufferCodec it would inturn lead to using the 
RDBStoreCodecBufferIterator otherwise currently the parallel iterator would use 
RBDStoreByteArrayIterator which is going to be unoptimal. For this we need to 
get rid of the batching logic which I have seen during my previous benchmark is 
unnecessary since we would not be blocked on IO ever from rocksdb. Let us get 
rid of this valueExecutor logic
   > 
   > 
https://github.com/apache/ozone/blob/ad891ec2e44d191829feeafc2832f0ae594baf1c/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L83-L85
   
   Thanks @swamirishi for reviewing and providing the suggested changes . I 
have added the changes with new test case .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-21 Thread via GitHub


swamirishi commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679663832

   This should ideally work
   ```
   Index: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
   IDEA additional info:
   Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
   <+>UTF-8
   ===
   diff --git 
a/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
 
b/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
   --- 
a/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
  (revision 13fd2c6028a63d55f211bff5c3e22355dac9dcdd)
   +++ 
b/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java
  (date 1766358462652)
   @@ -56,37 +56,26 @@

  // Thread Pools
  private final ExecutorService iteratorExecutor; // 5
   -  private final ExecutorService valueExecutors; // 20
   -
   -  private final int maxNumberOfVals;
   +  private final long maxNumberOfVals;
  private final OMMetadataManager metadataManager;
  private final int maxIteratorTasks;
   -  private final int maxWorkerTasks;
  private final long logCountThreshold;

  private static final Logger LOG = 
LoggerFactory.getLogger(ParallelTableIteratorOperation.class);

  public ParallelTableIteratorOperation(OMMetadataManager metadataManager, 
Table table, Codec keyCodec,
   -int iteratorCount, int workerCount, 
int maxNumberOfValsInMemory,
   -long logThreshold) {
   +int iteratorCount, long 
logThreshold) {
this.table = table;
this.keyCodec = keyCodec;
this.metadataManager = metadataManager;
this.maxIteratorTasks = 2 * iteratorCount;  // Allow up to 10 pending 
iterator tasks
   -this.maxWorkerTasks = workerCount * 2;  // Allow up to 40 pending 
worker tasks

// Create team of 5 iterator threads with UNLIMITED queue
// LinkedBlockingQueue() with no size = can hold infinite pending tasks
this.iteratorExecutor = new ThreadPoolExecutor(iteratorCount, 
iteratorCount, 1, TimeUnit.MINUTES,
new LinkedBlockingQueue<>());
   -
   -// Create team of 20 worker threads with UNLIMITED queue
   -this.valueExecutors = new ThreadPoolExecutor(workerCount, workerCount, 
1, TimeUnit.MINUTES,
   -new LinkedBlockingQueue<>());
   -
   -// Calculate batch size per worker (e.g., 2000 / 20 = 100 keys per 
batch per worker)
   -this.maxNumberOfVals = Math.max(10, maxNumberOfValsInMemory / 
(workerCount));
this.logCountThreshold = logThreshold;
   +this.maxNumberOfVals = Math.max(1000, this.logCountThreshold / 
(iteratorCount));
  }

  private List getBounds(K startKey, K endKey) throws IOException {
   @@ -166,9 +155,6 @@
// Queue to track iterator threads (5 threads creating work)
Queue> iterFutures = new LinkedList<>();

   -// Queue to track worker threads (20 threads doing work)
   -Queue> workerFutures = new ConcurrentLinkedQueue<>();
   -
AtomicLong keyCounter = new AtomicLong();
AtomicLong prevLogCounter = new AtomicLong();
Object logLock = new Object();
   @@ -190,75 +176,42 @@
  iterFutures.add(iteratorExecutor.submit(() -> {
try (TableIterator> iter  = 
table.iterator()) {
  iter.seek(beg);
   +  int count = 0;
  while (iter.hasNext()) {
   -List> keyValues = new ArrayList<>();
   -boolean reachedEnd = false;
   -while (iter.hasNext()) {
   -  Table.KeyValue kv = iter.next();
   -  K key = kv.getKey();
   -  
   -  // Check if key is within this segment's range
   -  boolean withinBounds;
   -  if (inclusive) {
   -// Last segment: include everything from beg onwards (or 
until endKey if specified)
   -withinBounds = (endKey == null || key.compareTo(endKey) <= 
0);
   -  } else {
   -// Middle segment: include keys in range [beg, end)
   -withinBounds = key.compareTo(end) < 0;
   -  }
   -  
   -  if (withinBounds) {
   -keyValues.add(kv);
   -  } else {
   -reachedEnd = true;
   -break;
   -  }
   -
   -  // If batch is full (2000 keys), stop collecting
   -  if (keyValues.size() >= maxNumberOfVals) {
   -break;
   -  }
   +Table.KeyValue kv = iter.next();
 

Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-21 Thread via GitHub


swamirishi commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679632733

   > ple worker and they do key comparison like key.compareTo(startKey) , since 
strings are comprable.
   @navinko 
   CodecBufferCodec can be also used in parallel function. As long as one table 
iterator is used by only one thread we can use CodecBufferCodec there. We can 
make the value use CodecBufferCodec it would inturn lead to using the 
RDBStoreCodecBufferIterator otherwise currently the parallel iterator would use 
RBDStoreByteArrayIterator which is going to be unoptimal. For this we need to 
get rid of the batching logic which I have seen during my previous benchmark is 
unnecessary since we would not be blocked on IO ever from rocksdb. Let us get 
rid of this valueExecutor logic
   
https://github.com/apache/ozone/blob/ad891ec2e44d191829feeafc2832f0ae594baf1c/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L83-L85


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-21 Thread via GitHub


szetszwo commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3679474315

   @swamirishi , it would be great if you could review this.  Thank you in 
advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-20 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3677757682

   > > The CodecBufferCodec implementation is not ThreadSafe implementation 
hence we can not used it for processTableInParallel
   > 
   > Is the `ByteArrayCodec` thread-safe?
   
   Yes 'ByteArrayCodec' is thread safe  . 
   
   My bad , I am not very sure on other implementation details for 
CodecBufferCodec but looks like the Codec implementation CodecBufferCodec reuse 
CodecBuffer instances so once CodecBufferCodec will be used as values across 
multiple thread the CodeBuffer can be modified by other thread in 
ParallelTableIteratorOperation. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-19 Thread via GitHub


adoroszlai commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3676634757

   > The CodecBufferCodec implementation is not ThreadSafe implementation hence 
we can not used it for processTableInParallel
   
   Is the `ByteArrayCodec` thread-safe?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-19 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3676544224

   > ```java
   > try (ParallelTableIteratorOperation parallelIter = 
new ParallelTableIteratorOperation<>(
   > ```
   Please correct me if my understanding is wrong .
   The CodecBufferCodec implementation is not ThreadSafe implementation hence 
we can not used it for processTableInParallel .
   
https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/db/CodecBufferCodec.java#L34
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-19 Thread via GitHub


adoroszlai commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3675966979

   > String as Key are comparable but CodecBuffer not. 
   
   I meant replacing `ByteArrayCodec`, not `StringCodec`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14155. Replace ByteArrayCodec with CodecBufferCodec in OmTableInsightTask [ozone]

2025-12-19 Thread via GitHub


navinko commented on PR #9529:
URL: https://github.com/apache/ozone/pull/9529#issuecomment-3675877806

   Thanks @adoroszlai for reviewing 
   
   > Can it be replaced in `processTableInParallel`, too?
   
   I see the processTableInParallel performs many background operation to make 
it work in parallel which requires key to be Comparable.
   
https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/tasks/util/ParallelTableIteratorOperation.java#L128
   
   String as Key are comparable but CodeBuffer not.
   
https://github.com/navinko/ozone/blob/1c817b0460f1e66e75aef0b8f5fde63c691f2a65/hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/utils/db/CodecBuffer.java#L53
   
   --> processTableSequentially 
   Works for non-string keys , the keys are just byte array some binary data 
and with TableIterator we are again iterating over these keys , no operations 
being performed like comparison of two binary row data , there is no range 
splitting. Hence CodecBuffer works , it is not implementing Comparable.
   
   Table table = omMetadataManager.getStore()
   .getTable(tableName, ByteArrayCodec.get(), ByteArrayCodec.get(), 
TableCache.CacheType.NO_CACHE);
   
   --> processTableInParallel  , The keys are string which are comparable  and 
under ParallelTableIteratorOperation.java we see there are operations like 
splitting the ranges for multiple worker and they do key comparison like 
key.compareTo(startKey) , since strings are comprable.
 
 Table table = omMetadataManager.getStore()
   .getTable(tableName, StringCodec.get(), ByteArrayCodec.get(), 
TableCache.CacheType.NO_CACHE);
  
   This seems to be the reason there is separate flow for sequential vs 
parallel processing and CodecBuffer should not be used existing implementation 
of processTableSequentially.   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]