Hi Jacob,

I've been facing somewhat similar issue recently. The following options
come to my mind:

- Try BulkDeleteEndpoint co-proc in hbase-examples:
https://github.com/apache/hbase/blob/master/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java

- Create your MR job to execute deletes in a distributed fashion.

- I created a modified version of CopyTable which accepted a
ColumnValueFilter to filter rows copied from a snapshot to a new table
(bulkload mode). I haven't open source'd this, but seems like to be
beneficial for others.

Regards,
Andor



On Wed, 2025-01-08 at 18:56 +0530, Joice Jacob wrote:
> Hello HBase Community,
> 
> 
> We are facing a requirement for bulk deletion in HBase using full row
> keys.
> Our table contains a large amount of data, and since we are unsure of
> the
> content within the table, we cannot pass the complete row keys
> directly.
> Typically, in our scripts, each row is deleted individually, which
> results
> in
> significant processing time.
> 
> 
> Additionally, we are using TTL (Time-to-Live) for some data in the
> table.
> However, due to the size of the tables, it is not feasible for us to
> have a
> list of all row keys. While normal deletion through sqline (e.g.,
> using
> WHERE offerid=value) works, it is
> impacting query performance and causing delays.
> 
> 
> Can you suggest any work around is there?
> 
> 
> Could anyone suggest an alternative approach or best practices for
> efficiently handling bulk deletions in HBase.
> 
> 
> Thank you

Reply via email to