Hi Jacob, I've been facing somewhat similar issue recently. The following options come to my mind:
- Try BulkDeleteEndpoint co-proc in hbase-examples: https://github.com/apache/hbase/blob/master/hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java - Create your MR job to execute deletes in a distributed fashion. - I created a modified version of CopyTable which accepted a ColumnValueFilter to filter rows copied from a snapshot to a new table (bulkload mode). I haven't open source'd this, but seems like to be beneficial for others. Regards, Andor On Wed, 2025-01-08 at 18:56 +0530, Joice Jacob wrote: > Hello HBase Community, > > > We are facing a requirement for bulk deletion in HBase using full row > keys. > Our table contains a large amount of data, and since we are unsure of > the > content within the table, we cannot pass the complete row keys > directly. > Typically, in our scripts, each row is deleted individually, which > results > in > significant processing time. > > > Additionally, we are using TTL (Time-to-Live) for some data in the > table. > However, due to the size of the tables, it is not feasible for us to > have a > list of all row keys. While normal deletion through sqline (e.g., > using > WHERE offerid=value) works, it is > impacting query performance and causing delays. > > > Can you suggest any work around is there? > > > Could anyone suggest an alternative approach or best practices for > efficiently handling bulk deletions in HBase. > > > Thank you