On Wed, Oct 13, 2010 at 9:25 AM, Jeff Zhang <[email protected]> wrote: > Hi all, > > Since HBase has bulk import, could hbase delete a whole region ? > Currently I have to do a scan operation then get the row id and invoke > delete operation for each row id, this inefficiency. And internally, > one region is just some hdfs files, so invoke some hdfs file deletion > is more efficiency. > > My initial idea is that before deletion of region, the region should > first be frozen (flush MemStore to disk, and inhibit any put operation > into this region). Then invoke a delete operation on the region, only > some hdfs file operation is needed. Not sure whether this is possible > and on the roadmap of hbase ? >
This could be a useful feature. I think you could script it easy enough in TRUNK (I think you need TRUNK because you can ask it when a region is closed since there is no synchronous close of regions currently). 1. Close the region (See shell for how to send a close message or look at HBaseAdmin API doc). 2. While its closing, you may have to disable the region in .META. (See bin/*.rb scripts for how to mangle .META.). This may not be necessary IIRC in TRUNK (In 0.20.x, it is necessary to prevent the region being opened elsewhere when the regionserver reports sucessful close). 3. Check the regionserver periodically for the closing region. When its no longer mentioned in online regions, you know its closed. 4. Close the region that falls just after the one you just closed (Same trick w/ offlining above). Do fixup on meta where you extend the key scope of this region so that it covers the region just closed by making its startkey that of the region just closed. 5. Reenable (in 0.20. this would mean flipping region to be enabled again -- in TRUNK, you might have to explicitly open it on a regionserver -- would have to check). If you want to work on the above, open a JIRA and I'll help you out. St.Ack
