Hi, Continuing with testing HBase suitability in a high ingest rate environment, I've come up with a new stumbling block, likely due to my inexperience with HBase.
We want to keep and purge records on a time basis: i.e, when a record is older than say, 24 hours, we want to purge it from the database. The problem that I am encountering is the only way I've found to delete records using an arbitrary but strongly ordered over time row id is to scan for rows from lower bound to upper bound, then build an array of Delete using for Result in ResultScanner add new Delete( Result.getRow( ) ) to Delete array. This method is far too slow to keep up with our ingest rate; the iteration over the Results in the ResultScanner is the bottleneck, even though the Scan is limited to a single small column in the column family. The obvious but naive solution is to use a sequential row id where the lower and upper bound can be known. This would allow the building of the array of Delete objects without a scan step. Problem with this approach is how do you guarantee a sequential and non-colliding row id across more than one Put'ing process, and do it efficiently. As it happens, I can do this, but given the details of my operational requirements, it's not a simple thing to do. So I was hoping that I had just missed something. The ideal would be a Delete object that would take row id bounds in the same way that Scan does, allowing the work to be done all on the server side. Does this exists somewhere? Or is there some other way to skin this cat? Thanks Thomas Downing