I learned friday that our fellas on the frontend are using an hbase table to do simple queuing. They insert stuff to be processed by distributed processes and when processes are done with the work, they'll remove the processed element from the hbase table. They are queuing, processing, and removing millions of items a day. Elements were added on the end of the queue (FIFO).
The issue to avoid was that over time, especially if a while between major compactions, the latency was going up. Turns out, the table had been splitting when the queue backed. Then a scan for new stuff to process had to first traverse regions that had nought in them (the key was time-based and the tail of the table had moved on past these first regions). This traversal, especially if no major compaction so lots of deletes to process, was taking time to get to the first row. To fix, we rid the table of its empty regions and made it so the table would on longer split so only ever one region in it. This should make it so we don't end up with empty regions to skip through before we get to the first element in the table (need the major compaction running on a somewhat regular basis to temper latencies). Will report back to the list if we find otherwise. Do not use locks. Doesn't scale. Maybe update a cell when task is taken out for processing. If too much time elapses since last update, maybe give it out again? St.Ack On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin <[email protected]> wrote: > Hello, we are thinking about using Hbase table as a simple queue which > will dispatch the work for a mapreduce job, as well as real time > fetching of data to present to end user. In simple terms, suppose you > had a data source table and a queue table. The queue table has a > smaller set of Rows that point to Values which in turn point to > Perma-set table, which has large collection of Rows. (so Queue{Row, > Value} -> Perma-Set {Row, Value}). Or Q-Value -> P-Row. Our Goal is > to look up which Rows to retrieve from the Perma-Set table by looking > through the Queue. Once the lookup into the Queue is done, the Row > from the Queue must be deleted to avoid the same process of Perma-Set > lookup be done twice; We expect many concurrent lookups to happen, so > I assume the first thing we need to do is to have a client that does > the work is acquire a lock on the Queue Row, process the work, then > Remove the Queue Row. > > Has anyone done something similar before? Any gotchas we should be away of? > > Thanks. > > -Jack >
