I'm not sure how they are doing this, but just a quick thought... You can increase the file size 1-2GB as an example and then run compactions on a regular basis to clean up rows deleted from the queue. This will stop the table from splitting.
The assumption is that your MAX_FILESIZE is much larger than your anticipated queue size. HTH -Mike > Date: Tue, 19 Jul 2011 11:26:47 -0400 > From: deinspan...@mozilla.com > To: user@hbase.apache.org > Subject: Re: hbase table as a queue. > > We use a queue table like this too and ran into the same problem. How > did you configure it such that it never splits? > > -Daniel > > On 7/16/11 4:24 PM, Stack wrote: > > I learned friday that our fellas on the frontend are using an hbase > > table to do simple queuing. They insert stuff to be processed by > > distributed processes and when processes are done with the work, > > they'll remove the processed element from the hbase table. They are > > queuing, processing, and removing millions of items a day. Elements > > were added on the end of the queue (FIFO). > > > > The issue to avoid was that over time, especially if a while between > > major compactions, the latency was going up. Turns out, the table had > > been splitting when the queue backed. Then a scan for new stuff to > > process had to first traverse regions that had nought in them (the key > > was time-based and the tail of the table had moved on past these first > > regions). This traversal, especially if no major compaction so lots > > of deletes to process, was taking time to get to the first row. > > > > To fix, we rid the table of its empty regions and made it so the table > > would on longer split so only ever one region in it. This should make > > it so we don't end up with empty regions to skip through before we get > > to the first element in the table (need the major compaction running > > on a somewhat regular basis to temper latencies). Will report back to > > the list if we find otherwise. > > > > Do not use locks. Doesn't scale. Maybe update a cell when task is > > taken out for processing. If too much time elapses since last update, > > maybe give it out again? > > > > St.Ack > > > > On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<magn...@gmail.com> wrote: > >> Hello, we are thinking about using Hbase table as a simple queue which > >> will dispatch the work for a mapreduce job, as well as real time > >> fetching of data to present to end user. In simple terms, suppose you > >> had a data source table and a queue table. The queue table has a > >> smaller set of Rows that point to Values which in turn point to > >> Perma-set table, which has large collection of Rows. (so Queue{Row, > >> Value} -> Perma-Set {Row, Value}). Or Q-Value -> P-Row. Our Goal is > >> to look up which Rows to retrieve from the Perma-Set table by looking > >> through the Queue. Once the lookup into the Queue is done, the Row > >> from the Queue must be deleted to avoid the same process of Perma-Set > >> lookup be done twice; We expect many concurrent lookups to happen, so > >> I assume the first thing we need to do is to have a client that does > >> the work is acquire a lock on the Queue Row, process the work, then > >> Remove the Queue Row. > >> > >> Has anyone done something similar before? Any gotchas we should be away > >> of? > >> > >> Thanks. > >> > >> -Jack > >>