I'm not sure how they are doing this, but just a quick thought...

You can increase the file size 1-2GB as an example and then run compactions on 
a regular basis to clean up rows deleted from the queue.
This will stop the table from splitting. 

The assumption is that your MAX_FILESIZE is much larger than your anticipated 
queue size.

HTH

-Mike

> Date: Tue, 19 Jul 2011 11:26:47 -0400
> From: deinspan...@mozilla.com
> To: user@hbase.apache.org
> Subject: Re: hbase table as a queue.
> 
> We use a queue table like this too and ran into the same problem.  How 
> did you configure it such that it never splits?
> 
> -Daniel
> 
> On 7/16/11 4:24 PM, Stack wrote:
> > I learned friday that our fellas on the frontend are using an hbase
> > table to do simple queuing.  They insert stuff to be processed by
> > distributed processes and when processes are done with the work,
> > they'll remove the processed element from the hbase table.   They are
> > queuing, processing, and removing millions of items a day.  Elements
> > were added on the end of the queue (FIFO).
> >
> > The issue to avoid was that over time, especially if a while between
> > major compactions, the latency was going up.  Turns out, the table had
> > been splitting when the queue backed.   Then a scan for new stuff to
> > process had to first traverse regions that had nought in them (the key
> > was time-based and the tail of the table had moved on past these first
> > regions).  This traversal, especially if no major compaction so lots
> > of deletes to process, was taking time to get to the first row.
> >
> > To fix, we rid the table of its empty regions and made it so the table
> > would on longer split so only ever one region in it.  This should make
> > it so we don't end up with empty regions to skip through before we get
> > to the first element in the table (need the major compaction running
> > on a somewhat regular basis to temper latencies).  Will report back to
> > the list if we find otherwise.
> >
> > Do not use locks.  Doesn't scale.  Maybe update a cell when task is
> > taken out for processing.  If too much time elapses since last update,
> > maybe give it out again?
> >
> > St.Ack
> >
> > On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<magn...@gmail.com>  wrote:
> >> Hello, we are thinking about using Hbase table as a simple queue which
> >> will dispatch the work for a mapreduce job, as well as real time
> >> fetching of data to present to end user.  In simple terms, suppose you
> >> had a data source table and a queue table.  The queue table has a
> >> smaller set of Rows that point to Values which in turn point to
> >> Perma-set table, which has large collection of Rows.  (so Queue{Row,
> >> Value} ->  Perma-Set {Row, Value}).  Or Q-Value ->  P-Row.   Our Goal is
> >> to look up which Rows to retrieve from the Perma-Set table by looking
> >> through the Queue.  Once the lookup into the Queue is done, the Row
> >> from the Queue must be deleted to avoid the same process of Perma-Set
> >> lookup be done twice; We expect many concurrent lookups to happen, so
> >> I assume the first thing we need to do is to have a client that does
> >> the work is acquire a lock on the Queue Row, process the work, then
> >> Remove the Queue Row.
> >>
> >> Has anyone done something similar before?  Any gotchas we should be away 
> >> of?
> >>
> >> Thanks.
> >>
> >> -Jack
> >>
                                          

Reply via email to