Yes, you do have to worry about efficiency. If your rows aren't ordered in the 
table (by rowkey) according to the update date, the server will be having to 
scan the entire table. Your filter will enable it to not send all of those 
results to the client, but it's still having to read them from disk and merge 
them with the rows in memory. It will likely not even be possible for a big 
table (and, if it's not a *big* table, it probably shouldn't be in HBase).

The fundamental thing to note here is that there's no "magic": HBase stores 
records sorted in exactly one order; if what you want isn't able to be 
efficiently found according to that ordering, then you'll be scanning the whole 
table. Relational DBs do that too, but they also have indexes that let you get 
at things quickly in some other sort order.

Ian

On Mar 2, 2012, at 3:42 PM, Peter Wolf wrote:


Ah ha!  So the row key orders the results, I just do an unbounded Scan,
and stop after N iterations.

Like this...

       Scan scan = new Scan();
       Filter filter = new SingleColumnValueFilter(...);
       scan.setFilter(filter);
       ResultScanner scanner = hTable.getScanner(scan);
       Iterator<Result> it = scanner.iterator();
       for ( int i=0; i<1000 && it.hasNext(); i++) {
           Result result = it.next();
           ... do stuff with result...
       }

Do I have to worry about efficiency?  Is the Server madly retrieving
rows, in the background, that the Client will never use?

Thanks
P



On 3/2/12 4:31 PM, Doug Meil wrote:
Hi there-

Take a look at this section of the book...

http://hbase.apache.org/book.html#reverse.timestamp




On 3/2/12 4:02 PM, "Peter Wolf"<[email protected]<mailto:[email protected]>>  
wrote:

Hello all,

I want to retrieve the most recent N rows from a table, with some column
qualifiers.

I can't find a Filter, or anything obvious in my books, or via Google.

What is the idiom for doing this?

Thanks
Peter




Reply via email to