Assuming your rowkey doesn't somehow encode the time that row was
created (in which case you can simply do a scan), things get a bit
more interesting.

The 'easiest' approach is probably to Scan, but use a custom filter
that only allows in 'recent' rows based on their timestamp (see the
TimestampsFilter for an example of how to do this - it isn't exactly
what you need, but should show you how) so that you expect at least N
rows to match. Then, if your scan matched at least N row, you can sort
and take the top N client side. If your scan retrieved less than N
row, so you'll have go back and do another scan with a different
timestamp filter and aggregate/sort the results from the multiple
scans.

The more efficient approach might be to create a second table as a
'recency' index. Let's pretend your data table is called 'd'. Then,
you'd created a second table called 'dri' (data recency index). Every
time you insert a row into 'd' with a rowkey of 'r', you also insert a
row into 'dri' with a rowkey of the current timestamp, and only one
column (say, called 'dr') with a value of 'r'. Then, when you want to
retrieve the last N rows, you can look at the last N rows in the dri
table, and GET the rows from the 'd' table with row keys matching the
column values in 'dr'. You can automate some of this with coprocessors
too.

Of course, the easiest way is to simply make the most significant bits
of your rowkey in your actual data be a timestamp, but I don't know if
your schema would allow that.

-Shaneal


On Fri, Mar 2, 2012 at 1:02 PM, Peter Wolf <[email protected]> wrote:
> Hello all,
>
> I want to retrieve the most recent N rows from a table, with some column
> qualifiers.
>
> I can't find a Filter, or anything obvious in my books, or via Google.
>
> What is the idiom for doing this?
>
> Thanks
> Peter

Reply via email to