Assuming your rowkey doesn't somehow encode the time that row was created (in which case you can simply do a scan), things get a bit more interesting.
The 'easiest' approach is probably to Scan, but use a custom filter that only allows in 'recent' rows based on their timestamp (see the TimestampsFilter for an example of how to do this - it isn't exactly what you need, but should show you how) so that you expect at least N rows to match. Then, if your scan matched at least N row, you can sort and take the top N client side. If your scan retrieved less than N row, so you'll have go back and do another scan with a different timestamp filter and aggregate/sort the results from the multiple scans. The more efficient approach might be to create a second table as a 'recency' index. Let's pretend your data table is called 'd'. Then, you'd created a second table called 'dri' (data recency index). Every time you insert a row into 'd' with a rowkey of 'r', you also insert a row into 'dri' with a rowkey of the current timestamp, and only one column (say, called 'dr') with a value of 'r'. Then, when you want to retrieve the last N rows, you can look at the last N rows in the dri table, and GET the rows from the 'd' table with row keys matching the column values in 'dr'. You can automate some of this with coprocessors too. Of course, the easiest way is to simply make the most significant bits of your rowkey in your actual data be a timestamp, but I don't know if your schema would allow that. -Shaneal On Fri, Mar 2, 2012 at 1:02 PM, Peter Wolf <[email protected]> wrote: > Hello all, > > I want to retrieve the most recent N rows from a table, with some column > qualifiers. > > I can't find a Filter, or anything obvious in my books, or via Google. > > What is the idiom for doing this? > > Thanks > Peter
