There is minimal/no underlying efficiency. It's basically a full table/region scan with a filter to discard the uninteresting values. We have various timestamp filtering techniques to avoid reading from files, eg: if you specify a time range [100,200) and a hfile only contains [0,50) we'll not include the file. So perhaps in your case this might help. Compactions will merge files and thus timestamp ranges together, and you'll lose some efficiency, assuming you COULD have done a query involving only the most recent HFiles.
On Fri, Feb 18, 2011 at 10:02 AM, Jason Rutherglen <[email protected]> wrote: > Thanks Ted! Is there some underlying efficiency to this, or will it > be scanning all of the rows underneath? > > On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu <[email protected]> wrote: >> From Scan.java: >> * To only retrieve columns within a specific range of version timestamps, >> * execute {@link #setTimeRange(long, long) setTimeRange}. >> >> On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen < >> [email protected]> wrote: >> >>> For search integration we need to, on server reboot scan over key >>> values since the last Lucene commit, and add them to the index. Is >>> there an efficient way to do this? >>> >> >
