Ryan, thanks, I think a full scan'll be fine as it's a one time event on startup/recovery, and I am curious either way.
On Fri, Feb 18, 2011 at 10:08 AM, Ryan Rawson <[email protected]> wrote: > There is minimal/no underlying efficiency. It's basically a full > table/region scan with a filter to discard the uninteresting values. > We have various timestamp filtering techniques to avoid reading from > files, eg: if you specify a time range [100,200) and a hfile only > contains [0,50) we'll not include the file. So perhaps in your case > this might help. Compactions will merge files and thus timestamp > ranges together, and you'll lose some efficiency, assuming you COULD > have done a query involving only the most recent HFiles. > > > > On Fri, Feb 18, 2011 at 10:02 AM, Jason Rutherglen > <[email protected]> wrote: >> Thanks Ted! Is there some underlying efficiency to this, or will it >> be scanning all of the rows underneath? >> >> On Fri, Feb 18, 2011 at 7:47 AM, Ted Yu <[email protected]> wrote: >>> From Scan.java: >>> * To only retrieve columns within a specific range of version timestamps, >>> * execute {@link #setTimeRange(long, long) setTimeRange}. >>> >>> On Fri, Feb 18, 2011 at 6:48 AM, Jason Rutherglen < >>> [email protected]> wrote: >>> >>>> For search integration we need to, on server reboot scan over key >>>> values since the last Lucene commit, and add them to the index. Is >>>> there an efficient way to do this? >>>> >>> >> >
