Hi, For the filtering part, every HFile is associated to a set of meta data. This meta data includes the timerange. So if there is no overlap between the time range you want and the time range of the store, the HFile is totally skipped.
This work is done in StoreScanner#selectScannersFrom Cheers, N. On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil <[email protected]>wrote: > Hi there- > > With respect to: > > "* Does it need to hit every memstore and HFile to determine if there > isdata available? And if so does it need to do a full scan of that file to > determine the records qualifying to the timerange, since keys are stored > lexicographically?" > > And... > > "Using "scan 'table', {TIMERANGE => [t, t+x]}" :" > See... > > > http://hbase.apache.org/book.html#regions.arch > 8.7.5.4. KeyValue > > > > The timestamp is an attribute of the KeyValue, but unless you perform a > restriction using start/stop row it have to process every row. > > Major compactions don't change this fact, they just change the number of > HFiles that have to get processed. > > > > On 4/14/12 10:38 AM, "Rob Verkuylen" <[email protected]> wrote: > > >I'm trying to find a definitive answer to the question if scans on > >timerange alone will scale when you use uniformly distributed keys like > >UUIDs. > > > >Since the keys are randomly generated that would mean the keys will be > >spread out over all RegionServers, Regions and HFiles. In theory, assuming > >enough writes, that would mean that every HFile will contain the entire > >timerange of writes. > > > >Now before a major compaction, data is in the memstores and (non > >max.filesize) flushed&merged HFiles. I can imagine that a scan using a > >TIMERANGE can quickly serve from memstores and the smaller files, but how > >does it perform after a major compaction? > > > >Using "scan 'table', {TIMERANGE => [t, t+x]}" : > >* How does HBase handle this query in this case(UUIDs)? > >* Does it need to hit every memstore and HFile to determine if there is > >data available? And if so does it need to do a full scan of that file to > >determine the records qualifying to the timerange, since keys are stored > >lexicographically? > > > >I've run some tests on 300+ region tables, on month old data(so after > >major > >compaction) and performance/response seems fairly quick. But I'm trying to > >understand why that is, because hitting every HFile on every region seems > >to be ineffective. Lars' book figure 9-3 seems to indicate this as well, > >but cant seem to get the answer from the book or anywhere else. > > > >Thnx, Rob > > >
