As far as I understand sequential keys with a timerange scan have the best read performance possible, because of the HFile metadata, just as N indicates. Maybe adding Bloomfilters can further up the performance.
Still, in my case with random keys I get quick(sub second) response from my scan example earlier. Does HBase keep all the HFile metadata in memory? I can't imagine it will start hitting hundreds, potentially thousands of HFiles, reading their metadata, start full scanning the files and returning rows. Does it? I'm trying to see if I can use this type of scan as a polling mechanism for returning all rows since time X, given the fact of using random keys. Pre major compaction I can see this working quite well, but don't know if it will work in large scale after major compactions have taken place. On Sat, Apr 14, 2012 at 8:04 PM, Doug Meil <[email protected]>wrote: > > Thanks N! That's a good point. I'll update the RefGuide with that. > > So if the data is evenly distributed (and evenly old per HFile) you still > have the same problem, but it's conceivable that could not be the case. > This is a case where monotonically increasing keys would actually help you. > > > > > > On 4/14/12 11:57 AM, "N Keywal" <[email protected]> wrote: > > >Hi, > > > >For the filtering part, every HFile is associated to a set of meta data. > >This meta data includes the timerange. So if there is no overlap between > >the time range you want and the time range of the store, the HFile is > >totally skipped. > > > >This work is done in StoreScanner#selectScannersFrom > > > >Cheers, > > > >N. > > > > > >On Sat, Apr 14, 2012 at 5:11 PM, Doug Meil > ><[email protected]>wrote: > > > >> Hi there- > >> > >> With respect to: > >> > >> "* Does it need to hit every memstore and HFile to determine if there > >> isdata available? And if so does it need to do a full scan of that file > >>to > >> determine the records qualifying to the timerange, since keys are stored > >> lexicographically?" > >> > >> And... > >> > >> "Using "scan 'table', {TIMERANGE => [t, t+x]}" :" > >> See... > >> > >> > >> http://hbase.apache.org/book.html#regions.arch > >> 8.7.5.4. KeyValue > >> > >> > >> > >> The timestamp is an attribute of the KeyValue, but unless you perform a > >> restriction using start/stop row it have to process every row. > >> > >> Major compactions don't change this fact, they just change the number of > >> HFiles that have to get processed. > >> > >> > >> > >> On 4/14/12 10:38 AM, "Rob Verkuylen" <[email protected]> wrote: > >> > >> >I'm trying to find a definitive answer to the question if scans on > >> >timerange alone will scale when you use uniformly distributed keys like > >> >UUIDs. > >> > > >> >Since the keys are randomly generated that would mean the keys will be > >> >spread out over all RegionServers, Regions and HFiles. In theory, > >>assuming > >> >enough writes, that would mean that every HFile will contain the entire > >> >timerange of writes. > >> > > >> >Now before a major compaction, data is in the memstores and (non > >> >max.filesize) flushed&merged HFiles. I can imagine that a scan using a > >> >TIMERANGE can quickly serve from memstores and the smaller files, but > >>how > >> >does it perform after a major compaction? > >> > > >> >Using "scan 'table', {TIMERANGE => [t, t+x]}" : > >> >* How does HBase handle this query in this case(UUIDs)? > >> >* Does it need to hit every memstore and HFile to determine if there is > >> >data available? And if so does it need to do a full scan of that file > >>to > >> >determine the records qualifying to the timerange, since keys are > >>stored > >> >lexicographically? > >> > > >> >I've run some tests on 300+ region tables, on month old data(so after > >> >major > >> >compaction) and performance/response seems fairly quick. But I'm > >>trying to > >> >understand why that is, because hitting every HFile on every region > >>seems > >> >to be ineffective. Lars' book figure 9-3 seems to indicate this as > >>well, > >> >but cant seem to get the answer from the book or anywhere else. > >> > > >> >Thnx, Rob > >> > >> > >> > > >
