Hi there-
With respect to:
"* Does it need to hit every memstore and HFile to determine if there
isdata available? And if so does it need to do a full scan of that file to
determine the records qualifying to the timerange, since keys are stored
lexicographically?"
And...
"Using "scan 'table', {TIMERANGE => [t, t+x]}" :"
See...
http://hbase.apache.org/book.html#regions.arch
8.7.5.4. KeyValue
The timestamp is an attribute of the KeyValue, but unless you perform a
restriction using start/stop row it have to process every row.
Major compactions don't change this fact, they just change the number of
HFiles that have to get processed.
On 4/14/12 10:38 AM, "Rob Verkuylen" <[email protected]> wrote:
>I'm trying to find a definitive answer to the question if scans on
>timerange alone will scale when you use uniformly distributed keys like
>UUIDs.
>
>Since the keys are randomly generated that would mean the keys will be
>spread out over all RegionServers, Regions and HFiles. In theory, assuming
>enough writes, that would mean that every HFile will contain the entire
>timerange of writes.
>
>Now before a major compaction, data is in the memstores and (non
>max.filesize) flushed&merged HFiles. I can imagine that a scan using a
>TIMERANGE can quickly serve from memstores and the smaller files, but how
>does it perform after a major compaction?
>
>Using "scan 'table', {TIMERANGE => [t, t+x]}" :
>* How does HBase handle this query in this case(UUIDs)?
>* Does it need to hit every memstore and HFile to determine if there is
>data available? And if so does it need to do a full scan of that file to
>determine the records qualifying to the timerange, since keys are stored
>lexicographically?
>
>I've run some tests on 300+ region tables, on month old data(so after
>major
>compaction) and performance/response seems fairly quick. But I'm trying to
>understand why that is, because hitting every HFile on every region seems
>to be ineffective. Lars' book figure 9-3 seems to indicate this as well,
>but cant seem to get the answer from the book or anywhere else.
>
>Thnx, Rob