I believe it's the same amount of work. On Wed, Dec 14, 2011 at 3:37 PM, Stuart Smith <[email protected]> wrote:
> Ah. Thanks for clarifying my wrong answer.. ! > > The only time I had to deal with timestamps I had to go through the thrift > API ... > Never noticed the setTimeRange in the Scan() java API :) > > So now I'm curious.. If I use this and it can't skip HFiles.. is there any > performance gain from doing this vs doing it client side? > Or is it basically the same amount of work - a full scan checking & > skipping timestamps.. ? > > > Take care, > -stu > > > > ________________________________ > From: Carson Hoffacker <[email protected]> > To: [email protected]; Stuart Smith <[email protected]> > Sent: Wednesday, December 14, 2011 10:29 AM > Subject: Re: Questions on timestamps, insights on how timerange/timestamp > filter are processed? > > The timerange scan is able to leverage metadata in each of the HFiles. Each > HFile should store information about the timerange associated with the data > within the HFile. If the the timerange associated with the HFile is > different than the timerange you are interested in, that hfile will be > skipped completely. This can significantly increase scan performance. > > However, when these files get compacted and the data is merged into a > smaller number of files, the time range associated with each file > increases. I don't think it works this way out of the box, but I believe > you can be smart about how you manage compactions over time to get the > behavior that you want. You could have compactions compact all the data > from January 2011 into a single file, and then compact all the data from > February 2011 into a different file. > > -Carson > > On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith <[email protected]> wrote: > > > Hello Thomas, > > > > Someone here could probably provide more help, but to start you off, > > the only way I've filtered timestamps is to do a scan, and just filter > out > > rows one by one. This definitely sounds like something coprocessors could > > help with, but I don't really understand those yet, so someone else will > > have to step up.. or you can really dig into the documentation about them > > (AFAIK, it's a little bit of custom code that runs on the regionservers > > that can pre-process your gets.. but don't quote me on that!). > > > > But I can say that a major compaction should not affect them - I've never > > seen it happen, and if it does, I believe that's a bug. > > > > Take care, > > -stu > > > > > > > > ________________________________ > > From: Steinmaurer Thomas <[email protected]> > > To: [email protected] > > Sent: Wednesday, December 14, 2011 12:38 AM > > Subject: Questions on timestamps, insights on how timerange/timestamp > > filter are processed? > > > > Hello, > > > > can anybody share some insights on how timerange/timestamp filters are > > processed? > > > > Basically we intend to use timerange/timestamp filters to process rather > > new data from an insertion timestamp POV > > > > - How does the process of skipping records and/or regions work, if one > > use timerange filters? > > - I also wonder, do timestamp change when e.g. running a major > > compaction? > > - If data grows over the years, is there any chance that regions with > > "older" rows keep "stable" in a way, that they can be skipped very > > quickly when querying data with a timerange filter of e.g. the last > > three yours? > > > > Thanks, > > Thomas > > >
