That is an interesting comment. How would you enforce this in practice ? Can you give more details.
On Wed, Dec 14, 2011 at 10:29 AM, Carson Hoffacker <[email protected]> wrote: > The timerange scan is able to leverage metadata in each of the HFiles. Each > HFile should store information about the timerange associated with the data > within the HFile. If the the timerange associated with the HFile is > different than the timerange you are interested in, that hfile will be > skipped completely. This can significantly increase scan performance. > > However, when these files get compacted and the data is merged into a > smaller number of files, the time range associated with each file > increases. I don't think it works this way out of the box, but I believe > you can be smart about how you manage compactions over time to get the > behavior that you want. You could have compactions compact all the data > from January 2011 into a single file, and then compact all the data from > February 2011 into a different file. > > -Carson > > On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith <[email protected]> wrote: > >> Hello Thomas, >> >> Someone here could probably provide more help, but to start you off, >> the only way I've filtered timestamps is to do a scan, and just filter out >> rows one by one. This definitely sounds like something coprocessors could >> help with, but I don't really understand those yet, so someone else will >> have to step up.. or you can really dig into the documentation about them >> (AFAIK, it's a little bit of custom code that runs on the regionservers >> that can pre-process your gets.. but don't quote me on that!). >> >> But I can say that a major compaction should not affect them - I've never >> seen it happen, and if it does, I believe that's a bug. >> >> Take care, >> -stu >> >> >> >> ________________________________ >> From: Steinmaurer Thomas <[email protected]> >> To: [email protected] >> Sent: Wednesday, December 14, 2011 12:38 AM >> Subject: Questions on timestamps, insights on how timerange/timestamp >> filter are processed? >> >> Hello, >> >> can anybody share some insights on how timerange/timestamp filters are >> processed? >> >> Basically we intend to use timerange/timestamp filters to process rather >> new data from an insertion timestamp POV >> >> - How does the process of skipping records and/or regions work, if one >> use timerange filters? >> - I also wonder, do timestamp change when e.g. running a major >> compaction? >> - If data grows over the years, is there any chance that regions with >> "older" rows keep "stable" in a way, that they can be skipped very >> quickly when querying data with a timerange filter of e.g. the last >> three yours? >> >> Thanks, >> Thomas >>
