Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?

Sam Seigal Wed, 14 Dec 2011 11:35:23 -0800

That is an interesting comment. How would you enforce this in practice
? Can you give more details.


On Wed, Dec 14, 2011 at 10:29 AM, Carson Hoffacker <[email protected]> wrote:
> The timerange scan is able to leverage metadata in each of the HFiles. Each
> HFile should store information about the timerange associated with the data
> within the HFile. If the the timerange associated with the HFile is
> different than the timerange you are interested in, that hfile will be
> skipped completely. This can significantly increase scan performance.
>
> However, when these files get compacted and the data is merged into a
> smaller number of files, the time range associated with each file
> increases. I don't think it works this way out of the box, but I believe
> you can be smart about how you manage compactions over time to get the
> behavior that you want. You could have compactions compact all the data
> from January 2011 into a single file, and then compact all the data from
> February 2011 into a different file.
>
> -Carson
>
> On Wed, Dec 14, 2011 at 9:39 AM, Stuart Smith <[email protected]> wrote:
>
>> Hello Thomas,
>>
>>    Someone here could probably provide more help, but to start you off,
>> the only way I've filtered timestamps is to do a scan, and just filter out
>> rows one by one. This definitely sounds like something coprocessors could
>> help with, but I don't really understand those yet, so someone else will
>> have to step up.. or you can really dig into the documentation about them
>> (AFAIK, it's a little bit of custom code that runs on the regionservers
>> that can pre-process your gets.. but don't quote me on that!).
>>
>> But I can say that a major compaction should not affect them - I've never
>> seen it happen, and if it does, I believe that's a bug.
>>
>> Take care,
>>   -stu
>>
>>
>>
>> ________________________________
>>  From: Steinmaurer Thomas <[email protected]>
>> To: [email protected]
>> Sent: Wednesday, December 14, 2011 12:38 AM
>> Subject: Questions on timestamps, insights on how timerange/timestamp
>> filter are processed?
>>
>> Hello,
>>
>> can anybody share some insights on how timerange/timestamp filters are
>> processed?
>>
>> Basically we intend to use timerange/timestamp filters to process rather
>> new data from an insertion timestamp POV
>>
>> - How does the process of skipping records and/or regions work, if one
>> use timerange filters?
>> - I also wonder, do timestamp change when e.g. running a major
>> compaction?
>> - If data grows over the years, is there any chance that regions with
>> "older" rows keep "stable" in a way, that they can be skipped very
>> quickly when querying data with a timerange filter of e.g. the last
>> three yours?
>>
>> Thanks,
>> Thomas
>>

Re: Questions on timestamps, insights on how timerange/timestamp filter are processed?

Reply via email to