Re: Performance characteristics of scans using timestamp as the filter

Doug Meil Thu, 01 Dec 2011 07:34:03 -0800

Scans work on startRow/stopRow...

http://hbase.apache.org/book.html#scan


... you can also select by timestamp *within the startRow/stopRow
selection*, but this isn't intended to quickly select rows by timestamp
irrespective of their keys.




On 12/1/11 9:03 AM, "Srikanth P. Shreenivas"
<[email protected]> wrote:

>So, will it be safe to assume that Scan queries with TimeRange will
>perform well and will read only necessary portions of the tables instead
>of doing full table scan?
>
>I have run into a situation, wherein I would like to find out all rows
>that got create/updated on during a time range.
>I was hoping that I could to time range scan.
>
>Regards,
>Srikanth
>
>
>
>-----Original Message-----
>From: Stuti Awasthi [mailto:[email protected]]
>Sent: Monday, October 10, 2011 3:44 PM
>To: [email protected]
>Subject: RE: Performance characteristics of scans using timestamp as the
>filter
>
>Yes its true.
>Your cluster time should be in sync for reliable functioning.
>
>-----Original Message-----
>From: Steinmaurer Thomas [mailto:[email protected]]
>Sent: Monday, October 10, 2011 3:04 PM
>To: [email protected]
>Subject: RE: Performance characteristics of scans using timestamp as the
>filter
>
>Isn't a synchronized time along all nodes a general requirement for
>running the cluster reliably?
>
>Regards,
>Thomas
>
>-----Original Message-----
>From: Stuti Awasthi [mailto:[email protected]]
>Sent: Montag, 10. Oktober 2011 11:18
>To: [email protected]
>Subject: RE: Performance characteristics of scans using timestamp as the
>filter
>
>Steinmaurer,
>
>I have done a little POC with Timerange scan and it worked fine for me.
>Another thing to note is time should be same on all machines of your
>cluster of Hbase.
>
>-----Original Message-----
>From: Steinmaurer Thomas [mailto:[email protected]]
>Sent: Monday, October 10, 2011 2:32 PM
>To: [email protected]
>Subject: RE: Performance characteristics of scans using timestamp as the
>filter
>
>Hello,
>
>others have stated that one shouldn't try to use timestamps, although I
>haven't figured out why? If it's reliability, which means, rows are
>omitted, even if they should be included in a timerange-based scan, then
>this might be a good argument. ;-)
>
>One thing is that the timestamp AFAIK changes when you update a row even
>cell values didn't change.
>
>Regards,
>Thomas
>
>-----Original Message-----
>From: Stuti Awasthi [mailto:[email protected]]
>Sent: Montag, 10. Oktober 2011 10:07
>To: [email protected]
>Subject: RE: Performance characteristics of scans using timestamp as the
>filter
>
>Hi Saurabh,
>
>AFAIK you can also scan on the basis of Timestamp Range. This can provide
>you data update in that timestamp range. You do not need to keep
>timestamp in you row key.
>
>-----Original Message-----
>From: [email protected] [mailto:[email protected]] On Behalf Of
>Sam Seigal
>Sent: Monday, October 10, 2011 1:20 PM
>To: [email protected]
>Subject: Re: Performance characteristics of scans using timestamp as the
>filter
>
>Is it possible to do incremental processing without putting the timestamp
>in the leading part of the row key in a more efficient manner i.e.
>process data that came within the last hour/ 2 hour etc ? I can't seem to
>find a good answer to this question myself.
>
>On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas <
>[email protected]> wrote:
>
>> Leif,
>>
>> we are pretty much in the same boat with a custom timestamp at the end
>
>> of a three-part rowkey, so basically we end up with reading all data
>> when processing daily batches. Beside performance aspects, have you
>> seen that using internals timestamps for scans etc... work reliable?
>>
>> Or did you come up with another solution to your problem?
>>
>> Thanks,
>> Thomas
>>
>> -----Original Message-----
>> From: Leif Wickland [mailto:[email protected]]
>> Sent: Freitag, 09. September 2011 20:33
>> To: [email protected]
>> Subject: Performance characteristics of scans using timestamp as the
>> filter
>>
>> (Apologies if this has been answered before.  I couldn't find anything
>
>> in the archives quite along these lines.)
>>
>> I have a process which writes to HBase as new data arrives.  I'd like
>> to run a map-reduce periodically, say daily, that takes the new items
>as input.
>>  A naive approach would use a scan which grabs all of the rows that
>> have a timestamp in a specified interval as the input to a MapReduce.
>> I tested a scenario like that with 10s of GB of data and it seemed to
>perform OK.
>>  Should I expected that approach to continue to perform reasonably
>> well when I have TBs of data?
>>
>> From what I understand of the HBase architecture, I don't see a reason
>
>> that the the scan approach would continue to perform well as the data
>> grows.  It seems like I may have to keep a log of modified keys and
>> use that as the map-reduce input, instead.
>>
>> Thanks,
>>
>> Leif Wickland
>>
>
>::DISCLAIMER::
>------------------------------------------------------------------------
>-----------------------------------------------
>
>The contents of this e-mail and any attachment(s) are confidential and
>intended for the named recipient(s) only.
>It shall not attach any liability on the originator or HCL or its
>affiliates. Any views or opinions presented in this email are solely
>those of the author and may not necessarily reflect the opinions of HCL
>or its affiliates.
>Any form of reproduction, dissemination, copying, disclosure,
>modification, distribution and / or publication of this message without
>the prior written consent of the author of this e-mail is strictly
>prohibited. If you have received this email in error please delete it and
>notify the sender immediately. Before opening any mail and attachments
>please check them for viruses and defect.
>
>------------------------------------------------------------------------
>-----------------------------------------------
>
>________________________________
>
>http://www.mindtree.com/email/disclaimer.html
>

Re: Performance characteristics of scans using timestamp as the filter

Reply via email to