RE: Performance characteristics of scans using timestamp as the filter

Srikanth P. Shreenivas Thu, 01 Dec 2011 06:03:39 -0800

So, will it be safe to assume that Scan queries with TimeRange will perform 
well and will read only necessary portions of the tables instead of doing full 
table scan?

I have run into a situation, wherein I would like to find out all rows that got 
create/updated on during a time range.
I was hoping that I could to time range scan.

Regards,
Srikanth

-----Original Message-----
From: Stuti Awasthi [mailto:[email protected]]
Sent: Monday, October 10, 2011 3:44 PM
To: [email protected]
Subject: RE: Performance characteristics of scans using timestamp as the filter

Yes its true.
Your cluster time should be in sync for reliable functioning.

-----Original Message-----
From: Steinmaurer Thomas [mailto:[email protected]]
Sent: Monday, October 10, 2011 3:04 PM
To: [email protected]
Subject: RE: Performance characteristics of scans using timestamp as the filter

Isn't a synchronized time along all nodes a general requirement for running the 
cluster reliably?

Regards,
Thomas

-----Original Message-----
From: Stuti Awasthi [mailto:[email protected]]
Sent: Montag, 10. Oktober 2011 11:18
To: [email protected]
Subject: RE: Performance characteristics of scans using timestamp as the filter

Steinmaurer,

I have done a little POC with Timerange scan and it worked fine for me.
Another thing to note is time should be same on all machines of your cluster of 
Hbase.

-----Original Message-----
From: Steinmaurer Thomas [mailto:[email protected]]
Sent: Monday, October 10, 2011 2:32 PM
To: [email protected]
Subject: RE: Performance characteristics of scans using timestamp as the filter

Hello,

others have stated that one shouldn't try to use timestamps, although I haven't 
figured out why? If it's reliability, which means, rows are omitted, even if 
they should be included in a timerange-based scan, then this might be a good 
argument. ;-)

One thing is that the timestamp AFAIK changes when you update a row even cell 
values didn't change.

Regards,
Thomas

-----Original Message-----
From: Stuti Awasthi [mailto:[email protected]]
Sent: Montag, 10. Oktober 2011 10:07
To: [email protected]
Subject: RE: Performance characteristics of scans using timestamp as the filter

Hi Saurabh,

AFAIK you can also scan on the basis of Timestamp Range. This can provide you 
data update in that timestamp range. You do not need to keep timestamp in you 
row key.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Sam 
Seigal
Sent: Monday, October 10, 2011 1:20 PM
To: [email protected]
Subject: Re: Performance characteristics of scans using timestamp as the filter

Is it possible to do incremental processing without putting the timestamp in 
the leading part of the row key in a more efficient manner i.e. process data 
that came within the last hour/ 2 hour etc ? I can't seem to find a good answer 
to this question myself.

On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas < 
[email protected]> wrote:

> Leif,
>
> we are pretty much in the same boat with a custom timestamp at the end

> of a three-part rowkey, so basically we end up with reading all data
> when processing daily batches. Beside performance aspects, have you
> seen that using internals timestamps for scans etc... work reliable?
>
> Or did you come up with another solution to your problem?
>
> Thanks,
> Thomas
>
> -----Original Message-----
> From: Leif Wickland [mailto:[email protected]]
> Sent: Freitag, 09. September 2011 20:33
> To: [email protected]
> Subject: Performance characteristics of scans using timestamp as the
> filter
>
> (Apologies if this has been answered before.  I couldn't find anything

> in the archives quite along these lines.)
>
> I have a process which writes to HBase as new data arrives.  I'd like
> to run a map-reduce periodically, say daily, that takes the new items
as input.
>  A naive approach would use a scan which grabs all of the rows that
> have a timestamp in a specified interval as the input to a MapReduce.
> I tested a scenario like that with 10s of GB of data and it seemed to
perform OK.
>  Should I expected that approach to continue to perform reasonably
> well when I have TBs of data?
>
> From what I understand of the HBase architecture, I don't see a reason

> that the the scan approach would continue to perform well as the data
> grows.  It seems like I may have to keep a log of modified keys and
> use that as the map-reduce input, instead.
>
> Thanks,
>
> Leif Wickland
>

::DISCLAIMER::
------------------------------------------------------------------------
-----------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in this email are solely those of the author 
and may not necessarily reflect the opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of this message without the prior written 
consent of the author of this e-mail is strictly prohibited. If you have 
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and attachments please check them for 
viruses and defect.

------------------------------------------------------------------------
-----------------------------------------------

________________________________

http://www.mindtree.com/email/disclaimer.html

RE: Performance characteristics of scans using timestamp as the filter

Reply via email to