RE: Performance characteristics of scans using timestamp as the filter

Stuti Awasthi Mon, 10 Oct 2011 01:07:56 -0700

Hi Saurabh,

AFAIK you can also scan on the basis of Timestamp Range. This can provide you 
data update in that timestamp range. You do not need to keep timestamp in you 
row key.

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Sam 
Seigal
Sent: Monday, October 10, 2011 1:20 PM
To: [email protected]
Subject: Re: Performance characteristics of scans using timestamp as the filter

Is it possible to do incremental processing without putting the timestamp in 
the leading part of the row key in a more efficient manner  i.e. process data 
that came within the last hour/ 2 hour etc ? I can't seem to find a good answer 
to this question myself.

On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas < 
[email protected]> wrote:

> Leif,
>
> we are pretty much in the same boat with a custom timestamp at the end
> of a three-part rowkey, so basically we end up with reading all data
> when processing daily batches. Beside performance aspects, have you
> seen that using internals timestamps for scans etc... work reliable?
>
> Or did you come up with another solution to your problem?
>
> Thanks,
> Thomas
>
> -----Original Message-----
> From: Leif Wickland [mailto:[email protected]]
> Sent: Freitag, 09. September 2011 20:33
> To: [email protected]
> Subject: Performance characteristics of scans using timestamp as the
> filter
>
> (Apologies if this has been answered before.  I couldn't find anything
> in the archives quite along these lines.)
>
> I have a process which writes to HBase as new data arrives.  I'd like
> to run a map-reduce periodically, say daily, that takes the new items as 
> input.
>  A naive approach would use a scan which grabs all of the rows that
> have a timestamp in a specified interval as the input to a MapReduce.
> I tested a scenario like that with 10s of GB of data and it seemed to perform 
> OK.
>  Should I expected that approach to continue to perform reasonably
> well when I have TBs of data?
>
> From what I understand of the HBase architecture, I don't see a reason
> that the the scan approach would continue to perform well as the data
> grows.  It seems like I may have to keep a log of modified keys and
> use that as the map-reduce input, instead.
>
> Thanks,
>
> Leif Wickland
>

::DISCLAIMER::
-----------------------------------------------------------------------------------------------------------------------

The contents of this e-mail and any attachment(s) are confidential and intended 
for the named recipient(s) only.
It shall not attach any liability on the originator or HCL or its affiliates. 
Any views or opinions presented in
this email are solely those of the author and may not necessarily reflect the 
opinions of HCL or its affiliates.
Any form of reproduction, dissemination, copying, disclosure, modification, 
distribution and / or publication of
this message without the prior written consent of the author of this e-mail is 
strictly prohibited. If you have
received this email in error please delete it and notify the sender 
immediately. Before opening any mail and
attachments please check them for viruses and defect.

-----------------------------------------------------------------------------------------------------------------------

RE: Performance characteristics of scans using timestamp as the filter

Reply via email to