Hi Saurabh, AFAIK you can also scan on the basis of Timestamp Range. This can provide you data update in that timestamp range. You do not need to keep timestamp in you row key.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Sam Seigal Sent: Monday, October 10, 2011 1:20 PM To: [email protected] Subject: Re: Performance characteristics of scans using timestamp as the filter Is it possible to do incremental processing without putting the timestamp in the leading part of the row key in a more efficient manner i.e. process data that came within the last hour/ 2 hour etc ? I can't seem to find a good answer to this question myself. On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas < [email protected]> wrote: > Leif, > > we are pretty much in the same boat with a custom timestamp at the end > of a three-part rowkey, so basically we end up with reading all data > when processing daily batches. Beside performance aspects, have you > seen that using internals timestamps for scans etc... work reliable? > > Or did you come up with another solution to your problem? > > Thanks, > Thomas > > -----Original Message----- > From: Leif Wickland [mailto:[email protected]] > Sent: Freitag, 09. September 2011 20:33 > To: [email protected] > Subject: Performance characteristics of scans using timestamp as the > filter > > (Apologies if this has been answered before. I couldn't find anything > in the archives quite along these lines.) > > I have a process which writes to HBase as new data arrives. I'd like > to run a map-reduce periodically, say daily, that takes the new items as > input. > A naive approach would use a scan which grabs all of the rows that > have a timestamp in a specified interval as the input to a MapReduce. > I tested a scenario like that with 10s of GB of data and it seemed to perform > OK. > Should I expected that approach to continue to perform reasonably > well when I have TBs of data? > > From what I understand of the HBase architecture, I don't see a reason > that the the scan approach would continue to perform well as the data > grows. It seems like I may have to keep a log of modified keys and > use that as the map-reduce input, instead. > > Thanks, > > Leif Wickland > ::DISCLAIMER:: ----------------------------------------------------------------------------------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. -----------------------------------------------------------------------------------------------------------------------
