Yes its true. Your cluster time should be in sync for reliable functioning.
-----Original Message----- From: Steinmaurer Thomas [mailto:[email protected]] Sent: Monday, October 10, 2011 3:04 PM To: [email protected] Subject: RE: Performance characteristics of scans using timestamp as the filter Isn't a synchronized time along all nodes a general requirement for running the cluster reliably? Regards, Thomas -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Montag, 10. Oktober 2011 11:18 To: [email protected] Subject: RE: Performance characteristics of scans using timestamp as the filter Steinmaurer, I have done a little POC with Timerange scan and it worked fine for me. Another thing to note is time should be same on all machines of your cluster of Hbase. -----Original Message----- From: Steinmaurer Thomas [mailto:[email protected]] Sent: Monday, October 10, 2011 2:32 PM To: [email protected] Subject: RE: Performance characteristics of scans using timestamp as the filter Hello, others have stated that one shouldn't try to use timestamps, although I haven't figured out why? If it's reliability, which means, rows are omitted, even if they should be included in a timerange-based scan, then this might be a good argument. ;-) One thing is that the timestamp AFAIK changes when you update a row even cell values didn't change. Regards, Thomas -----Original Message----- From: Stuti Awasthi [mailto:[email protected]] Sent: Montag, 10. Oktober 2011 10:07 To: [email protected] Subject: RE: Performance characteristics of scans using timestamp as the filter Hi Saurabh, AFAIK you can also scan on the basis of Timestamp Range. This can provide you data update in that timestamp range. You do not need to keep timestamp in you row key. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Sam Seigal Sent: Monday, October 10, 2011 1:20 PM To: [email protected] Subject: Re: Performance characteristics of scans using timestamp as the filter Is it possible to do incremental processing without putting the timestamp in the leading part of the row key in a more efficient manner i.e. process data that came within the last hour/ 2 hour etc ? I can't seem to find a good answer to this question myself. On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas < [email protected]> wrote: > Leif, > > we are pretty much in the same boat with a custom timestamp at the end > of a three-part rowkey, so basically we end up with reading all data > when processing daily batches. Beside performance aspects, have you > seen that using internals timestamps for scans etc... work reliable? > > Or did you come up with another solution to your problem? > > Thanks, > Thomas > > -----Original Message----- > From: Leif Wickland [mailto:[email protected]] > Sent: Freitag, 09. September 2011 20:33 > To: [email protected] > Subject: Performance characteristics of scans using timestamp as the > filter > > (Apologies if this has been answered before. I couldn't find anything > in the archives quite along these lines.) > > I have a process which writes to HBase as new data arrives. I'd like > to run a map-reduce periodically, say daily, that takes the new items as input. > A naive approach would use a scan which grabs all of the rows that > have a timestamp in a specified interval as the input to a MapReduce. > I tested a scenario like that with 10s of GB of data and it seemed to perform OK. > Should I expected that approach to continue to perform reasonably > well when I have TBs of data? > > From what I understand of the HBase architecture, I don't see a reason > that the the scan approach would continue to perform well as the data > grows. It seems like I may have to keep a log of modified keys and > use that as the map-reduce input, instead. > > Thanks, > > Leif Wickland > ::DISCLAIMER:: ------------------------------------------------------------------------ ----------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any mail and attachments please check them for viruses and defect. ------------------------------------------------------------------------ -----------------------------------------------
