Scans work on startRow/stopRow... http://hbase.apache.org/book.html#scan
... you can also select by timestamp *within the startRow/stopRow selection*, but this isn't intended to quickly select rows by timestamp irrespective of their keys. On 12/1/11 9:03 AM, "Srikanth P. Shreenivas" <[email protected]> wrote: >So, will it be safe to assume that Scan queries with TimeRange will >perform well and will read only necessary portions of the tables instead >of doing full table scan? > >I have run into a situation, wherein I would like to find out all rows >that got create/updated on during a time range. >I was hoping that I could to time range scan. > >Regards, >Srikanth > > > >-----Original Message----- >From: Stuti Awasthi [mailto:[email protected]] >Sent: Monday, October 10, 2011 3:44 PM >To: [email protected] >Subject: RE: Performance characteristics of scans using timestamp as the >filter > >Yes its true. >Your cluster time should be in sync for reliable functioning. > >-----Original Message----- >From: Steinmaurer Thomas [mailto:[email protected]] >Sent: Monday, October 10, 2011 3:04 PM >To: [email protected] >Subject: RE: Performance characteristics of scans using timestamp as the >filter > >Isn't a synchronized time along all nodes a general requirement for >running the cluster reliably? > >Regards, >Thomas > >-----Original Message----- >From: Stuti Awasthi [mailto:[email protected]] >Sent: Montag, 10. Oktober 2011 11:18 >To: [email protected] >Subject: RE: Performance characteristics of scans using timestamp as the >filter > >Steinmaurer, > >I have done a little POC with Timerange scan and it worked fine for me. >Another thing to note is time should be same on all machines of your >cluster of Hbase. > >-----Original Message----- >From: Steinmaurer Thomas [mailto:[email protected]] >Sent: Monday, October 10, 2011 2:32 PM >To: [email protected] >Subject: RE: Performance characteristics of scans using timestamp as the >filter > >Hello, > >others have stated that one shouldn't try to use timestamps, although I >haven't figured out why? If it's reliability, which means, rows are >omitted, even if they should be included in a timerange-based scan, then >this might be a good argument. ;-) > >One thing is that the timestamp AFAIK changes when you update a row even >cell values didn't change. > >Regards, >Thomas > >-----Original Message----- >From: Stuti Awasthi [mailto:[email protected]] >Sent: Montag, 10. Oktober 2011 10:07 >To: [email protected] >Subject: RE: Performance characteristics of scans using timestamp as the >filter > >Hi Saurabh, > >AFAIK you can also scan on the basis of Timestamp Range. This can provide >you data update in that timestamp range. You do not need to keep >timestamp in you row key. > >-----Original Message----- >From: [email protected] [mailto:[email protected]] On Behalf Of >Sam Seigal >Sent: Monday, October 10, 2011 1:20 PM >To: [email protected] >Subject: Re: Performance characteristics of scans using timestamp as the >filter > >Is it possible to do incremental processing without putting the timestamp >in the leading part of the row key in a more efficient manner i.e. >process data that came within the last hour/ 2 hour etc ? I can't seem to >find a good answer to this question myself. > >On Mon, Oct 10, 2011 at 12:09 AM, Steinmaurer Thomas < >[email protected]> wrote: > >> Leif, >> >> we are pretty much in the same boat with a custom timestamp at the end > >> of a three-part rowkey, so basically we end up with reading all data >> when processing daily batches. Beside performance aspects, have you >> seen that using internals timestamps for scans etc... work reliable? >> >> Or did you come up with another solution to your problem? >> >> Thanks, >> Thomas >> >> -----Original Message----- >> From: Leif Wickland [mailto:[email protected]] >> Sent: Freitag, 09. September 2011 20:33 >> To: [email protected] >> Subject: Performance characteristics of scans using timestamp as the >> filter >> >> (Apologies if this has been answered before. I couldn't find anything > >> in the archives quite along these lines.) >> >> I have a process which writes to HBase as new data arrives. I'd like >> to run a map-reduce periodically, say daily, that takes the new items >as input. >> A naive approach would use a scan which grabs all of the rows that >> have a timestamp in a specified interval as the input to a MapReduce. >> I tested a scenario like that with 10s of GB of data and it seemed to >perform OK. >> Should I expected that approach to continue to perform reasonably >> well when I have TBs of data? >> >> From what I understand of the HBase architecture, I don't see a reason > >> that the the scan approach would continue to perform well as the data >> grows. It seems like I may have to keep a log of modified keys and >> use that as the map-reduce input, instead. >> >> Thanks, >> >> Leif Wickland >> > >::DISCLAIMER:: >------------------------------------------------------------------------ >----------------------------------------------- > >The contents of this e-mail and any attachment(s) are confidential and >intended for the named recipient(s) only. >It shall not attach any liability on the originator or HCL or its >affiliates. Any views or opinions presented in this email are solely >those of the author and may not necessarily reflect the opinions of HCL >or its affiliates. >Any form of reproduction, dissemination, copying, disclosure, >modification, distribution and / or publication of this message without >the prior written consent of the author of this e-mail is strictly >prohibited. If you have received this email in error please delete it and >notify the sender immediately. Before opening any mail and attachments >please check them for viruses and defect. > >------------------------------------------------------------------------ >----------------------------------------------- > >________________________________ > >http://www.mindtree.com/email/disclaimer.html >
