Scans are in serial. To use DB parlance, consider a Scan + filter the moral equivalent of a "SELECT * FROM <> WHERE col='val'" with no index, and a full table scan is engaged.
The typical ways to help solve performance issues are such: - arrange your data using the primary key so you can scan the smallest portion of the table possible. - use another table as an index. Unfortunately HBase doesn't help you here. -ryan On Wed, May 11, 2011 at 11:12 PM, Connolly Juhani <[email protected]> wrote: > By naming rows from the timestamp the rowids are going to all be sequential > when inserting. So all new inserts will be going into the same region. When > checking the last 30 days you will also be reading from the same region > where all the writing is happening, i.e the one that is already busy writing > the edit log for all those entries. You might want to consider an > alternative method of naming your rows that would result in more distributed > reading/writing. > However since you are naming rows by timestamps, you should be able to > restrict the scan by a start and end date. You are doing this, right? If > you're not, you are scanning every row in the table when you only need the > rows from end-start. > > Someone may need to correct me, but based on my memory of the implementation > scans are entirely sequential, so region a gets scanned, then b, then c. You > could speed this up by scanning multiple regions in parallel processes and > merging the results. > > On 12 May 2011 14:36, Himanish Kushary <[email protected]> wrote: > >> Hi, >> >> We have a table split across multiple regions(approx 50-60 regions for 64 >> MB >> split size) with rowid schema as >> [ReverseTimestamp/itemtimestamp/customerid/itemid].This stores the >> activities for an item for a customer.We have lots of data for lots of item >> for a custoer in this table. >> >> When we try to lookup activities for an item for the last 30 days from this >> table , we are using a Scan with RowFilter and RegexComparator.The scan >> takes a lot of time ( almost 15-20 secs) to get us the activities for an >> item. >> >> We are hooked up to HBase tables directly from a web application,so this >> response time of around 20 secs is unacceptable.We also noticed that >> whenever we do any scan kind of operation it is never in acceptable ranges >> for a web application. >> >> Are we doing something wrong ? If Hbase scans are so slow then it would be >> real hard to hook it up directly with any web application. >> >> Could somebody please suggest how to improve this or some other >> options(design,architectural) to remedy this kind of issues dealing with >> lot >> of data. >> >> Note: We have tried with setCaching,SingleColumnValueFilter to no >> significant effect. >> >> --------------------------- >> Thanks & Regards >> Himanish >> >
