By naming rows from the timestamp the rowids are going to all be sequential when inserting. So all new inserts will be going into the same region. When checking the last 30 days you will also be reading from the same region where all the writing is happening, i.e the one that is already busy writing the edit log for all those entries. You might want to consider an alternative method of naming your rows that would result in more distributed reading/writing. However since you are naming rows by timestamps, you should be able to restrict the scan by a start and end date. You are doing this, right? If you're not, you are scanning every row in the table when you only need the rows from end-start.
Someone may need to correct me, but based on my memory of the implementation scans are entirely sequential, so region a gets scanned, then b, then c. You could speed this up by scanning multiple regions in parallel processes and merging the results. On 12 May 2011 14:36, Himanish Kushary <[email protected]> wrote: > Hi, > > We have a table split across multiple regions(approx 50-60 regions for 64 > MB > split size) with rowid schema as > [ReverseTimestamp/itemtimestamp/customerid/itemid].This stores the > activities for an item for a customer.We have lots of data for lots of item > for a custoer in this table. > > When we try to lookup activities for an item for the last 30 days from this > table , we are using a Scan with RowFilter and RegexComparator.The scan > takes a lot of time ( almost 15-20 secs) to get us the activities for an > item. > > We are hooked up to HBase tables directly from a web application,so this > response time of around 20 secs is unacceptable.We also noticed that > whenever we do any scan kind of operation it is never in acceptable ranges > for a web application. > > Are we doing something wrong ? If Hbase scans are so slow then it would be > real hard to hook it up directly with any web application. > > Could somebody please suggest how to improve this or some other > options(design,architectural) to remedy this kind of issues dealing with > lot > of data. > > Note: We have tried with setCaching,SingleColumnValueFilter to no > significant effect. > > --------------------------- > Thanks & Regards > Himanish >
