Re: Very slow Scan performance using Filters

Ryan Rawson Wed, 11 May 2011 23:21:45 -0700

Scans are in serial.

To use DB parlance, consider a Scan + filter the moral equivalent of a
"SELECT * FROM <> WHERE col='val'" with no index, and a full table
scan is engaged.


The typical ways to help solve performance issues are such:
- arrange your data using the primary key so you can scan the smallest
portion of the table possible.
- use another table as an index. Unfortunately HBase doesn't help you here.

-ryan

On Wed, May 11, 2011 at 11:12 PM, Connolly Juhani <[email protected]> wrote:
> By naming rows from the timestamp the rowids are going to all be sequential
> when inserting. So all new inserts will be going into the same region. When
> checking the last 30 days you will also be reading from the same region
> where all the writing is happening, i.e the one that is already busy writing
> the edit log for all those entries. You might want to consider an
> alternative method of naming your rows that would result in more distributed
> reading/writing.
> However since you are naming rows by timestamps, you should be able to
> restrict the scan by a start and end date. You are doing this, right? If
> you're not, you are scanning every row in the table when you only need the
> rows from end-start.
>
> Someone may need to correct me, but based on my memory of the implementation
> scans are entirely sequential, so region a gets scanned, then b, then c. You
> could speed this up by scanning multiple regions in parallel processes and
> merging the results.
>
> On 12 May 2011 14:36, Himanish Kushary <[email protected]> wrote:
>
>> Hi,
>>
>> We have a table split across multiple regions(approx 50-60 regions for 64
>> MB
>> split size) with rowid schema as
>> [ReverseTimestamp/itemtimestamp/customerid/itemid].This stores the
>> activities for an item for a customer.We have lots of data for lots of item
>> for a custoer in this table.
>>
>> When we try to lookup activities for an item for the last 30 days from this
>> table , we are using a Scan with RowFilter and RegexComparator.The scan
>> takes a lot of time ( almost 15-20 secs) to get us the activities for an
>> item.
>>
>> We are hooked up to HBase tables directly from a web application,so this
>> response time of around 20 secs is unacceptable.We also noticed that
>> whenever we do any scan kind of operation it is never in acceptable ranges
>> for a web application.
>>
>> Are we doing something wrong ? If Hbase scans are so slow then it would be
>> real hard to hook it up directly with any web application.
>>
>> Could somebody please suggest how to improve this or some other
>> options(design,architectural) to remedy this kind of issues dealing with
>> lot
>> of data.
>>
>> Note: We have tried with setCaching,SingleColumnValueFilter to no
>> significant effect.
>>
>> ---------------------------
>> Thanks & Regards
>> Himanish
>>
>

Re: Very slow Scan performance using Filters

Reply via email to