Thanks, also referring documentation from link you posted (13.6.5.) I have applied those filters.

On 01.05.2011 20:44, Doug Meil wrote:
Another thing is be careful about CF/attributes you have in the Scan.  If you 
add a column family (scan.addFamily) , it will pull *all* the attributes of 
that column family.  If you only care about a row-count, pick only one very 
small attribute from the row.


-----Original Message-----
From: Wojciech Langiewicz [mailto:[email protected]]
Sent: Sunday, May 01, 2011 2:12 PM
To: [email protected]
Subject: Re: Row count without iterating over ResultScanner?

Yes, I was using default caching, setting this value to few thousands made 
significant difference in performance, I'll experiment more with this option.

Right now I want to stay away from MR, mainly because of cluster warm-up time, 
and I want to get results almost real-time (few seconds max).

Thanks for the tip on caching!

On 01.05.2011 19:55, Doug Meil wrote:
What caching value are you using on the scan?  If you aren't setting this, it's 
probably using the default - which is 1.  Which is slow.   
http://hbase.apache.org/book.html#d379e3504

Re:  "I would like to use HBase API, not MR job (because this cluster only has HDFS 
and HBase installed)."

For Very Large tables you want to start using an MR job for this.


-----Original Message-----
From: Wojciech Langiewicz [mailto:[email protected]]
Sent: Sunday, May 01, 2011 9:44 AM
To: [email protected]
Subject: Row count without iterating over ResultScanner?

Hi,
I would like to know if there's a way to quickly count number of rows from scan 
result?
Right now I'm iterating over ResultScanner like this:
int count = 0;
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
        ++count;
}
But with number of rows reaching millions this takes a while.
I tried to find something in documentation, but I didn't found anything.
I would like to use HBase API, not MR job (because this cluster only has HDFS 
and HBase installed).

Thanks for all help.

--
Wojciech Langiewicz


Reply via email to