Another thing is be careful about CF/attributes you have in the Scan. If you add a column family (scan.addFamily) , it will pull *all* the attributes of that column family. If you only care about a row-count, pick only one very small attribute from the row.
-----Original Message----- From: Wojciech Langiewicz [mailto:[email protected]] Sent: Sunday, May 01, 2011 2:12 PM To: [email protected] Subject: Re: Row count without iterating over ResultScanner? Yes, I was using default caching, setting this value to few thousands made significant difference in performance, I'll experiment more with this option. Right now I want to stay away from MR, mainly because of cluster warm-up time, and I want to get results almost real-time (few seconds max). Thanks for the tip on caching! On 01.05.2011 19:55, Doug Meil wrote: > What caching value are you using on the scan? If you aren't setting this, > it's probably using the default - which is 1. Which is slow. > http://hbase.apache.org/book.html#d379e3504 > > Re: "I would like to use HBase API, not MR job (because this cluster only > has HDFS and HBase installed)." > > For Very Large tables you want to start using an MR job for this. > > > -----Original Message----- > From: Wojciech Langiewicz [mailto:[email protected]] > Sent: Sunday, May 01, 2011 9:44 AM > To: [email protected] > Subject: Row count without iterating over ResultScanner? > > Hi, > I would like to know if there's a way to quickly count number of rows from > scan result? > Right now I'm iterating over ResultScanner like this: > int count = 0; > for (Result rr = scanner.next(); rr != null; rr = scanner.next()) { > ++count; > } > But with number of rows reaching millions this takes a while. > I tried to find something in documentation, but I didn't found anything. > I would like to use HBase API, not MR job (because this cluster only has HDFS > and HBase installed). > > Thanks for all help. > > -- > Wojciech Langiewicz
