Re: Row count without iterating over ResultScanner?

Wojciech Langiewicz Sun, 01 May 2011 11:51:38 -0700

Thanks, that's great. But I firstly I have to update HBase and read somedocumentation, so I'll let you know in a while how that works for me.


On 01.05.2011 20:42, Himanshu Vashishtha wrote:

Yes, you can define your scan object at the client side and pass to the
AggregateClient.rowCount. You can refer to AggregateClient javadoc and
associated TestAggregateProtocol test methods to get an idea.


Thanks,
Himanshu

On Sun, May 1, 2011 at 12:29 PM, Wojciech Langiewicz
<[email protected]>wrote:

Hi,

On 01.05.2011 20:03, Himanshu Vashishtha wrote:

If you are interested row count only (and not want to fetch the table rows
to your client side), you can also try out
https://issues.apache.org/jira/browse/HBASE-1512.


Yes, I only want to count rows and apply filters or select columns.
Are filters also supported to work with those aggregate functions?


  PS: Which version you are on? The above patch is in main trunk as of now,

so
to use it you would have to checkout the code and build it.


I'm using version from CDH3, so it is: 0.90.1-cdh3u0, but I'm not bound to
this version.

Coprocessors with aggregate functions seem to be the thing I need. Thanks!
--
Wojciech Langiewicz


  Thanks,

Himanshu


On Sun, May 1, 2011 at 11:55 AM, Doug Meil<[email protected]

wrote:


  What caching value are you using on the scan?  If you aren't setting

this,
it's probably using the default - which is 1.  Which is slow.
http://hbase.apache.org/book.html#d379e3504

Re:  "I would like to use HBase API, not MR job (because this cluster
only
has HDFS and HBase installed)."

For Very Large tables you want to start using an MR job for this.


-----Original Message-----
From: Wojciech Langiewicz [mailto:[email protected]]
Sent: Sunday, May 01, 2011 9:44 AM
To: [email protected]
Subject: Row count without iterating over ResultScanner?

Hi,
I would like to know if there's a way to quickly count number of rows
from
scan result?
Right now I'm iterating over ResultScanner like this:
int count = 0;
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
        ++count;
}
But with number of rows reaching millions this takes a while.
I tried to find something in documentation, but I didn't found anything.
I would like to use HBase API, not MR job (because this cluster only has
HDFS and HBase installed).

Thanks for all help.

--
Wojciech Langiewicz

Re: Row count without iterating over ResultScanner?

Reply via email to