Thanks, James!

Can you point me to some instructions or some syntax for setting those timeout 
values in Java code?

I’ve seen lots of information about setting them on the classpath when 
launching different query clients. But how do I set a high timeout so that I 
can do a large query via Java/JDBC?

Thanks!

From: James Taylor [mailto:[email protected]]
Sent: Friday, June 26, 2015 2:14 PM
To: [email protected]
Subject: Re: How to count table rows from Java?

Zach,
I wouldn't at all say that doing a count(*) is not recommended. It's important 
to know that 1) this requires a full table scan and 2) this is done by Phoenix 
asynchronously. You'll need to set the timeouts high enough for this to 
complete. Phoenix will be much faster than running a MR job, but the MR job 
runs asynchronously.

To ensure your stats are up to date, run a major compaction on your table as 
that updates the stats. Also, if you have wide rows, consider using multiple 
column families so that the count(*) doesn't have to traverse all of your data.

Thanks,
James

On Friday, June 26, 2015, Nick Dimiduk 
<[email protected]<mailto:[email protected]>> wrote:
RowCounter is s mapreduce program. After the program completes execution of the 
job, it returns information about that job, including job counters. RowCounter 
includes its counts in the job counters, so they're easily accessed 
programmatically from the returned object. It's not a ResultSet, but it should 
work none the less.

On Friday, June 26, 2015, Riesland, Zack 
<[email protected]<javascript:_e(%7B%7D,'cvml','[email protected]');>>
 wrote:
I wrote a Java program that runs nightly and collects metrics about our hive 
tables.

I would like to include HBase tables in this as well.

Since select count(*) is slow and not recommended on Phoenix, what are my 
alternatives from Java?

Is there a way to call org.apache.hadoop.hbase.mapreduce.RowCounter from java 
and get results in some kind of result set?

Thanks for any info!


Reply via email to