RE: Phoenix table scan performance

Brady, John Mon, 09 Mar 2015 10:16:34 -0700

Hi Yohan,

Apologies, I don’t have an answer to your question.


Could I ask a separate question please? Is your cluster on AWS?

I have Apache Phoenix installed on a 5 node cluster with 3 zookeeper nodes on 
AWS. Also using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2.  I put the phoenix 
server and client jars in the hbase class path on all nodes and restarted the 
cluster. The phoenix command line works on the cluster and running a JDBC app 
on the cluster returns data.

The problem is that I can’t run a JDBC app outside the cluster.

I've read that the link below that there is an issue on AWS where internal and 
external IPs get confused and zookeeper can't connect to HBase properly. Did 
you have this problem?

http://stackoverflow.com/questions/28676561/apache-phoenix-jdbc-connection-zookeeper-error

As suggested in the link  I solved this by creating aliases in /etc/hosts on 
the machines in the cluster pointing at internal IP addresses, then on my local 
desktop using the same aliases but pointing to the external IPs. Then, altered 
my cluster setup to use aliases everywhere instead of IP addresses. I could run 
the app on my local machine. But modifying cloud era config files to point to 
aliases on the servers ultimately breaks cloudera and isn’t a viable solution 
long term.

Thanks
John



From: Yohan Bismuth [mailto:yohan.bismu...@gmail.com]
Sent: Monday, March 09, 2015 5:02 PM
To: user@phoenix.apache.org
Subject: Phoenix table scan performance

Hello,
we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our 
cluster and we're experiencing some perf issues.

What we need to do is a full table scan over 1 billion rows. We've got 50 
regionservers and approximatively 1000 regions of 1Gb equally distributed on 
these rs (which means ~20 regions per rs). Each node has 14 disks and 12 cores.

A simple "Select count(1) from table" is currently taking 400~500 sec.

We noticed that a range scan over 2 regions located on 2 different rs seems to 
be done in parallel (taking 15~20 sec) but a range scan over 2 regions of a 
single rs is taking twice this time (about 30~40 sec). We experience the same 
result with more than 2 regions.

Could this mean that parallelization is done at a regionserver level but not a 
region level ? in this case 400~500 seconds seems legit with 20~25 regions per 
rs. We expected regions of a single rs to be scanned in parallel, is this a 
normal behavior or are we doing something wrong ?

Thanks for your help
-------------------------------------------------------------
Intel Ireland Limited (Branch)
Collinstown Industrial Park, Leixlip, County Kildare, Ireland
Registered Number: E902934

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

RE: Phoenix table scan performance

Reply via email to