I've been facing this issue for a long time, so i'm pretty sure a major compaction already occured. Running your query returns 27006.
I have run update statistics on my table, this didn't solve my problem. But if i understand well, these guideposts are used to parallelize scan over a region, not between regions of a same regionserver, aren't they ? On Mon, Mar 9, 2015 at 6:45 PM, James Taylor <jamestay...@apache.org> wrote: > Hi Yohan, > Have you done a major compaction on your table and are stats generated > for your table? You can run this to confirm: > SELECT sum(guide_posts_count) from SYSTEM.STATS where > physical_name=<your full table name>; > > Phoenix does intra-region parallelization based on these guideposts as > described briefly here: > http://phoenix.apache.org/update_statistics.html > > Thanks, > James > > On Mon, Mar 9, 2015 at 10:35 AM, Jerry <chiling...@gmail.com> wrote: > > Hi Yohan, > > > > I think your observation is correct. A scan in hbase is sequential by > > default unless you use something like HBASE-10502. > > > > Best Regards, > > > > Jerry > > > > Sent from my iPad > > > > On Mar 9, 2015, at 1:01 PM, Yohan Bismuth <yohan.bismu...@gmail.com> > wrote: > > > > Hello, > > we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our > > cluster and we're experiencing some perf issues. > > > > What we need to do is a full table scan over 1 billion rows. We've got 50 > > regionservers and approximatively 1000 regions of 1Gb equally > distributed on > > these rs (which means ~20 regions per rs). Each node has 14 disks and 12 > > cores. > > > > A simple "Select count(1) from table" is currently taking 400~500 sec. > > > > We noticed that a range scan over 2 regions located on 2 different rs > seems > > to be done in parallel (taking 15~20 sec) but a range scan over 2 > regions of > > a single rs is taking twice this time (about 30~40 sec). We experience > the > > same result with more than 2 regions. > > > > Could this mean that parallelization is done at a regionserver level but > not > > a region level ? in this case 400~500 seconds seems legit with 20~25 > regions > > per rs. We expected regions of a single rs to be scanned in parallel, is > > this a normal behavior or are we doing something wrong ? > > > > Thanks for your help >