Hi Yohan, I think your observation is correct. A scan in hbase is sequential by default unless you use something like HBASE-10502.
Best Regards, Jerry Sent from my iPad > On Mar 9, 2015, at 1:01 PM, Yohan Bismuth <yohan.bismu...@gmail.com> wrote: > > Hello, > we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our > cluster and we're experiencing some perf issues. > > What we need to do is a full table scan over 1 billion rows. We've got 50 > regionservers and approximatively 1000 regions of 1Gb equally distributed on > these rs (which means ~20 regions per rs). Each node has 14 disks and 12 > cores. > > A simple "Select count(1) from table" is currently taking 400~500 sec. > > We noticed that a range scan over 2 regions located on 2 different rs seems > to be done in parallel (taking 15~20 sec) but a range scan over 2 regions of > a single rs is taking twice this time (about 30~40 sec). We experience the > same result with more than 2 regions. > > Could this mean that parallelization is done at a regionserver level but not > a region level ? in this case 400~500 seconds seems legit with 20~25 regions > per rs. We expected regions of a single rs to be scanned in parallel, is this > a normal behavior or are we doing something wrong ? > > Thanks for your help