>From what i've seen, we're mostly idle during scans. On Mon, Mar 9, 2015 at 6:11 PM, Mujtaba Chohan <mujt...@apache.org> wrote:
> During your scan with data on single region server (RS), do you see RS > blocked on disk I/O due to heavy reads or 100% CPU utilized? if that is the > case then having data distributed on 2 RS would effectively cut time in > half. > > On Mon, Mar 9, 2015 at 10:01 AM, Yohan Bismuth <yohan.bismu...@gmail.com> > wrote: > >> Hello, >> we're currently using Phoenix 4.2 with Hbase 0.98.6 from CDH5.3.2 on our >> cluster and we're experiencing some perf issues. >> >> What we need to do is a full table scan over 1 billion rows. We've got 50 >> regionservers and approximatively 1000 regions of 1Gb equally distributed >> on these rs (which means ~20 regions per rs). Each node has 14 disks and 12 >> cores. >> >> A simple "Select count(1) from table" is currently taking 400~500 sec. >> >> We noticed that a range scan over 2 regions located on 2 different rs >> seems to be done in parallel (taking 15~20 sec) but a range scan over 2 >> regions of a single rs is taking twice this time (about 30~40 sec). We >> experience the same result with more than 2 regions. >> >> *Could this mean that parallelization is done at a regionserver level but >> not a region level *? in this case 400~500 seconds seems legit with >> 20~25 regions per rs. We expected regions of a single rs to be scanned in >> parallel, is this a normal behavior or are we doing something wrong ? >> >> Thanks for your help >> > >