I assume the partitions' boundaries don't align with region boundaries,
right ?

Meaning some partitions would cross region boundaries.

Which hbase release do you use ?

Thanks

On Thu, Mar 24, 2016 at 4:45 PM, James Johansville <
[email protected]> wrote:

> Hello all,
>
> So, I wrote a Java application for HBase that does a partitioned full-table
> scan according to a set number of partitions. For example, if there are 20
> partitions specified, then 20 separate full scans are launched that cover
> an equal slice of the row identifier range.
>
> The rows are uniformly distributed throughout the RegionServers. I
> confirmed this through the hbase shell. I have only one column family, and
> each row has the same number of column qualifiers.
>
> My problem is that the individual scan performance is wildly inconsistent
> even though they fetch approximately a similar number of rows. This
> inconsistency appears to be random with respect to hosts or regionservers
> or partitions or CPU cores. I am the only user of the fleet and not running
> any other concurrent HBase operation.
>
> I started measuring from the beginning of the scan and stopped measuring
> after the scan was completed. I am not doing any logic with the results,
> just scanning them.
>
> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to
> 100+ seconds. This seems a little too bouncy for me. Does anyone have any
> insight? By comparison, a similar utility I wrote to upsert to
> regionservers was very consistent in ops/sec and I had no issues with it.
>
> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I
> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log
> output I saved that used 130 partitions.
>
> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401
> ops/sec:36358.38150289017
> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636
> ops/sec:31176.91380349608
> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586
> ops/sec:30772.08014764039
> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985
> ops/sec:7051.235410034865
> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733
> ops/sec:6046.3170939508955
> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479
> ops/sec:4803.316900101075
> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911
> ops/sec:4899.286583474505
> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281
> ops/sec:4886.875901705258
> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083
> ops/sec:4743.210480206996
>
> I use setCacheBlocks(false), setCaching(5000).  Does anyone have any
> insight into how I can make the read performance more consistent?
>
> Thanks!
>

Reply via email to