On Fri, Mar 25, 2016 at 3:50 AM, Ted Yu <[email protected]> wrote: > James: > Another experiment you can do is to enable region replica - HBASE-10070. > > This would bring down the read variance greatly. > > Suggest you NOT do this James.
Lets figure your issue as-is rather than compound by adding yet more moving parts. St.Ack > > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <[email protected]> wrote: > > > > The read path is much more complex than the write one, so the response > time > > has much more variance. > > The gap is so wide here that I would bet on Ted's or Stack's points, but > > here are a few other sources of variance: > > - hbase cache: as Anoop said, may be the data is already in the hbase > cache > > (setCacheBlocks(false), means "don't add blocks to the cache", not "don't > > use the cache") > > - OS cache: and if the data is not in HBase cache may be it is in the > > operating system cache (for example if you run the test multiple times) > > - data locality: if you're lucky the data is local to the region server. > If > > you're not, the reads need an extra network hoop. > > - number of store: more hfiles/stores per region => slower reads. > > - number of versions and so on: sub case of the previous one: if the rows > > have been updated multiple times and the compaction has not ran yet, you > > will read much more data. > > - (another subcase): the data has not been flushed yet and is available > in > > the memstore => fast read. > > > > None of these points has any importance for the the write path. Basically > > the writes variance says nothing about the variance you will get on the > > reads. > > > > IIRC, locality and number of stores are visible in HBase UI. Doing a > table > > flush and then running a major compaction generally helps to stabilize > > response time when you do a test. But it should not explain the x25 > you're > > seeing, there is something else somewhere else. I don't get the > > regionserver boundaries you're mentioning: there is no boundary between > > regionservers. A regionserver can host A->D and M->S while another hosts > > D->M and S->Z for example. > > > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <[email protected]> > wrote: > >> > >> I see you set cacheBlocks to be false on the Scan. By any chance on > >> some other RS(s), the data you are looking for is already in cache? > >> (Any previous scan or by cache on write) And there are no concurrent > >> writes any way right? This much difference in time ! One > >> possibility is blocks avail or not avail in cache.. > >> > >> -Anoop- > >> > >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <[email protected]> wrote: > >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < > >>> [email protected]> wrote: > >>> > >>>> Hello all, > >>>> > >>>> So, I wrote a Java application for HBase that does a partitioned > >> full-table > >>>> scan according to a set number of partitions. For example, if there > are > >> 20 > >>>> partitions specified, then 20 separate full scans are launched that > >> cover > >>>> an equal slice of the row identifier range. > >>>> > >>>> The rows are uniformly distributed throughout the RegionServers. > >>> > >>> > >>> How many RegionServers? How many Regions? Are Regions evenly > distributed > >>> across the servers? If you put all partitions on one machine and then > run > >>> your client, do the timings even out? > >>> > >>> The disparity seems really wide. > >>> > >>> St.Ack > >>> > >>> > >>> > >>> > >>>> I > >>>> confirmed this through the hbase shell. I have only one column family, > >> and > >>>> each row has the same number of column qualifiers. > >>>> > >>>> My problem is that the individual scan performance is wildly > >> inconsistent > >>>> even though they fetch approximately a similar number of rows. This > >>>> inconsistency appears to be random with respect to hosts or > >> regionservers > >>>> or partitions or CPU cores. I am the only user of the fleet and not > >> running > >>>> any other concurrent HBase operation. > >>>> > >>>> I started measuring from the beginning of the scan and stopped > measuring > >>>> after the scan was completed. I am not doing any logic with the > results, > >>>> just scanning them. > >>>> > >>>> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds > to > >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone have > >> any > >>>> insight? By comparison, a similar utility I wrote to upsert to > >>>> regionservers was very consistent in ops/sec and I had no issues with > >> it. > >>>> > >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB > heap, I > >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of > log > >>>> output I saved that used 130 partitions. > >>>> > >>>> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401 > >>>> ops/sec:36358.38150289017 > >>>> total # partitions:130; partition id:100; rows:206890 > elapsed_sec:6.636 > >>>> ops/sec:31176.91380349608 > >>>> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586 > >>>> ops/sec:30772.08014764039 > >>>> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985 > >>>> ops/sec:7051.235410034865 > >>>> total # partitions:130; partition id:19; rows:234192 > elapsed_sec:38.733 > >>>> ops/sec:6046.3170939508955 > >>>> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479 > >>>> ops/sec:4803.316900101075 > >>>> total # partitions:130; partition id:125; rows:205334 > elapsed_sec:41.911 > >>>> ops/sec:4899.286583474505 > >>>> total # partitions:130; partition id:123; rows:206622 > elapsed_sec:42.281 > >>>> ops/sec:4886.875901705258 > >>>> total # partitions:130; partition id:54; rows:232811 > elapsed_sec:49.083 > >>>> ops/sec:4743.210480206996 > >>>> > >>>> I use setCacheBlocks(false), setCaching(5000). Does anyone have any > >>>> insight into how I can make the read performance more consistent? > >>>> > >>>> Thanks! > >> >
