James: Another experiment you can do is to enable region replica - HBASE-10070.
This would bring down the read variance greatly. > On Mar 25, 2016, at 2:41 AM, Nicolas Liochon <[email protected]> wrote: > > The read path is much more complex than the write one, so the response time > has much more variance. > The gap is so wide here that I would bet on Ted's or Stack's points, but > here are a few other sources of variance: > - hbase cache: as Anoop said, may be the data is already in the hbase cache > (setCacheBlocks(false), means "don't add blocks to the cache", not "don't > use the cache") > - OS cache: and if the data is not in HBase cache may be it is in the > operating system cache (for example if you run the test multiple times) > - data locality: if you're lucky the data is local to the region server. If > you're not, the reads need an extra network hoop. > - number of store: more hfiles/stores per region => slower reads. > - number of versions and so on: sub case of the previous one: if the rows > have been updated multiple times and the compaction has not ran yet, you > will read much more data. > - (another subcase): the data has not been flushed yet and is available in > the memstore => fast read. > > None of these points has any importance for the the write path. Basically > the writes variance says nothing about the variance you will get on the > reads. > > IIRC, locality and number of stores are visible in HBase UI. Doing a table > flush and then running a major compaction generally helps to stabilize > response time when you do a test. But it should not explain the x25 you're > seeing, there is something else somewhere else. I don't get the > regionserver boundaries you're mentioning: there is no boundary between > regionservers. A regionserver can host A->D and M->S while another hosts > D->M and S->Z for example. > >> On Fri, Mar 25, 2016 at 6:51 AM, Anoop John <[email protected]> wrote: >> >> I see you set cacheBlocks to be false on the Scan. By any chance on >> some other RS(s), the data you are looking for is already in cache? >> (Any previous scan or by cache on write) And there are no concurrent >> writes any way right? This much difference in time ! One >> possibility is blocks avail or not avail in cache.. >> >> -Anoop- >> >>> On Fri, Mar 25, 2016 at 11:04 AM, Stack <[email protected]> wrote: >>> On Thu, Mar 24, 2016 at 4:45 PM, James Johansville < >>> [email protected]> wrote: >>> >>>> Hello all, >>>> >>>> So, I wrote a Java application for HBase that does a partitioned >> full-table >>>> scan according to a set number of partitions. For example, if there are >> 20 >>>> partitions specified, then 20 separate full scans are launched that >> cover >>>> an equal slice of the row identifier range. >>>> >>>> The rows are uniformly distributed throughout the RegionServers. >>> >>> >>> How many RegionServers? How many Regions? Are Regions evenly distributed >>> across the servers? If you put all partitions on one machine and then run >>> your client, do the timings even out? >>> >>> The disparity seems really wide. >>> >>> St.Ack >>> >>> >>> >>> >>>> I >>>> confirmed this through the hbase shell. I have only one column family, >> and >>>> each row has the same number of column qualifiers. >>>> >>>> My problem is that the individual scan performance is wildly >> inconsistent >>>> even though they fetch approximately a similar number of rows. This >>>> inconsistency appears to be random with respect to hosts or >> regionservers >>>> or partitions or CPU cores. I am the only user of the fleet and not >> running >>>> any other concurrent HBase operation. >>>> >>>> I started measuring from the beginning of the scan and stopped measuring >>>> after the scan was completed. I am not doing any logic with the results, >>>> just scanning them. >>>> >>>> For ~230K rows fetched per scan, I am getting anywhere from 4 seconds to >>>> 100+ seconds. This seems a little too bouncy for me. Does anyone have >> any >>>> insight? By comparison, a similar utility I wrote to upsert to >>>> regionservers was very consistent in ops/sec and I had no issues with >> it. >>>> >>>> Using 13 partitions on a machine that has 32 CPU cores and 16 GB heap, I >>>> see anywhere between 3K ops/sec to 82K ops/sec. Here's an example of log >>>> output I saved that used 130 partitions. >>>> >>>> total # partitions:130; partition id:47; rows:232730 elapsed_sec:6.401 >>>> ops/sec:36358.38150289017 >>>> total # partitions:130; partition id:100; rows:206890 elapsed_sec:6.636 >>>> ops/sec:31176.91380349608 >>>> total # partitions:130; partition id:63; rows:233437 elapsed_sec:7.586 >>>> ops/sec:30772.08014764039 >>>> total # partitions:130; partition id:9; rows:232585 elapsed_sec:32.985 >>>> ops/sec:7051.235410034865 >>>> total # partitions:130; partition id:19; rows:234192 elapsed_sec:38.733 >>>> ops/sec:6046.3170939508955 >>>> total # partitions:130; partition id:1; rows:232860 elapsed_sec:48.479 >>>> ops/sec:4803.316900101075 >>>> total # partitions:130; partition id:125; rows:205334 elapsed_sec:41.911 >>>> ops/sec:4899.286583474505 >>>> total # partitions:130; partition id:123; rows:206622 elapsed_sec:42.281 >>>> ops/sec:4886.875901705258 >>>> total # partitions:130; partition id:54; rows:232811 elapsed_sec:49.083 >>>> ops/sec:4743.210480206996 >>>> >>>> I use setCacheBlocks(false), setCaching(5000). Does anyone have any >>>> insight into how I can make the read performance more consistent? >>>> >>>> Thanks! >>
