This is basically a read bug/performance problem. The execution path followed when the caching is used up is not consistent with the initial execution path/performance. Can anyone help shed light on this? Was there any changes to 0.94 to introduce this (we have not tested on other versions)? Any help or advice would be appreciated. As it is stands we are looking to have to reverse engineer every aspect of a read from both the hbase client and server components to find and fix.
One additional lead is that not all rows behave like this. Only certain small rows seem to do this consistently. Most of our rows are larger and do not have this behavior. Thanks. On Tue, Aug 28, 2012 at 4:35 PM, Jay T <[email protected]> wrote: > We are running Hbase 0.94 with Hadoop 1.0.3. We use Python thrift to talk > to Hbase. > > > We are experiencing strange behavior when scanning specific rows from Hbase > (Caching is always set to 1 in the scanner). Each of these rows are > retrieved in (~12 ms) if they are the startRow of the scanner. However if > they are somewhere in between they take (~ 42 ms) to read. > > > Asumming Row1 Row2 Row3 Row4 ……..Row 10 are the row Keys. > > > Scenario 1: (Scanner starts from Row1) > > ======== > > > Row 1: 12 ms > > Row 2: 42 ms > > Row 3: 42 ms > > … > > … > > … > > > Scenario 2: (Scanner starts from Row2) > > ========= > > Row 2: 12 ms > > Row 3: 42 ms > > Row 4: 42 ms > > > > Scenario 3: (Scanner starts from Row 3) > > =============================== > > > Row 3: 12 ms > > Row 4: 42 ms > > Row 5: 42 ms > > > > You can see that Row 1 and Row 2 and Row 3 each take ~12 ms when they are > the startRow of the scanner. However their performance degrades if they are > part of the 'next" call to scanner (caching = 1). > > This behavior is seen with both Python thrift and with Java API as well. > > When the scan caching is increased to (say 10) then all the rows can be > retrieved in 20 ms. I understand that by setting a higher caching size the > number of RPC calls are reduced. However there seems to be something else > at play. > > I added log statements to ServerCallable.java and HRegionServer.java and > many other files to figure out where the time is lost. > > > *2012-08-24 18:28:43,147 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: CurrentScanResultSize = > 34976* > > *2012-08-24 06:28:43 188 INFO client.ScannerCallable: After client calls > server* > > From the above two log statements I cannot figure out where is 41 ms being > spent (188 - 147). > > Can someone explain to me what is going on after HRegionServer.next > completes executing and the control goes back to ScannerCallable.java > statement: > > rrs = server.next(scannerId, caching); > > > I would greatly appreciate if someone would help me understand where the > time might be getting spent. > > Thanks, > > Jay >
