On Thu, Jan 19, 2012 at 12:57 AM, T Vinod Gupta <[email protected]>wrote:
> The stack traces are from the client. > hbase version - 0.90.3-cdh3u1, r, Mon Jul 18 08:23:50 PDT 2011 > > java version "1.6.0_20" > OpenJDK Runtime Environment (IcedTea6 1.9.8) > (amazon-52.1.9.8.36.amzn1-x86_64) > OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode) > The stack traces never move on from their current locations no matter how often you thread dump? Have you tried a more recent Oracle JVM? St.Ack > > So I'll tell you the symptoms.. My writes are pretty smooth and don't face > problems.. this datapath typically involves row gets (with no filters and > such) and row puts and flushes in multiple threads (about 11 threads per > process and i have 2 such processes). Its the bursts in my read codepath > that causes these symptoms. In read codepath, I am doing scans (with > start/end row and regex qualifier filter) and row/column gets in multiple > threads (about 11 threads per process, 2 such processes). whenever there is > a burst, i see this happening. other accompanying events are region server > shutting down since the master thinks it is offline. sometimes, around > these events sequence, i see long gc pauses (5-40 sec range). im following > the gc recommendations of the hbase book. there are 33 regions currently, > max region file size set to 2GB, 3GB heap given to RS and master each. > > thanks > > On Wed, Jan 18, 2012 at 8:55 PM, Stack <[email protected]> wrote: > > > On Wed, Jan 18, 2012 at 10:37 AM, T Vinod Gupta <[email protected]> > > wrote: > > > Has anyone seen this kind of behavior? i am wondering what is > triggering > > it > > > and how can i resolve it. > > > I have a multithreaded app which is reading stuff out of hbase. i > > recently > > > made a change to switch from using Get for each row I am interested to > > > using Scan with row key ranges.. What i see this the process will > pause > > > forever. and starts taking up a lot of cpu (100% on a 4 core machine - > so > > > effectively 25%). all stacks look like this - > > > > > > > In the client or over on the regionserver (below seems to be from > > client)? What version of hbase? What version of jvm? > > St.Ack > > > > > "pool-1-thread-13" prio=10 tid=0x00007fa094013000 nid=0x56b7 waiting > for > > > monitor entry [0x00007fa0e4c56000] > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > at java.util.TreeMap.put(TreeMap.java:571) > > > at > > > org.apache.hadoop.hbase.client.Result.getNoVersionMap(Result.java:360) > > > > > > "pool-1-thread-11" prio=10 tid=0x00007fa0ec284000 nid=0x1c7 in > > > Object.wait() [0x00007fa0e5969000] > > > java.lang.Thread.State: WAITING (on object monitor) > > > at java.lang.Object.wait(Native Method) > > > - waiting on <0x00000000f026f3b0> (a > > > org.apache.hadoop.hbase.ipc.HBaseClient$Call) > > > at java.lang.Object.wait(Object.java:502) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) > > > - locked <0x00000000f026f3b0> (a > > > org.apache.hadoop.hbase.ipc.HBaseClient$Call) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257) > > > at $Proxy4.get(Unknown Source) > > > at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:549) > > > at org.apache.hadoop.hbase.client.HTable$4.call(HTable.java:547) > > > at > > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1000) > > > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:546) > > > > > > "pool-1-thread-10" prio=10 tid=0x00007fa0ec282000 nid=0x1c6 waiting for > > > monitor entry [0x00007fa0e5a6a000] > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > at org.apache.hadoop.hbase.KeyValue.split(KeyValue.java:1048) > > > at org.apache.hadoop.hbase.client.Result.getMap(Result.java:309) > > > at > > > org.apache.hadoop.hbase.client.Result.getNoVersionMap(Result.java:345) > > > > > > "pool-1-thread-6" prio=10 tid=0x00007fa0ec21f000 nid=0x1c2 in > > Object.wait() > > > [0x0 > > > 0007fa0e5e6d000] > > > java.lang.Thread.State: WAITING (on object monitor) > > > at java.lang.Object.wait(Native Method) > > > - waiting on <0x00000000f0766b10> (a > > > org.apache.hadoop.hbase.ipc.HBaseCl > > > ient$Call) > > > at java.lang.Object.wait(Object.java:502) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:757) > > > - locked <0x00000000f0766b10> (a > > > org.apache.hadoop.hbase.ipc.HBaseClient > > > $Call) > > > at > > > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257 > > > ) > > > at $Proxy4.next(Unknown Source) > > > at > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.j > > > ava:79) > > > at > > > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.j > > > ava:38) > > > at > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplemen > > > tation.getRegionServerWithRetries(HConnectionManager.java:1000) > > > at > > > > > > org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1100) > > > at > > > > > > org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1210) > > > > > > > > > The only thing I observed in the RS logs is there were some compactions > > > going on. And the GC logs did show some pauses as high as 5sec. but it > is > > > past that state but the app still doesn't make progress.. any insights? > > > > > > thanks > > >
