Hi, We are using Phoenix 4 and Hbase 0.98. We have just started loading HBase with 300 million records per hour via Phoenix MR Bulk loader. After around 8-9 hours (with 2.5 billion records in HBase) we have found that CPU utilization on all of our region servers is almost 100% making cluster unusable. All of our queries are now hanged and we are not getting any response. I took thread dump for one of the region server (attached) and found following w.r.t phoenix. It also seems that similar issue has been raised by someone else https://issues.apache.org/jira/browse/PHOENIX-1081 . Not sure if it is exactly same or not. But looks similar. Can you please look into this and let us know the cause for same.
Thread 73953: (state = IN_JAVA) - org.apache.phoenix.filter.SkipScanFilter.navigate(byte[], int, int, org.apache.phoenix.filter.SkipScanFilter$Terminate) @bci=630, line=341 (Compiled frame; information may be imprecise) - org.apache.phoenix.filter.SkipScanFilter.filterKeyValue(org.apache.hadoop.hbase.Cell) @bci=22, line=116 (Compiled frame) - org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(org.apache.hadoop.hbase.KeyValue) @bci=594, line=392 (Compiled frame) - org.apache.hadoop.hbase.regionserver.StoreScanner.next(java.util.List, int) @bci=240, line=469 (Compiled frame) - org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(java.util.List, int) @bci=20, line=140 (Compiled frame) - org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(java.util.List, org.apache.hadoop.hbase.regionserver.KeyValueHeap, int, byte[], int, short) @bci=10, line=3848 (Compiled frame) - org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(java.util.List, int) @bci=253, line=3928 (Compiled frame) - org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List, int) @bci=12, line=3796 (Compiled frame) - org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(java.util.List) @bci=6, line=3787 (Compiled frame) - org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner, java.util.List, org.apache.phoenix.expression.aggregator.ServerAggregators, long) @bci=230, line=386 (Compiled frame) - org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=184, line=133 (Interpreted frame) - org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(org.apache.hadoop.hbase.coprocessor.ObserverContext, org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=4, line=66 (Interpreted frame) - org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(org.apache.hadoop.hbase.client.Scan, org.apache.hadoop.hbase.regionserver.RegionScanner) @bci=86, line=1663 (Compiled frame) - org.apache.hadoop.hbase.regionserver.HRegionServer.scan(com.google.protobuf.RpcController, org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest) @bci=459, line=3093 (Compiled frame) - org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.RpcController, com.google.protobuf.Message) @bci=103, line=28861 (Interpreted frame) - org.apache.hadoop.hbase.ipc.RpcServer.call(com.google.protobuf.BlockingService, com.google.protobuf.Descriptors$MethodDescriptor, com.google.protobuf.Message, org.apache.hadoop.hbase.CellScanner, long, org.apache.hadoop.hbase.monitoring.MonitoredRPCHandler) @bci=59, line=2008 (Interpreted frame) - org.apache.hadoop.hbase.ipc.CallRunner.run() @bci=257, line=92 (Interpreted frame) - org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(java.util.concurrent.BlockingQueue) @bci=18, line=160 (Interpreted frame) - org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(org.apache.hadoop.hbase.ipc.SimpleRpcScheduler, java.util.concurrent.BlockingQueue) @bci=2, line=38 (Interpreted frame) - org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run() @bci=8, line=110 (Interpreted frame) - java.lang.Thread.run() @bci=11, line=744 (Interpreted frame) Thanks and Regards, Gagan Agrawal
