Thanks for the context Arun. For what it's worth, I greatly increased the batch size (from default 1,000 to 500,000), which i believe reduced contention on the lock and allowed ingest to catch up.
On Tue, Feb 16, 2016 at 9:14 PM, Thangamani, Arun <[email protected]> wrote: > Sorry I had pressed Control + Enter a little earlier than I wanted to, > corrections inline. Thanks > > From: "Thangamani, Arun" <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, February 16, 2016 at 8:38 PM > To: "[email protected]" <[email protected]> > Cc: James Taylor <[email protected]>, Lars Hofhansl <[email protected] > > > Subject: Re: Write path blocked by MetaDataEndpoint acquiring region lock > > Hey Nick, > > Looks like you are failing to find your table in meta data cache, if you > don’t find it in the meta data cache, we end up rebuilding the metadata > from both the SYSTEM.CATALOG and SYSTEM.STATS tables. > The rebuilding process for the meta data is a scan on both the tables. > > So, we will end up going to zookeeper to find the region and execute the > scan on the region, it is an expensive operation, that explains the calls > to a specific region server and zookeeper > > Similar thing happens in PHOENIX-2607, but that is specifiically related > to timestamps, if the client timestamp is less than or equal to the stats > timestamp of the table, we will end up doing the above rebuilding process > repeatedly for every batch of insert from PhoenixMapReduceUtil (Batch size > default is 1000 rows). I don’t believe you have a timestamp issue, but > looks like you have the same repeated lookups and related scans for > rebuilding that happen in PHOENIX-2607 > > James has suggested a workaround for the meta data cache refresh using > UPDATE_CACHE_FREQUENCY > variable while defining the table, and it will probably help (we are > trying it out for timestamp issue) > > Hope this helps. > > Thanks > Arun > > Please look at the following lines in MetadataCacheImpl > private PTable doGetTable(byte[] key, long clientTimeStamp, RowLock > rowLock) throws IOException, SQLException { > > ImmutableBytesPtr cacheKey = new ImmutableBytesPtr(key); > Cache<ImmutableBytesPtr, PMetaDataEntity> metaDataCache = > GlobalCache.getInstance(this.env).getMetaDataCache(); > PTable table = (PTable)metaDataCache.getIfPresent(cacheKey); > // We only cache the latest, so we'll end up building the table with > every call if the > // client connection has specified an SCN. > // TODO: If we indicate to the client that we're returning an older > version, but there's a > // newer version available, the client > // can safely not call this, since we only allow modifications to the > latest. > if (table != null && table.getTimeStamp() < clientTimeStamp) { > // Table on client is up-to-date with table on server, so just return > if (isTableDeleted(table)) { > return null; > } > return table; > } > // Ask Lars about the expense of this call - if we don't take the lock, > we still won't get > // partial results > // get the co-processor environment > // TODO: check that key is within region.getStartKey() and > region.getEndKey() > // and return special code to force client to lookup region from meta. > Region region = env.getRegion(); > /* > * Lock directly on key, though it may be an index table. This will just > prevent a table > * from getting rebuilt too often. > */ > final boolean wasLocked = (rowLock != null); > if (!wasLocked) { > rowLock = region.getRowLock(key, true); > if (rowLock == null) { > throw new IOException("Failed to acquire lock on " + > Bytes.toStringBinary(key)); > } > } > try { > // Try cache again in case we were waiting on a lock > table = (PTable)metaDataCache.getIfPresent(cacheKey); > // We only cache the latest, so we'll end up building the table with > every call if the > // client connection has specified an SCN. > // TODO: If we indicate to the client that we're returning an older > version, but there's > // a newer version available, the client > // can safely not call this, since we only allow modifications to the > latest. > if (table != null && table.getTimeStamp() < clientTimeStamp) { > // Table on client is up-to-date with table on server, so just > return > if (isTableDeleted(table)) { > return null; > } > return table; > } > // Query for the latest table first, since it's not cached > table = buildTable(key, cacheKey, region, > HConstants.LATEST_TIMESTAMP); > if (table != null && table.getTimeStamp() < clientTimeStamp) { > return table; > } > // Otherwise, query for an older version of the table - it won't be > cached > return buildTable(key, cacheKey, region, clientTimeStamp); > } finally { > if (!wasLocked) rowLock.release(); > } > } > > > > > > > From: Nick Dimiduk <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, February 16, 2016 at 5:58 PM > To: "[email protected]" <[email protected]> > Cc: James Taylor <[email protected]>, Lars Hofhansl <[email protected] > > > Subject: Write path blocked by MetaDataEndpoint acquiring region lock > > Hello, > > I have a high throughput ingest pipeline that's seised up. My ingest > application ultimately crashes, contains the following stack trace [0]. > > Independently, I noticed that the RPC call time of one of the machines was > significantly higher than others (95pct at multiple seconds vs 10's of ms). > I grabbed the RS log and a couple jstacks from the process. In the logs I > see handler threads creating ZK connections excessively (~50 INFO lines per > second). The jstacks show handler threads parked while taking region row > locks, calling HRegion.getRowLockInternal() via > MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's > in the same MetaDataEndpointImpl area but not under lock appears to be > making an RPC to read the statistics table [2]. > > I believe these two occurrences are related. > > My working theory is that the metaDataCache object is performing poorly > for some reason. This results in excessive meta data lookups, some of which > appear to require making an RPC call while under lock. > > What can I do to relive pressure on this rowlock? Looking at the code > around the lock, this looks like it's populating to a connection-level > cache of schema metadata. The host with the high RPC call time is hosting > the SYSTEM.CATALOG table region. I see some configuration settings that may > be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way > to get debug information about cache size or evictions from this Guava > cache instance. > > I'll be digging into this further, but I appreciate any pointers you may > have. > > BTW, this is Phoenix 4.6.0 + HBase 1.1.2. > > Thanks a lot, > -n > > [0]: client-side stack > Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted > calling coprocessor service > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for > row \x00<schema>\x00<user_table> > at > org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016) > at > org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389) > at > org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385) > at > org.apache.phoenix.execute.MutationState.validate(MutationState.java:369) > at > org.apache.phoenix.execute.MutationState.commit(MutationState.java:417) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479) > at > org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84) > > [1] rs handlers blocked stack > "B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5 > os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition > [0x00007ff47a607000] > java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x0000000708ef59c0> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047) > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440) > at > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609) > at > org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > > [2]: rs handler stats rpc stack > "B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5 > os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait() > [0x00007ff47c323000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:460) > at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155) > - locked <0x00000007a3f5e030> (a > [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) > at > org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440) > at > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609) > at > org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) > at > org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > > ------------------------------ > This message and any attachments are intended only for the use of the > addressee and may contain information that is privileged and confidential. > If the reader of the message is not the intended recipient or an authorized > representative of the intended recipient, you are hereby notified that any > dissemination of this communication is strictly prohibited. If you have > received this communication in error, notify the sender immediately by > return email and delete the message and any attachments from your system. >
