Spinning back around here, it seems my configuration change helped, but hasn't solved the problem. Jobs are no longer dying from RPC timeouts but I still see significant RPC latency spikes associated with SYSTEM.CATALOG. Hopefully I can make time to investigate further next week.
@Arun did you gain any more insight into these symptoms on your side? On Mon, Feb 29, 2016 at 5:03 PM, Nick Dimiduk <[email protected]> wrote: > Is 1000 a good default? >> > > I'm sure it depends a lot on one's workload. > > I added some debug logging around the metaDataCache and and acquisition of > the rowlock. Checking into the one host with excessive RPC call time, I do > indeed see MetaDataEndpointImpl logging cache evictions happening > frequently. Looks like the estimatedSize of the stats for one of my tables > is pushing 11mb and another table is not far behind. I bumped the value > of phoenix.coprocessor.maxMetaDataCacheSize to 100mb, will let that soak > for a couple days. > > Let's get in some extra debug logging folks can enable to see what's going > on in there; there's currently no visibility (stats or logging) around this > cache. Maybe stats would be better? Better still would be a cache that can > dynamically resize to accommodate increasing table (stats) sizes and/or > increasing number of tables. I also wonder if it's worth pinning > SYSTEM.CATALOG and SYSTEM.STATS to the same host, and short-circuiting the > RPC call between them. At the very least let's cache the HBase Connection > object so we're not recreating it with each stats table lookup. > > Thanks, > Nick > > On Wed, Feb 17, 2016 at 9:42 AM, Nick Dimiduk <[email protected]> wrote: >> >>> Thanks for the context Arun. >>> >>> For what it's worth, I greatly increased the batch size (from default >>> 1,000 to 500,000), which i believe reduced contention on the lock and >>> allowed ingest to catch up. >>> >>> On Tue, Feb 16, 2016 at 9:14 PM, Thangamani, Arun < >>> [email protected]> wrote: >>> >>>> Sorry I had pressed Control + Enter a little earlier than I wanted to, >>>> corrections inline. Thanks >>>> >>>> From: "Thangamani, Arun" <[email protected]> >>>> Reply-To: "[email protected]" <[email protected]> >>>> Date: Tuesday, February 16, 2016 at 8:38 PM >>>> To: "[email protected]" <[email protected]> >>>> Cc: James Taylor <[email protected]>, Lars Hofhansl < >>>> [email protected]> >>>> Subject: Re: Write path blocked by MetaDataEndpoint acquiring region >>>> lock >>>> >>>> Hey Nick, >>>> >>>> Looks like you are failing to find your table in meta data cache, if >>>> you don’t find it in the meta data cache, we end up rebuilding the metadata >>>> from both the SYSTEM.CATALOG and SYSTEM.STATS tables. >>>> The rebuilding process for the meta data is a scan on both the tables. >>>> >>>> So, we will end up going to zookeeper to find the region and execute >>>> the scan on the region, it is an expensive operation, that explains the >>>> calls to a specific region server and zookeeper >>>> >>>> Similar thing happens in PHOENIX-2607, but that is specifiically >>>> related to timestamps, if the client timestamp is less than or equal to the >>>> stats timestamp of the table, we will end up doing the above rebuilding >>>> process repeatedly for every batch of insert from PhoenixMapReduceUtil >>>> (Batch size default is 1000 rows). I don’t believe you have a timestamp >>>> issue, but looks like you have the same repeated lookups and related scans >>>> for rebuilding that happen in PHOENIX-2607 >>>> >>>> James has suggested a workaround for the meta data cache refresh using >>>> UPDATE_CACHE_FREQUENCY >>>> variable while defining the table, and it will probably help (we are >>>> trying it out for timestamp issue) >>>> >>>> Hope this helps. >>>> >>>> Thanks >>>> Arun >>>> >>>> Please look at the following lines in MetadataCacheImpl >>>> private PTable doGetTable(byte[] key, long clientTimeStamp, RowLock >>>> rowLock) throws IOException, SQLException { >>>> >>>> ImmutableBytesPtr cacheKey = new ImmutableBytesPtr(key); >>>> Cache<ImmutableBytesPtr, PMetaDataEntity> metaDataCache = >>>> GlobalCache.getInstance(this.env).getMetaDataCache(); >>>> PTable table = (PTable)metaDataCache.getIfPresent(cacheKey); >>>> // We only cache the latest, so we'll end up building the table with >>>> every call if the >>>> // client connection has specified an SCN. >>>> // TODO: If we indicate to the client that we're returning an older >>>> version, but there's a >>>> // newer version available, the client >>>> // can safely not call this, since we only allow modifications to the >>>> latest. >>>> if (table != null && table.getTimeStamp() < clientTimeStamp) { >>>> // Table on client is up-to-date with table on server, so just >>>> return >>>> if (isTableDeleted(table)) { >>>> return null; >>>> } >>>> return table; >>>> } >>>> // Ask Lars about the expense of this call - if we don't take the >>>> lock, we still won't get >>>> // partial results >>>> // get the co-processor environment >>>> // TODO: check that key is within region.getStartKey() and >>>> region.getEndKey() >>>> // and return special code to force client to lookup region from meta. >>>> Region region = env.getRegion(); >>>> /* >>>> * Lock directly on key, though it may be an index table. This will >>>> just prevent a table >>>> * from getting rebuilt too often. >>>> */ >>>> final boolean wasLocked = (rowLock != null); >>>> if (!wasLocked) { >>>> rowLock = region.getRowLock(key, true); >>>> if (rowLock == null) { >>>> throw new IOException("Failed to acquire lock on " + >>>> Bytes.toStringBinary(key)); >>>> } >>>> } >>>> try { >>>> // Try cache again in case we were waiting on a lock >>>> table = (PTable)metaDataCache.getIfPresent(cacheKey); >>>> // We only cache the latest, so we'll end up building the table >>>> with every call if the >>>> // client connection has specified an SCN. >>>> // TODO: If we indicate to the client that we're returning an >>>> older version, but there's >>>> // a newer version available, the client >>>> // can safely not call this, since we only allow modifications to >>>> the latest. >>>> if (table != null && table.getTimeStamp() < clientTimeStamp) { >>>> // Table on client is up-to-date with table on server, so just >>>> return >>>> if (isTableDeleted(table)) { >>>> return null; >>>> } >>>> return table; >>>> } >>>> // Query for the latest table first, since it's not cached >>>> table = buildTable(key, cacheKey, region, >>>> HConstants.LATEST_TIMESTAMP); >>>> if (table != null && table.getTimeStamp() < clientTimeStamp) { >>>> return table; >>>> } >>>> // Otherwise, query for an older version of the table - it won't >>>> be cached >>>> return buildTable(key, cacheKey, region, clientTimeStamp); >>>> } finally { >>>> if (!wasLocked) rowLock.release(); >>>> } >>>> } >>>> >>>> >>>> >>>> >>>> >>>> >>>> From: Nick Dimiduk <[email protected]> >>>> Reply-To: "[email protected]" <[email protected]> >>>> Date: Tuesday, February 16, 2016 at 5:58 PM >>>> To: "[email protected]" <[email protected]> >>>> Cc: James Taylor <[email protected]>, Lars Hofhansl < >>>> [email protected]> >>>> Subject: Write path blocked by MetaDataEndpoint acquiring region lock >>>> >>>> Hello, >>>> >>>> I have a high throughput ingest pipeline that's seised up. My ingest >>>> application ultimately crashes, contains the following stack trace [0]. >>>> >>>> Independently, I noticed that the RPC call time of one of the machines >>>> was significantly higher than others (95pct at multiple seconds vs 10's of >>>> ms). I grabbed the RS log and a couple jstacks from the process. In the >>>> logs I see handler threads creating ZK connections excessively (~50 INFO >>>> lines per second). The jstacks show handler threads parked while taking >>>> region row locks, calling HRegion.getRowLockInternal() via >>>> MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's >>>> in the same MetaDataEndpointImpl area but not under lock appears to be >>>> making an RPC to read the statistics table [2]. >>>> >>>> I believe these two occurrences are related. >>>> >>>> My working theory is that the metaDataCache object is performing poorly >>>> for some reason. This results in excessive meta data lookups, some of which >>>> appear to require making an RPC call while under lock. >>>> >>>> What can I do to relive pressure on this rowlock? Looking at the code >>>> around the lock, this looks like it's populating to a connection-level >>>> cache of schema metadata. The host with the high RPC call time is hosting >>>> the SYSTEM.CATALOG table region. I see some configuration settings that may >>>> be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way >>>> to get debug information about cache size or evictions from this Guava >>>> cache instance. >>>> >>>> I'll be digging into this further, but I appreciate any pointers you >>>> may have. >>>> >>>> BTW, this is Phoenix 4.6.0 + HBase 1.1.2. >>>> >>>> Thanks a lot, >>>> -n >>>> >>>> [0]: client-side stack >>>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted >>>> calling coprocessor service >>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for >>>> row \x00<schema>\x00<user_table> >>>> at >>>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) >>>> at >>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053) >>>> at >>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016) >>>> at >>>> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289) >>>> at >>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446) >>>> at >>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389) >>>> at >>>> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385) >>>> at >>>> org.apache.phoenix.execute.MutationState.validate(MutationState.java:369) >>>> at >>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:417) >>>> at >>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482) >>>> at >>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479) >>>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) >>>> at >>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479) >>>> at >>>> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84) >>>> >>>> [1] rs handlers blocked stack >>>> "B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5 >>>> os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition >>>> [0x00007ff47a607000] >>>> java.lang.Thread.State: TIMED_WAITING (parking) >>>> at sun.misc.Unsafe.park(Native Method) >>>> - parking to wait for <0x0000000708ef59c0> (a >>>> java.util.concurrent.CountDownLatch$Sync) >>>> at >>>> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) >>>> at >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) >>>> at >>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) >>>> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440) >>>> at >>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435) >>>> at >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875) >>>> at >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857) >>>> at >>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) >>>> at >>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) >>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> [2]: rs handler stats rpc stack >>>> "B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5 >>>> os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait() >>>> [0x00007ff47c323000] >>>> java.lang.Thread.State: TIMED_WAITING (on object monitor) >>>> at java.lang.Object.wait(Native Method) >>>> at java.lang.Object.wait(Object.java:460) >>>> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348) >>>> at >>>> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155) >>>> - locked <0x00000007a3f5e030> (a >>>> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;) >>>> at >>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168) >>>> at >>>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59) >>>> at >>>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) >>>> at >>>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) >>>> at >>>> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) >>>> at >>>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) >>>> at >>>> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365) >>>> at >>>> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440) >>>> at >>>> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609) >>>> at >>>> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435) >>>> at >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875) >>>> at >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857) >>>> at >>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209) >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) >>>> at >>>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) >>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> ------------------------------ >>>> This message and any attachments are intended only for the use of the >>>> addressee and may contain information that is privileged and confidential. >>>> If the reader of the message is not the intended recipient or an authorized >>>> representative of the intended recipient, you are hereby notified that any >>>> dissemination of this communication is strictly prohibited. If you have >>>> received this communication in error, notify the sender immediately by >>>> return email and delete the message and any attachments from your system. >>>> >>> >>> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> > >
