Re: Write path blocked by MetaDataEndpoint acquiring region lock

Nick Dimiduk Wed, 17 Feb 2016 09:58:06 -0800

Thanks for the context Arun.

For what it's worth, I greatly increased the batch size (from default 1,000
to 500,000), which i believe reduced contention on the lock and allowed
ingest to catch up.


On Tue, Feb 16, 2016 at 9:14 PM, Thangamani, Arun <[email protected]>
wrote:

> Sorry I had pressed Control + Enter a little earlier than I wanted to,
> corrections inline. Thanks
>
> From: "Thangamani, Arun" <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Tuesday, February 16, 2016 at 8:38 PM
> To: "[email protected]" <[email protected]>
> Cc: James Taylor <[email protected]>, Lars Hofhansl <[email protected]
> >
> Subject: Re: Write path blocked by MetaDataEndpoint acquiring region lock
>
> Hey Nick,
>
> Looks like you are failing to find your table in meta data cache, if you
> don’t find it in the meta data cache, we end up rebuilding the metadata
> from both the SYSTEM.CATALOG and SYSTEM.STATS tables.
> The rebuilding process for the meta data is a scan on both the tables.
>
> So, we will end up going to zookeeper to find the region and execute the
> scan on the region, it is an expensive operation, that explains the calls
> to a specific region server and zookeeper
>
> Similar thing happens in PHOENIX-2607, but that is specifiically related
> to timestamps, if the client timestamp is less than or equal to the stats
> timestamp of the table, we will end up doing the above rebuilding process
> repeatedly for every batch of insert from PhoenixMapReduceUtil (Batch size
> default is 1000 rows). I don’t believe you have a timestamp issue, but
> looks like you have the same repeated lookups and related scans for
> rebuilding that happen in PHOENIX-2607
>
> James has suggested a workaround for the meta data cache refresh using 
> UPDATE_CACHE_FREQUENCY
> variable while defining the table, and it will probably help  (we are
> trying it out for timestamp issue)
>
> Hope this helps.
>
> Thanks
> Arun
>
> Please look at the following lines in MetadataCacheImpl
> private PTable doGetTable(byte[] key, long clientTimeStamp, RowLock
> rowLock) throws IOException, SQLException {
>
>     ImmutableBytesPtr cacheKey = new ImmutableBytesPtr(key);
>     Cache<ImmutableBytesPtr, PMetaDataEntity> metaDataCache =
>             GlobalCache.getInstance(this.env).getMetaDataCache();
>     PTable table = (PTable)metaDataCache.getIfPresent(cacheKey);
>     // We only cache the latest, so we'll end up building the table with 
> every call if the
>     // client connection has specified an SCN.
>     // TODO: If we indicate to the client that we're returning an older 
> version, but there's a
>     // newer version available, the client
>     // can safely not call this, since we only allow modifications to the 
> latest.
>     if (table != null && table.getTimeStamp() < clientTimeStamp) {
>         // Table on client is up-to-date with table on server, so just return
>         if (isTableDeleted(table)) {
>             return null;
>         }
>         return table;
>     }
>     // Ask Lars about the expense of this call - if we don't take the lock, 
> we still won't get
>     // partial results
>     // get the co-processor environment
>     // TODO: check that key is within region.getStartKey() and 
> region.getEndKey()
>     // and return special code to force client to lookup region from meta.
>     Region region = env.getRegion();
>     /*
>      * Lock directly on key, though it may be an index table. This will just 
> prevent a table
>      * from getting rebuilt too often.
>      */
>     final boolean wasLocked = (rowLock != null);
>     if (!wasLocked) {
>         rowLock = region.getRowLock(key, true);
>         if (rowLock == null) {
>             throw new IOException("Failed to acquire lock on " + 
> Bytes.toStringBinary(key));
>         }
>     }
>     try {
>         // Try cache again in case we were waiting on a lock
>         table = (PTable)metaDataCache.getIfPresent(cacheKey);
>         // We only cache the latest, so we'll end up building the table with 
> every call if the
>         // client connection has specified an SCN.
>         // TODO: If we indicate to the client that we're returning an older 
> version, but there's
>         // a newer version available, the client
>         // can safely not call this, since we only allow modifications to the 
> latest.
>         if (table != null && table.getTimeStamp() < clientTimeStamp) {
>             // Table on client is up-to-date with table on server, so just 
> return
>             if (isTableDeleted(table)) {
>                 return null;
>             }
>             return table;
>         }
>         // Query for the latest table first, since it's not cached
>         table = buildTable(key, cacheKey, region, 
> HConstants.LATEST_TIMESTAMP);
>         if (table != null && table.getTimeStamp() < clientTimeStamp) {
>             return table;
>         }
>         // Otherwise, query for an older version of the table - it won't be 
> cached
>         return buildTable(key, cacheKey, region, clientTimeStamp);
>     } finally {
>         if (!wasLocked) rowLock.release();
>     }
> }
>
>
>
>
>
>
> From: Nick Dimiduk <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Tuesday, February 16, 2016 at 5:58 PM
> To: "[email protected]" <[email protected]>
> Cc: James Taylor <[email protected]>, Lars Hofhansl <[email protected]
> >
> Subject: Write path blocked by MetaDataEndpoint acquiring region lock
>
> Hello,
>
> I have a high throughput ingest pipeline that's seised up. My ingest
> application ultimately crashes, contains the following stack trace [0].
>
> Independently, I noticed that the RPC call time of one of the machines was
> significantly higher than others (95pct at multiple seconds vs 10's of ms).
> I grabbed the RS log and a couple jstacks from the process. In the logs I
> see handler threads creating ZK connections excessively (~50 INFO lines per
> second). The jstacks show handler threads parked while taking region row
> locks, calling HRegion.getRowLockInternal() via
> MetaDataEndpointImpl.doGetTable() [1]. The one handler thread I see that's
> in the same MetaDataEndpointImpl area but not under lock appears to be
> making an RPC to read the statistics table [2].
>
> I believe these two occurrences are related.
>
> My working theory is that the metaDataCache object is performing poorly
> for some reason. This results in excessive meta data lookups, some of which
> appear to require making an RPC call while under lock.
>
> What can I do to relive pressure on this rowlock? Looking at the code
> around the lock, this looks like it's populating to a connection-level
> cache of schema metadata. The host with the high RPC call time is hosting
> the SYSTEM.CATALOG table region. I see some configuration settings that may
> be related (ie, phoenix.coprocessor.maxMetaDataCacheSize), but I see no way
> to get debug information about cache size or evictions from this Guava
> cache instance.
>
> I'll be digging into this further, but I appreciate any pointers you may
> have.
>
> BTW, this is Phoenix 4.6.0 + HBase 1.1.2.
>
> Thanks a lot,
> -n
>
> [0]: client-side stack
> Caused by: org.apache.phoenix.exception.PhoenixIOException: Interrupted
> calling coprocessor service
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for
> row \x00<schema>\x00<user_table>
>         at
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
>         at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1053)
>         at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1016)
>         at
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1289)
>         at
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:446)
>         at
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:389)
>         at
> org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:385)
>         at
> org.apache.phoenix.execute.MutationState.validate(MutationState.java:369)
>         at
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:417)
>         at
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:482)
>         at
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:479)
>         at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>         at
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:479)
>         at
> org.apache.phoenix.mapreduce.PhoenixRecordWriter.write(PhoenixRecordWriter.java:84)
>
> [1] rs handlers blocked stack
> "B.defaultRpcServer.handler=48,queue=3,port=16020" #91 daemon prio=5
> os_prio=0 tid=0x00007ff4ce458000 nid=0xca0a waiting on condition
> [0x00007ff47a607000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x0000000708ef59c0> (a
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5047)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5013)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2397)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
> at
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
> [2]: rs handler stats rpc stack
> "B.defaultRpcServer.handler=19,queue=4,port=16020" #62 daemon prio=5
> os_prio=0 tid=0x00007ff4ce420000 nid=0xc9ea in Object.wait()
> [0x00007ff47c323000]
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:460)
> at java.util.concurrent.TimeUnit.timedWait(TimeUnit.java:348)
> at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService.poll(ResultBoundedCompletionService.java:155)
> - locked <0x00000007a3f5e030> (a
> [Lorg.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture;)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:168)
> at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
> at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
> at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)
> at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
> at
> org.apache.phoenix.schema.stats.StatisticsUtil.readStatistics(StatisticsUtil.java:99)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:836)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(MetaDataEndpointImpl.java:472)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2418)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2365)
> at
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:440)
> at
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7435)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1875)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1857)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
> ------------------------------
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>

Re: Write path blocked by MetaDataEndpoint acquiring region lock

Reply via email to