Re: Filters with TimeRange (should get executed only in regions matching TimeRange)
Can you tell us more about your row key design ? Thanks On Tue, Jul 11, 2017 at 3:03 PM, Veerraju Tadimetiwrote: > hi, > > If I implement a filter, it does full range scan. Is there a way to > implement a Filter ( with TimeRange, startRow and stopRow ) without doing > Full range scan. > > Basically, if we pass TimeRange to scan, it wont do Full range scan. I > want write a filter to get executed only in the regions which fall under > TimeRange. > > Thank you in advance. > > > Thanks, > Raju, > (972)273-0155. >
Re: Difference between ResultScanner and initTableMapperJob
I got a timeout when trying to search for this row (185_) and for a different row (20_): hbase(main):016:0> scan 'server_based_data', {FILTER => "(PrefixFilter ('20'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560, 149920620]} ROWCOLUMN+CELL ERROR: Call id=7856, waitTime=120001, operationTimeout=12 expired. I tried to increase the timeout but now after waiting over 1 hr, it still hasn't come back. hbase(main):017:0> @shell.hbase.configuration.setInt("hbase.client.scanner.timeout.period", 24) hbase(main):018:0> scan 'server_based_data', {FILTER => "(PrefixFilter ('20_'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560, 149920620]} ROWCOLUMN+CELL (Still no output and waiting over 1 hr) I also checked other failed/killed mappers. These are a small sample of "bad" rowkeys. These deleted rowkeys show up with all sorts of hashes so scanning a row after the "bad" rowkey won't tell us much since it seems like these bad row keys occurs on all sorts of rows/hashes. 2017-07-07 20:25:59,640 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Fri Jul 07 20:25:59 PDT 2017, null, java.net.SocketTimeoutException: callTimeout=4, callDuration=40306: row '145_app129023.lhr1.mydomain.com_1482214200' on table 'server_based_data' at region=server_based_data,145_app129023.lhr1.mydomain.com_1482214200,1483679406846.fbc6c1e473b944fcf1eedd03a3b8e2ec., hostname=hslave35139.ams9.mydomain.com,60020,1483577331446, seqNum=8165882 2017-07-07 20:29:22,280 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Fri Jul 07 20:29:22 PDT 2017, null, java.net.SocketTimeoutException: callTimeout=4, callDuration=40303: row '162_app128162.sjc4.mydomain.com_1485642420' on table 'server_based_data' at region=server_based_data,162_app128162.sjc4.mydomain.com_1485642420,1485969672759.37985ed5325cf4afb4bd54afa25728e9., hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5489984 2017-07-07 20:28:52,216 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Fri Jul 07 20:28:52 PDT 2017, null, java.net.SocketTimeoutException: callTimeout=4, callDuration=40304: row '41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a., hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139 On Tue, Jul 11, 2017 at 2:12 PM, Ted Yuwrote: > bq. it can find 0 rows in less than 1 sec > > What if you perform a scan with start row lower than the deleted key, can > you reproduce the hanging scan ? > > Cheers > > On Tue, Jul 11, 2017 at 1:55 PM, S L wrote: > > > Same error as from the hadoop job output I initially posted. > > > > SocketTimeoutException/RetriesExhaustedException is caused by a key that > > should be deleted/expired. > > > > row '184_app128057.syd2.mydomain.com_1485646620'. > > > > The funny thing is when I execute a "get 'tablename', 'rowkey'" from > "hbase > > shell", it can find 0 rows in less than 1 sec. It seems like the > > initTableMapperJob method is sitting there for 40 sec trying to reach > this > > non-existent key for some reason. > > > > > > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: > > bufstart = 0; bufvoid = 268435456 > > > > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: > > kvstart = 67108860; length = 16777216 > > > > 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: > Map > > output collector class = org.apache.hadoop.mapred. > MapTask$MapOutputBuffer > > > > 2017-07-07 20:29:25,248 INFO [main] > > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > > attempts=36, exceptions: > > > > Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException: > > callTimeout=4, callDuration=40314: row > > '184_app128057.syd2.mydomain.com_1485646620' on table > 'server_based_data' > > at > > region=server_based_data,184_app128057.syd2.mydomain.com_ > > 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5., > > hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, > seqNum=9247698 > > > > > > > >at > > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli > > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) > > > >at > >
Filters with TimeRange (should get executed only in regions matching TimeRange)
hi, If I implement a filter, it does full range scan. Is there a way to implement a Filter ( with TimeRange, startRow and stopRow ) without doing Full range scan. Basically, if we pass TimeRange to scan, it wont do Full range scan. I want write a filter to get executed only in the regions which fall under TimeRange. Thank you in advance. Thanks, Raju, (972)273-0155.
Re: scope of RegionCoprocessorEnvironment sharedData
Can I load coprocessor dynamically for a scan operation, it should not be loaded for another scan operation if not intended btw I invoke scan from hive Sent from my iPhone > On Jul 11, 2017, at 4:15 PM, Veerraju Tadimetiwrote: > > hi, > > Hi John, > > Thanks for the reply. > > I implemented #2 (another way) in ur above post: > > > > i debug the logs : in PostScannerOpen() , regionScanner method parameter > object is null > > Also, in preScannerOpen() , i returned return super.preScannerOpen(e, scan, > new DelegateRegionScanner(s)); > in postScannerNext() , internalScanner object is > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl > > #1 way (Put scanner in local map) - may not be possible, cos if two different > scan operation with and without attribute hits at the same time, how can we > differentiate in postScannerNext. > > > Thanks, > Raju, > (972)273-0155. > >> On Tue, Jul 11, 2017 at 8:05 AM, Anoop John wrote: >> Ya. It is the same RegionScanner impl in use only being passed. Ya >> the param type should have been RegionScanner I guess. We made that >> mistake! >> -Anoop- >> >> On Mon, Jul 10, 2017 at 8:37 PM, Ted Yu wrote: >> > The tricky part is that postScannerNext() passes InternalScanner parameter >> > instead of RegionScanner. >> > >> > FYI >> > >> > On Sun, Jul 9, 2017 at 10:57 PM, Anoop John wrote: >> > >> >> Ya as Ted said, u are not getting Scan object in the postScannerNext >> >> and so can not make use of the attribute in Scan within this hook. >> >> Just setting the sharedData variable will cause issue with concurrent >> >> scans. (As u imagine) >> >> >> >> So I can think of solving this in 2 possible ways. (May be more ways >> >> possible) >> >> >> >> 1. U keep a Map within ur CP impl. You implement postScannerOpen >> >> where u will get the ref to Scanner been created as well as the Scan. >> >> If the Scan is having attribute, keep that scanner within ur Map. >> >> During postScannerNext check if the coming in scanner is there in ur >> >> Map. If so that means this is the one where u can do the action. >> >> Also dont forget to implement postScannerClose and remove that scanner >> >> from the Map. Here u might have some perf penalty as u have to add >> >> and get from Map which has to be a concurrent map too. >> >> >> >> Another way >> >> >> >> 2. Create a custom scanner implementing RegionScanner. The new one >> >> has to take an original Region Scanner and just delegate the calls. On >> >> postScannerOpen, u will get the original scanner been created and u >> >> can just wrap it with ur new scanner object. ( If the Scan object is >> >> having required attribute).. In postScannerNext() u can check for ur >> >> own RegionScanner type and if so u can do action. >> >> >> >> >> >> -Anoop- >> >> >> >> >> >> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu wrote: >> >> > if (canUseGetOperation(e)) { >> >> >//logic goes here >> >> > >> >> > Does your Get target the same region being scanned ? >> >> > If not, issuing the Get is not advised since the other region may be >> >> hosted >> >> > on different region server. >> >> > >> >> > Cheers >> >> > >> >> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti >> >> wrote: >> >> > >> >> >> hi, >> >> >> >> >> >> I have few questions regarding scope of *RegionCoprocessorEnvironment* >> >> >> sharedData. >> >> >> >> >> >> >> >> >> >> >> >>- *Is sharedData map is shared accross all instances simultaneously >> >> ?* >> >> >> - I am putting a variable in sharedData in preScannerOpen() >> >> based on >> >> >> scan attribute, >> >> >> - check that variable exists in postScannerNext() then apply >> >> logic, >> >> >> - remove the variable postScannerClose(). >> >> >> - If data is in multiple regions, when one coprocessor removes >> >> >> variable in postScannerClose(), will the variable is NULL for >> >> another >> >> >> region coprocessor in postScannerNext() ? >> >> >> >> >> >> >> >> >>- *Is sharedData map is shared across all the client request >> >> >>operations ?* >> >> >> >> >> >> If a variable is set in sharedData for one client operation(say SCAN), >> >> will >> >> >> the variable is available for another client operation(new SCAN) ? >> >> >> >> >> >> >> >> >>- *Will the variables be garbage collected even if we dont >> >> >> implement >> >> >>(removed variables in sharedData) postScannerClose() method* >> >> >> >> >> >> >> >> >> Please find below the logic that I am using currently >> >> >> *CODE: * >> >> >> >> >> >> public RegionScanner >> >> >> *preScannerOpen*(ObserverContext >> >> >> e, Scan scan, RegionScanner s) throws IOException { >> >> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_ >> >> >> OPERATION_IN_POST_SCANNER_NEXT); >> >> >> String
Re: scope of RegionCoprocessorEnvironment sharedData
hi, Hi John, Thanks for the reply. I implemented #2 (*another way*) in ur above post: i debug the logs : in PostScannerOpen() , regionScanner method parameter object is null Also, in preScannerOpen() , i returned *return super.preScannerOpen(e, scan, new DelegateRegionScanner(s)); * in postScannerNext() , internalScanner object is *org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl* #1 way (Put scanner in local map) - may not be possible, cos if two different scan operation with and without attribute hits at the same time, how can we differentiate in postScannerNext. Thanks, Raju, (972)273-0155. On Tue, Jul 11, 2017 at 8:05 AM, Anoop Johnwrote: > Ya. It is the same RegionScanner impl in use only being passed. Ya > the param type should have been RegionScanner I guess. We made that > mistake! > -Anoop- > > On Mon, Jul 10, 2017 at 8:37 PM, Ted Yu wrote: > > The tricky part is that postScannerNext() passes InternalScanner > parameter > > instead of RegionScanner. > > > > FYI > > > > On Sun, Jul 9, 2017 at 10:57 PM, Anoop John > wrote: > > > >> Ya as Ted said, u are not getting Scan object in the postScannerNext > >> and so can not make use of the attribute in Scan within this hook. > >> Just setting the sharedData variable will cause issue with concurrent > >> scans. (As u imagine) > >> > >> So I can think of solving this in 2 possible ways. (May be more ways > >> possible) > >> > >> 1. U keep a Map within ur CP impl. You implement postScannerOpen > >> where u will get the ref to Scanner been created as well as the Scan. > >> If the Scan is having attribute, keep that scanner within ur Map. > >> During postScannerNext check if the coming in scanner is there in ur > >> Map. If so that means this is the one where u can do the action. > >> Also dont forget to implement postScannerClose and remove that scanner > >> from the Map. Here u might have some perf penalty as u have to add > >> and get from Map which has to be a concurrent map too. > >> > >> Another way > >> > >> 2. Create a custom scanner implementing RegionScanner. The new one > >> has to take an original Region Scanner and just delegate the calls. On > >> postScannerOpen, u will get the original scanner been created and u > >> can just wrap it with ur new scanner object. ( If the Scan object is > >> having required attribute).. In postScannerNext() u can check for ur > >> own RegionScanner type and if so u can do action. > >> > >> > >> -Anoop- > >> > >> > >> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu wrote: > >> > if (canUseGetOperation(e)) { > >> >//logic goes here > >> > > >> > Does your Get target the same region being scanned ? > >> > If not, issuing the Get is not advised since the other region may be > >> hosted > >> > on different region server. > >> > > >> > Cheers > >> > > >> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti > >> wrote: > >> > > >> >> hi, > >> >> > >> >> I have few questions regarding scope of > *RegionCoprocessorEnvironment* > >> >> sharedData. > >> >> > >> >> > >> >> > >> >>- *Is sharedData map is shared accross all instances > simultaneously > >> ?* > >> >> - I am putting a variable in sharedData in preScannerOpen() > >> based on > >> >> scan attribute, > >> >> - check that variable exists in postScannerNext() then apply > >> logic, > >> >> - remove the variable postScannerClose(). > >> >> - If data is in multiple regions, when one coprocessor removes > >> >> variable in postScannerClose(), will the variable is NULL for > >> another > >> >> region coprocessor in postScannerNext() ? > >> >> > >> >> > >> >>- *Is sharedData map is shared across all the client request > >> >>operations ?* > >> >> > >> >> If a variable is set in sharedData for one client operation(say > SCAN), > >> will > >> >> the variable is available for another client operation(new SCAN) ? > >> >> > >> >> > >> >>- *Will the variables be garbage collected even if we dont > implement > >> >>(removed variables in sharedData) postScannerClose() method* > >> >> > >> >> > >> >> Please find below the logic that I am using currently > >> >> *CODE: * > >> >> > >> >> public RegionScanner > >> >> *preScannerOpen*(ObserverContext > >> >> e, Scan scan, RegionScanner s) throws IOException { > >> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_ > >> >> OPERATION_IN_POST_SCANNER_NEXT); > >> >> String useGetInPostScannerNextStr = Bytes.toString( > >> >> useGetInPostScannerNext); > >> >> if (Boolean.parseBoolean(useGetInPostScannerNextStr)) { > >> >> e.getEnvironment().getSharedData().put(USE_GET_ > >> >> OPERATION_IN_POST_SCANNER_NEXT, useGetInPostScannerNextStr); > >> >> } > >> >> return super.preScannerOpen(e, scan, s); > >> >> } > >> >> > >> >> @Override > >>
Re: Difference between ResultScanner and initTableMapperJob
bq. it can find 0 rows in less than 1 sec What if you perform a scan with start row lower than the deleted key, can you reproduce the hanging scan ? Cheers On Tue, Jul 11, 2017 at 1:55 PM, S Lwrote: > Same error as from the hadoop job output I initially posted. > > SocketTimeoutException/RetriesExhaustedException is caused by a key that > should be deleted/expired. > > row '184_app128057.syd2.mydomain.com_1485646620'. > > The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase > shell", it can find 0 rows in less than 1 sec. It seems like the > initTableMapperJob method is sitting there for 40 sec trying to reach this > non-existent key for some reason. > > > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: > bufstart = 0; bufvoid = 268435456 > > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: > kvstart = 67108860; length = 16777216 > > 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map > output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > > 2017-07-07 20:29:25,248 INFO [main] > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=36, exceptions: > > Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException: > callTimeout=4, callDuration=40314: row > '184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data' > at > region=server_based_data,184_app128057.syd2.mydomain.com_ > 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5., > hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698 > > > >at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) > >at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( > ScannerCallableWithReplicas.java:207) > >at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call( > ScannerCallableWithReplicas.java:60) > >at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries( > RpcRetryingCaller.java:200) > >at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) > >at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java: > 403) > >at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) > >at > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue( > TableRecordReaderImpl.java:222) > >at > org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue( > TableRecordReader.java:147) > >at > org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue( > TableInputFormatBase.java:216) > >at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader. > nextKeyValue(MapTask.java:556) > >at > org.apache.hadoop.mapreduce.task.MapContextImpl. > nextKeyValue(MapContextImpl.java:80) > >at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context. > nextKeyValue(WrappedMapper.java:91) > >at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > >at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > >at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > >at java.security.AccessController.doPrivileged(Native Method) > >at javax.security.auth.Subject.doAs(Subject.java:415) > >at > org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1693) > >at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > Caused by: java.net.SocketTimeoutException: callTimeout=4, > callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on > table 'server_based_data' at > region=server_based_data,184_app128057.syd2.mydomain.com_ > 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5., > hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698 > >at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries( > RpcRetryingCaller.java:159) > >at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService > $QueueingFuture.run(ResultBoundedCompletionService.java:65) > >at > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > >at > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > >at java.lang.Thread.run(Thread.java:745) > > Caused by: java.io.IOException: Call to > hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local > exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, > waitTime=40001, operationTimeout=4 expired. > >at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException( > AbstractRpcClient.java:291) > >at > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272) > >at >
Re: Difference between ResultScanner and initTableMapperJob
Same error as from the hadoop job output I initially posted. SocketTimeoutException/RetriesExhaustedException is caused by a key that should be deleted/expired. row '184_app128057.syd2.mydomain.com_1485646620'. The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase shell", it can find 0 rows in less than 1 sec. It seems like the initTableMapperJob method is sitting there for 40 sec trying to reach this non-existent key for some reason. 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 268435456 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 67108860; length = 16777216 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2017-07-07 20:29:25,248 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException: callTimeout=4, callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data' at region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5., hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.net.SocketTimeoutException: callTimeout=4, callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data' at region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5., hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Call to hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2, waitTime=40001, operationTimeout=4 expired. at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219) at
Re: Difference between ResultScanner and initTableMapperJob
Can you take a look at the server log on hslave35150.ams9.mydomain.com around 17/07/07 20:23:31 ? See if there is some clue in the log. On Tue, Jul 11, 2017 at 12:18 PM, S Lwrote: > If I forgot to say, the keys that the log shows is causing the > RetriesExhaustedException should be deleted/gone from the table due to the > TTL being exceeded. > > Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException: > callTimeout=4, callDuration=40303: row > '41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at > region=server_based_data,41_db160190.iad3.mydomain.com_ > 1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a., > hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139 > > The timestamp here is from Feb 2, 2017. My TTL is 30 days. Since I ran > the job on July 7, 2017, Feb 2017 is way past the 30 day TTL > > describe 'server_based_data' > > Table server_based_data is ENABLED > > > server_based_data > > > COLUMN FAMILIES DESCRIPTION > > > {NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', > REPLIC > > ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS > => '0 > > ', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE', > BLOCKSIZE > > => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} > > > 1 row(s) in 0.5180 seconds > > On Tue, Jul 11, 2017 at 12:11 PM, S L wrote: > > > Sorry for not being clear. I tried with both versions, first 1.0.1, then > > 1,2-cdh5.7.2. We are currently running on Cloudera 5.7.2, thus why I > used > > that version of the jar. > > > > I had set the timeout to be as short as 30 sec and as long as 2 min but I > > was still running into the problem. When setting the timeout to 2 min, > the > > job took almost 50 min to "complete". Complete is in quotes because it > > fails (see pastebin below) > > > > Here's a copy of the hadoop output logs via pastebin. The log is 11000 > > lines so I just pasted up to the first couple exceptions and then pasted > > the end where it jumps from 80% maps to 100% and from 21% reduce to 100% > > because Yarn or something killed it. > > > > https://pastebin.com/KwriyPn6 > > http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from > > cloudera manager/Yarn > > > > > > > > On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu wrote: > > > >> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. > >> > >> You mean the error occurred for both versions or, client is on 1.0.1 and > >> server is on 1.2.0 ? > >> > >> There should be more to the RetriesExhaustedException. > >> Can you pastebin the full stack trace ? > >> > >> Cheers > >> > >> On Mon, Jul 10, 2017 at 2:21 PM, S L wrote: > >> > >> > I hope someone can tell me what the difference between these two API > >> calls > >> > are. I'm getting weird results between the two of them. This is > >> happening > >> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. > >> > > >> > First off, my rowkeys are in the format hash_name_timestamp > >> > e.g. 100_servername_1234567890. The hbase table has a TTL of 30 days > so > >> > things older than 30 days should disappear after compaction. > >> > > >> > The following is code for using ResultScanner. It doesn't use > >> MapReduce so > >> > it takes a very long time to complete. I can't run my job this way > >> because > >> > it takes too long. However, for debugging purposes, I don't have any > >> > problems with this method. It lists all keys for the specified time > >> range, > >> > which look valid to me since all the timestamps of the returned keys > are > >> > within the past 30 days and within the specified time range: > >> > > >> > Scan scan = new Scan(); > >> > scan.addColumn(Bytes.toBytes("raw_data"), > Bytes.toBytes(fileType)); > >> > scan.setCaching(500); > >> > scan.setCacheBlocks(false); > >> > scan.setTimeRange(start, end); > >> > > >> > Connection fConnection = ConnectionFactory. > createConnection(conf); > >> > Table table = fConnection.getTable(TableName.valueOf(tableName)); > >> > ResultScanner scanner = table.getScanner(scan); > >> > for (Result result = scanner.next(); result != null; result = > >> > scanner.next()) { > >> >System.out.println("Found row: " + > Bytes.toString(result.getRow() > >> > )); > >> > } > >> > > >> > > >> > The follow code doesn't work but it uses MapReduce, which runs way > >> faster > >> > than using the ResultScanner way, since it divides things up into 1200 > >> > maps. The problem is I'm getting rowkeys that should have disappeared > >> due > >> > to TTL expiring: > >> > > >> > Scan scan = new Scan(); > >> > scan.addColumn(Bytes.toBytes("raw_data"), > Bytes.toBytes(fileType)); > >> > scan.setCaching(500); > >> > scan.setCacheBlocks(false); > >> > scan.setTimeRange(start, end); > >> >
Re: Missing data in snapshot - possible flush timing issue?
Jacob: Do you mind updating this thread on whether you saw any unexpected behavior after applying the patch ? Thanks On Wed, May 24, 2017 at 9:04 AM, LeBlanc, Jacobwrote: > Will do. I'll build off 1.1.4 with the patch, apply it to the region > servers, and capture logs and let you know if I see the exception occur. > > --Jacob > > -Original Message- > From: Ted Yu [mailto:yuzhih...@gmail.com] > Sent: Wednesday, May 24, 2017 11:57 AM > To: user@hbase.apache.org > Subject: Re: Missing data in snapshot - possible flush timing issue? > > I attached tentative fix to HBASE-18099. > > If you have a bandwidth, you can try it out. > > On Wed, May 24, 2017 at 8:53 AM, LeBlanc, Jacob > wrote: > > > Great! I see the JIRA bug you just opened. I'll enable debug logging > > on FlushSnapshotSubprocedure and HRegion on the region servers in the > > cluster to see if I can capture log messages as better evidence. Since > > it's a timing issue I'm not sure when we might see it again, but I'll > > keep an eye out. > > > > Thanks so much for your help, > > > > --Jacob > > > > -Original Message- > > From: Ted Yu [mailto:yuzhih...@gmail.com] > > Sent: Wednesday, May 24, 2017 11:29 AM > > To: user@hbase.apache.org > > Subject: Re: Missing data in snapshot - possible flush timing issue? > > > > In FlushSnapshotSubprocedure (running on region server), there is > > debug > > log: > > > > LOG.debug("... Flush Snapshotting region " + > > region.toString() + " > > completed."); > > > > If you enable debug log, we would know whether the underlying region > > is considered having completed the flush. > > > > Higher in call() method there is this: > > > > region.flush(true); > > > > The return value is not checked. > > > > In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to: > > > > String msg = "Not flushing since " > > > > + (writestate.flushing ? "already flushing" > > > > : "writes not enabled"); > > > > The above seems to correlate with your description. > > > > Let me log a JIRA referring to this thread. > > > > On Wed, May 24, 2017 at 8:08 AM, LeBlanc, Jacob > > > > wrote: > > > > > Thanks for looking Ted! > > > > > > My understanding of the log messages is that the last line of the > > > pastebin is the end of the flush of the memstore for the region > > > where we missed data, but that line is tagged with > "[MemStoreFlusher.1]" > > > whereas the other regions that were getting flushed as part of > > > snapshot are tagged with "[rs( > > > a1-qa-hbr31416d.lab.lynx-connected.com > > ,16020,1494432106955)-snapshot-pool81-thread-1]". > > > With only a superficial understanding, it seems like the flush of > > > that region where messages were tagged with "[MemStoreFlusher.1]", > > > while happening at the same time, wasn't really part of the snapshot > > > process. For example, line 3 in the pastebin shows the flush of one > > > region starting and tagged with snapshot-pool81-thread-1, line 4 > > > shows the flush starting for the region we missed data and tagged > > > with MemStoreFlusher.1, and line 5 continues with the flush of > > > region as part of snapshot. So it definitely looks like multiple > > > flushes were occurring at the same time whereas elsewhere in the > > > logs it seems like the flushes are always done sequentially as part > > > of snapshot. So I came to the theory that perhaps there is a timing > > > issue where the flushed data for a region is missed as part of a > > > snapshot because the flush is occurring on another thread as part of > > > normal, periodic > > flushing of memstore. > > > > > > The last line I see in the full region server log that has anything > > > to do with the snapshot is line 11 in the pastebin at 2017-05-12 > > > 02:06:05,577 where it's processing events from zookeeper. Again with > > > only a superficial understanding, I was assuming this had something > > > to do with the master signaling that the snapshot was complete. > > > We'll be sure to capture the master log next time. > > > > > > And thanks for also checking JIRA for me. If there is a bug here it > > > seems as though we don't have an option to upgrade to fix it and > > > we'll have to plan on coding around it for now. > > > > > > Thanks, > > > > > > --Jacob > > > > > > -Original Message- > > > From: Ted Yu [mailto:yuzhih...@gmail.com] > > > Sent: Wednesday, May 24, 2017 8:47 AM > > > To: user@hbase.apache.org > > > Subject: Re: Missing data in snapshot - possible flush timing issue? > > > > > > bq. the snapshot finishes before the flush of that last region > > > finishes > > > > > > According to the last line in the pastebin, flush finished at > > > 2017-05-12 > > > 02:06:06,063 > > > Did you find something in master log which indicated that snapshot > > > finished before the above time ? > > > > > > I went thru snapshot bug fixes in branch-1.1 backward
Re: Difference between ResultScanner and initTableMapperJob
Sorry for not being clear. I tried with both versions, first 1.0.1, then 1,2-cdh5.7.2. We are currently running on Cloudera 5.7.2, thus why I used that version of the jar. I had set the timeout to be as short as 30 sec and as long as 2 min but I was still running into the problem. When setting the timeout to 2 min, the job took almost 50 min to "complete". Complete is in quotes because it fails (see pastebin below) Here's a copy of the hadoop output logs via pastebin. The log is 11000 lines so I just pasted up to the first couple exceptions and then pasted the end where it jumps from 80% maps to 100% and from 21% reduce to 100% because Yarn or something killed it. https://pastebin.com/KwriyPn6 http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from cloudera manager/Yarn On Mon, Jul 10, 2017 at 8:50 PM, Ted Yuwrote: > bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. > > You mean the error occurred for both versions or, client is on 1.0.1 and > server is on 1.2.0 ? > > There should be more to the RetriesExhaustedException. > Can you pastebin the full stack trace ? > > Cheers > > On Mon, Jul 10, 2017 at 2:21 PM, S L wrote: > > > I hope someone can tell me what the difference between these two API > calls > > are. I'm getting weird results between the two of them. This is > happening > > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. > > > > First off, my rowkeys are in the format hash_name_timestamp > > e.g. 100_servername_1234567890. The hbase table has a TTL of 30 days so > > things older than 30 days should disappear after compaction. > > > > The following is code for using ResultScanner. It doesn't use MapReduce > so > > it takes a very long time to complete. I can't run my job this way > because > > it takes too long. However, for debugging purposes, I don't have any > > problems with this method. It lists all keys for the specified time > range, > > which look valid to me since all the timestamps of the returned keys are > > within the past 30 days and within the specified time range: > > > > Scan scan = new Scan(); > > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType)); > > scan.setCaching(500); > > scan.setCacheBlocks(false); > > scan.setTimeRange(start, end); > > > > Connection fConnection = ConnectionFactory.createConnection(conf); > > Table table = fConnection.getTable(TableName.valueOf(tableName)); > > ResultScanner scanner = table.getScanner(scan); > > for (Result result = scanner.next(); result != null; result = > > scanner.next()) { > >System.out.println("Found row: " + Bytes.toString(result.getRow() > > )); > > } > > > > > > The follow code doesn't work but it uses MapReduce, which runs way faster > > than using the ResultScanner way, since it divides things up into 1200 > > maps. The problem is I'm getting rowkeys that should have disappeared > due > > to TTL expiring: > > > > Scan scan = new Scan(); > > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType)); > > scan.setCaching(500); > > scan.setCacheBlocks(false); > > scan.setTimeRange(start, end); > > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class, > > Text.class, IntWritable.class, job); > > > > Here is the error that I get, which eventually kills the whole MR job > later > > because over 25% of the mappers failed. > > > > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: > > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017, > > > null, java.net.SocketTimeoutException: callTimeout=12, > > > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940' > > > on table 'server_based_data' at region=server_based_data > > > > I'll try to study the code for the hbase-client and hbase-server jars but > > hopefully someone will know offhand what the difference between the > methods > > are and what is causing the initTableMapperJob call to fail. > > >
Re: Difference between ResultScanner and initTableMapperJob
If I forgot to say, the keys that the log shows is causing the RetriesExhaustedException should be deleted/gone from the table due to the TTL being exceeded. Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException: callTimeout=4, callDuration=40303: row '41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a., hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139 The timestamp here is from Feb 2, 2017. My TTL is 30 days. Since I ran the job on July 7, 2017, Feb 2017 is way past the 30 day TTL describe 'server_based_data' Table server_based_data is ENABLED server_based_data COLUMN FAMILIES DESCRIPTION {NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLIC ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0 ', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 1 row(s) in 0.5180 seconds On Tue, Jul 11, 2017 at 12:11 PM, S Lwrote: > Sorry for not being clear. I tried with both versions, first 1.0.1, then > 1,2-cdh5.7.2. We are currently running on Cloudera 5.7.2, thus why I used > that version of the jar. > > I had set the timeout to be as short as 30 sec and as long as 2 min but I > was still running into the problem. When setting the timeout to 2 min, the > job took almost 50 min to "complete". Complete is in quotes because it > fails (see pastebin below) > > Here's a copy of the hadoop output logs via pastebin. The log is 11000 > lines so I just pasted up to the first couple exceptions and then pasted > the end where it jumps from 80% maps to 100% and from 21% reduce to 100% > because Yarn or something killed it. > > https://pastebin.com/KwriyPn6 > http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from > cloudera manager/Yarn > > > > On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu wrote: > >> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. >> >> You mean the error occurred for both versions or, client is on 1.0.1 and >> server is on 1.2.0 ? >> >> There should be more to the RetriesExhaustedException. >> Can you pastebin the full stack trace ? >> >> Cheers >> >> On Mon, Jul 10, 2017 at 2:21 PM, S L wrote: >> >> > I hope someone can tell me what the difference between these two API >> calls >> > are. I'm getting weird results between the two of them. This is >> happening >> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2. >> > >> > First off, my rowkeys are in the format hash_name_timestamp >> > e.g. 100_servername_1234567890. The hbase table has a TTL of 30 days so >> > things older than 30 days should disappear after compaction. >> > >> > The following is code for using ResultScanner. It doesn't use >> MapReduce so >> > it takes a very long time to complete. I can't run my job this way >> because >> > it takes too long. However, for debugging purposes, I don't have any >> > problems with this method. It lists all keys for the specified time >> range, >> > which look valid to me since all the timestamps of the returned keys are >> > within the past 30 days and within the specified time range: >> > >> > Scan scan = new Scan(); >> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType)); >> > scan.setCaching(500); >> > scan.setCacheBlocks(false); >> > scan.setTimeRange(start, end); >> > >> > Connection fConnection = ConnectionFactory.createConnection(conf); >> > Table table = fConnection.getTable(TableName.valueOf(tableName)); >> > ResultScanner scanner = table.getScanner(scan); >> > for (Result result = scanner.next(); result != null; result = >> > scanner.next()) { >> >System.out.println("Found row: " + Bytes.toString(result.getRow() >> > )); >> > } >> > >> > >> > The follow code doesn't work but it uses MapReduce, which runs way >> faster >> > than using the ResultScanner way, since it divides things up into 1200 >> > maps. The problem is I'm getting rowkeys that should have disappeared >> due >> > to TTL expiring: >> > >> > Scan scan = new Scan(); >> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType)); >> > scan.setCaching(500); >> > scan.setCacheBlocks(false); >> > scan.setTimeRange(start, end); >> > TableMapReduceUtil.initTableMapperJob(tableName, scan, >> MTTRMapper.class, >> > Text.class, IntWritable.class, job); >> > >> > Here is the error that I get, which eventually kills the whole MR job >> later >> > because over 25% of the mappers failed. >> > >> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException: >> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017, >> > > null, java.net.SocketTimeoutException:
How to get a list of running tasks in hbase shell?
I sent this earlier in another thread. Thought i'd create its own to get an answer. thanks How do you get an instance of TaskMonitor in Jruby (bin/hbase shell)? I tried the following and didn't result in anything: - taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get taskmonitor.get_tasks.each do |task| printf("%s\r\n", task.to_string) end exit - I was trying to mimic the following code in "hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon" TaskMonitor taskMonitor = TaskMonitor.get(); ... List tasks = taskMonitor.getTasks();
Re: scope of RegionCoprocessorEnvironment sharedData
Ya. It is the same RegionScanner impl in use only being passed. Ya the param type should have been RegionScanner I guess. We made that mistake! -Anoop- On Mon, Jul 10, 2017 at 8:37 PM, Ted Yuwrote: > The tricky part is that postScannerNext() passes InternalScanner parameter > instead of RegionScanner. > > FYI > > On Sun, Jul 9, 2017 at 10:57 PM, Anoop John wrote: > >> Ya as Ted said, u are not getting Scan object in the postScannerNext >> and so can not make use of the attribute in Scan within this hook. >> Just setting the sharedData variable will cause issue with concurrent >> scans. (As u imagine) >> >> So I can think of solving this in 2 possible ways. (May be more ways >> possible) >> >> 1. U keep a Map within ur CP impl. You implement postScannerOpen >> where u will get the ref to Scanner been created as well as the Scan. >> If the Scan is having attribute, keep that scanner within ur Map. >> During postScannerNext check if the coming in scanner is there in ur >> Map. If so that means this is the one where u can do the action. >> Also dont forget to implement postScannerClose and remove that scanner >> from the Map. Here u might have some perf penalty as u have to add >> and get from Map which has to be a concurrent map too. >> >> Another way >> >> 2. Create a custom scanner implementing RegionScanner. The new one >> has to take an original Region Scanner and just delegate the calls. On >> postScannerOpen, u will get the original scanner been created and u >> can just wrap it with ur new scanner object. ( If the Scan object is >> having required attribute).. In postScannerNext() u can check for ur >> own RegionScanner type and if so u can do action. >> >> >> -Anoop- >> >> >> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu wrote: >> > if (canUseGetOperation(e)) { >> >//logic goes here >> > >> > Does your Get target the same region being scanned ? >> > If not, issuing the Get is not advised since the other region may be >> hosted >> > on different region server. >> > >> > Cheers >> > >> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti >> wrote: >> > >> >> hi, >> >> >> >> I have few questions regarding scope of *RegionCoprocessorEnvironment* >> >> sharedData. >> >> >> >> >> >> >> >>- *Is sharedData map is shared accross all instances simultaneously >> ?* >> >> - I am putting a variable in sharedData in preScannerOpen() >> based on >> >> scan attribute, >> >> - check that variable exists in postScannerNext() then apply >> logic, >> >> - remove the variable postScannerClose(). >> >> - If data is in multiple regions, when one coprocessor removes >> >> variable in postScannerClose(), will the variable is NULL for >> another >> >> region coprocessor in postScannerNext() ? >> >> >> >> >> >>- *Is sharedData map is shared across all the client request >> >>operations ?* >> >> >> >> If a variable is set in sharedData for one client operation(say SCAN), >> will >> >> the variable is available for another client operation(new SCAN) ? >> >> >> >> >> >>- *Will the variables be garbage collected even if we dont implement >> >>(removed variables in sharedData) postScannerClose() method* >> >> >> >> >> >> Please find below the logic that I am using currently >> >> *CODE: * >> >> >> >> public RegionScanner >> >> *preScannerOpen*(ObserverContext >> >> e, Scan scan, RegionScanner s) throws IOException { >> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_ >> >> OPERATION_IN_POST_SCANNER_NEXT); >> >> String useGetInPostScannerNextStr = Bytes.toString( >> >> useGetInPostScannerNext); >> >> if (Boolean.parseBoolean(useGetInPostScannerNextStr)) { >> >> e.getEnvironment().getSharedData().put(USE_GET_ >> >> OPERATION_IN_POST_SCANNER_NEXT, useGetInPostScannerNextStr); >> >> } >> >> return super.preScannerOpen(e, scan, s); >> >> } >> >> >> >> @Override >> >> public boolean *postScannerNext*(final >> >> ObserverContext >> >> e, >> >> final InternalScanner s, final List results, final >> int >> >> limit, >> >> final boolean hasMore) throws IOException { >> >> try { >> >> >> >> if (canUseGetOperation(e)) { >> >> >> >>//logic goes here >> >> } >> >> } catch (Exception ex) { >> >> logger.error("Exception in postScannerNext ", ex); >> >> throw new IOException(ex); >> >> } >> >> return hasMore; >> >> } >> >> >> >> @Override >> >> public void >> >> *postScannerClose*(ObserverContext >> >> e, InternalScanner s) throws IOException { >> >> if (canUseGetOperation(e)) { >> >> e.getEnvironment().getSharedData().remove(USE_ >> >> GET_OPERATION_IN_POST_SCANNER_NEXT); >> >> } >> >> super.postScannerClose(e, s); >> >>