Re: Filters with TimeRange (should get executed only in regions matching TimeRange)

2017-07-11 Thread Ted Yu
Can you tell us more about your row key design ?

Thanks

On Tue, Jul 11, 2017 at 3:03 PM, Veerraju Tadimeti 
wrote:

> hi,
>
> If I implement a filter, it does full range scan.  Is there a way to
> implement a Filter ( with TimeRange, startRow  and stopRow ) without doing
> Full range scan.
>
> Basically, if we pass TimeRange to scan, it wont do Full range scan.  I
> want write a filter to get executed only in the regions which fall under
> TimeRange.
>
> Thank you in advance.
>
>
> Thanks,
> Raju,
> (972)273-0155.
>


Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
I got a timeout when trying to search for this row (185_) and for a
different row (20_):

hbase(main):016:0> scan 'server_based_data', {FILTER => "(PrefixFilter
('20'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560,
149920620]}

ROWCOLUMN+CELL




ERROR: Call id=7856, waitTime=120001, operationTimeout=12 expired.

I tried to increase the timeout but now after waiting over 1 hr, it still
hasn't come back.

hbase(main):017:0>
@shell.hbase.configuration.setInt("hbase.client.scanner.timeout.period",
24)

hbase(main):018:0> scan 'server_based_data', {FILTER => "(PrefixFilter
('20_'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560,
149920620]}

ROWCOLUMN+CELL




(Still no output and waiting over 1 hr)


I also checked other failed/killed mappers.  These are a small sample of
"bad" rowkeys.  These deleted rowkeys show up with all sorts of hashes so
scanning a row after the "bad" rowkey won't tell us much since it seems
like these bad row keys occurs on all sorts of rows/hashes.

2017-07-07 20:25:59,640 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:25:59 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40306: row
'145_app129023.lhr1.mydomain.com_1482214200' on table 'server_based_data'
at
region=server_based_data,145_app129023.lhr1.mydomain.com_1482214200,1483679406846.fbc6c1e473b944fcf1eedd03a3b8e2ec.,
hostname=hslave35139.ams9.mydomain.com,60020,1483577331446, seqNum=8165882





2017-07-07 20:29:22,280 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:29:22 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40303: row
'162_app128162.sjc4.mydomain.com_1485642420' on table 'server_based_data'
at
region=server_based_data,162_app128162.sjc4.mydomain.com_1485642420,1485969672759.37985ed5325cf4afb4bd54afa25728e9.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5489984





2017-07-07 20:28:52,216 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:28:52 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40304: row
'41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139





On Tue, Jul 11, 2017 at 2:12 PM, Ted Yu  wrote:

> bq. it can find 0 rows in less than 1 sec
>
> What if you perform a scan with start row lower than the deleted key, can
> you reproduce the hanging scan ?
>
> Cheers
>
> On Tue, Jul 11, 2017 at 1:55 PM, S L  wrote:
>
> > Same error as from the hadoop job output I initially posted.
> >
> > SocketTimeoutException/RetriesExhaustedException is caused by a key that
> > should be deleted/expired.
> >
> > row '184_app128057.syd2.mydomain.com_1485646620'.
> >
> > The funny thing is when I execute a "get 'tablename', 'rowkey'" from
> "hbase
> > shell", it can find 0 rows in less than 1 sec.  It seems like the
> > initTableMapperJob method is sitting there for 40 sec trying to reach
> this
> > non-existent key for some reason.
> >
> >
> > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> > bufstart = 0; bufvoid = 268435456
> >
> > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> > kvstart = 67108860; length = 16777216
> >
> > 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask:
> Map
> > output collector class = org.apache.hadoop.mapred.
> MapTask$MapOutputBuffer
> >
> > 2017-07-07 20:29:25,248 INFO [main]
> > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=36, exceptions:
> >
> > Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
> > callTimeout=4, callDuration=40314: row
> > '184_app128057.syd2.mydomain.com_1485646620' on table
> 'server_based_data'
> > at
> > region=server_based_data,184_app128057.syd2.mydomain.com_
> > 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> > hostname=hslave35120.ams9.mydomain.com,60020,1498245230342,
> seqNum=9247698
> >
> >
> >
> >at
> > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
> >
> >at
> > 

Filters with TimeRange (should get executed only in regions matching TimeRange)

2017-07-11 Thread Veerraju Tadimeti
hi,

If I implement a filter, it does full range scan.  Is there a way to
implement a Filter ( with TimeRange, startRow  and stopRow ) without doing
Full range scan.

Basically, if we pass TimeRange to scan, it wont do Full range scan.  I
want write a filter to get executed only in the regions which fall under
TimeRange.

Thank you in advance.


Thanks,
Raju,
(972)273-0155.


Re: scope of RegionCoprocessorEnvironment sharedData

2017-07-11 Thread Veerraju Tadimeti
Can I load coprocessor dynamically for a scan operation, it should not be 
loaded for another scan operation if not intended
 btw I invoke scan from hive

Sent from my iPhone

> On Jul 11, 2017, at 4:15 PM, Veerraju Tadimeti  wrote:
> 
> hi,
> 
> Hi John,
> 
> Thanks for the reply.
> 
> I implemented #2 (another way) in ur above post:
> 
> 
> 
> i debug the logs  : in PostScannerOpen() , regionScanner method parameter 
> object is null
> 
> Also, in preScannerOpen() , i returned return super.preScannerOpen(e, scan, 
> new DelegateRegionScanner(s)); 
> in postScannerNext() , internalScanner object is 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl
> 
> #1 way (Put scanner in local map) - may not be possible, cos if two different 
> scan operation with and without attribute hits at the same time, how can we 
> differentiate in postScannerNext. 
> 
> 
> Thanks,
> Raju,
> (972)273-0155.
> 
>> On Tue, Jul 11, 2017 at 8:05 AM, Anoop John  wrote:
>> Ya. It is the same RegionScanner impl in use only being passed.  Ya
>> the param type should have been RegionScanner  I guess. We made that
>> mistake!
>> -Anoop-
>> 
>> On Mon, Jul 10, 2017 at 8:37 PM, Ted Yu  wrote:
>> > The tricky part is that postScannerNext() passes InternalScanner parameter
>> > instead of RegionScanner.
>> >
>> > FYI
>> >
>> > On Sun, Jul 9, 2017 at 10:57 PM, Anoop John  wrote:
>> >
>> >> Ya as Ted said, u are not getting Scan object in the postScannerNext
>> >> and so can not make use of the attribute in Scan within this hook.
>> >> Just setting the sharedData variable will cause issue with concurrent
>> >> scans. (As u imagine)
>> >>
>> >> So I can think of solving this in 2 possible ways. (May be more ways
>> >> possible)
>> >>
>> >> 1.  U keep a Map within ur CP impl.  You implement postScannerOpen
>> >> where u will get the ref to Scanner been created as well as the Scan.
>> >> If the Scan is having attribute, keep that scanner within ur Map.
>> >> During postScannerNext  check if the coming in scanner is there in ur
>> >> Map. If so that means this is the one where u can do the action.
>> >> Also dont forget to implement postScannerClose and remove that scanner
>> >> from the Map.   Here u might have some perf penalty as u have to add
>> >> and get from Map which has to be a concurrent map too.
>> >>
>> >> Another way
>> >>
>> >> 2. Create a custom scanner implementing RegionScanner.   The new one
>> >> has to take an original Region Scanner and just delegate the calls. On
>> >> postScannerOpen, u will get the original scanner been created and u
>> >> can just wrap it with ur new scanner object. ( If the Scan object is
>> >> having required attribute)..  In postScannerNext() u can check for ur
>> >> own RegionScanner type and if so u can do action.
>> >>
>> >>
>> >> -Anoop-
>> >>
>> >>
>> >> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu  wrote:
>> >> > if (canUseGetOperation(e)) {
>> >> >//logic goes here
>> >> >
>> >> > Does your Get target the same region being scanned ?
>> >> > If not, issuing the Get is not advised since the other region may be
>> >> hosted
>> >> > on different region server.
>> >> >
>> >> > Cheers
>> >> >
>> >> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti 
>> >> wrote:
>> >> >
>> >> >> hi,
>> >> >>
>> >> >> I have few questions regarding scope of *RegionCoprocessorEnvironment*
>> >> >>  sharedData.
>> >> >>
>> >> >>
>> >> >>
>> >> >>- *Is sharedData map is shared accross all instances simultaneously
>> >> ?*
>> >> >>   -  I am putting a variable in sharedData in preScannerOpen()
>> >> based on
>> >> >>   scan attribute,
>> >> >>   - check that variable exists in postScannerNext() then apply
>> >> logic,
>> >> >>   - remove the variable postScannerClose().
>> >> >>   - If data is in multiple regions, when one coprocessor removes
>> >> >>   variable in postScannerClose(), will the variable is NULL for
>> >> another
>> >> >>   region coprocessor in postScannerNext() ?
>> >> >>
>> >> >>
>> >> >>- *Is sharedData map is shared across all the client request
>> >> >>operations ?*
>> >> >>
>> >> >> If a variable is set in sharedData for one client operation(say SCAN),
>> >> will
>> >> >> the variable is available for another client operation(new SCAN) ?
>> >> >>
>> >> >>
>> >> >>-  *Will the variables be garbage collected even if we dont 
>> >> >> implement
>> >> >>(removed variables in sharedData) postScannerClose() method*
>> >> >>
>> >> >>
>> >> >> Please find below the logic that I am using currently
>> >> >> *CODE: *
>> >> >>
>> >> >> public RegionScanner
>> >> >> *preScannerOpen*(ObserverContext
>> >> >> e, Scan scan, RegionScanner s) throws IOException {
>> >> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_
>> >> >> OPERATION_IN_POST_SCANNER_NEXT);
>> >> >> String 

Re: scope of RegionCoprocessorEnvironment sharedData

2017-07-11 Thread Veerraju Tadimeti
hi,

Hi John,

Thanks for the reply.

I implemented #2 (*another way*) in ur above post:



i debug the logs  : in PostScannerOpen() , regionScanner method parameter
object is null

Also, in preScannerOpen() , i returned *return super.preScannerOpen(e,
scan, new DelegateRegionScanner(s)); *
in postScannerNext() , internalScanner object is
*org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl*

#1 way (Put scanner in local map) - may not be possible, cos if two
different scan operation with and without attribute hits at the same time,
how can we differentiate in postScannerNext.


Thanks,
Raju,
(972)273-0155.

On Tue, Jul 11, 2017 at 8:05 AM, Anoop John  wrote:

> Ya. It is the same RegionScanner impl in use only being passed.  Ya
> the param type should have been RegionScanner  I guess. We made that
> mistake!
> -Anoop-
>
> On Mon, Jul 10, 2017 at 8:37 PM, Ted Yu  wrote:
> > The tricky part is that postScannerNext() passes InternalScanner
> parameter
> > instead of RegionScanner.
> >
> > FYI
> >
> > On Sun, Jul 9, 2017 at 10:57 PM, Anoop John 
> wrote:
> >
> >> Ya as Ted said, u are not getting Scan object in the postScannerNext
> >> and so can not make use of the attribute in Scan within this hook.
> >> Just setting the sharedData variable will cause issue with concurrent
> >> scans. (As u imagine)
> >>
> >> So I can think of solving this in 2 possible ways. (May be more ways
> >> possible)
> >>
> >> 1.  U keep a Map within ur CP impl.  You implement postScannerOpen
> >> where u will get the ref to Scanner been created as well as the Scan.
> >> If the Scan is having attribute, keep that scanner within ur Map.
> >> During postScannerNext  check if the coming in scanner is there in ur
> >> Map. If so that means this is the one where u can do the action.
> >> Also dont forget to implement postScannerClose and remove that scanner
> >> from the Map.   Here u might have some perf penalty as u have to add
> >> and get from Map which has to be a concurrent map too.
> >>
> >> Another way
> >>
> >> 2. Create a custom scanner implementing RegionScanner.   The new one
> >> has to take an original Region Scanner and just delegate the calls. On
> >> postScannerOpen, u will get the original scanner been created and u
> >> can just wrap it with ur new scanner object. ( If the Scan object is
> >> having required attribute)..  In postScannerNext() u can check for ur
> >> own RegionScanner type and if so u can do action.
> >>
> >>
> >> -Anoop-
> >>
> >>
> >> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu  wrote:
> >> > if (canUseGetOperation(e)) {
> >> >//logic goes here
> >> >
> >> > Does your Get target the same region being scanned ?
> >> > If not, issuing the Get is not advised since the other region may be
> >> hosted
> >> > on different region server.
> >> >
> >> > Cheers
> >> >
> >> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti 
> >> wrote:
> >> >
> >> >> hi,
> >> >>
> >> >> I have few questions regarding scope of
> *RegionCoprocessorEnvironment*
> >> >>  sharedData.
> >> >>
> >> >>
> >> >>
> >> >>- *Is sharedData map is shared accross all instances
> simultaneously
> >> ?*
> >> >>   -  I am putting a variable in sharedData in preScannerOpen()
> >> based on
> >> >>   scan attribute,
> >> >>   - check that variable exists in postScannerNext() then apply
> >> logic,
> >> >>   - remove the variable postScannerClose().
> >> >>   - If data is in multiple regions, when one coprocessor removes
> >> >>   variable in postScannerClose(), will the variable is NULL for
> >> another
> >> >>   region coprocessor in postScannerNext() ?
> >> >>
> >> >>
> >> >>- *Is sharedData map is shared across all the client request
> >> >>operations ?*
> >> >>
> >> >> If a variable is set in sharedData for one client operation(say
> SCAN),
> >> will
> >> >> the variable is available for another client operation(new SCAN) ?
> >> >>
> >> >>
> >> >>-  *Will the variables be garbage collected even if we dont
> implement
> >> >>(removed variables in sharedData) postScannerClose() method*
> >> >>
> >> >>
> >> >> Please find below the logic that I am using currently
> >> >> *CODE: *
> >> >>
> >> >> public RegionScanner
> >> >> *preScannerOpen*(ObserverContext
> >> >> e, Scan scan, RegionScanner s) throws IOException {
> >> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_
> >> >> OPERATION_IN_POST_SCANNER_NEXT);
> >> >> String useGetInPostScannerNextStr = Bytes.toString(
> >> >> useGetInPostScannerNext);
> >> >> if (Boolean.parseBoolean(useGetInPostScannerNextStr)) {
> >> >> e.getEnvironment().getSharedData().put(USE_GET_
> >> >> OPERATION_IN_POST_SCANNER_NEXT, useGetInPostScannerNextStr);
> >> >> }
> >> >> return super.preScannerOpen(e, scan, s);
> >> >> }
> >> >>
> >> >> @Override
> >> 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread Ted Yu
bq. it can find 0 rows in less than 1 sec

What if you perform a scan with start row lower than the deleted key, can
you reproduce the hanging scan ?

Cheers

On Tue, Jul 11, 2017 at 1:55 PM, S L  wrote:

> Same error as from the hadoop job output I initially posted.
>
> SocketTimeoutException/RetriesExhaustedException is caused by a key that
> should be deleted/expired.
>
> row '184_app128057.syd2.mydomain.com_1485646620'.
>
> The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase
> shell", it can find 0 rows in less than 1 sec.  It seems like the
> initTableMapperJob method is sitting there for 40 sec trying to reach this
> non-existent key for some reason.
>
>
> 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 268435456
>
> 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 67108860; length = 16777216
>
> 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>
> 2017-07-07 20:29:25,248 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
>
> Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
> callTimeout=4, callDuration=40314: row
> '184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data'
> at
> region=server_based_data,184_app128057.syd2.mydomain.com_
> 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698
>
>
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>
>at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:207)
>
>at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:60)
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:200)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:
> 403)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(
> TableRecordReaderImpl.java:222)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(
> TableRecordReader.java:147)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(
> TableInputFormatBase.java:216)
>
>at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.
> nextKeyValue(MapTask.java:556)
>
>at
> org.apache.hadoop.mapreduce.task.MapContextImpl.
> nextKeyValue(MapContextImpl.java:80)
>
>at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.
> nextKeyValue(WrappedMapper.java:91)
>
>at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
>at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>
>at java.security.AccessController.doPrivileged(Native Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:415)
>
>at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1693)
>
>at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
> Caused by: java.net.SocketTimeoutException: callTimeout=4,
> callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on
> table 'server_based_data' at
> region=server_based_data,184_app128057.syd2.mydomain.com_
> 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:159)
>
>at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService
> $QueueingFuture.run(ResultBoundedCompletionService.java:65)
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.io.IOException: Call to
> hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2,
> waitTime=40001, operationTimeout=4 expired.
>
>at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(
> AbstractRpcClient.java:291)
>
>at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
>
>at
> 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
Same error as from the hadoop job output I initially posted.

SocketTimeoutException/RetriesExhaustedException is caused by a key that
should be deleted/expired.

row '184_app128057.syd2.mydomain.com_1485646620'.

The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase
shell", it can find 0 rows in less than 1 sec.  It seems like the
initTableMapperJob method is sitting there for 40 sec trying to reach this
non-existent key for some reason.


2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 268435456

2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 67108860; length = 16777216

2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

2017-07-07 20:29:25,248 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40314: row
'184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data'
at
region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698



   at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)

   at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)

   at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)

   at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

   at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)

   at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)

   at
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)

   at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222)

   at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)

   at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)

   at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)

   at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)

   at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

   at java.security.AccessController.doPrivileged(Native Method)

   at javax.security.auth.Subject.doAs(Subject.java:415)

   at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)

   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.net.SocketTimeoutException: callTimeout=4,
callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on
table 'server_based_data' at
region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698

   at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)

   at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

   at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: Call to
hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2,
waitTime=40001, operationTimeout=4 expired.

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)

   at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)

   at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)

   at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219)

   at

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread Ted Yu
Can you take a look at the server log on hslave35150.ams9.mydomain.com
around 17/07/07 20:23:31 ?

See if there is some clue in the log.

On Tue, Jul 11, 2017 at 12:18 PM, S L  wrote:

> If I forgot to say, the keys that the log shows is causing the
> RetriesExhaustedException should be deleted/gone from the table due to the
> TTL being exceeded.
>
> Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException:
> callTimeout=4, callDuration=40303: row
> '41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
> region=server_based_data,41_db160190.iad3.mydomain.com_
> 1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
> hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139
>
> The timestamp here is from Feb 2, 2017.  My TTL is 30 days.  Since I ran
> the job on July 7, 2017, Feb 2017 is way past the 30 day TTL
>
> describe 'server_based_data'
>
> Table server_based_data is ENABLED
>
>
> server_based_data
>
>
> COLUMN FAMILIES DESCRIPTION
>
>
> {NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLIC
>
> ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS
> => '0
>
> ', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE',
> BLOCKSIZE
>
> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
>
> 1 row(s) in 0.5180 seconds
>
> On Tue, Jul 11, 2017 at 12:11 PM, S L  wrote:
>
> > Sorry for not being clear.  I tried with both versions, first 1.0.1, then
> > 1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I
> used
> > that version of the jar.
> >
> > I had set the timeout to be as short as 30 sec and as long as 2 min but I
> > was still running into the problem.  When setting the timeout to 2 min,
> the
> > job took almost 50 min to "complete".  Complete is in quotes because it
> > fails (see pastebin below)
> >
> > Here's a copy of the hadoop output logs via pastebin.  The log is 11000
> > lines so I just pasted up to the first couple exceptions and then pasted
> > the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
> > because Yarn or something killed it.
> >
> > https://pastebin.com/KwriyPn6
> > http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
> > cloudera manager/Yarn
> >
> >
> >
> > On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:
> >
> >> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >>
> >> You mean the error occurred for both versions or, client is on 1.0.1 and
> >> server is on 1.2.0 ?
> >>
> >> There should be more to the RetriesExhaustedException.
> >> Can you pastebin the full stack trace ?
> >>
> >> Cheers
> >>
> >> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
> >>
> >> > I hope someone can tell me what the difference between these two API
> >> calls
> >> > are.  I'm getting weird results between the two of them.  This is
> >> happening
> >> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >> >
> >> > First off, my rowkeys are in the format hash_name_timestamp
> >> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days
> so
> >> > things older than 30 days should disappear after compaction.
> >> >
> >> > The following is code for using ResultScanner.  It doesn't use
> >> MapReduce so
> >> > it takes a very long time to complete.  I can't run my job this way
> >> because
> >> > it takes too long.  However, for debugging purposes, I don't have any
> >> > problems with this method.  It lists all keys for the specified time
> >> range,
> >> > which look valid to me since all the timestamps of the returned keys
> are
> >> > within the past 30 days and within the specified time range:
> >> >
> >> > Scan scan = new Scan();
> >> > scan.addColumn(Bytes.toBytes("raw_data"),
> Bytes.toBytes(fileType));
> >> > scan.setCaching(500);
> >> > scan.setCacheBlocks(false);
> >> > scan.setTimeRange(start, end);
> >> >
> >> > Connection fConnection = ConnectionFactory.
> createConnection(conf);
> >> > Table table = fConnection.getTable(TableName.valueOf(tableName));
> >> > ResultScanner scanner = table.getScanner(scan);
> >> > for (Result result = scanner.next(); result != null; result =
> >> > scanner.next()) {
> >> >System.out.println("Found row: " +
> Bytes.toString(result.getRow()
> >> > ));
> >> > }
> >> >
> >> >
> >> > The follow code doesn't work but it uses MapReduce, which runs way
> >> faster
> >> > than using the ResultScanner way, since it divides things up into 1200
> >> > maps.  The problem is I'm getting rowkeys that should have disappeared
> >> due
> >> > to TTL expiring:
> >> >
> >> > Scan scan = new Scan();
> >> > scan.addColumn(Bytes.toBytes("raw_data"),
> Bytes.toBytes(fileType));
> >> > scan.setCaching(500);
> >> > scan.setCacheBlocks(false);
> >> > scan.setTimeRange(start, end);
> >> > 

Re: Missing data in snapshot - possible flush timing issue?

2017-07-11 Thread Ted Yu
Jacob:
Do you mind updating this thread on whether you saw any unexpected behavior
after applying the patch ?

Thanks

On Wed, May 24, 2017 at 9:04 AM, LeBlanc, Jacob 
wrote:

> Will do. I'll build off 1.1.4 with the patch, apply it to the region
> servers, and capture logs and let you know if I see the exception occur.
>
> --Jacob
>
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Wednesday, May 24, 2017 11:57 AM
> To: user@hbase.apache.org
> Subject: Re: Missing data in snapshot - possible flush timing issue?
>
> I attached tentative fix to HBASE-18099.
>
> If you have a bandwidth, you can try it out.
>
> On Wed, May 24, 2017 at 8:53 AM, LeBlanc, Jacob 
> wrote:
>
> > Great! I see the JIRA bug you just opened. I'll enable debug logging
> > on FlushSnapshotSubprocedure and HRegion on the region servers in the
> > cluster to see if I can capture log messages as better evidence. Since
> > it's a timing issue I'm not sure when we might see it again, but I'll
> > keep an eye out.
> >
> > Thanks so much for your help,
> >
> > --Jacob
> >
> > -Original Message-
> > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > Sent: Wednesday, May 24, 2017 11:29 AM
> > To: user@hbase.apache.org
> > Subject: Re: Missing data in snapshot - possible flush timing issue?
> >
> > In FlushSnapshotSubprocedure (running on region server), there is
> > debug
> > log:
> >
> >   LOG.debug("... Flush Snapshotting region " +
> > region.toString() + "
> > completed.");
> >
> > If you enable debug log, we would know whether the underlying region
> > is considered having completed the flush.
> >
> > Higher in call() method there is this:
> >
> >   region.flush(true);
> >
> > The return value is not checked.
> >
> > In HRegion#flushcache(), Result.CANNOT_FLUSH may be returned due to:
> >
> >   String msg = "Not flushing since "
> >
> >   + (writestate.flushing ? "already flushing"
> >
> >   : "writes not enabled");
> >
> > The above seems to correlate with your description.
> >
> > Let me log a JIRA referring to this thread.
> >
> > On Wed, May 24, 2017 at 8:08 AM, LeBlanc, Jacob
> > 
> > wrote:
> >
> > > Thanks for looking Ted!
> > >
> > > My understanding of the log messages is that the last line of the
> > > pastebin is the end of the flush of the memstore for the region
> > > where we missed data, but that line is tagged with
> "[MemStoreFlusher.1]"
> > > whereas the other regions that were getting flushed as part of
> > > snapshot are tagged with "[rs(
> > > a1-qa-hbr31416d.lab.lynx-connected.com
> > ,16020,1494432106955)-snapshot-pool81-thread-1]".
> > > With only a superficial understanding, it seems like the flush of
> > > that region where messages were tagged with "[MemStoreFlusher.1]",
> > > while happening at the same time, wasn't really part of the snapshot
> > > process. For example, line 3 in the pastebin shows the flush of one
> > > region starting and tagged with snapshot-pool81-thread-1, line 4
> > > shows the flush starting for the region we missed data and tagged
> > > with MemStoreFlusher.1, and line 5 continues with the flush of
> > > region as part of snapshot. So it definitely looks like multiple
> > > flushes were occurring at the same time whereas elsewhere in the
> > > logs it seems like the flushes are always done sequentially as part
> > > of snapshot. So I came to the theory that perhaps there is a timing
> > > issue where the flushed data for a region is missed as part of a
> > > snapshot because the flush is occurring on another thread as part of
> > > normal, periodic
> > flushing of memstore.
> > >
> > > The last line I see in the full region server log that has anything
> > > to do with the snapshot is line 11 in the pastebin at 2017-05-12
> > > 02:06:05,577 where it's processing events from zookeeper. Again with
> > > only a superficial understanding, I was assuming this had something
> > > to do with the master signaling that the snapshot was complete.
> > > We'll be sure to capture the master log next time.
> > >
> > > And thanks for also checking JIRA for me. If there is a bug here it
> > > seems as though we don't have an option to upgrade to fix it and
> > > we'll have to plan on coding around it for now.
> > >
> > > Thanks,
> > >
> > > --Jacob
> > >
> > > -Original Message-
> > > From: Ted Yu [mailto:yuzhih...@gmail.com]
> > > Sent: Wednesday, May 24, 2017 8:47 AM
> > > To: user@hbase.apache.org
> > > Subject: Re: Missing data in snapshot - possible flush timing issue?
> > >
> > > bq. the snapshot finishes before the flush of that last region
> > > finishes
> > >
> > > According to the last line in the pastebin, flush finished at
> > > 2017-05-12
> > > 02:06:06,063
> > > Did you find something in master log which indicated that snapshot
> > > finished before the above time ?
> > >
> > > I went thru snapshot bug fixes in branch-1.1 backward 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
Sorry for not being clear.  I tried with both versions, first 1.0.1, then
1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I used
that version of the jar.

I had set the timeout to be as short as 30 sec and as long as 2 min but I
was still running into the problem.  When setting the timeout to 2 min, the
job took almost 50 min to "complete".  Complete is in quotes because it
fails (see pastebin below)

Here's a copy of the hadoop output logs via pastebin.  The log is 11000
lines so I just pasted up to the first couple exceptions and then pasted
the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
because Yarn or something killed it.

https://pastebin.com/KwriyPn6
http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
cloudera manager/Yarn



On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:

> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>
> You mean the error occurred for both versions or, client is on 1.0.1 and
> server is on 1.2.0 ?
>
> There should be more to the RetriesExhaustedException.
> Can you pastebin the full stack trace ?
>
> Cheers
>
> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
>
> > I hope someone can tell me what the difference between these two API
> calls
> > are.  I'm getting weird results between the two of them.  This is
> happening
> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >
> > First off, my rowkeys are in the format hash_name_timestamp
> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
> > things older than 30 days should disappear after compaction.
> >
> > The following is code for using ResultScanner.  It doesn't use MapReduce
> so
> > it takes a very long time to complete.  I can't run my job this way
> because
> > it takes too long.  However, for debugging purposes, I don't have any
> > problems with this method.  It lists all keys for the specified time
> range,
> > which look valid to me since all the timestamps of the returned keys are
> > within the past 30 days and within the specified time range:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> >
> > Connection fConnection = ConnectionFactory.createConnection(conf);
> > Table table = fConnection.getTable(TableName.valueOf(tableName));
> > ResultScanner scanner = table.getScanner(scan);
> > for (Result result = scanner.next(); result != null; result =
> > scanner.next()) {
> >System.out.println("Found row: " + Bytes.toString(result.getRow()
> > ));
> > }
> >
> >
> > The follow code doesn't work but it uses MapReduce, which runs way faster
> > than using the ResultScanner way, since it divides things up into 1200
> > maps.  The problem is I'm getting rowkeys that should have disappeared
> due
> > to TTL expiring:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> > Text.class, IntWritable.class, job);
> >
> > Here is the error that I get, which eventually kills the whole MR job
> later
> > because over 25% of the mappers failed.
> >
> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> > > null, java.net.SocketTimeoutException: callTimeout=12,
> > > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> > > on table 'server_based_data' at region=server_based_data
> >
> > I'll try to study the code for the hbase-client and hbase-server jars but
> > hopefully someone will know offhand what the difference between the
> methods
> > are and what is causing the initTableMapperJob call to fail.
> >
>


Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
If I forgot to say, the keys that the log shows is causing the
RetriesExhaustedException should be deleted/gone from the table due to the
TTL being exceeded.

Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40303: row
'41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139

The timestamp here is from Feb 2, 2017.  My TTL is 30 days.  Since I ran
the job on July 7, 2017, Feb 2017 is way past the 30 day TTL

describe 'server_based_data'

Table server_based_data is ENABLED


server_based_data


COLUMN FAMILIES DESCRIPTION


{NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLIC

ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS
=> '0

', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE',
BLOCKSIZE

=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}


1 row(s) in 0.5180 seconds

On Tue, Jul 11, 2017 at 12:11 PM, S L  wrote:

> Sorry for not being clear.  I tried with both versions, first 1.0.1, then
> 1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I used
> that version of the jar.
>
> I had set the timeout to be as short as 30 sec and as long as 2 min but I
> was still running into the problem.  When setting the timeout to 2 min, the
> job took almost 50 min to "complete".  Complete is in quotes because it
> fails (see pastebin below)
>
> Here's a copy of the hadoop output logs via pastebin.  The log is 11000
> lines so I just pasted up to the first couple exceptions and then pasted
> the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
> because Yarn or something killed it.
>
> https://pastebin.com/KwriyPn6
> http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
> cloudera manager/Yarn
>
>
>
> On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:
>
>> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>>
>> You mean the error occurred for both versions or, client is on 1.0.1 and
>> server is on 1.2.0 ?
>>
>> There should be more to the RetriesExhaustedException.
>> Can you pastebin the full stack trace ?
>>
>> Cheers
>>
>> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
>>
>> > I hope someone can tell me what the difference between these two API
>> calls
>> > are.  I'm getting weird results between the two of them.  This is
>> happening
>> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>> >
>> > First off, my rowkeys are in the format hash_name_timestamp
>> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
>> > things older than 30 days should disappear after compaction.
>> >
>> > The following is code for using ResultScanner.  It doesn't use
>> MapReduce so
>> > it takes a very long time to complete.  I can't run my job this way
>> because
>> > it takes too long.  However, for debugging purposes, I don't have any
>> > problems with this method.  It lists all keys for the specified time
>> range,
>> > which look valid to me since all the timestamps of the returned keys are
>> > within the past 30 days and within the specified time range:
>> >
>> > Scan scan = new Scan();
>> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
>> > scan.setCaching(500);
>> > scan.setCacheBlocks(false);
>> > scan.setTimeRange(start, end);
>> >
>> > Connection fConnection = ConnectionFactory.createConnection(conf);
>> > Table table = fConnection.getTable(TableName.valueOf(tableName));
>> > ResultScanner scanner = table.getScanner(scan);
>> > for (Result result = scanner.next(); result != null; result =
>> > scanner.next()) {
>> >System.out.println("Found row: " + Bytes.toString(result.getRow()
>> > ));
>> > }
>> >
>> >
>> > The follow code doesn't work but it uses MapReduce, which runs way
>> faster
>> > than using the ResultScanner way, since it divides things up into 1200
>> > maps.  The problem is I'm getting rowkeys that should have disappeared
>> due
>> > to TTL expiring:
>> >
>> > Scan scan = new Scan();
>> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
>> > scan.setCaching(500);
>> > scan.setCacheBlocks(false);
>> > scan.setTimeRange(start, end);
>> > TableMapReduceUtil.initTableMapperJob(tableName, scan,
>> MTTRMapper.class,
>> > Text.class, IntWritable.class, job);
>> >
>> > Here is the error that I get, which eventually kills the whole MR job
>> later
>> > because over 25% of the mappers failed.
>> >
>> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
>> > > null, java.net.SocketTimeoutException: 

How to get a list of running tasks in hbase shell?

2017-07-11 Thread jeff saremi
I sent this earlier in another thread. Thought i'd create its own to get an 
answer. thanks


How do you get an instance of TaskMonitor in Jruby (bin/hbase shell)?
I tried the following and didn't result in anything:

-
taskmonitor = org.apache.hadoop.hbase.monitoring.TaskMonitor.get
taskmonitor.get_tasks.each do |task|
printf("%s\r\n", task.to_string)
end
exit
-

I was trying to mimic the following code in 
"hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/common/TaskMonitorTmpl.jamon"


TaskMonitor taskMonitor = TaskMonitor.get();
...
List tasks = taskMonitor.getTasks();



Re: scope of RegionCoprocessorEnvironment sharedData

2017-07-11 Thread Anoop John
Ya. It is the same RegionScanner impl in use only being passed.  Ya
the param type should have been RegionScanner  I guess. We made that
mistake!
-Anoop-

On Mon, Jul 10, 2017 at 8:37 PM, Ted Yu  wrote:
> The tricky part is that postScannerNext() passes InternalScanner parameter
> instead of RegionScanner.
>
> FYI
>
> On Sun, Jul 9, 2017 at 10:57 PM, Anoop John  wrote:
>
>> Ya as Ted said, u are not getting Scan object in the postScannerNext
>> and so can not make use of the attribute in Scan within this hook.
>> Just setting the sharedData variable will cause issue with concurrent
>> scans. (As u imagine)
>>
>> So I can think of solving this in 2 possible ways. (May be more ways
>> possible)
>>
>> 1.  U keep a Map within ur CP impl.  You implement postScannerOpen
>> where u will get the ref to Scanner been created as well as the Scan.
>> If the Scan is having attribute, keep that scanner within ur Map.
>> During postScannerNext  check if the coming in scanner is there in ur
>> Map. If so that means this is the one where u can do the action.
>> Also dont forget to implement postScannerClose and remove that scanner
>> from the Map.   Here u might have some perf penalty as u have to add
>> and get from Map which has to be a concurrent map too.
>>
>> Another way
>>
>> 2. Create a custom scanner implementing RegionScanner.   The new one
>> has to take an original Region Scanner and just delegate the calls. On
>> postScannerOpen, u will get the original scanner been created and u
>> can just wrap it with ur new scanner object. ( If the Scan object is
>> having required attribute)..  In postScannerNext() u can check for ur
>> own RegionScanner type and if so u can do action.
>>
>>
>> -Anoop-
>>
>>
>> On Sat, Jul 8, 2017 at 9:13 PM, Ted Yu  wrote:
>> > if (canUseGetOperation(e)) {
>> >//logic goes here
>> >
>> > Does your Get target the same region being scanned ?
>> > If not, issuing the Get is not advised since the other region may be
>> hosted
>> > on different region server.
>> >
>> > Cheers
>> >
>> > On Thu, Jul 6, 2017 at 7:14 AM, Veerraju Tadimeti 
>> wrote:
>> >
>> >> hi,
>> >>
>> >> I have few questions regarding scope of *RegionCoprocessorEnvironment*
>> >>  sharedData.
>> >>
>> >>
>> >>
>> >>- *Is sharedData map is shared accross all instances simultaneously
>> ?*
>> >>   -  I am putting a variable in sharedData in preScannerOpen()
>> based on
>> >>   scan attribute,
>> >>   - check that variable exists in postScannerNext() then apply
>> logic,
>> >>   - remove the variable postScannerClose().
>> >>   - If data is in multiple regions, when one coprocessor removes
>> >>   variable in postScannerClose(), will the variable is NULL for
>> another
>> >>   region coprocessor in postScannerNext() ?
>> >>
>> >>
>> >>- *Is sharedData map is shared across all the client request
>> >>operations ?*
>> >>
>> >> If a variable is set in sharedData for one client operation(say SCAN),
>> will
>> >> the variable is available for another client operation(new SCAN) ?
>> >>
>> >>
>> >>-  *Will the variables be garbage collected even if we dont implement
>> >>(removed variables in sharedData) postScannerClose() method*
>> >>
>> >>
>> >> Please find below the logic that I am using currently
>> >> *CODE: *
>> >>
>> >> public RegionScanner
>> >> *preScannerOpen*(ObserverContext
>> >> e, Scan scan, RegionScanner s) throws IOException {
>> >> byte[] useGetInPostScannerNext = scan.getAttribute(USE_GET_
>> >> OPERATION_IN_POST_SCANNER_NEXT);
>> >> String useGetInPostScannerNextStr = Bytes.toString(
>> >> useGetInPostScannerNext);
>> >> if (Boolean.parseBoolean(useGetInPostScannerNextStr)) {
>> >> e.getEnvironment().getSharedData().put(USE_GET_
>> >> OPERATION_IN_POST_SCANNER_NEXT, useGetInPostScannerNextStr);
>> >> }
>> >> return super.preScannerOpen(e, scan, s);
>> >> }
>> >>
>> >> @Override
>> >> public boolean *postScannerNext*(final
>> >> ObserverContext
>> >> e,
>> >> final InternalScanner s, final List results, final
>> int
>> >> limit,
>> >> final boolean hasMore) throws IOException {
>> >> try {
>> >>
>> >> if (canUseGetOperation(e)) {
>> >>
>> >>//logic goes here
>> >> }
>> >> } catch (Exception ex) {
>> >> logger.error("Exception in postScannerNext ", ex);
>> >> throw new IOException(ex);
>> >> }
>> >> return hasMore;
>> >> }
>> >>
>> >> @Override
>> >> public void
>> >> *postScannerClose*(ObserverContext
>> >> e, InternalScanner s) throws IOException {
>> >> if (canUseGetOperation(e)) {
>> >> e.getEnvironment().getSharedData().remove(USE_
>> >> GET_OPERATION_IN_POST_SCANNER_NEXT);
>> >> }
>> >> super.postScannerClose(e, s);
>> >>