Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
I got a timeout when trying to search for this row (185_) and for a
different row (20_):

hbase(main):016:0> scan 'server_based_data', {FILTER => "(PrefixFilter
('20'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560,
149920620]}

ROWCOLUMN+CELL




ERROR: Call id=7856, waitTime=120001, operationTimeout=12 expired.

I tried to increase the timeout but now after waiting over 1 hr, it still
hasn't come back.

hbase(main):017:0>
@shell.hbase.configuration.setInt("hbase.client.scanner.timeout.period",
24)

hbase(main):018:0> scan 'server_based_data', {FILTER => "(PrefixFilter
('20_'))", COLUMNS => 'raw_data:top', TIMERANGE => [149920560,
149920620]}

ROWCOLUMN+CELL




(Still no output and waiting over 1 hr)


I also checked other failed/killed mappers.  These are a small sample of
"bad" rowkeys.  These deleted rowkeys show up with all sorts of hashes so
scanning a row after the "bad" rowkey won't tell us much since it seems
like these bad row keys occurs on all sorts of rows/hashes.

2017-07-07 20:25:59,640 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:25:59 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40306: row
'145_app129023.lhr1.mydomain.com_1482214200' on table 'server_based_data'
at
region=server_based_data,145_app129023.lhr1.mydomain.com_1482214200,1483679406846.fbc6c1e473b944fcf1eedd03a3b8e2ec.,
hostname=hslave35139.ams9.mydomain.com,60020,1483577331446, seqNum=8165882





2017-07-07 20:29:22,280 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:29:22 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40303: row
'162_app128162.sjc4.mydomain.com_1485642420' on table 'server_based_data'
at
region=server_based_data,162_app128162.sjc4.mydomain.com_1485642420,1485969672759.37985ed5325cf4afb4bd54afa25728e9.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5489984





2017-07-07 20:28:52,216 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:28:52 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40304: row
'41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139





On Tue, Jul 11, 2017 at 2:12 PM, Ted Yu  wrote:

> bq. it can find 0 rows in less than 1 sec
>
> What if you perform a scan with start row lower than the deleted key, can
> you reproduce the hanging scan ?
>
> Cheers
>
> On Tue, Jul 11, 2017 at 1:55 PM, S L  wrote:
>
> > Same error as from the hadoop job output I initially posted.
> >
> > SocketTimeoutException/RetriesExhaustedException is caused by a key that
> > should be deleted/expired.
> >
> > row '184_app128057.syd2.mydomain.com_1485646620'.
> >
> > The funny thing is when I execute a "get 'tablename', 'rowkey'" from
> "hbase
> > shell", it can find 0 rows in less than 1 sec.  It seems like the
> > initTableMapperJob method is sitting there for 40 sec trying to reach
> this
> > non-existent key for some reason.
> >
> >
> > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> > bufstart = 0; bufvoid = 268435456
> >
> > 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> > kvstart = 67108860; length = 16777216
> >
> > 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask:
> Map
> > output collector class = org.apache.hadoop.mapred.
> MapTask$MapOutputBuffer
> >
> > 2017-07-07 20:29:25,248 INFO [main]
> > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> > attempts=36, exceptions:
> >
> > Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
> > callTimeout=4, callDuration=40314: row
> > '184_app128057.syd2.mydomain.com_1485646620' on table
> 'server_based_data'
> > at
> > region=server_based_data,184_app128057.syd2.mydomain.com_
> > 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> > hostname=hslave35120.ams9.mydomain.com,60020,1498245230342,
> seqNum=9247698
> >
> >
> >
> >at
> > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> > cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
> >
> >at
> > 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread Ted Yu
bq. it can find 0 rows in less than 1 sec

What if you perform a scan with start row lower than the deleted key, can
you reproduce the hanging scan ?

Cheers

On Tue, Jul 11, 2017 at 1:55 PM, S L  wrote:

> Same error as from the hadoop job output I initially posted.
>
> SocketTimeoutException/RetriesExhaustedException is caused by a key that
> should be deleted/expired.
>
> row '184_app128057.syd2.mydomain.com_1485646620'.
>
> The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase
> shell", it can find 0 rows in less than 1 sec.  It seems like the
> initTableMapperJob method is sitting there for 40 sec trying to reach this
> non-existent key for some reason.
>
>
> 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> bufstart = 0; bufvoid = 268435456
>
> 2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
> kvstart = 67108860; length = 16777216
>
> 2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
>
> 2017-07-07 20:29:25,248 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
>
> Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
> callTimeout=4, callDuration=40314: row
> '184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data'
> at
> region=server_based_data,184_app128057.syd2.mydomain.com_
> 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698
>
>
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>
>at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:207)
>
>at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:60)
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:200)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:
> 403)
>
>at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(
> TableRecordReaderImpl.java:222)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(
> TableRecordReader.java:147)
>
>at
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(
> TableInputFormatBase.java:216)
>
>at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.
> nextKeyValue(MapTask.java:556)
>
>at
> org.apache.hadoop.mapreduce.task.MapContextImpl.
> nextKeyValue(MapContextImpl.java:80)
>
>at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.
> nextKeyValue(WrappedMapper.java:91)
>
>at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
>at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>
>at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>
>at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>
>at java.security.AccessController.doPrivileged(Native Method)
>
>at javax.security.auth.Subject.doAs(Subject.java:415)
>
>at
> org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1693)
>
>at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
> Caused by: java.net.SocketTimeoutException: callTimeout=4,
> callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on
> table 'server_based_data' at
> region=server_based_data,184_app128057.syd2.mydomain.com_
> 1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
> hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698
>
>at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(
> RpcRetryingCaller.java:159)
>
>at
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService
> $QueueingFuture.run(ResultBoundedCompletionService.java:65)
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.io.IOException: Call to
> hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2,
> waitTime=40001, operationTimeout=4 expired.
>
>at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(
> AbstractRpcClient.java:291)
>
>at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)
>
>at
> 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
Same error as from the hadoop job output I initially posted.

SocketTimeoutException/RetriesExhaustedException is caused by a key that
should be deleted/expired.

row '184_app128057.syd2.mydomain.com_1485646620'.

The funny thing is when I execute a "get 'tablename', 'rowkey'" from "hbase
shell", it can find 0 rows in less than 1 sec.  It seems like the
initTableMapperJob method is sitting there for 40 sec trying to reach this
non-existent key for some reason.


2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
bufstart = 0; bufvoid = 268435456

2017-07-07 20:28:19,974 INFO [main] org.apache.hadoop.mapred.MapTask:
kvstart = 67108860; length = 16777216

2017-07-07 20:28:19,980 INFO [main] org.apache.hadoop.mapred.MapTask: Map
output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer

2017-07-07 20:29:25,248 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
attempts=36, exceptions:

Fri Jul 07 20:29:25 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40314: row
'184_app128057.syd2.mydomain.com_1485646620' on table 'server_based_data'
at
region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698



   at
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)

   at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:207)

   at
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)

   at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

   at
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)

   at
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:403)

   at
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)

   at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:222)

   at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)

   at
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)

   at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)

   at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)

   at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)

   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)

   at java.security.AccessController.doPrivileged(Native Method)

   at javax.security.auth.Subject.doAs(Subject.java:415)

   at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)

   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.net.SocketTimeoutException: callTimeout=4,
callDuration=40314: row '184_app128057.syd2.mydomain.com_1485646620' on
table 'server_based_data' at
region=server_based_data,184_app128057.syd2.mydomain.com_1485646620,1486597623524.37ccf993b84fd15b24c0c4efbb95b7f5.,
hostname=hslave35120.ams9.mydomain.com,60020,1498245230342, seqNum=9247698

   at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)

   at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)

   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

   at java.lang.Thread.run(Thread.java:745)

Caused by: java.io.IOException: Call to
hslave35120.ams9.mydomain.com/10.216.35.120:60020 failed on local
exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=2,
waitTime=40001, operationTimeout=4 expired.

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:291)

   at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1272)

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)

   at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)

   at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:34094)

   at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:219)

   at

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread Ted Yu
Can you take a look at the server log on hslave35150.ams9.mydomain.com
around 17/07/07 20:23:31 ?

See if there is some clue in the log.

On Tue, Jul 11, 2017 at 12:18 PM, S L  wrote:

> If I forgot to say, the keys that the log shows is causing the
> RetriesExhaustedException should be deleted/gone from the table due to the
> TTL being exceeded.
>
> Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException:
> callTimeout=4, callDuration=40303: row
> '41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
> region=server_based_data,41_db160190.iad3.mydomain.com_
> 1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
> hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139
>
> The timestamp here is from Feb 2, 2017.  My TTL is 30 days.  Since I ran
> the job on July 7, 2017, Feb 2017 is way past the 30 day TTL
>
> describe 'server_based_data'
>
> Table server_based_data is ENABLED
>
>
> server_based_data
>
>
> COLUMN FAMILIES DESCRIPTION
>
>
> {NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> REPLIC
>
> ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS
> => '0
>
> ', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE',
> BLOCKSIZE
>
> => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
>
>
> 1 row(s) in 0.5180 seconds
>
> On Tue, Jul 11, 2017 at 12:11 PM, S L  wrote:
>
> > Sorry for not being clear.  I tried with both versions, first 1.0.1, then
> > 1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I
> used
> > that version of the jar.
> >
> > I had set the timeout to be as short as 30 sec and as long as 2 min but I
> > was still running into the problem.  When setting the timeout to 2 min,
> the
> > job took almost 50 min to "complete".  Complete is in quotes because it
> > fails (see pastebin below)
> >
> > Here's a copy of the hadoop output logs via pastebin.  The log is 11000
> > lines so I just pasted up to the first couple exceptions and then pasted
> > the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
> > because Yarn or something killed it.
> >
> > https://pastebin.com/KwriyPn6
> > http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
> > cloudera manager/Yarn
> >
> >
> >
> > On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:
> >
> >> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >>
> >> You mean the error occurred for both versions or, client is on 1.0.1 and
> >> server is on 1.2.0 ?
> >>
> >> There should be more to the RetriesExhaustedException.
> >> Can you pastebin the full stack trace ?
> >>
> >> Cheers
> >>
> >> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
> >>
> >> > I hope someone can tell me what the difference between these two API
> >> calls
> >> > are.  I'm getting weird results between the two of them.  This is
> >> happening
> >> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >> >
> >> > First off, my rowkeys are in the format hash_name_timestamp
> >> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days
> so
> >> > things older than 30 days should disappear after compaction.
> >> >
> >> > The following is code for using ResultScanner.  It doesn't use
> >> MapReduce so
> >> > it takes a very long time to complete.  I can't run my job this way
> >> because
> >> > it takes too long.  However, for debugging purposes, I don't have any
> >> > problems with this method.  It lists all keys for the specified time
> >> range,
> >> > which look valid to me since all the timestamps of the returned keys
> are
> >> > within the past 30 days and within the specified time range:
> >> >
> >> > Scan scan = new Scan();
> >> > scan.addColumn(Bytes.toBytes("raw_data"),
> Bytes.toBytes(fileType));
> >> > scan.setCaching(500);
> >> > scan.setCacheBlocks(false);
> >> > scan.setTimeRange(start, end);
> >> >
> >> > Connection fConnection = ConnectionFactory.
> createConnection(conf);
> >> > Table table = fConnection.getTable(TableName.valueOf(tableName));
> >> > ResultScanner scanner = table.getScanner(scan);
> >> > for (Result result = scanner.next(); result != null; result =
> >> > scanner.next()) {
> >> >System.out.println("Found row: " +
> Bytes.toString(result.getRow()
> >> > ));
> >> > }
> >> >
> >> >
> >> > The follow code doesn't work but it uses MapReduce, which runs way
> >> faster
> >> > than using the ResultScanner way, since it divides things up into 1200
> >> > maps.  The problem is I'm getting rowkeys that should have disappeared
> >> due
> >> > to TTL expiring:
> >> >
> >> > Scan scan = new Scan();
> >> > scan.addColumn(Bytes.toBytes("raw_data"),
> Bytes.toBytes(fileType));
> >> > scan.setCaching(500);
> >> > scan.setCacheBlocks(false);
> >> > scan.setTimeRange(start, end);
> >> > 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
Sorry for not being clear.  I tried with both versions, first 1.0.1, then
1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I used
that version of the jar.

I had set the timeout to be as short as 30 sec and as long as 2 min but I
was still running into the problem.  When setting the timeout to 2 min, the
job took almost 50 min to "complete".  Complete is in quotes because it
fails (see pastebin below)

Here's a copy of the hadoop output logs via pastebin.  The log is 11000
lines so I just pasted up to the first couple exceptions and then pasted
the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
because Yarn or something killed it.

https://pastebin.com/KwriyPn6
http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
cloudera manager/Yarn



On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:

> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>
> You mean the error occurred for both versions or, client is on 1.0.1 and
> server is on 1.2.0 ?
>
> There should be more to the RetriesExhaustedException.
> Can you pastebin the full stack trace ?
>
> Cheers
>
> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
>
> > I hope someone can tell me what the difference between these two API
> calls
> > are.  I'm getting weird results between the two of them.  This is
> happening
> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >
> > First off, my rowkeys are in the format hash_name_timestamp
> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
> > things older than 30 days should disappear after compaction.
> >
> > The following is code for using ResultScanner.  It doesn't use MapReduce
> so
> > it takes a very long time to complete.  I can't run my job this way
> because
> > it takes too long.  However, for debugging purposes, I don't have any
> > problems with this method.  It lists all keys for the specified time
> range,
> > which look valid to me since all the timestamps of the returned keys are
> > within the past 30 days and within the specified time range:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> >
> > Connection fConnection = ConnectionFactory.createConnection(conf);
> > Table table = fConnection.getTable(TableName.valueOf(tableName));
> > ResultScanner scanner = table.getScanner(scan);
> > for (Result result = scanner.next(); result != null; result =
> > scanner.next()) {
> >System.out.println("Found row: " + Bytes.toString(result.getRow()
> > ));
> > }
> >
> >
> > The follow code doesn't work but it uses MapReduce, which runs way faster
> > than using the ResultScanner way, since it divides things up into 1200
> > maps.  The problem is I'm getting rowkeys that should have disappeared
> due
> > to TTL expiring:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> > Text.class, IntWritable.class, job);
> >
> > Here is the error that I get, which eventually kills the whole MR job
> later
> > because over 25% of the mappers failed.
> >
> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> > > null, java.net.SocketTimeoutException: callTimeout=12,
> > > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> > > on table 'server_based_data' at region=server_based_data
> >
> > I'll try to study the code for the hbase-client and hbase-server jars but
> > hopefully someone will know offhand what the difference between the
> methods
> > are and what is causing the initTableMapperJob call to fail.
> >
>


Re: Difference between ResultScanner and initTableMapperJob

2017-07-11 Thread S L
If I forgot to say, the keys that the log shows is causing the
RetriesExhaustedException should be deleted/gone from the table due to the
TTL being exceeded.

Fri Jul 07 20:23:26 PDT 2017, null, java.net.SocketTimeoutException:
callTimeout=4, callDuration=40303: row
'41_db160190.iad3.mydomain.com_1486067940' on table 'server_based_data' at
region=server_based_data,41_db160190.iad3.mydomain.com_1486067940,1487094006943.f67c3b9836107bdbe6a533e2829c509a.,
hostname=hslave35150.ams9.mydomain.com,60020,1483579082784, seqNum=5423139

The timestamp here is from Feb 2, 2017.  My TTL is 30 days.  Since I ran
the job on July 7, 2017, Feb 2017 is way past the 30 day TTL

describe 'server_based_data'

Table server_based_data is ENABLED


server_based_data


COLUMN FAMILIES DESCRIPTION


{NAME => 'raw_data', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLIC

ATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'SNAPPY', MIN_VERSIONS
=> '0

', TTL => '2592000 SECONDS (30 DAYS)', KEEP_DELETED_CELLS => 'FALSE',
BLOCKSIZE

=> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}


1 row(s) in 0.5180 seconds

On Tue, Jul 11, 2017 at 12:11 PM, S L  wrote:

> Sorry for not being clear.  I tried with both versions, first 1.0.1, then
> 1,2-cdh5.7.2.  We are currently running on Cloudera 5.7.2, thus why I used
> that version of the jar.
>
> I had set the timeout to be as short as 30 sec and as long as 2 min but I
> was still running into the problem.  When setting the timeout to 2 min, the
> job took almost 50 min to "complete".  Complete is in quotes because it
> fails (see pastebin below)
>
> Here's a copy of the hadoop output logs via pastebin.  The log is 11000
> lines so I just pasted up to the first couple exceptions and then pasted
> the end where it jumps from 80% maps to 100% and from 21% reduce to 100%
> because Yarn or something killed it.
>
> https://pastebin.com/KwriyPn6
> http://imgur.com/a/ouPZ5 - screenshot from failed mapreduce job from
> cloudera manager/Yarn
>
>
>
> On Mon, Jul 10, 2017 at 8:50 PM, Ted Yu  wrote:
>
>> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>>
>> You mean the error occurred for both versions or, client is on 1.0.1 and
>> server is on 1.2.0 ?
>>
>> There should be more to the RetriesExhaustedException.
>> Can you pastebin the full stack trace ?
>>
>> Cheers
>>
>> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
>>
>> > I hope someone can tell me what the difference between these two API
>> calls
>> > are.  I'm getting weird results between the two of them.  This is
>> happening
>> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>> >
>> > First off, my rowkeys are in the format hash_name_timestamp
>> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
>> > things older than 30 days should disappear after compaction.
>> >
>> > The following is code for using ResultScanner.  It doesn't use
>> MapReduce so
>> > it takes a very long time to complete.  I can't run my job this way
>> because
>> > it takes too long.  However, for debugging purposes, I don't have any
>> > problems with this method.  It lists all keys for the specified time
>> range,
>> > which look valid to me since all the timestamps of the returned keys are
>> > within the past 30 days and within the specified time range:
>> >
>> > Scan scan = new Scan();
>> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
>> > scan.setCaching(500);
>> > scan.setCacheBlocks(false);
>> > scan.setTimeRange(start, end);
>> >
>> > Connection fConnection = ConnectionFactory.createConnection(conf);
>> > Table table = fConnection.getTable(TableName.valueOf(tableName));
>> > ResultScanner scanner = table.getScanner(scan);
>> > for (Result result = scanner.next(); result != null; result =
>> > scanner.next()) {
>> >System.out.println("Found row: " + Bytes.toString(result.getRow()
>> > ));
>> > }
>> >
>> >
>> > The follow code doesn't work but it uses MapReduce, which runs way
>> faster
>> > than using the ResultScanner way, since it divides things up into 1200
>> > maps.  The problem is I'm getting rowkeys that should have disappeared
>> due
>> > to TTL expiring:
>> >
>> > Scan scan = new Scan();
>> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
>> > scan.setCaching(500);
>> > scan.setCacheBlocks(false);
>> > scan.setTimeRange(start, end);
>> > TableMapReduceUtil.initTableMapperJob(tableName, scan,
>> MTTRMapper.class,
>> > Text.class, IntWritable.class, job);
>> >
>> > Here is the error that I get, which eventually kills the whole MR job
>> later
>> > because over 25% of the mappers failed.
>> >
>> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
>> > > null, java.net.SocketTimeoutException: 

Re: Difference between ResultScanner and initTableMapperJob

2017-07-10 Thread Allan Yang
You should pay more attention to the RetriesExhaustedException exception.
As you have already found out that the important difference between normal
scan and MapReduce job is that the mapreduce job will end up in many
concurrent tasks( 1200 maps as you pointed out). Maybe the 1200 tasks
running in parallel create a big load for client/server and result in the
retry exhausted exception. Anyway, still need more log here to locate the
root problem.

Best Regards
Allan Yang

2017-07-11 11:50 GMT+08:00 Ted Yu :

> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>
> You mean the error occurred for both versions or, client is on 1.0.1 and
> server is on 1.2.0 ?
>
> There should be more to the RetriesExhaustedException.
> Can you pastebin the full stack trace ?
>
> Cheers
>
> On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:
>
> > I hope someone can tell me what the difference between these two API
> calls
> > are.  I'm getting weird results between the two of them.  This is
> happening
> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >
> > First off, my rowkeys are in the format hash_name_timestamp
> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
> > things older than 30 days should disappear after compaction.
> >
> > The following is code for using ResultScanner.  It doesn't use MapReduce
> so
> > it takes a very long time to complete.  I can't run my job this way
> because
> > it takes too long.  However, for debugging purposes, I don't have any
> > problems with this method.  It lists all keys for the specified time
> range,
> > which look valid to me since all the timestamps of the returned keys are
> > within the past 30 days and within the specified time range:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> >
> > Connection fConnection = ConnectionFactory.createConnection(conf);
> > Table table = fConnection.getTable(TableName.valueOf(tableName));
> > ResultScanner scanner = table.getScanner(scan);
> > for (Result result = scanner.next(); result != null; result =
> > scanner.next()) {
> >System.out.println("Found row: " + Bytes.toString(result.getRow()
> > ));
> > }
> >
> >
> > The follow code doesn't work but it uses MapReduce, which runs way faster
> > than using the ResultScanner way, since it divides things up into 1200
> > maps.  The problem is I'm getting rowkeys that should have disappeared
> due
> > to TTL expiring:
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(500);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> > Text.class, IntWritable.class, job);
> >
> > Here is the error that I get, which eventually kills the whole MR job
> later
> > because over 25% of the mappers failed.
> >
> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> > > null, java.net.SocketTimeoutException: callTimeout=12,
> > > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> > > on table 'server_based_data' at region=server_based_data
> >
> > I'll try to study the code for the hbase-client and hbase-server jars but
> > hopefully someone will know offhand what the difference between the
> methods
> > are and what is causing the initTableMapperJob call to fail.
> >
>


Re: Difference between ResultScanner and initTableMapperJob

2017-07-10 Thread Ted Yu
bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.

You mean the error occurred for both versions or, client is on 1.0.1 and
server is on 1.2.0 ?

There should be more to the RetriesExhaustedException.
Can you pastebin the full stack trace ?

Cheers

On Mon, Jul 10, 2017 at 2:21 PM, S L  wrote:

> I hope someone can tell me what the difference between these two API calls
> are.  I'm getting weird results between the two of them.  This is happening
> for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>
> First off, my rowkeys are in the format hash_name_timestamp
> e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
> things older than 30 days should disappear after compaction.
>
> The following is code for using ResultScanner.  It doesn't use MapReduce so
> it takes a very long time to complete.  I can't run my job this way because
> it takes too long.  However, for debugging purposes, I don't have any
> problems with this method.  It lists all keys for the specified time range,
> which look valid to me since all the timestamps of the returned keys are
> within the past 30 days and within the specified time range:
>
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> scan.setCaching(500);
> scan.setCacheBlocks(false);
> scan.setTimeRange(start, end);
>
> Connection fConnection = ConnectionFactory.createConnection(conf);
> Table table = fConnection.getTable(TableName.valueOf(tableName));
> ResultScanner scanner = table.getScanner(scan);
> for (Result result = scanner.next(); result != null; result =
> scanner.next()) {
>System.out.println("Found row: " + Bytes.toString(result.getRow()
> ));
> }
>
>
> The follow code doesn't work but it uses MapReduce, which runs way faster
> than using the ResultScanner way, since it divides things up into 1200
> maps.  The problem is I'm getting rowkeys that should have disappeared due
> to TTL expiring:
>
> Scan scan = new Scan();
> scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> scan.setCaching(500);
> scan.setCacheBlocks(false);
> scan.setTimeRange(start, end);
> TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> Text.class, IntWritable.class, job);
>
> Here is the error that I get, which eventually kills the whole MR job later
> because over 25% of the mappers failed.
>
> > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> > null, java.net.SocketTimeoutException: callTimeout=12,
> > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> > on table 'server_based_data' at region=server_based_data
>
> I'll try to study the code for the hbase-client and hbase-server jars but
> hopefully someone will know offhand what the difference between the methods
> are and what is causing the initTableMapperJob call to fail.
>


Difference between ResultScanner and initTableMapperJob

2017-07-10 Thread S L
I hope someone can tell me what the difference between these two API calls
are.  I'm getting weird results between the two of them.  This is happening
for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.

First off, my rowkeys are in the format hash_name_timestamp
e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
things older than 30 days should disappear after compaction.

The following is code for using ResultScanner.  It doesn't use MapReduce so
it takes a very long time to complete.  I can't run my job this way because
it takes too long.  However, for debugging purposes, I don't have any
problems with this method.  It lists all keys for the specified time range,
which look valid to me since all the timestamps of the returned keys are
within the past 30 days and within the specified time range:

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.setTimeRange(start, end);

Connection fConnection = ConnectionFactory.createConnection(conf);
Table table = fConnection.getTable(TableName.valueOf(tableName));
ResultScanner scanner = table.getScanner(scan);
for (Result result = scanner.next(); result != null; result =
scanner.next()) {
   System.out.println("Found row: " + Bytes.toString(result.getRow()));
}


The follow code doesn't work but it uses MapReduce, which runs way faster
than using the ResultScanner way, since it divides things up into 1200
maps.  The problem is I'm getting rowkeys that should have disappeared due
to TTL expiring:

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
scan.setCaching(500);
scan.setCacheBlocks(false);
scan.setTimeRange(start, end);
TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
Text.class, IntWritable.class, job);

Here is the error that I get, which eventually kills the whole MR job later
because over 25% of the mappers failed.

> Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> null, java.net.SocketTimeoutException: callTimeout=12,
> callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> on table 'server_based_data' at region=server_based_data

I'll try to study the code for the hbase-client and hbase-server jars but
hopefully someone will know offhand what the difference between the methods
are and what is causing the initTableMapperJob call to fail.