After a while it's possible to see this error too: 9/05/28 11:11:18 ERROR executor.Executor: Exception in task 35.1 in stage 0.0 (TID 265) org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 122 actions: my_table: 122 times, at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:258) at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$2000(AsyncProcess.java:238) at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1810) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:240) at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1092) at example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:25) at example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:19) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/05/28 11:11:18 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 369
El mar., 28 may. 2019 a las 12:12, Guillermo Ortiz Fernández (< guillermo.ortiz.f...@gmail.com>) escribió: > I'm executing a load process into HBase with spark. (around 150M record). > At the end of the process there are a lot of fail tasks. > > I get this error: > > 19/05/28 11:02:31 ERROR client.AsyncProcess: Failed to get region location > org.apache.hadoop.hbase.TableNotFoundException: my_table > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1417) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1211) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:410) > at > org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:359) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:238) > at > org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:146) > at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1092) > at > example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:25) > at > example.bigdata.v360.dsl.UpsertDsl$$anonfun$writeToHBase$1.apply(UpsertDsl.scala:19) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) > at > org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2067) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > > When I execute from the hbase shell an scan, it works. Which could it be the > reason? I'm not sure if it's more a error from HBase or Spark. > >