0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
Looking at Git history it has not changed in a while.

Do you have the exact classpath that has gone into your Spark cluster?

On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <p...@actionml.com> wrote:

> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
> Spark cluster? The issue seems to be how to pass the correct code to Spark
> to connect to HBase:
>
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> [ERROR] [TransportRequestHandler] Error while invoking
> RpcHandler#receive() for one-way message.
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
> java.lang.NoClassDefFoundError: Could not initialize class
> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.
> convertStringToScan(TableMapReduceUtil.java:521)
>     at org.apache.hadoop.hbase.mapreduce.TableInputFormat.
> setConf(TableInputFormat.java:110)
>     at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(
> NewHadoopRDD.scala:170)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(
> MapPartitionsRDD.scala:38)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
> (edited)
>
> Now that we have these pluggable DBs did I miss something? This works with
> master=local but not with remote Spark master
>
> I’ve passed in the hbase-client in the --jars part of spark-submit, still
> fails, what am I missing?
>
>
> From: Pat Ferrel <p...@actionml.com> <p...@actionml.com>
> Reply: Pat Ferrel <p...@actionml.com> <p...@actionml.com>
> Date: May 23, 2018 at 8:57:32 AM
> To: user@predictionio.apache.org <user@predictionio.apache.org>
> <user@predictionio.apache.org>
> Subject:  Spark cluster error
>
> Same CLI works using local Spark master, but fails using remote master for
> a cluster due to a missing class def for protobuf used in hbase. We are
> using the binary dist 0.12.1.  Is this known? Is there a work around?
>
> We are now trying a source build in hope the class will be put in the
> assembly passed to Spark and the reasoning is that the executors don’t
> contain hbase classes but when you run a local executor it does, due to
> some local classpath. If the source built assembly does not have these
> classes, we will have the same problem. Namely how to get protobuf to the
> executors.
>
> Has anyone seen this?
>
>

Reply via email to