0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly. Looking at Git history it has not changed in a while.
Do you have the exact classpath that has gone into your Spark cluster? On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <p...@actionml.com> wrote: > A source build did not fix the problem, has anyone run PIO 0.12.1 on a > Spark cluster? The issue seems to be how to pass the correct code to Spark > to connect to HBase: > > [ERROR] [TransportRequestHandler] Error while invoking > RpcHandler#receive() for one-way message. > [ERROR] [TransportRequestHandler] Error while invoking > RpcHandler#receive() for one-way message. > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent > failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0): > java.lang.NoClassDefFoundError: Could not initialize class > org.apache.hadoop.hbase.protobuf.ProtobufUtil > at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil. > convertStringToScan(TableMapReduceUtil.java:521) > at org.apache.hadoop.hbase.mapreduce.TableInputFormat. > setConf(TableInputFormat.java:110) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>( > NewHadoopRDD.scala:170) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute( > MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)``` > (edited) > > Now that we have these pluggable DBs did I miss something? This works with > master=local but not with remote Spark master > > I’ve passed in the hbase-client in the --jars part of spark-submit, still > fails, what am I missing? > > > From: Pat Ferrel <p...@actionml.com> <p...@actionml.com> > Reply: Pat Ferrel <p...@actionml.com> <p...@actionml.com> > Date: May 23, 2018 at 8:57:32 AM > To: user@predictionio.apache.org <user@predictionio.apache.org> > <user@predictionio.apache.org> > Subject: Spark cluster error > > Same CLI works using local Spark master, but fails using remote master for > a cluster due to a missing class def for protobuf used in hbase. We are > using the binary dist 0.12.1. Is this known? Is there a work around? > > We are now trying a source build in hope the class will be put in the > assembly passed to Spark and the reasoning is that the executors don’t > contain hbase classes but when you run a local executor it does, due to > some local classpath. If the source built assembly does not have these > classes, we will have the same problem. Namely how to get protobuf to the > executors. > > Has anyone seen this? > >