Re: Spark cluster error

Donald Szeto Tue, 29 May 2018 16:04:35 -0700

I recall at one point Spark switched to use per-thread classpath so that
each job would have its own isolated classpath. That was probably around
Spark 1.5 though, so not likely the exact same case here. From what version
of Spark to what version did you upgrade to?


On Tue, May 29, 2018 at 2:39 PM Pat Ferrel <[email protected]> wrote:

> BTW the way we worked around this was to scale up the driver machine to
> handle the executors too-et voila. All worked but, our normal strategy of
> using remote Spark is now somehow broken. We upgraded everything to the
> latest stable and may have messed up some config. So not sure where the
> problem is, just looking for a clue we haven’t already thought of.
>
>
> From: Pat Ferrel <[email protected]> <[email protected]>
> Reply: [email protected] <[email protected]>
> <[email protected]>
> Date: May 29, 2018 at 2:14:23 PM
> To: Donald Szeto <[email protected]> <[email protected]>,
> [email protected] <[email protected]>
> <[email protected]>
>
> Subject:  Re: Spark cluster error
>
> Yes, the spark-submit --jars is where we started to find the missing
> class. The class isn’t found on the remote executor so we looked in the
> jars actually downloaded into the executor’s work dir. the PIO assembly
> jars are there are do have the classes. This would be in the classpath of
> the executor, right? Not sure what you are asking.
>
> Are you asking about the SPARK_CLASSPATH in spark-env.sh? The default
> should include the work subdir for the job, I believe. and it can only be
> added to so we couldn’t have messed that up if it points first to the
> work/job-number dir, right?
>
> I guess the root of my question is how can the jars be downloaded to the
> executor’s work dir and still the classes we know are in the jar are not
> found?
>
>
> From: Donald Szeto <[email protected]> <[email protected]>
> Reply: [email protected] <[email protected]>
> <[email protected]>
> Date: May 29, 2018 at 1:27:03 PM
> To: [email protected] <[email protected]>
> <[email protected]>
> Subject:  Re: Spark cluster error
>
> Sorry, what I meant was the actual spark-submit command that PIO was
> using. It should be in the log.
>
> What Spark version was that? I recall classpath issues with certain
> versions of Spark.
>
> On Thu, May 24, 2018 at 4:52 PM, Pat Ferrel <[email protected]> wrote:
>
>> Thanks Donald,
>>
>> We have:
>>
>>    - built pio with hbase 1.4.3, which is what we have deployed
>>    - verified that the `ProtobufUtil` class is in the pio hbase assembly
>>    - verified the assembly is passed in --jars to spark-submit
>>    - verified that the executors receive and store the assemblies in the
>>    FS work dir on the worker machines
>>    - verified that hashes match the original assembly so the class is
>>    being received by every executor
>>
>> However the executor is unable to find the class.
>>
>> This seems just short of impossible but clearly possible. How can the
>> executor deserialize the code but not find it later?
>>
>> Not sure what you mean the classpath going in to the cluster? The
>> classDef not found does seem to be in the pio 0.12.1 hbase assembly, isn’t
>> this where it should get it?
>>
>> Thanks again
>> p
>>
>>
>> From: Donald Szeto <[email protected]> <[email protected]>
>> Reply: [email protected] <[email protected]>
>> <[email protected]>
>> Date: May 24, 2018 at 2:10:24 PM
>> To: [email protected] <[email protected]>
>> <[email protected]>
>> Subject:  Re: Spark cluster error
>>
>> 0.12.1 packages HBase 0.98.5-hadoop2 in the storage driver assembly.
>> Looking at Git history it has not changed in a while.
>>
>> Do you have the exact classpath that has gone into your Spark cluster?
>>
>> On Wed, May 23, 2018 at 1:30 PM, Pat Ferrel <[email protected]> wrote:
>>
>>> A source build did not fix the problem, has anyone run PIO 0.12.1 on a
>>> Spark cluster? The issue seems to be how to pass the correct code to Spark
>>> to connect to HBase:
>>>
>>> [ERROR] [TransportRequestHandler] Error while invoking
>>> RpcHandler#receive() for one-way message.
>>> [ERROR] [TransportRequestHandler] Error while invoking
>>> RpcHandler#receive() for one-way message.
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due to stage failure: Task 4 in stage 0.0 failed 4 times, most recent
>>> failure: Lost task 4.3 in stage 0.0 (TID 18, 10.68.9.147, executor 0):
>>> java.lang.NoClassDefFoundError: Could not initialize class
>>> org.apache.hadoop.hbase.protobuf.ProtobufUtil
>>>     at
>>> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.convertStringToScan(TableMapReduceUtil.java:521)
>>>     at
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:110)
>>>     at
>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:170)
>>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134)
>>>     at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>>>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>>>     at
>>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>>>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)```
>>> (edited)
>>>
>>> Now that we have these pluggable DBs did I miss something? This works
>>> with master=local but not with remote Spark master
>>>
>>> I’ve passed in the hbase-client in the --jars part of spark-submit,
>>> still fails, what am I missing?
>>>
>>>
>>> From: Pat Ferrel <[email protected]> <[email protected]>
>>> Reply: Pat Ferrel <[email protected]> <[email protected]>
>>> Date: May 23, 2018 at 8:57:32 AM
>>> To: [email protected] <[email protected]>
>>> <[email protected]>
>>> Subject:  Spark cluster error
>>>
>>> Same CLI works using local Spark master, but fails using remote master
>>> for a cluster due to a missing class def for protobuf used in hbase. We are
>>> using the binary dist 0.12.1.  Is this known? Is there a work around?
>>>
>>> We are now trying a source build in hope the class will be put in the
>>> assembly passed to Spark and the reasoning is that the executors don’t
>>> contain hbase classes but when you run a local executor it does, due to
>>> some local classpath. If the source built assembly does not have these
>>> classes, we will have the same problem. Namely how to get protobuf to the
>>> executors.
>>>
>>> Has anyone seen this?
>>>
>>>
>>
>

Re: Spark cluster error

Reply via email to