Re: Custom Spark Error on Hadoop Cluster

Xiangrui Meng Thu, 07 Jul 2016 22:32:27 -0700

This seems like a deployment or dependency issue. Please check the
following:
1. The unmodified Spark jars were not on the classpath (already existed on
the cluster or pulled in by other packages).
2. The modified jars were indeed deployed to both master and slave nodes.


On Tue, Jul 5, 2016 at 12:29 PM Alger Remirata <abremirat...@gmail.com>
wrote:

> Hi all,
>
> First of all, we like to thank you for developing spark. This helps us a
> lot on our data science task.
>
> I have a question. We have build a customized spark using the following
> command:
> mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver
> -DskipTests clean package.
>
> On the custom spark we built, we've added a new scala file or package
> called StandardNNLS file however it got an error saying:
>
> Name: org.apache.spark.SparkException
> Message: Job aborted due to stage failure: Task 21 in stage 34.0 failed 4
> times, most recent failure: Lost task 21.3 in stage 34.0 (TID 2547,
> 192.168.60.115): java.lang.ClassNotFoundException:
> org.apache.spark.ml.recommendation.ALS$StandardNNLSSolver
>
> StandardNNLSolver is found on another scala file called StandardNNLS.scala
> as we replace the original NNLS solver from scala with StandardNNLS
> Do you guys have some idea about the error. Is there a config file we need
> to edit to add the classpath? Even if we insert the added codes in
> ALS.scala and not create another file like StandardNNLS.scala, the inserted
> code is not recognized. It still gets an error regarding
> ClassNotFoundException
>
> However, when we run this on our local machine and not on the hadoop
> cluster, it is working. We don't know if the error is because we are using
> mvn to build custom spark or it has something to do with communicating to
> hadoop cluster.
>
> We would like to ask some ideas from you how to solve this problem. We can
> actually create another package not dependent to Apache Spark but this is
> so slow. As of now, we are still learning scala and spark. Using Apache
> spark utilities make the code faster. However, if we'll make another
> package not dependent to apache spark, we have to recode the utilities that
> are set private in Apache Spark. So, it is better to use Apache Spark and
> insert some code that we can use.
>
> Thanks,
>
> Alger
>

Re: Custom Spark Error on Hadoop Cluster

Reply via email to