Re: graphframes on cluster

2017-09-22 Thread Imran Rajjad
sorry for posting without complete information

I am connecting to spark cluster with the driver program as the backend of
web application. This is intended to listen to job progress and some other
work. Below is how I am connecting to the cluster

sparkConf = new SparkConf().setAppName("isolated test")
   .setMaster("spark://master:7077")
.set("spark.executor.memory","6g")
.set("spark.driver.memory","6g")
.set("spark.driver.maxResultSize","2g")
.set("spark.executor.extrajavaoptions","-Xmx8g")

.set("spark.jars.packages","graphframes:graphframes:0.5.0-spark2.1-s_2.11")
.set("spark.jars","/home/usr/jobs.jar"); //this is shared location
Linux machines and has the required java classes

the crash occurs at

gFrame.connectedComponents().setBroadcastThreshold(2).run();

with exception

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 0.0 (TID 5, 10.112.29.80):
java.lang.ClassCastException: cannot assign instance of
scala.collection.immutable.List$SerializationProxy to field
org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type
scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at
java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2024)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:71)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

after googling around..this appears to be related to dependencies but I
don't have much dependencies apart from a few POJOs which have been
included through context

regards,
Imran




On Wed, Sep 20, 2017 at 9:00 PM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Could you include the code where it fails?
> Generally the best way to use gf is to use the --packages options with
> spark-submit command
>
> ------
> *From:* Imran Rajjad <raj...@gmail.com>
> *Sent:* Wednesday, September 20, 2017 5:47:27 AM
> *To:* user @spark
> *Subject:* graphframes on cluster
>
> Trying to run graph frames on a spark cluster. Do I need to include the
> package in spark context settings? or the only the driver program is
> suppose to have the graphframe libraries in its class path? Currently the
> job is crashing when action function is invoked on graphframe classes.
>
> regards,
> Imran
>
> --
> I.R
>



-- 
I.R


Re: graphframes on cluster

2017-09-20 Thread Felix Cheung
Could you include the code where it fails?
Generally the best way to use gf is to use the --packages options with 
spark-submit command


From: Imran Rajjad <raj...@gmail.com>
Sent: Wednesday, September 20, 2017 5:47:27 AM
To: user @spark
Subject: graphframes on cluster

Trying to run graph frames on a spark cluster. Do I need to include the package 
in spark context settings? or the only the driver program is suppose to have 
the graphframe libraries in its class path? Currently the job is crashing when 
action function is invoked on graphframe classes.

regards,
Imran

--
I.R


graphframes on cluster

2017-09-20 Thread Imran Rajjad
Trying to run graph frames on a spark cluster. Do I need to include the
package in spark context settings? or the only the driver program is
suppose to have the graphframe libraries in its class path? Currently the
job is crashing when action function is invoked on graphframe classes.

regards,
Imran

-- 
I.R