Hello Abhi, I did try that and it did not work And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So how did you overcome this problem?
Shivani On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi <cepoi.eu...@gmail.com> wrote: > > Le 20 juin 2014 01:46, "Shivani Rao" <raoshiv...@gmail.com> a écrit : > > > > > Hello Andrew, > > > > i wish I could share the code, but for proprietary reasons I can't. But > I can give some idea though of what i am trying to do. The job reads a file > and for each line of that file and processors these lines. I am not doing > anything intense in the "processLogs" function > > > > import argonaut._ > > import argonaut.Argonaut._ > > > > > > /* all of these case classes are created from json strings extracted > from the line in the processLogs() function > > * > > */ > > case class struct1… > > case class struct2… > > case class value1(struct1, struct2) > > > > def processLogs(line:String): Option[(key1, value1)] {… > > } > > > > def run(sparkMaster, appName, executorMemory, jarsPath) { > > val sparkConf = new SparkConf() > > sparkConf.setMaster(sparkMaster) > > sparkConf.setAppName(appName) > > sparkConf.set("spark.executor.memory", executorMemory) > > sparkConf.setJars(jarsPath) // This includes all the jars relevant > jars.. > > val sc = new SparkContext(sparkConf) > > val rawLogs = sc.textFile("hdfs://<my-hadoop-namenode:8020:myfile.txt") > > > rawLogs.saveAsTextFile("hdfs://<my-hadoop-namenode:8020:writebackForTesting") > > > rawLogs.flatMap(processLogs).saveAsTextFile("hdfs://<my-hadoop-namenode:8020:outfile.txt") > > } > > > > If I switch to "local" mode, the code runs just fine, it fails with the > error I pasted above. In the cluster mode, even writing back the file we > just read fails > (rawLogs.saveAsTextFile("hdfs://<my-hadoop-namenode:8020:writebackForTesting") > > > > I still believe this is a classNotFound error in disguise > > > > Indeed you are right, this can be the reason. I had similar errors when > defining case classes in the shell and trying to use them in the RDDs. Are > you shading argonaut in the fat jar ? > > > Thanks > > Shivani > > > > > > > > On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash <and...@andrewash.com> > wrote: > >> > >> Wait, so the file only has four lines and the job running out of heap > space? Can you share the code you're running that does the processing? > I'd guess that you're doing some intense processing on every line but just > writing parsed case classes back to disk sounds very lightweight. > >> > >> I > >> > >> > >> On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao <raoshiv...@gmail.com> > wrote: > >>> > >>> I am trying to process a file that contains 4 log lines (not very > long) and then write my parsed out case classes to a destination folder, > and I get the following error: > >>> > >>> > >>> java.lang.OutOfMemoryError: Java heap space > >>> > >>> at > org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183) > >>> > >>> at > org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244) > >>> > >>> at > org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) > >>> > >>> at > org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) > >>> > >>> at > org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) > >>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>> > >>> at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>> > >>> at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> > >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> > >>> at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > >>> > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) > >>> > >>> at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > >>> > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > >>> > >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350) > >>> > >>> at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > >>> > >>> at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165) > >>> > >>> at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > >>> > >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >>> > >>> at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >>> > >>> at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >>> > >>> at java.lang.reflect.Method.invoke(Method.java:597) > >>> > >>> at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974) > >>> > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848) > >>> > >>> at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > >>> > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > >>> > >>> at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) > >>> > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) > >>> > >>> at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > >>> > >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328) > >>> > >>> at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946) > >>> > >>> at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870) > >>> > >>> at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752) > >>> > >>> > >>> Sadly, there are several folks that have faced this error while trying > to execute Spark jobs and there are various solutions, none of which work > for me > >>> > >>> > >>> a) I tried ( > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736) > changing the number of partitions in my RDD by using coalesce(8) and the > error persisted > >>> > >>> b) I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g, > and both did not work > >>> > >>> c) I strongly suspect there is a class path error ( > http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html) > Mainly because the call stack is repetitive. Maybe the OOM error is a > disguise ? > >>> > >>> d) I checked that i am not out of disk space and that i do not have > too many open files (ulimit -u << sudo ls > /proc/<spark_master_process_id>/fd | wc -l) > >>> > >>> > >>> I am also noticing multiple reflections happening to find the right > "class" i guess, so it could be "class Not Found: error disguising itself > as a memory error. > >>> > >>> > >>> Here are other threads that are encountering same situation .. but > have not been resolved in any way so far.. > >>> > >>> > >>> > http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html > >>> > >>> > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html > >>> > >>> > >>> Any help is greatly appreciated. I am especially calling out on > creators of Spark and Databrick folks. This seems like a "known bug" > waiting to happen. > >>> > >>> > >>> Thanks, > >>> > >>> Shivani > >>> > >>> > >>> -- > >>> Software Engineer > >>> Analytics Engineering Team@ Box > >>> Mountain View, CA > >> > >> > > > > > > > > -- > > Software Engineer > > Analytics Engineering Team@ Box > > Mountain View, CA > -- Software Engineer Analytics Engineering Team@ Box Mountain View, CA