[Apache Spark][Streaming Job][Checkpoint]Spark job failed on Checkpoint recovery with Batch not found error
Hi, We have a Spark 2.4 job failed on Checkpoint recovery every few hours with the following errors (from the Driver Log): driver spark-kubernetes-driver ERROR 20:38:51 ERROR MicroBatchExecution: Query impressionUpdate [id = 54614900-4145-4d60-8156-9746ffc13d1f, runId = 3637c2f3-49b6-40c2-b6d0-7edb28361c5d] terminated with error java.lang.IllegalStateException: batch 946 doesn't exist at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:406) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:337) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:183) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166) at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160) at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281) at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193) And the executor logs show this error: ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM How should I fix this? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark Job Failed with FileNotFoundException
I have a spark cluster consists of 5 nodes and I have a spark job that should process some files from a directory and send its content to Kafka. I am trying to submit the job using the following command bin$ ./spark-submit --total-executor-cores 20 --executor-memory 5G --class org.css.java.FileMigration.FileSparkMigrator --master spark://spark-master:7077 /home/me/FileMigrator-0.1.1-jar-with-dependencies.jar /home/me/shared kafka01,kafka02,kafka03,kafka04,kafka05 kafka_topic The directory /home/me/shared is mounted on all the 5 nodes but when I submit the job I got the following exception java.io.FileNotFoundException: File file:/home/me/shared/input_1.txt does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:140) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:108) at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:239) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) After some tries, I faced another weird behavior. When I submit the job while the directory contains 1 or 2 files, the same exception is thrown on the driver machine but the job continued and the files are processed successfully. Once I add another file, the exception is thrown and the job failed. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-Failed-with-FileNotFoundException-tp27980.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark job failed
Have you considered posting on vendor forum ? FYI On Mon, Sep 14, 2015 at 6:09 AM, Renu Yadav wrote: > > -- Forwarded message -- > From: Renu Yadav > Date: Mon, Sep 14, 2015 at 4:51 PM > Subject: Spark job failed > To: d...@spark.apache.org > > > I am getting below error while running spark job: > > storage.DiskBlockObjectWriter: Uncaught exception while reverting partial > writes to file > /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd > java.io.FileNotFoundException: > /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd > (No such file or directory > > > > I am running 1.3TB data > following are the transformation > > read from hadoop->map(key/value).coalease(2000).groupByKey. > then sorting each record by server_ts and select most recent > > saving data into parquet. > > > Following is the command > spark-submit --class com.test.Myapp--master yarn-cluster --driver-memory > 16g --executor-memory 20g --executor-cores 5 --num-executors 150 > --files /home/renu_yadav/fmyapp/hive-site.xml --conf > spark.yarn.preserve.staging.files=true --conf > spark.shuffle.memoryFraction=0.6 --conf spark.storage.memoryFraction=0.1 > --conf SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf > SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf > spark.akka.timeout=40 --conf spark.locality.wait=10 --conf > spark.yarn.executor.memoryOverhead=8000 --conf > SPARK_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" > --conf spark.reducer.maxMbInFlight=96 --conf > spark.shuffle.file.buffer.kb=64 --conf > spark.core.connection.ack.wait.timeout=120 --jars > /usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-rdbms-3.2.9.jar > myapp_2.10-1.0.jar > > > > > > > > Cluster configuration > > 20 Nodes > 32 cores per node > 125 GB ram per node > > > Please Help. > > Thanks & Regards, > Renu Yadav > >
Fwd: Spark job failed
-- Forwarded message -- From: Renu Yadav Date: Mon, Sep 14, 2015 at 4:51 PM Subject: Spark job failed To: d...@spark.apache.org I am getting below error while running spark job: storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd java.io.FileNotFoundException: /data/vol5/hadoop/yarn/local/usercache/renu_yadav/appcache/application_1438196554863_31545/spark-4686a622-82be-418e-a8b0-1653458bc8cb/22/temp_shuffle_8c437ba7-55d2-4520-80ec-adcfe932b3bd (No such file or directory I am running 1.3TB data following are the transformation read from hadoop->map(key/value).coalease(2000).groupByKey. then sorting each record by server_ts and select most recent saving data into parquet. Following is the command spark-submit --class com.test.Myapp--master yarn-cluster --driver-memory 16g --executor-memory 20g --executor-cores 5 --num-executors 150 --files /home/renu_yadav/fmyapp/hive-site.xml --conf spark.yarn.preserve.staging.files=true --conf spark.shuffle.memoryFraction=0.6 --conf spark.storage.memoryFraction=0.1 --conf SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf SPARK_SUBMIT_OPTS="-XX:MaxPermSize=768m" --conf spark.akka.timeout=40 --conf spark.locality.wait=10 --conf spark.yarn.executor.memoryOverhead=8000 --conf SPARK_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps" --conf spark.reducer.maxMbInFlight=96 --conf spark.shuffle.file.buffer.kb=64 --conf spark.core.connection.ack.wait.timeout=120 --jars /usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.2.6.0-2800/hive/lib/datanucleus-rdbms-3.2.9.jar myapp_2.10-1.0.jar Cluster configuration 20 Nodes 32 cores per node 125 GB ram per node Please Help. Thanks & Regards, Renu Yadav
Re: Spark Job Failed (Executor Lost & then FS closed)
Can you look more in the worker logs and see whats going on? It looks like a memory issue (kind of GC overhead etc., You need to look in the worker logs) Thanks Best Regards On Fri, Aug 7, 2015 at 3:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > Re attaching the images. > > On Thu, Aug 6, 2015 at 2:50 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > >> Code: >> import java.text.SimpleDateFormat >> import java.util.Calendar >> import java.sql.Date >> import org.apache.spark.storage.StorageLevel >> >> def extract(array: Array[String], index: Integer) = { >> if (index < array.length) { >> array(index).replaceAll("\"", "") >> } else { >> "" >> } >> } >> >> >> case class GuidSess( >> guid: String, >> sessionKey: String, >> sessionStartDate: String, >> siteId: String, >> eventCount: String, >> browser: String, >> browserVersion: String, >> operatingSystem: String, >> experimentChannel: String, >> deviceName: String) >> >> val rowStructText = >> sc.textFile("/user/zeppelin/guidsess/2015/08/05/part-m-1.gz") >> val guidSessRDD = rowStructText.filter(s => s.length != 1).map(s => >> s.split(",")).map( >> { >> s => >> GuidSess(extract(s, 0), >> extract(s, 1), >> extract(s, 2), >> extract(s, 3), >> extract(s, 4), >> extract(s, 5), >> extract(s, 6), >> extract(s, 7), >> extract(s, 8), >> extract(s, 9)) >> }) >> >> val guidSessDF = guidSessRDD.toDF() >> guidSessDF.registerTempTable("guidsess") >> >> Once the temp table is created, i wrote this query >> >> select siteid, count(distinct guid) total_visitor, >> count(sessionKey) as total_visits >> from guidsess >> group by siteid >> >> *Metrics:* >> >> Data Size: 170 MB >> Spark Version: 1.3.1 >> YARN: 2.7.x >> >> >> >> Timeline: >> There is 1 Job, 2 stages with 1 task each. >> >> *1st Stage : mapPartitions* >> [image: Inline image 1] >> >> 1st Stage: Task 1 started to fail. A second attempt started for 1st task >> of first Stage. The first attempt failed "Executor LOST" >> when i go to YARN resource manager and go to that particular host, i see >> that its running fine. >> >> *Attempt #1* >> [image: Inline image 2] >> >> *Attempt #2* Executor LOST AGAIN >> [image: Inline image 3] >> *Attempt 3&4* >> >> *[image: Inline image 4]* >> >> >> >> *2nd Stage runJob : SKIPPED* >> >> *[image: Inline image 5]* >> >> Any suggestions ? >> >> >> -- >> Deepak >> >> > > > -- > Deepak > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org >
Spark Job Failed (Executor Lost & then FS closed)
Code: import java.text.SimpleDateFormat import java.util.Calendar import java.sql.Date import org.apache.spark.storage.StorageLevel def extract(array: Array[String], index: Integer) = { if (index < array.length) { array(index).replaceAll("\"", "") } else { "" } } case class GuidSess( guid: String, sessionKey: String, sessionStartDate: String, siteId: String, eventCount: String, browser: String, browserVersion: String, operatingSystem: String, experimentChannel: String, deviceName: String) val rowStructText = sc.textFile("/user/zeppelin/guidsess/2015/08/05/part-m-1.gz") val guidSessRDD = rowStructText.filter(s => s.length != 1).map(s => s.split(",")).map( { s => GuidSess(extract(s, 0), extract(s, 1), extract(s, 2), extract(s, 3), extract(s, 4), extract(s, 5), extract(s, 6), extract(s, 7), extract(s, 8), extract(s, 9)) }) val guidSessDF = guidSessRDD.toDF() guidSessDF.registerTempTable("guidsess") Once the temp table is created, i wrote this query select siteid, count(distinct guid) total_visitor, count(sessionKey) as total_visits from guidsess group by siteid *Metrics:* Data Size: 170 MB Spark Version: 1.3.1 YARN: 2.7.x Timeline: There is 1 Job, 2 stages with 1 task each. *1st Stage : mapPartitions* [image: Inline image 1] 1st Stage: Task 1 started to fail. A second attempt started for 1st task of first Stage. The first attempt failed "Executor LOST" when i go to YARN resource manager and go to that particular host, i see that its running fine. *Attempt #1* [image: Inline image 2] *Attempt #2* Executor LOST AGAIN [image: Inline image 3] *Attempt 3&4* *[image: Inline image 4]* *2nd Stage runJob : SKIPPED* *[image: Inline image 5]* Any suggestions ? -- Deepak
Re: Spark Job Failed - Class not serializable
You’ll definitely want to use a Kryo-based serializer for Avro. We have a Kryo based serializer that wraps the Avro efficient serializer here. Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 On Apr 3, 2015, at 5:41 AM, Akhil Das wrote: > Because, its throwing up serializable exceptions and kryo is a serializer to > serialize your objects. > > Thanks > Best Regards > > On Fri, Apr 3, 2015 at 5:37 PM, Deepak Jain wrote: > I meant that I did not have to use kyro. Why will kyro help fix this issue > now ? > > Sent from my iPhone > > On 03-Apr-2015, at 5:36 pm, Deepak Jain wrote: > >> I was able to write record that extends specificrecord (avro) this class was >> not auto generated. Do we need to do something extra for auto generated >> classes >> >> Sent from my iPhone >> >> On 03-Apr-2015, at 5:06 pm, Akhil Das wrote: >> >>> This thread might give you some insights >>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E >>> >>> Thanks >>> Best Regards >>> >>> On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >>> My Spark Job failed with >>> >>> >>> 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: >>> saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s >>> 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: >>> Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not >>> serializable result: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >>> Serialization stack: >>> - object not serializable (class: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >>> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >>> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >>> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >>> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >>> null, "currPsLvlId": null})) >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 >>> in stage 2.0 (TID 0) had a not serializable result: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >>> Serialization stack: >>> - object not serializable (class: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >>> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >>> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >>> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >>> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >>> null, "currPsLvlId": null})) >>> at >>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRecei
Re: Spark Job Failed - Class not serializable
Because, its throwing up serializable exceptions and kryo is a serializer to serialize your objects. Thanks Best Regards On Fri, Apr 3, 2015 at 5:37 PM, Deepak Jain wrote: > I meant that I did not have to use kyro. Why will kyro help fix this issue > now ? > > Sent from my iPhone > > On 03-Apr-2015, at 5:36 pm, Deepak Jain wrote: > > I was able to write record that extends specificrecord (avro) this class > was not auto generated. Do we need to do something extra for auto generated > classes > > Sent from my iPhone > > On 03-Apr-2015, at 5:06 pm, Akhil Das wrote: > > This thread might give you some insights > http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E > > Thanks > Best Regards > > On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > >> My Spark Job failed with >> >> >> 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: >> saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s >> 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw >> exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) >> had a not serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> - object not serializable (class: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >> null, "currPsLvlId": null})) >> org.apache.spark.SparkException: Job aborted due to stage failure: Task >> 0.0 in stage 2.0 (TID 0) had a not serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> - object not serializable (class: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >> null, "currPsLvlId": null})) >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, >> exitCode: 15, (reason: User class threw exception: Job aborted due to stage >> failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> >> >> >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is >> auto generated through avro schema using avro-generate-sources maven pulgin. >> >> >> package com.ebay.ep.poc.spark.reporting.process.model.dw; >> >> @SuppressWarnings("all") >> >> @org.apache.avro.specific.AvroGenerated >> >> public class SpsLevelMetricSum extends >> org.apache.avro.specific.SpecificRecordBase implements >> org.apache.avro.specific.SpecificRecord { >> ... >> ... >> } >> >> Can anyone suggest how to fix this ? >> >> >> >> -- >> Deepak >> >> >
Re: Spark Job Failed - Class not serializable
I meant that I did not have to use kyro. Why will kyro help fix this issue now ? Sent from my iPhone > On 03-Apr-2015, at 5:36 pm, Deepak Jain wrote: > > I was able to write record that extends specificrecord (avro) this class was > not auto generated. Do we need to do something extra for auto generated > classes > > Sent from my iPhone > >> On 03-Apr-2015, at 5:06 pm, Akhil Das wrote: >> >> This thread might give you some insights >> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E >> >> Thanks >> Best Regards >> >>> On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >>> My Spark Job failed with >>> >>> >>> 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: >>> saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s >>> 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: >>> Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not >>> serializable result: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >>> Serialization stack: >>> - object not serializable (class: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >>> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >>> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >>> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >>> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >>> null, "currPsLvlId": null})) >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 >>> in stage 2.0 (TID 0) had a not serializable result: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >>> Serialization stack: >>> - object not serializable (class: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >>> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >>> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >>> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >>> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >>> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >>> null, "currPsLvlId": null})) >>> at >>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at >>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >>> at >>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >>> at scala.Option.foreach(Option.scala:236) >>> at >>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) >>> at >>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) >>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >>> 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, >>> exitCode: 15, (reason: User class threw exception: Job aborted due to stage >>> failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >>> Serialization stack: >>> >>> >>> >>> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto >>> generated through avro schema using avro-generate-sources maven pulgin. >>> >>> >>> package com.ebay.ep.poc.spark.reporting.process.model.dw; >>> >>> @SuppressWarnings("all") >>> >>> @org.apache.avro.specific.AvroGenerated >>> >>> public class SpsLevelMetricSum extends >>> org.apache.avro.specific.SpecificRecordBase implements >>> org.apache.avro.specific.SpecificRecord { >>> >>> ... >>> ... >>> } >>> >>> Can anyone suggest how to fix this ? >>> >>> >>> >>> -- >>> Deepak >>
Re: Spark Job Failed - Class not serializable
I was able to write record that extends specificrecord (avro) this class was not auto generated. Do we need to do something extra for auto generated classes Sent from my iPhone > On 03-Apr-2015, at 5:06 pm, Akhil Das wrote: > > This thread might give you some insights > http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E > > Thanks > Best Regards > >> On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: >> My Spark Job failed with >> >> >> 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: >> saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s >> 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: >> Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not >> serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> - object not serializable (class: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >> null, "currPsLvlId": null})) >> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 >> in stage 2.0 (TID 0) had a not serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> - object not serializable (class: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: >> {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": >> null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) >> - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) >> - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, >> "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": >> null, "currPsLvlId": null})) >> at >> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) >> at scala.Option.foreach(Option.scala:236) >> at >> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) >> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) >> 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, >> exitCode: 15, (reason: User class threw exception: Job aborted due to stage >> failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum >> Serialization stack: >> >> >> >> com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto >> generated through avro schema using avro-generate-sources maven pulgin. >> >> >> package com.ebay.ep.poc.spark.reporting.process.model.dw; >> >> @SuppressWarnings("all") >> >> @org.apache.avro.specific.AvroGenerated >> >> public class SpsLevelMetricSum extends >> org.apache.avro.specific.SpecificRecordBase implements >> org.apache.avro.specific.SpecificRecord { >> >> ... >> ... >> } >> >> Can anyone suggest how to fix this ? >> >> >> >> -- >> Deepak >
Re: Spark Job Failed - Class not serializable
This thread might give you some insights http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3CCA+WVT8WXbEHac=N0GWxj-s9gqOkgG0VRL5B=ovjwexqm8ev...@mail.gmail.com%3E Thanks Best Regards On Fri, Apr 3, 2015 at 3:53 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > My Spark Job failed with > > > 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: > saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s > 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw > exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) > had a not serializable result: > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum > Serialization stack: > - object not serializable (class: > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: > {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": > null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) > - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) > - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, > "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": > null, "currPsLvlId": null})) > org.apache.spark.SparkException: Job aborted due to stage failure: Task > 0.0 in stage 2.0 (TID 0) had a not serializable result: > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum > Serialization stack: > - object not serializable (class: > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: > {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": > null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) > - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) > - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, > "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": > null, "currPsLvlId": null})) > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) > at scala.Option.foreach(Option.scala:236) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, > exitCode: 15, (reason: User class threw exception: Job aborted due to stage > failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum > Serialization stack: > > > > com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto > generated through avro schema using avro-generate-sources maven pulgin. > > > package com.ebay.ep.poc.spark.reporting.process.model.dw; > > @SuppressWarnings("all") > > @org.apache.avro.specific.AvroGenerated > > public class SpsLevelMetricSum extends > org.apache.avro.specific.SpecificRecordBase implements > org.apache.avro.specific.SpecificRecord { > ... > ... > } > > Can anyone suggest how to fix this ? > > > > -- > Deepak > >
Spark Job Failed - Class not serializable
My Spark Job failed with 15/04/03 03:15:36 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at AbstractInputHelper.scala:103, took 2.480175 s 15/04/03 03:15:36 ERROR yarn.ApplicationMaster: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null})) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: - object not serializable (class: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum, value: {"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null}) - field (class: scala.Tuple2, name: _2, type: class java.lang.Object) - object (class scala.Tuple2, (0,{"userId": 0, "spsPrgrmId": 0, "spsSlrLevelCd": 0, "spsSlrLevelSumStartDt": null, "spsSlrLevelSumEndDt": null, "currPsLvlId": null})) at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) 15/04/03 03:15:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Job aborted due to stage failure: Task 0.0 in stage 2.0 (TID 0) had a not serializable result: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum Serialization stack: com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum is auto generated through avro schema using avro-generate-sources maven pulgin. package com.ebay.ep.poc.spark.reporting.process.model.dw; @SuppressWarnings("all") @org.apache.avro.specific.AvroGenerated public class SpsLevelMetricSum extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { ... ... } Can anyone suggest how to fix this ? -- Deepak