Hi, I’m seeing strange, random errors when running unit tests for my Spark jobs. In this particular case I’m using Spark SQL to read and write Parquet files, and one error that I keep running into is this one:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) I can only prevent this from happening by using isolated Specs tests thats always create a new SparkContext that is not shared between tests (but there can also be only a single SparkContext per test), and also by using standard SQLContext instead of HiveContext. It does not seem to have anything to do with the actual files that I also create during the test run with SQLContext.saveAsParquetFile. Cheers - Marius PS The full trace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 6.0 failed 1 times, most recent failure: Lost task 19.0 in stage 6.0 (TID 223, localhost): java.io.IOException: PARSING_ERROR(2) org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) org.xerial.snappy.SnappyNative.uncompressedLength(Native Method) org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:545) org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125) org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) org.xerial.snappy.SnappyInputStream.<init>(SnappyInputStream.java:58) org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:232) org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readObject$1.apply$mcV$sp(TorrentBroadcast.scala:169) org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:927) org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:155) sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) java.lang.reflect.Method.invoke(Method.java:606) java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:160) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) ~[spark-core_2.10-1.1.1.jar:1.1.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) ~[scala-library.jar:na] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) ~[scala-library.jar:na] at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) ~[spark-core_2.10-1.1.1.jar:1.1.1] --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org