End of Stream errors in shuffle

Fernando Pereira Mon, 15 Jan 2018 02:33:07 -0800

Hi,

I'm facing a very strange error that occurs halfway of long execution Spark
SQL jobs:


18/01/12 22:14:30 ERROR Utils: Aborting task
java.io.EOFException: reached end of stream after reading 0 bytes; 96 bytes
expected
at org.spark_project.guava.io.ByteStreams.readFully(ByteStreams.java:735)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:127)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2$$anon$3.next(UnsafeRowSerializer.scala:110)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at
org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
at
org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:40)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
at
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
(...)

Since I get this in several jobs, I wonder if it might be a problem at the
comm layer.
Did anyone face a similar problem?

It always happens in a job which does a shuffle of 200GB reading then in
partitions of ~64MB for a groupBy. And it is weird that it only fails when
it processed over 1000 partitions (16 cores on one node)

I even tried changing the spark.shuffle.file.buffer config but it just
seems to change the point when it occurs.

Really would appreciate some hints - what it could be, what to try, test,
how to debug - as I feel pretty much blocked here.

Thanks in advance
Fernando

End of Stream errors in shuffle

Reply via email to