Why are task results large in this case?

ankits Wed, 04 Feb 2015 13:09:07 -0800

I am running a job, part of which is to add some "null" values to the rows of
a SchemaRDD. The job fails with "Total size of serialized results of 2692
tasks (1024.1 MB) is bigger than spark.driver.maxResultSize(1024.0 MB)"


This is the code:

val in = sqc.parquetFile(...)
..
val presentColProj: SchemaRDD = in.select(symbolList : _*)

val nullSeq:Broadcast[Seq[_]] =
sc.broadcast(Seq.fill(missingColNames.size)(null))

val nullPaddedProj: RDD[Row]  = presentColProj.map { row => Row.fromSeq(
  Row.unapplySeq(row).get ++ nullSeq.value) }

..

sqc.applySchema(nullPaddedProj, newSchema)

I believe it is failing on the map. Is the size of the serialized result
large because of the rows in the map? Is there a better way to add some null
columns to a schemardd? Any insight would be appreciated.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Why-are-task-results-large-in-this-case-tp21503.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Why are task results large in this case?

Reply via email to