I am running a job, part of which is to add some "null" values to the rows of
a SchemaRDD. The job fails with "Total size of serialized results of 2692
tasks (1024.1 MB) is bigger than spark.driver.maxResultSize(1024.0 MB)"
This is the code:
val in = sqc.parquetFile(...)
..
val presentColProj: SchemaRDD = in.select(symbolList : _*)
val nullSeq:Broadcast[Seq[_]] =
sc.broadcast(Seq.fill(missingColNames.size)(null))
val nullPaddedProj: RDD[Row] = presentColProj.map { row => Row.fromSeq(
Row.unapplySeq(row).get ++ nullSeq.value) }
..
sqc.applySchema(nullPaddedProj, newSchema)
I believe it is failing on the map. Is the size of the serialized result
large because of the rows in the map? Is there a better way to add some null
columns to a schemardd? Any insight would be appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Why-are-task-results-large-in-this-case-tp21503.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]