I am running a job, part of which is to add some "null" values to the rows of a SchemaRDD. The job fails with "Total size of serialized results of 2692 tasks (1024.1 MB) is bigger than spark.driver.maxResultSize(1024.0 MB)"
This is the code: val in = sqc.parquetFile(...) .. val presentColProj: SchemaRDD = in.select(symbolList : _*) val nullSeq:Broadcast[Seq[_]] = sc.broadcast(Seq.fill(missingColNames.size)(null)) val nullPaddedProj: RDD[Row] = presentColProj.map { row => Row.fromSeq( Row.unapplySeq(row).get ++ nullSeq.value) } .. sqc.applySchema(nullPaddedProj, newSchema) I believe it is failing on the map. Is the size of the serialized result large because of the rows in the map? Is there a better way to add some null columns to a schemardd? Any insight would be appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-are-task-results-large-in-this-case-tp21503.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org