Matt Cheah created SPARK-6044: --------------------------------- Summary: RDD.aggregate() should not use the closure serializer on the zero value Key: SPARK-6044 URL: https://issues.apache.org/jira/browse/SPARK-6044 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.3.0 Reporter: Matt Cheah Fix For: 1.4.0
PairRDDFunctions.aggregateByKey() correctly uses SparkEnv.get.serializer.newInstance() to serialize the zero value. It seems this logic is not mirrored in RDD.aggregate(), which computes the aggregation and returns the aggregation directly at the driver. We should change RDD.aggregate() to make this consistent; I ran into some serialization errors because I was expecting RDD.aggregate() to Kryo serialize the zero value. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org