Hello all, I have a really strange thing going on.
I have a test data set with 500K lines in a gzipped csv file. I have an array of "column processors," one for each column in the dataset. A Processor tracks aggregate state and has a method "process(v : String)" I'm calling: val processors: Array[Processors] = .... sc.textFile(gzippedFileName).aggregate(processors, { (curState, row) => row.split(",", -1).zipWithIndex.foreach({ v => curState(v._2).process(v._1) }) curState } ....) If the class definition for the Processors is in the same file as the driver it runs in ~23 seconds. If I move the classes to a separate file in the same package without ANY OTHER CHANGES it goes to ~35 seconds. This doesn't make any sense to me. I can't even understand how the compiled class files could be any different in either case. Does anyone have an explanation for why this might be? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Wildly-varying-aggregate-performance-depending-on-code-location-tp18752.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org