In my test data, I have a JavaRDD with a single String(size of this RDD is 1).
On a 3 node Yarn cluster, mapToPair function on this RDD sends the same input String to 2 different nodes. Container logs on these nodes show the same string as input. Overriding default partition count by JavaRDD<String> input = sparkContext.textFile(hdfsPath, 0); didn't change anything and the same input string is being processed twice. Is there a way to make sure that each string in a RDD is processed exactly once? Thanks, Neera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-yarn-tp23230.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org