In my test data, I have a JavaRDD with a single String(size of this RDD is
1).

On a 3 node Yarn cluster, mapToPair function on this RDD sends the same
input String to 2 different nodes. Container logs on these nodes show the
same string as input.   

Overriding default partition count by 
JavaRDD<String> input = sparkContext.textFile(hdfsPath, 0);

didn't change anything and the same input string is being processed twice.
Is there a way to make sure that each string in a RDD is processed exactly
once?

Thanks,
Neera 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-on-yarn-tp23230.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to