I want to know as follows: what is a partition? how it works? how it is different from hadoop partition?
For example: >>> sc.parallelize([1,2,3,4]).map(lambda x: >>> (x,x)).partitionBy(2).glom().collect() [[(2,2), (4,4)], [(1,1), (3,3)]] from this, we will get 2 partitions but what does it mean? how do they reside in memory in the cluster? I am sorry for such a simple question but I couldn't find any specific information about what happens underneath partitioning. Thank you, Joe -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-is-a-partition-how-it-works-tp4325.html Sent from the Apache Spark User List mailing list archive at Nabble.com.