I've discovered that it was noticed a year ago that RDD zip() does not work
when the number of partitions does not evenly divide the total number of
elements in the RDD:
https://groups.google.com/forum/#!msg/spark-users/demrmjHFnoc/Ek3ijiXHr2MJ
I will enter a JIRA ticket just as soon as the ASF Jira system will let me
reset my password.
On Sunday, May 11, 2014 4:40 AM, Michael Malak wrote:
Is this a bug?
scala> sc.parallelize(1 to 2,4).zip(sc.parallelize(11 to 12,4)).collect
res0: Array[(Int, Int)] = Array((1,11), (2,12))
scala> sc.parallelize(1L to 2L,4).zip(sc.parallelize(11 to 12,4)).collect
res1: Array[(Long, Int)] = Array((2,11))