Dell - Internal Use - Confidential I got an exception "can't zip RDDs with unusual numbers of Partitions" when I apply any action (reduce, collect) of dataset created by zipping two dataset of 10 million entries each. The problem occurs independently of the number of partitions or when I let Spark creates those partitions.
Interestingly enough, I do not have problem zipping datasets of 1 and 2.5 million entries..... A similar problem was reported on this board with 0.8 but remember if the problem was fixed. Any idea? Any workaround? I appreciate.