Hi Xiangrui,
Thanks for your reply. This makes sense, and I should have looked at the doc.. 
indeed.. Zipping before saveAsFile did the trick.


-----Original Message-----
From: Xiangrui Meng [mailto:men...@gmail.com]
Sent: Tuesday, April 01, 2014 11:43 PM
To: user@spark.apache.org
Cc: u...@spark.incubator.apache.org
Subject: Re: Issue with zip and partitions

>From API docs: "Zips this RDD with another one, returning key-value pairs with 
>the first element in each RDD, second element in each RDD, etc. Assumes that 
>the two RDDs have the *same number of partitions* and the *same number of 
>elements in each partition* (e.g. one was made through a map on the other)."

Basically, one RDD should be a mapped RDD of the other, or both RDDs are mapped 
RDDs of the same RDD.

Btw, your message says "Dell - Internal Use - Confidential"...

Best,
Xiangrui

On Tue, Apr 1, 2014 at 7:27 PM,  <patrick_nico...@dell.com> wrote:
> Dell - Internal Use - Confidential
>
> I got an exception "can't zip RDDs with unusual numbers of Partitions"
> when I apply any action (reduce, collect) of dataset created by
> zipping two dataset of 10 million entries each.  The problem occurs
> independently of the number of partitions or when I let Spark creates those 
> partitions.
>
>
>
> Interestingly enough, I do not have problem zipping datasets of 1 and
> 2.5 million entries.....
>
> A similar problem was reported on this board with 0.8 but remember if
> the problem was fixed.
>
>
>
> Any idea? Any workaround?
>
>
>
> I appreciate.

Reply via email to