It seems like a bug, could you file a JIRA for this? (also post a way to reproduce it)
On Fri, Apr 1, 2016 at 11:08 AM, Sergey <ser...@gmail.com> wrote: > Hi! > > I'm on Spark 1.6.1 in local mode on Windows. > > And have issue with zip of zip'pping of two RDDs of __equal__ size and > __equal__ partitions number (I also tried to repartition both RDDs to one > partition). > I get such exception when I do rdd1.zip(rdd2).count(): > > File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 111, in main > File "c:\spark\python\lib\pyspark.zip\pyspark\worker.py", line 106, in > process > File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 263, > in dump_stream > vs = list(itertools.islice(iterator, batch)) > File "c:\spark\python\pyspark\rddsampler.py", line 95, in func > for obj in iterator: > File "c:\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 322, > in load_stream > " in pair: (%d, %d)" % (len(keys), len(vals))) > ValueError: Can not deserialize RDD with different number of items in pair: > (256, 512) > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org