[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

HyukjinKwon Wed, 13 Sep 2017 20:42:06 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19226#discussion_r138790747
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -343,9 +346,6 @@ def _load_stream_without_unbatching(self, stream):
             key_batch_stream = 
self.key_ser._load_stream_without_unbatching(stream)
             val_batch_stream = 
self.val_ser._load_stream_without_unbatching(stream)
             for (key_batch, val_batch) in zip(key_batch_stream, 
val_batch_stream):
    -            if len(key_batch) != len(val_batch):
    -                raise ValueError("Can not deserialize PairRDD with 
different number of items"
    -                                 " in batches: (%d, %d)" % 
(len(key_batch), len(val_batch)))
                 # for correctness with repeated cartesian/zip this must be 
returned as one batch
                 yield zip(key_batch, val_batch)
    --- End diff --
    
    How about returning this batch as a list (and as described in the doc)?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

Reply via email to