[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

HyukjinKwon Fri, 15 Sep 2017 03:16:16 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19226#discussion_r139112030
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -343,6 +343,8 @@ def _load_stream_without_unbatching(self, stream):
             key_batch_stream = 
self.key_ser._load_stream_without_unbatching(stream)
             val_batch_stream = 
self.val_ser._load_stream_without_unbatching(stream)
             for (key_batch, val_batch) in zip(key_batch_stream, 
val_batch_stream):
    +            key_batch = key_batch if hasattr(key_batch, '__len__') else 
list(key_batch)
    --- End diff --
    
    Could we add a small comment that this is required because 
`_load_stream_without_unbatching` could return an  iterator of iterators in 
this case?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

Reply via email to