Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19226#discussion_r139281251
--- Diff: python/pyspark/serializers.py ---
@@ -343,6 +343,9 @@ def _load_stream_without_unbatching(self, stream):
key_batch_stream =
self.key_ser._load_stream_without_unbatching(stream)
val_batch_stream =
self.val_ser._load_stream_without_unbatching(stream)
for (key_batch, val_batch) in zip(key_batch_stream,
val_batch_stream):
+ # the batch is an iterable, we need to check lengths so we
convert to list if needed.
--- End diff --
nit: For double-zipped RDDs, the batches can be iterators from other
PairDeserializer, instead of lists. We need to convert them to lists if needed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]