Github user ueshin commented on a diff in the pull request:
https://github.com/apache/spark/pull/20295#discussion_r162912985
--- Diff: python/pyspark/serializers.py ---
@@ -267,13 +267,13 @@ def load_stream(self, stream):
"""
Deserialize ArrowRecordBatches to an Arrow table and return as a
list of pandas.Series.
"""
- from pyspark.sql.types import _check_dataframe_localize_timestamps
+ from pyspark.sql.types import _check_series_localize_timestamps
import pyarrow as pa
reader = pa.open_stream(stream)
for batch in reader:
# NOTE: changed from pa.Columns.to_pandas, timezone issue in
conversion fixed in 0.7.1
- pdf = _check_dataframe_localize_timestamps(batch.to_pandas(),
self._timezone)
- yield [c for _, c in pdf.iteritems()]
+ yield [_check_series_localize_timestamps(c.to_pandas(),
self._timezone)
+ for c in pa.Table.from_batches([batch]).itercolumns()]
--- End diff --
Maybe we can remove the comment above (`# NOTE: ...`) ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]