[GitHub] [spark] BryanCutler commented on a change in pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow

GitBox Tue, 17 Nov 2020 22:41:38 -0800


BryanCutler commented on a change in pull request #30393:
URL: https://github.com/apache/spark/pull/30393#discussion_r525846522




##########
File path: python/pyspark/sql/pandas/types.py
##########
@@ -306,3 +322,23 @@ def _check_series_convert_timestamps_tz_local(s, timezone):
         `pandas.Series` where if it is a timestamp, has been converted to 
tz-naive
     """
     return _check_series_convert_timestamps_localize(s, timezone, None)
+
+
+def _convert_map_items_to_dict(s):

Review comment:
       Note: these conversion functions are because pyarrow expects map items 
as a list of (key, value) pairs, and has this format when converting to Pandas 
also. The reason is that the arrow spec could allow for duplicate key values in 
a row, and doesn't say how these should be handled exactly. So by having these 
conversions, we match the non-arrow behavior for maps, with a dictionary as 
input/output.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] BryanCutler commented on a change in pull request #30393: [SPARK-24554][PYTHON][SQL] Add MapType support for PySpark with Arrow

Reply via email to