itholic commented on a change in pull request #33214:
URL: https://github.com/apache/spark/pull/33214#discussion_r664993005
##########
File path: python/pyspark/sql/types.py
##########
@@ -1020,14 +1020,22 @@ def _infer_type(obj):
return dataType()
if isinstance(obj, dict):
- for key, value in obj.items():
- if key is not None and value is not None:
- return MapType(_infer_type(key), _infer_type(value), True)
- return MapType(NullType(), NullType(), True)
+ if infer_dict_as_struct:
+ struct = StructType()
+ for key, value in obj.items():
+ if key is not None and value is not None:
+ struct.add(key, _infer_type(value, infer_dict_as_struct),
True)
+ return struct
+ else:
+ for key, value in obj.items():
+ if key is not None and value is not None:
+ return MapType(_infer_type(key, infer_dict_as_struct),
+ _infer_type(value, infer_dict_as_struct),
True)
+ return MapType(NullType(), NullType(), True)
Review comment:
Thanks for the comment! :)
Actually PySpark merging one only handles null cases only (that's called out
here) at
https://github.com/apache/spark/blob/52a9a70fa3e5b720b41e2ff4e9177a5d201b471f/python/pyspark/sql/types.py#L1096-L1133
It actually fails for different types (unlike JSON or CSV type inference).
I am not sure what's the ideal behavior for the null case pointed out here
though.
Let me separate it from this PR in any event if you're fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]