[GitHub] [spark] itholic commented on a change in pull request #33214: [SPARK-35929][PYTHON] Support to infer nested dict as a struct when creating a DataFrame

GitBox Tue, 06 Jul 2021 19:21:21 -0700


itholic commented on a change in pull request #33214:
URL: https://github.com/apache/spark/pull/33214#discussion_r664993005




##########
File path: python/pyspark/sql/types.py
##########
@@ -1020,14 +1020,22 @@ def _infer_type(obj):
         return dataType()
 
     if isinstance(obj, dict):
-        for key, value in obj.items():
-            if key is not None and value is not None:
-                return MapType(_infer_type(key), _infer_type(value), True)
-        return MapType(NullType(), NullType(), True)
+        if infer_dict_as_struct:
+            struct = StructType()
+            for key, value in obj.items():
+                if key is not None and value is not None:
+                    struct.add(key, _infer_type(value, infer_dict_as_struct), 
True)
+            return struct
+        else:
+            for key, value in obj.items():
+                if key is not None and value is not None:
+                    return MapType(_infer_type(key, infer_dict_as_struct),
+                                   _infer_type(value, infer_dict_as_struct), 
True)
+            return MapType(NullType(), NullType(), True)

Review comment:
       Thanks for the comment! :)
   Actually PySpark merging one only handles null cases only (that's called out 
here) at 
   
https://github.com/apache/spark/blob/52a9a70fa3e5b720b41e2ff4e9177a5d201b471f/python/pyspark/sql/types.py#L1096-L1133
   
   It actually fails for different types (unlike JSON or CSV type inference).
   I am not sure what's the ideal behavior for the null case pointed out here 
though.
   Let me separate it from this PR in any event if you're fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] itholic commented on a change in pull request #33214: [SPARK-35929][PYTHON] Support to infer nested dict as a struct when creating a DataFrame

Reply via email to