[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

GitBox Tue, 17 May 2022 10:51:30 -0700


srowen commented on code in PR #36545:
URL: https://github.com/apache/spark/pull/36545#discussion_r875104045



##########
python/pyspark/sql/session.py:
##########
@@ -570,10 +570,20 @@ def _inferSchemaFromList(
         if not data:
             raise ValueError("can not infer schema from empty dataset")
         infer_dict_as_struct = self._jconf.inferDictAsStruct()
+        infer_array_from_first_element = 
self._jconf.legacyInferArrayTypeFromFirstElement()

Review Comment:
   I see it fixes some cases, by somewhat 'accidentally' correctly inferring a 
widening type. But it does cause some common cases to start failing, when they 
'accidentally' work now (your example in this thread). It feels like the 
half-measure isn't worth it, as it needs a whole new flag. Is it hard to just 
implement logic to find the closest type for everything? that code surely 
already exists in the code base



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

Reply via email to