physinet commented on code in PR #36545:
URL: https://github.com/apache/spark/pull/36545#discussion_r874983582


##########
python/pyspark/sql/session.py:
##########
@@ -570,10 +570,20 @@ def _inferSchemaFromList(
         if not data:
             raise ValueError("can not infer schema from empty dataset")
         infer_dict_as_struct = self._jconf.inferDictAsStruct()
+        infer_array_from_first_element = 
self._jconf.legacyInferArrayTypeFromFirstElement()

Review Comment:
   Previously it was allowed to have mixed types in a python list, as long as 
the types could be cast to the type enforced by the schema inferred from the 
first element:
   ```python
   >>> df = spark.createDataFrame([{"a": ["1", 2]}])
   >>> df.show()
   +------+
   |     a|
   +------+
   |[1, 2]|
   +------+
   >>> df.schema
   StructType(List(StructField(a,ArrayType(StringType,true),true)))
   ```
   With this change, creating the DataFrame causes an error:
   ```python
   >>> df = spark.createDataFrame([{"a": ["1", 2]}])
   ...
   TypeError: Unable to infer the type of the field a.
   ```
   Because of this change, I think it makes sense to have the behavior 
configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to