HyukjinKwon opened a new pull request, #46547:
URL: https://github.com/apache/spark/pull/46547

   ### What changes were proposed in this pull request?
   
   This is similar with https://github.com/apache/spark/pull/36545. This PR 
proposes to infer the map types from all pairs instead of the first pair.
   
   ### Why are the changes needed?
   
   To have the consistent behaivour. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. See below
   
   **Without Spark Connect:**
   
   ```python
   >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": 
"A"}}]).collect()
   [Row(outer={'name': 'A', 'payment': '200.5'})]
   >>> 
spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", 
True)
   >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": 
"A"}}]).collect()
   [Row(outer={'name': None, 'payment': 200.5})]
   ```
   
   **With Spark Conenct:**
   
   ```python
   >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": 
"A"}}]).collect()
   [Row(outer={'payment': '200.5', 'name': 'A'})]
   >>> 
spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", 
True)
   >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": 
"A"}}]).collect()
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/session.py",
 line 635, in createDataFrame
       _table = LocalDataToArrowConversion.convert(_data, _schema)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/conversion.py",
 line 378, in convert
       return pa.Table.from_arrays(pylist, schema=pa_schema)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "pyarrow/table.pxi", line 3974, in pyarrow.lib.Table.from_arrays
     File "pyarrow/table.pxi", line 1464, in pyarrow.lib._sanitize_arrays
     File "pyarrow/array.pxi", line 373, in pyarrow.lib.asarray
     File "pyarrow/array.pxi", line 343, in pyarrow.lib.array
     File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
     File "pyarrow/error.pxi", line 154, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Could not convert 'A' with type str: tried to 
convert to double
   ```
   
   ### How was this patch tested?
   
   Unittests added
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to