HyukjinKwon opened a new pull request, #46547: URL: https://github.com/apache/spark/pull/46547
### What changes were proposed in this pull request? This is similar with https://github.com/apache/spark/pull/36545. This PR proposes to infer the map types from all pairs instead of the first pair. ### Why are the changes needed? To have the consistent behaivour. ### Does this PR introduce _any_ user-facing change? Yes. See below **Without Spark Connect:** ```python >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'name': 'A', 'payment': '200.5'})] >>> spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", True) >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'name': None, 'payment': 200.5})] ``` **With Spark Conenct:** ```python >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() [Row(outer={'payment': '200.5', 'name': 'A'})] >>> spark.conf.set("spark.sql.pyspark.legacy.inferMapTypeFromFirstPair.enabled", True) >>> spark.createDataFrame([{"outer": {"payment": 200.5, "name": "A"}}]).collect() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/session.py", line 635, in createDataFrame _table = LocalDataToArrowConversion.convert(_data, _schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/hyukjin.kwon/workspace/forked/spark/python/pyspark/sql/connect/conversion.py", line 378, in convert return pa.Table.from_arrays(pylist, schema=pa_schema) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pyarrow/table.pxi", line 3974, in pyarrow.lib.Table.from_arrays File "pyarrow/table.pxi", line 1464, in pyarrow.lib._sanitize_arrays File "pyarrow/array.pxi", line 373, in pyarrow.lib.asarray File "pyarrow/array.pxi", line 343, in pyarrow.lib.array File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Could not convert 'A' with type str: tried to convert to double ``` ### How was this patch tested? Unittests added ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
