zhengruifeng commented on PR #38979: URL: https://github.com/apache/spark/pull/38979#issuecomment-1343806000
difference in casting: this PR leverages `Dataset.to(schema)` to cast datatypes, which is very different from the pyspark's approach which relies on [the `_acceptable_types` list](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1755-L1775) `createDataFrame([[1, 2, 3, 4]], schema="col1 int, col2 int, col3 int, col4 double")` runs successfully in Connect, while it fails in PySpark: ``` Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/connect/test_connect_basic.py", line 299, in test_with_local_list self.spark.createDataFrame([[1, 2, 3, 4]], schema="col1 int, col2 int, col3 int, col4 double") File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 1164, in createDataFrame return self._create_dataframe( File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 1206, in _create_dataframe rdd, struct = self._createFromLocal(map(prepare, data), schema) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 850, in _createFromLocal data = list(data) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 1180, in prepare verify_func(obj) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 2003, in verify verify_value(obj) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 1981, in verify_struct verifier(v) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 2003, in verify verify_value(obj) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 1997, in verify_default verify_acceptable_types(obj) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 1873, in verify_acceptable_types raise TypeError( TypeError: field col4: DoubleType() can not accept object 4 in type <class 'int'> ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
