zhengruifeng commented on PR #38979:
URL: https://github.com/apache/spark/pull/38979#issuecomment-1343806000

   difference in casting:
   this PR leverages `Dataset.to(schema)` to cast datatypes, which is very 
different from the pyspark's approach which relies on [the `_acceptable_types` 
list](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1755-L1775)
   
   `createDataFrame([[1, 2, 3, 4]], schema="col1 int, col2 int, col3 int, col4 
double")` runs successfully in Connect, while it fails in PySpark:
   
   
   ```
   Traceback (most recent call last):
     File 
"/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/connect/test_connect_basic.py",
 line 299, in test_with_local_list
       self.spark.createDataFrame([[1, 2, 3, 4]], schema="col1 int, col2 int, 
col3 int, col4 double")
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
1164, in createDataFrame
       return self._create_dataframe(
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
1206, in _create_dataframe
       rdd, struct = self._createFromLocal(map(prepare, data), schema)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
850, in _createFromLocal
       data = list(data)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/session.py", line 
1180, in prepare
       verify_func(obj)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 
2003, in verify
       verify_value(obj)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 
1981, in verify_struct
       verifier(v)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 
2003, in verify
       verify_value(obj)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 
1997, in verify_default
       verify_acceptable_types(obj)
     File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/types.py", line 
1873, in verify_acceptable_types
       raise TypeError(
   TypeError: field col4: DoubleType() can not accept object 4 in type <class 
'int'>
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to