Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20163
I ran some experiments:
```
py_date = udf(datetime.date, DateType())
py_timestamp = udf(datetime.datetime, TimestampType())
```
This works correctly
```
spark.range(1).select(py_date(lit(2017), lit(10), lit(30))).show()
spark.range(1).select(py_timestamp(lit(2017), lit(10), lit(30))).show()
```
Result:
```
+------------------+
|date(2017, 10, 30)|
+------------------+
| 2017-10-30|
+------------------+
+----------------------+
|datetime(2017, 10, 30)|
+----------------------+
| 2017-10-30 00:00:00|
+----------------------+
```
The change that the PR proposes seem to be coercing python
`datetime.datetime` and `datetime.date` to the python datetime string
representation rather the java one. We could call function `str` on the return
value of the python udf if it's a String type to get the python string
representation, but this probably needs some microbenchmark to see the
performance implication.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]