[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

icexelloss Fri, 05 Jan 2018 07:32:49 -0800

Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20163
  
    I ran some experiments:
    ```
    py_date = udf(datetime.date, DateType())
    py_timestamp = udf(datetime.datetime, TimestampType())
    ```
    This works correctly
    ```
    spark.range(1).select(py_date(lit(2017), lit(10), lit(30))).show()
    spark.range(1).select(py_timestamp(lit(2017), lit(10), lit(30))).show()
    ```
    Result:
    ```
    +------------------+
    |date(2017, 10, 30)|
    +------------------+
    |        2017-10-30|
    +------------------+
    
    +----------------------+
    |datetime(2017, 10, 30)|
    +----------------------+
    |   2017-10-30 00:00:00|
    +----------------------+
    ```
    
    The change that the PR proposes seem to be coercing python 
`datetime.datetime` and `datetime.date` to the python datetime string 
representation rather the java one. We could call function `str` on the return 
value of the python udf if it's a String type to get the python string 
representation, but this probably needs some microbenchmark to see the 
performance implication.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20163: [SPARK-22966][PySpark] Spark SQL should handle Python UD...

Reply via email to