Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20163#discussion_r160017637
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -26,6 +26,28 @@
     
     
     def _wrap_function(sc, func, returnType):
    +    def coerce_to_str(v):
    +        import datetime
    +        if type(v) == datetime.date or type(v) == datetime.datetime:
    +            return str(v)
    +        else:
    +            return v
    +
    +    # Pyrolite will unpickle both Python datetime.date and 
datetime.datetime objects
    +    # into java.util.Calendar objects, so the type information on the 
Python side is lost.
    +    # This is problematic when Spark SQL needs to cast such objects into 
Spark SQL string type,
    +    # because the format of the string should be different, depending on 
the type of the input
    +    # object. So for those two specific types we eagerly convert them to 
string here, where the
    +    # Python type information is still intact.
    +    if returnType == StringType():
    --- End diff --
    
    I have a question, why we need to handle this type conversion? If we expect 
correct string format, isn't it more reasonable to convert the date/datetime to 
strings in the udf, instead of adding this conversion implicitly?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to