Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20163#discussion_r160017637
--- Diff: python/pyspark/sql/udf.py ---
@@ -26,6 +26,28 @@
def _wrap_function(sc, func, returnType):
+ def coerce_to_str(v):
+ import datetime
+ if type(v) == datetime.date or type(v) == datetime.datetime:
+ return str(v)
+ else:
+ return v
+
+ # Pyrolite will unpickle both Python datetime.date and
datetime.datetime objects
+ # into java.util.Calendar objects, so the type information on the
Python side is lost.
+ # This is problematic when Spark SQL needs to cast such objects into
Spark SQL string type,
+ # because the format of the string should be different, depending on
the type of the input
+ # object. So for those two specific types we eagerly convert them to
string here, where the
+ # Python type information is still intact.
+ if returnType == StringType():
--- End diff --
I have a question, why we need to handle this type conversion? If we expect
correct string format, isn't it more reasonable to convert the date/datetime to
strings in the udf, instead of adding this conversion implicitly?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]