Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22655#discussion_r223218561
--- Diff: python/pyspark/sql/functions.py ---
@@ -2733,6 +2733,33 @@ def udf(f=None, returnType=StringType()):
| 8| JOHN DOE| 22|
+----------+--------------+------------+
"""
+
+ # The following table shows most of Python data and SQL type
conversions in normal UDFs that
+ # are not yet visible to the user. Some of behaviors are buggy and
might be changed in the near
+ # future. The table might have to be eventually documented externally.
+ # Please see SPARK-25666's PR to see the codes in order to generate
the table below.
+ #
+ #
+-----------------------------+--------------+----------+------+-------+------+----------+--------------------+-----------------------------+----------+----------------------+---------+--------------------+--------------+----------+--------------+-------------+-------------+
# noqa
+ # |SQL Type \ Python
Value(Type)|None(NoneType)|True(bool)|1(int)|1(long)|a(str)|a(unicode)|
1970-01-01(date)|1970-01-01 00:00:00(datetime)|1.0(float)|array('i',
[1])(array)|[1](list)| (1,)(tuple)|ABC(bytearray)|1(Decimal)|{'a':
1}(dict)|Row(a=1)(Row)|Row(a=1)(Row)| # noqa
+ #
+-----------------------------+--------------+----------+------+-------+------+----------+--------------------+-----------------------------+----------+----------------------+---------+--------------------+--------------+----------+--------------+-------------+-------------+
# noqa
+ # | null| None| None| None|
None| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | boolean| None| True| None|
None| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | tinyint| None| None| 1|
1| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | smallint| None| None| 1|
1| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | int| None| None| 1|
1| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | bigint| None| None| 1|
1| None| None| None| None|
None| None| None| None| None|
None| None| X| X| # noqa
+ # | string| None| true| 1|
1| a| a|java.util.Gregori...| java.util.Gregori...|
1.0| [I@7f1970e1| [1]|[Ljava.lang.Objec...| [B@284838a9|
1| {a=1}| X| X| # noqa
--- End diff --
Hmmmmm .. I see the type is not clear here. Let me think about this a bit
more.
`[B@284838a9` is a quite buggy behaviour - we should fix. So I was thinking
of documenting internally since we already spent much time to figure out how it
works for each case individually (at
https://github.com/apache/spark/pull/20163).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]