GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/20246
[SPARK-23054][SQL] Fix incorrect results of casting UserDefinedType to
String
## What changes were proposed in this pull request?
This pr fixed the issue when casting `UserDefinedType`s into strings;
```
>>> from pyspark.ml.classification import MultilayerPerceptronClassifier
>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([(0.0, Vectors.dense([0.0, 0.0])), (1.0,
Vectors.dense([0.0, 1.0]))], ["label", "features"])
>>> df.selectExpr("CAST(features AS STRING)").show(truncate = False)
+-------------------------------------------+
|features |
+-------------------------------------------+
|[6,1,0,0,2800000020,2,0,0,0] |
|[6,1,0,0,2800000020,2,0,0,3ff0000000000000]|
+-------------------------------------------+
```
This pr modified the result into;
```
+---------+
|features |
+---------+
|[0.0,0.0]|
|[0.0,1.0]|
+---------+
```
## How was this patch tested?
Added tests in `UserDefinedTypeSuite `.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-23054
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20246.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20246
----
commit 137d85f23fa8d0e45144db89666f4c9083d14100
Author: Takeshi Yamamuro <yamamuro@...>
Date: 2018-01-12T02:45:42Z
Cast user-defined data into strings
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]