Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20214#discussion_r161131086
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -237,13 +237,18 @@ class Dataset[T] private[sql](
private[sql] def showString(
_numRows: Int, truncate: Int = 20, vertical: Boolean = false):
String = {
val numRows = _numRows.max(0).min(Int.MaxValue - 1)
- val takeResult = toDF().take(numRows + 1)
+ val newDf = toDF()
+ val castExprs = newDf.schema.map { f => f.dataType match {
+ // Since binary types in top-level schema fields have a specific
format to print,
+ // so we do not cast them to strings here.
+ case BinaryType => s"`${f.name}`"
+ case _: UserDefinedType[_] => s"`${f.name}`"
--- End diff --
I added this entry for passing the existing tests in pyspark though, we
still hit wired behaviours when casting user-defined types into strings;
```
>>> from pyspark.ml.classification import MultilayerPerceptronClassifier
>>> from pyspark.ml.linalg import Vectors
>>> df = spark.createDataFrame([(0.0, Vectors.dense([0.0, 0.0])), (1.0,
Vectors.dense([0.0, 1.0]))], ["label", "features"])
>>> df.selectExpr("CAST(features AS STRING)").show(truncate = False)
+-------------------------------------------+
|features |
+-------------------------------------------+
|[6,1,0,0,2800000020,2,0,0,0] |
|[6,1,0,0,2800000020,2,0,0,3ff0000000000000]|
+-------------------------------------------+
```
This cast shows the internal data structure of user-define types.
I just tried to fix this though, I think we easily can't because, in
codegen path, spark can't tell a way to convert a given internal data into an
user-defined string;
https://github.com/apache/spark/compare/master...maropu:CastUDTtoString#diff-258b71121d8d168e4d53cb5b6dc53ffeR844
WDYT? cc: @cloud-fan @ueshin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]