[GitHub] spark pull request #20214: [SPARK-23023][SQL] Cast field data to strings in ...

maropu Thu, 11 Jan 2018 19:02:01 -0800

Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20214#discussion_r161131086
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -237,13 +237,18 @@ class Dataset[T] private[sql](
       private[sql] def showString(
           _numRows: Int, truncate: Int = 20, vertical: Boolean = false): 
String = {
         val numRows = _numRows.max(0).min(Int.MaxValue - 1)
    -    val takeResult = toDF().take(numRows + 1)
    +    val newDf = toDF()
    +    val castExprs = newDf.schema.map { f => f.dataType match {
    +      // Since binary types in top-level schema fields have a specific 
format to print,
    +      // so we do not cast them to strings here.
    +      case BinaryType => s"`${f.name}`"
    +      case _: UserDefinedType[_] => s"`${f.name}`"
    --- End diff --
    
    I added this entry for passing the existing tests in pyspark though, we 
still hit wired behaviours when casting user-defined types into strings;
    ```
    >>> from pyspark.ml.classification import MultilayerPerceptronClassifier
    >>> from pyspark.ml.linalg import Vectors
    >>> df = spark.createDataFrame([(0.0, Vectors.dense([0.0, 0.0])), (1.0, 
Vectors.dense([0.0, 1.0]))], ["label", "features"])
    >>> df.selectExpr("CAST(features AS STRING)").show(truncate = False)
    +-------------------------------------------+
    |features                                   |
    +-------------------------------------------+
    |[6,1,0,0,2800000020,2,0,0,0]               |
    |[6,1,0,0,2800000020,2,0,0,3ff0000000000000]|
    +-------------------------------------------+
    ```
    This cast shows the internal data structure of user-define types. 
    I just tried to fix this though, I think we easily can't because, in 
codegen path, spark can't tell a way to convert a given internal data into an 
user-defined string;
    
https://github.com/apache/spark/compare/master...maropu:CastUDTtoString#diff-258b71121d8d168e4d53cb5b6dc53ffeR844
    
    WDYT? cc: @cloud-fan @ueshin



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20214: [SPARK-23023][SQL] Cast field data to strings in ...

Reply via email to