Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/21732
  
    Yeah. The encoding rules regarding `Option` are:
    
    1. For non `Product` types, no change is made.
    ```
    Option[Int] in normal encoder -> a nullable int column
    Option[Int] in agg encoder -> a nullable int column
    Option[Int] in tupled encoder -> a nullable int column
    ```
    
    E.g.,
    
    ```scala
    val df = Seq(Some(1), Some(2), None).toDF()
    df.printSchema
    
    root
     |-- value: integer (nullable = true)
    
    val df2 = Seq((1, Some(1)), (2, Some(2)), (3, None)).toDF()
    df2.printSchema
    
    root
     |-- _1: integer (nullable = false)
     |-- _2: integer (nullable = true)
    
    ```
    
    
    2. For `Product` types, we don't support it at top level and aggregator for 
now. After this change, we can support it in normal encoder and aggregator.
    
    ```
    Option[Product] in normal encoder => a nullable struct column
    ```
    
    E.g.,
    
    ```scala
    val df3 = Seq(Some(1 -> "a"), Some(2 -> "b"), None).toDF()
    df3.printSchema
    
    root
     |-- value: struct (nullable = true)
     |    |-- _1: integer (nullable = false)
     |    |-- _2: string (nullable = true)
    ```
    
    ```
    Option[Product] in agg encoder => a nullable struct column // topLevel = 
false
    Option[Product] in agg encoder => a nullable struct column nested in a 
struct column // topLevel = true
    ```
    
    ```scala
    val df4 = Seq(
        OptionBooleanIntData("bob", Some((true, 1))),
        OptionBooleanIntData("bob", Some((false, 2))),
        OptionBooleanIntData("bob", None)).toDF()
    
    val group = df4
        .groupBy("name")
        .agg(OptionBooleanIntAggregator("isGood").toColumn.alias("isGood"))
    group.printSchema
    
    // topLevel = false
    root                                                  
     |-- name: string (nullable = true)                          
     |-- isGood: struct (nullable = true)
     |    |-- _1: boolean (nullable = false)
     |    |-- _2: integer (nullable = false)
    
    // topLevel = true
    root                    
     |-- name: string (nullable = true)                   
     |-- isGood: struct (nullable = true)                        
     |    |-- value: struct (nullable = true)
     |    |    |-- _1: boolean (nullable = false)
     |    |    |-- _2: integer (nullable = false)
    ```
    
    3. For `Option[Product]` in tupled encoder, no change is made too:
    
    ```scala
    val df5 = Seq((1, Some(1 -> "a")), (2, Some(2 -> "b")), (3, None)).toDF()
    df5.printSchema
    
    root
     |-- _1: integer (nullable = false)
     |-- _2: struct (nullable = true)
     |    |-- _1: integer (nullable = false)
     |    |-- _2: string (nullable = true)
    
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to