Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21944#discussion_r207182155
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1367,6 +1367,22 @@ class Dataset[T] private[sql](
         }: _*)
       }
     
    +  /**
    +   * Casts all the values of the current Dataset following the types of a 
specific StructType.
    +   * This method works also with nested structTypes.
    +   *
    +   *  @group typedrel
    +   *  @since 2.4.0
    +   */
    +  def castBySchema(schema: StructType): DataFrame = {
    +    
assert(schema.fields.map(_.name).toList.sameElements(this.schema.fields.map(_.name).toList),
    +      "schema should have the same fields as the original schema")
    +
    +    selectExpr(schema.map(
    --- End diff --
    
    most of one liner APIs should have been considered more before being added 
- this doesn't mean we are okay with one liner API.
    
    You can just define it in application side and use it. Pretty easy and 
simple. You can answer some questions about casting. I don't think it's worth 
adding it. There are already too many APIs open.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to