[GitHub] spark pull request #21840: Initial implementation of copy function

ssimeonov Sun, 22 Jul 2018 09:18:35 -0700

Github user ssimeonov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21840#discussion_r204245778
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
    @@ -1234,6 +1234,8 @@ class Column(val expr: Expression) extends Logging {
        */
       def over(): Column = over(Window.spec)
     
    +  def copy(field: String, value: Column): Column = 
withExpr(StructCopy(expr, field, value.expr))
    --- End diff --
    
    Some things to consider about the API:
    
    - How is custom metadata associated with the updated field?
    - How can a field be deleted?
    - How can a field be added?
        - When a field is added, where does it go in the schema? The only 
logical place is at the end but that may not be what's desired in some cases.
    
    Simply for discussion purposes (overloaded methods are not shown):
    
    ```scala
    class Column(val expr: Expression) extends Logging {
    
      // ...
    
      // matches Dataset.schema semantics; errors on non-struct columns
      def schema: StructType
    
      // matches Dataset.select() semantics, errors on non-struct columns
      // '* support allows multiple new fields to be added easily, saving 
cumbersome repeated withColumn() calls
      def select(cols: Column*): Column
    
      // matches Dataset.withColumn() semantics of add or replace
      def withColumn(colName: String, col: Column): Column
    
      // matches Dataset.drop() semantics
      def drop(colName: String): Column
    
    }
    ```
    
    The benefit of the above API is that it unifies manipulating top-level & 
nested columns, which I would argue is very desirable. The addition of `schema` 
and `select()` allows for nested field reordering, casting, etc., which is 
important in data exchange scenarios where field position matters.
    
    /cc @rxin



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21840: Initial implementation of copy function

Reply via email to