nvander1 opened a new pull request #24232: [SPARK-27297] [SQL] Add higher order 
functions to scala API
URL: https://github.com/apache/spark/pull/24232
 
 
   ## What changes were proposed in this pull request?
   
   There is currently no existing Scala API equivalent for the higher order 
functions introduced in Spark 2.4.0.
    * transform
    * aggregate
    * filter
    * exists
    * zip_with
    * map_zip_with
    * map_filter
    * transform_values
    * transform_keys
   
   Equivalent column based functions should be added to the Scala API for 
org.apache.spark.sql.functions with the following signatures:
   
    
   ```scala
   def transform(column: Column, f: Column => Column): Column = ???
   
   def transform(column: Column, f: (Column, Column) => Column): Column = ???
   
   def exists(column: Column, f: Column => Column): Column = ???
   
   def filter(column: Column, f: Column => Column): Column = ???
   
   def aggregate(
   expr: Column,
   zero: Column,
   merge: (Column, Column) => Column,
   finish: Column => Column): Column = ???
   
   def aggregate(
   expr: Column,
   zero: Column,
   merge: (Column, Column) => Column): Column = ???
   
   def zip_with(
   left: Column,
   right: Column,
   f: (Column, Column) => Column): Column = ???
   
   def transform_keys(expr: Column, f: (Column, Column) => Column): Column = ???
   
   def transform_values(expr: Column, f: (Column, Column) => Column): Column = 
???
   
   def map_filter(expr: Column, f: (Column, Column) => Column): Column = ???
   
   def map_zip_with(left: Column, right: Column, f: (Column, Column, Column) => 
Column): Column = ???
   ```
   
   ## How was this patch tested?
   
   I've mimicked the existing tests for the higher order functions in 
`org.apache.spark.sql.DataFrameFunctionsSuite` that use `expr` to test the 
higher order functions.
   
   As an example of an existing test: 
   ```scala
     test("map_zip_with function - map of primitive types") {
       val df = Seq(
         (Map(8 -> 6L, 3 -> 5L, 6 -> 2L), Map[Integer, Integer]((6, 4), (8, 2), 
(3, 2))),
         (Map(10 -> 6L, 8 -> 3L), Map[Integer, Integer]((8, 4), (4, null))),
         (Map.empty[Int, Long], Map[Integer, Integer]((5, 1))),
         (Map(5 -> 1L), null)
       ).toDF("m1", "m2")
   
       checkAnswer(df.selectExpr("map_zip_with(m1, m2, (k, v1, v2) -> k == v1 + 
v2)"),
         Seq(
           Row(Map(8 -> true, 3 -> false, 6 -> true)),
           Row(Map(10 -> null, 8 -> false, 4 -> null)),
           Row(Map(5 -> null)),
           Row(null)))
   }
   ```
   
   I've added this test that performs the same logic, but with the new column 
based API I've added.
   ```scala
       checkAnswer(df.select(map_zip_with(df("m1"), df("m2"), (k, v1, v2) => k 
=== v1 + v2)),
         Seq(
           Row(Map(8 -> true, 3 -> false, 6 -> true)),
           Row(Map(10 -> null, 8 -> false, 4 -> null)),
           Row(Map(5 -> null)),
           Row(null)))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to