[spark] branch branch-3.0 updated: [SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference

srowen Sun, 12 Apr 2020 21:40:45 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 8f7a1a9  [SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference
8f7a1a9 is described below

commit 8f7a1a9ce1edc2c5ad64fd65f8ee98480bc5829e
Author: Huaxin Gao <huax...@us.ibm.com>
AuthorDate: Sun Apr 12 23:38:17 2020 -0500

    [SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference
    
    ### What changes were proposed in this pull request?
    Document UDF in SQL Reference
    
    ### Why are the changes needed?
    To make SQL Reference complete.
    
    ### Does this PR introduce any user-facing change?
    Yes. Here are the new pages:
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 06 42 PM" 
src="https://user-images.githubusercontent.com/13592258/78950977-585dc200-7a85-11ea-875c-ce14c3795e0f.png";>
    
    <img width="1049" alt="Screen Shot 2020-04-09 at 5 07 06 PM" 
src="https://user-images.githubusercontent.com/13592258/78950979-5b58b280-7a85-11ea-81f3-bd5d91bd07e3.png";>
    
    <img width="1049" alt="Screen Shot 2020-04-09 at 5 07 26 PM" 
src="https://user-images.githubusercontent.com/13592258/78950985-5e53a300-7a85-11ea-86be-f63152c1501b.png";>
    
    <img width="1051" alt="Screen Shot 2020-04-09 at 5 07 54 PM" 
src="https://user-images.githubusercontent.com/13592258/78950991-63185700-7a85-11ea-9379-8da46cfc434c.png";>
    
    <img width="1060" alt="Screen Shot 2020-04-09 at 5 08 17 PM" 
src="https://user-images.githubusercontent.com/13592258/78950994-657ab100-7a85-11ea-8b34-d2c87f94b03b.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 09 27 PM" 
src="https://user-images.githubusercontent.com/13592258/78951001-6875a180-7a85-11ea-874e-8abd14a3d3d3.png";>
    
    <img width="1060" alt="Screen Shot 2020-04-09 at 5 10 00 PM" 
src="https://user-images.githubusercontent.com/13592258/78951005-6f041900-7a85-11ea-9e57-520eb8db59ec.png";>
    
    <img width="1049" alt="Screen Shot 2020-04-09 at 5 11 10 PM" 
src="https://user-images.githubusercontent.com/13592258/78951014-73303680-7a85-11ea-93ab-32d68d2e2d59.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 11 41 PM" 
src="https://user-images.githubusercontent.com/13592258/78951019-75929080-7a85-11ea-9d3b-600e8e157c05.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 16 22 PM" 
src="https://user-images.githubusercontent.com/13592258/78951137-dfab3580-7a85-11ea-8512-c6b660aa271e.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 22 15 PM" 
src="https://user-images.githubusercontent.com/13592258/78951466-22214200-7a87-11ea-93dd-6e36492421f1.png";>
    
    <img width="1049" alt="Screen Shot 2020-04-09 at 5 22 46 PM" 
src="https://user-images.githubusercontent.com/13592258/78951469-24839c00-7a87-11ea-93a9-fe30d689adbd.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 23 08 PM" 
src="https://user-images.githubusercontent.com/13592258/78951472-26e5f600-7a87-11ea-84db-087a3528aa53.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 23 34 PM" 
src="https://user-images.githubusercontent.com/13592258/78951474-29e0e680-7a87-11ea-8be4-2a5be1bc3788.png";>
    
    <img width="1049" alt="Screen Shot 2020-04-09 at 5 23 57 PM" 
src="https://user-images.githubusercontent.com/13592258/78951481-2cdbd700-7a87-11ea-8894-0a39abf54a3b.png";>
    
    <img width="1050" alt="Screen Shot 2020-04-09 at 5 24 15 PM" 
src="https://user-images.githubusercontent.com/13592258/78951483-2f3e3100-7a87-11ea-8845-ffebf89d7898.png";>
    
    ### How was this patch tested?
    Manually build and check
    
    Closes #28087 from huaxingao/udf.
    
    Authored-by: Huaxin Gao <huax...@us.ibm.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
    (cherry picked from commit 3bbd80dbc3c30d05aa01d8369e1ff90a41850d26)
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 docs/sql-getting-started.md                        | 29 +------
 docs/sql-ref-functions-udf-aggregate.md            | 89 ++++++++++++++++++--
 docs/sql-ref-functions-udf-scalar.md               | 49 +++++++++--
 .../spark/examples/sql/JavaUserDefinedScalar.java  | 98 ++++++++++++++++++++++
 .../spark/examples/sql/UserDefinedScalar.scala     | 80 ++++++++++++++++++
 5 files changed, 309 insertions(+), 36 deletions(-)

diff --git a/docs/sql-getting-started.md b/docs/sql-getting-started.md
index 9df0f76..fc0a5d0 100644
--- a/docs/sql-getting-started.md
+++ b/docs/sql-getting-started.md
@@ -356,31 +356,8 @@ aggregations such as `count()`, `countDistinct()`, 
`avg()`, `max()`, `min()`, et
 While those functions are designed for DataFrames, Spark SQL also has 
type-safe versions for some of them in
 [Scala](api/scala/org/apache/spark/sql/expressions/scalalang/typed$.html) and
 [Java](api/java/org/apache/spark/sql/expressions/javalang/typed.html) to work 
with strongly typed Datasets.
-Moreover, users are not limited to the predefined aggregate functions and can 
create their own.
+Moreover, users are not limited to the predefined aggregate functions and can 
create their own. For more details
+about user defined aggregate functions, please refer to the documentation of
+[User Defined Aggregate Functions](sql-ref-functions-udf-aggregate.html).
 
-### Type-Safe User-Defined Aggregate Functions
 
-User-defined aggregations for strongly typed Datasets revolve around the 
[Aggregator](api/scala/org/apache/spark/sql/expressions/Aggregator.html) 
abstract class.
-For example, a type-safe user-defined average can look like:
-
-<div class="codetabs">
-<div data-lang="scala"  markdown="1">
-{% include_example typed_custom_aggregation 
scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala%}
-</div>
-<div data-lang="java"  markdown="1">
-{% include_example typed_custom_aggregation 
java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java%}
-</div>
-</div>
-
-### Untyped User-Defined Aggregate Functions
-Typed aggregations, as described above, may also be registered as untyped 
aggregating UDFs for use with DataFrames.
-For example, a user-defined average for untyped DataFrames can look like:
-
-<div class="codetabs">
-<div data-lang="scala"  markdown="1">
-{% include_example untyped_custom_aggregation 
scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala%}
-</div>
-<div data-lang="java"  markdown="1">
-{% include_example untyped_custom_aggregation 
java/org/apache/spark/examples/sql/JavaUserDefinedUntypedAggregation.java%}
-</div>
-</div>
diff --git a/docs/sql-ref-functions-udf-aggregate.md 
b/docs/sql-ref-functions-udf-aggregate.md
index 49c7b58..3d8a64e 100644
--- a/docs/sql-ref-functions-udf-aggregate.md
+++ b/docs/sql-ref-functions-udf-aggregate.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: User defined Aggregate Functions (UDAF)
-displayTitle: User defined Aggregate Functions (UDAF)
+title: User Defined Aggregate Functions (UDAFs)
+displayTitle: User Defined Aggregate Functions (UDAFs)
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,4 +19,83 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+User-Defined Aggregate Functions (UDAFs) are user-programmable routines that 
act on multiple rows at once and return a single aggregated value as a result. 
This documentation lists the classes that are required for creating and 
registering UDAFs. It also contains examples that demonstrate how to define and 
register UDAFs in Scala and invoke them in Spark SQL.
+
+### Aggregator[-IN, BUF, OUT]
+
+A base class for user-defined aggregations, which can be used in Dataset 
operations to take all of the elements of a group and reduce them to a single 
value.
+
+ * IN - The input type for the aggregation.
+ * BUF - The type of the intermediate value of the reduction.
+ * OUT - The type of the final output result.
+
+<dl>
+  <dt><code><em>bufferEncoder: Encoder[BUF]</em></code></dt>
+  <dd>
+    Specifies the Encoder for the intermediate value type.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>finish(reduction: BUF): OUT</em></code></dt>
+  <dd>
+    Transform the output of the reduction.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>merge(b1: BUF, b2: BUF): BUF</em></code></dt>
+  <dd>
+    Merge two intermediate values.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>outputEncoder: Encoder[OUT]</em></code></dt>
+  <dd>
+    Specifies the Encoder for the final output value type.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>reduce(b: BUF, a: IN): BUF</em></code></dt>
+  <dd>
+     Aggregate input value <code>a</code> into current intermediate value. For 
performance, the function may modify <code>b</code> and return it instead of 
constructing new object for <code>b</code>.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>zero: BUF</em></code></dt>
+  <dd>
+    The initial value of the intermediate result for this aggregation.
+  </dd>
+</dl>
+
+### Examples
+
+#### Type-Safe User-Defined Aggregate Functions
+
+User-defined aggregations for strongly typed Datasets revolve around the 
[Aggregator](api/scala/org/apache/spark/sql/expressions/Aggregator.html) 
abstract class.
+For example, a type-safe user-defined average can look like:
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
+  {% include_example typed_custom_aggregation 
scala/org/apache/spark/examples/sql/UserDefinedTypedAggregation.scala%}
+</div>
+<div data-lang="java"  markdown="1">
+  {% include_example typed_custom_aggregation 
java/org/apache/spark/examples/sql/JavaUserDefinedTypedAggregation.java%}
+</div>
+</div>
+
+#### Untyped User-Defined Aggregate Functions
+
+Typed aggregations, as described above, may also be registered as untyped 
aggregating UDFs for use with DataFrames.
+For example, a user-defined average for untyped DataFrames can look like:
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
+  {% include_example untyped_custom_aggregation 
scala/org/apache/spark/examples/sql/UserDefinedUntypedAggregation.scala%}
+</div>
+<div data-lang="java"  markdown="1">
+  {% include_example untyped_custom_aggregation 
java/org/apache/spark/examples/sql/JavaUserDefinedUntypedAggregation.java%}
+</div>
+</div>
+
+### Related Statements
+ * [Scalar User Defined Functions (UDFs)](sql-ref-functions-udf-scalar.html)
+ * [Integration with Hive UDFs/UDAFs/UDTFs](sql-ref-functions-udf-hive.html)
diff --git a/docs/sql-ref-functions-udf-scalar.md 
b/docs/sql-ref-functions-udf-scalar.md
index cee135b..2cb25f2 100644
--- a/docs/sql-ref-functions-udf-scalar.md
+++ b/docs/sql-ref-functions-udf-scalar.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: User defined Scalar Functions (UDF)
-displayTitle: User defined Scalar Functions (UDF)
+title: Scalar User Defined Functions (UDFs)
+displayTitle: Scalar User Defined Functions (UDFs)
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -9,9 +9,9 @@ license: |
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at
- 
+
      http://www.apache.org/licenses/LICENSE-2.0
- 
+
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
@@ -19,4 +19,43 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+User-Defined Functions (UDFs) are user-programmable routines that act on one 
row. This documentation lists the classes that are required for creating and 
registering UDFs. It also contains examples that demonstrate how to define and 
register UDFs and invoke them in Spark SQL.
+
+### UserDefinedFunction
+
+To define the properties of a user-defined function, the user can use some of 
the methods defined in this class.
+<dl>
+  <dt><code><em>asNonNullable(): UserDefinedFunction</em></code></dt>
+  <dd>
+    Updates UserDefinedFunction to non-nullable.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>asNondeterministic(): UserDefinedFunction</em></code></dt>
+  <dd>
+    Updates UserDefinedFunction to nondeterministic.
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>withName(name: String): UserDefinedFunction</em></code></dt>
+  <dd>
+    Updates UserDefinedFunction with a given name.
+  </dd>
+</dl>
+
+### Examples
+
+<div class="codetabs">
+<div data-lang="scala"  markdown="1">
+{% include_example udf_scalar 
scala/org/apache/spark/examples/sql/UserDefinedScalar.scala%}
+</div>
+<div data-lang="java"  markdown="1">
+  {% include_example udf_scalar 
java/org/apache/spark/examples/sql/JavaUserDefinedScalar.java%}
+</div>
+</div>
+
+### Related Statements
+ * [User Defined Aggregate Functions 
(UDAFs)](sql-ref-functions-udf-aggregate.html)
+ * [Integration with Hive UDFs/UDAFs/UDTFs](sql-ref-functions-udf-hive.html)
diff --git 
a/examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedScalar.java
 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedScalar.java
new file mode 100644
index 0000000..e5e6988
--- /dev/null
+++ 
b/examples/src/main/java/org/apache/spark/examples/sql/JavaUserDefinedScalar.java
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql;
+
+// $example on:udf_scalar$
+import org.apache.spark.sql.*;
+import org.apache.spark.sql.api.java.UDF1;
+import org.apache.spark.sql.expressions.UserDefinedFunction;
+import static org.apache.spark.sql.functions.udf;
+import org.apache.spark.sql.types.DataTypes;
+// $example off:udf_scalar$
+
+public class JavaUserDefinedScalar {
+
+  public static void main(String[] args) {
+
+    // $example on:udf_scalar$
+    SparkSession spark = SparkSession
+      .builder()
+      .appName("Java Spark SQL UDF scalar example")
+      .getOrCreate();
+
+    // Define and register a zero-argument non-deterministic UDF
+    // UDF is deterministic by default, i.e. produces the same result for the 
same input.
+    UserDefinedFunction random = udf(
+      () -> Math.random(), DataTypes.DoubleType
+    );
+    random.asNondeterministic();
+    spark.udf().register("random", random);
+    spark.sql("SELECT random()").show();
+    // +-------+
+    // |UDF()  |
+    // +-------+
+    // |xxxxxxx|
+    // +-------+
+
+    // Define and register a one-argument UDF
+    spark.udf().register("plusOne", new UDF1<Integer, Integer>() {
+      @Override
+      public Integer call(Integer x) {
+        return x + 1;
+      }
+    }, DataTypes.IntegerType);
+    spark.sql("SELECT plusOne(5)").show();
+    // +----------+
+    // |plusOne(5)|
+    // +----------+
+    // |         6|
+    // +----------+
+
+    // Define and register a two-argument UDF
+    UserDefinedFunction strLen = udf(
+      (String s, Integer x) -> s.length() + x, DataTypes.IntegerType
+    );
+    spark.udf().register("strLen", strLen);
+    spark.sql("SELECT strLen('test', 1)").show();
+    // +------------+
+    // |UDF(test, 1)|
+    // +------------+
+    // |           5|
+    // +------------+
+
+    // UDF in a WHERE clause
+    spark.udf().register("oneArgFilter", new UDF1<Long, Boolean>() {
+      @Override
+      public Boolean call(Long x) {
+        return  x > 5;
+      }
+    }, DataTypes.BooleanType);
+    spark.range(1, 10).createOrReplaceTempView("test");
+    spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show();
+    // +---+
+    // | id|
+    // +---+
+    // |  6|
+    // |  7|
+    // |  8|
+    // |  9|
+    // +---+
+
+    // $example off:udf_scalar$
+    spark.stop();
+  }
+}
diff --git 
a/examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedScalar.scala 
b/examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedScalar.scala
new file mode 100644
index 0000000..25218a6
--- /dev/null
+++ 
b/examples/src/main/scala/org/apache/spark/examples/sql/UserDefinedScalar.scala
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.sql
+
+// $example on:udf_scalar$
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.functions.udf
+// $example off:udf_scalar$
+
+object UserDefinedScalar {
+
+  def main(args: Array[String]): Unit = {
+    // $example on:udf_scalar$
+    val spark = SparkSession
+      .builder()
+      .appName("Spark SQL UDF scalar example")
+      .getOrCreate()
+
+    // Define and register a zero-argument non-deterministic UDF
+    // UDF is deterministic by default, i.e. produces the same result for the 
same input.
+    val random = udf(() => Math.random())
+    spark.udf.register("random", random.asNondeterministic())
+    spark.sql("SELECT random()").show()
+    // +-------+
+    // |UDF()  |
+    // +-------+
+    // |xxxxxxx|
+    // +-------+
+
+    // Define and register a one-argument UDF
+    val plusOne = udf((x: Int) => x + 1)
+    spark.udf.register("plusOne", plusOne)
+    spark.sql("SELECT plusOne(5)").show()
+    // +------+
+    // |UDF(5)|
+    // +------+
+    // |     6|
+    // +------+
+
+    // Define a two-argument UDF and register it with Spark in one step
+    spark.udf.register("strLenScala", (_: String).length + (_: Int))
+    spark.sql("SELECT strLenScala('test', 1)").show()
+    // +--------------------+
+    // |strLenScala(test, 1)|
+    // +--------------------+
+    // |                   5|
+    // +--------------------+
+
+    // UDF in a WHERE clause
+    spark.udf.register("oneArgFilter", (n: Int) => { n > 5 })
+    spark.range(1, 10).createOrReplaceTempView("test")
+    spark.sql("SELECT * FROM test WHERE oneArgFilter(id)").show()
+    // +---+
+    // | id|
+    // +---+
+    // |  6|
+    // |  7|
+    // |  8|
+    // |  9|
+    // +---+
+
+    // $example off:udf_scalar$
+
+    spark.stop()
+  }
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31319][SQL][DOCS] Document UDFs/UDAFs in SQL Reference

Reply via email to