subject:"spark git commit\: \[SPARK\-17854\]\[SQL\] rand\/randn allows null\/long as input seed"

spark git commit: [SPARK-17854][SQL] rand/randn allows null/long as input seed

2016-11-06 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 23ce0d1e9 -> 340f09d10


[SPARK-17854][SQL] rand/randn allows null/long as input seed

## What changes were proposed in this pull request?

This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and 
`LongType` as input in SQL. In this case, it treats the values as `0`.

So, this PR includes both changes below:
- `null` support

  It seems MySQL also accepts this.

  ``` sql
  mysql> select rand(0);
  +-+
  | rand(0) |
  +-+
  | 0.15522042769493574 |
  +-+
  1 row in set (0.00 sec)

  mysql> select rand(NULL);
  +-+
  | rand(NULL)  |
  +-+
  | 0.15522042769493574 |
  +-+
  1 row in set (0.00 sec)
  ```

  and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)

  So the codes below:

  ``` scala
  spark.range(1).selectExpr("rand(null)").show()
  ```

  prints..

  **Before**

  ```
Input argument to rand must be an integer literal.;; line 1 pos 0
  org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
  ```

  **After**

  ```
+---+
|rand(CAST(NULL AS INT))|
+---+
|0.13385709732307427|
+---+
  ```
- `LongType` support in SQL.

  In addition, it make the function allows to take `LongType` consistently 
within Scala/SQL.

  In more details, the codes below:

  ``` scala
  spark.range(1).select(rand(1), rand(1L)).show()
  spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
  ```

  prints..

  **Before**

  ```
  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+

  Input argument to rand must be an integer literal.;; line 1 pos 0
  org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
  at
  ```

  **After**

  ```
  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+

  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+
  ```
## How was this patch tested?

Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.

Author: hyukjinkwon 

Closes #15432 from HyukjinKwon/SPARK-17854.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/340f09d1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/340f09d1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/340f09d1

Branch: refs/heads/master
Commit: 340f09d100cb669bc6795f085aac6fa05630a076
Parents: 23ce0d1
Author: hyukjinkwon 
Authored: Sun Nov 6 14:11:37 2016 +
Committer: Sean Owen 
Committed: Sun Nov 6 14:11:37 2016 +

--
 .../expressions/randomExpressions.scala | 50 +++-
 .../sql/catalyst/expressions/RandomSuite.scala  |  6 ++
 .../test/resources/sql-tests/inputs/random.sql  | 17 
 .../resources/sql-tests/results/random.sql.out  | 84 
 4 files changed, 135 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/340f09d1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
index a331a55..1d7a3c7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
@@ -17,11 +17,10 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
-import org.apache.spark.TaskContext
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext,

spark git commit: [SPARK-17854][SQL] rand/randn allows null/long as input seed

2016-11-06 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-2.1 c42301f1e -> dcbf3fd4b


[SPARK-17854][SQL] rand/randn allows null/long as input seed

## What changes were proposed in this pull request?

This PR proposes `rand`/`randn` accept `null` as input in Scala/SQL and 
`LongType` as input in SQL. In this case, it treats the values as `0`.

So, this PR includes both changes below:
- `null` support

  It seems MySQL also accepts this.

  ``` sql
  mysql> select rand(0);
  +-+
  | rand(0) |
  +-+
  | 0.15522042769493574 |
  +-+
  1 row in set (0.00 sec)

  mysql> select rand(NULL);
  +-+
  | rand(NULL)  |
  +-+
  | 0.15522042769493574 |
  +-+
  1 row in set (0.00 sec)
  ```

  and also Hive does according to 
[HIVE-14694](https://issues.apache.org/jira/browse/HIVE-14694)

  So the codes below:

  ``` scala
  spark.range(1).selectExpr("rand(null)").show()
  ```

  prints..

  **Before**

  ```
Input argument to rand must be an integer literal.;; line 1 pos 0
  org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:444)
  ```

  **After**

  ```
+---+
|rand(CAST(NULL AS INT))|
+---+
|0.13385709732307427|
+---+
  ```
- `LongType` support in SQL.

  In addition, it make the function allows to take `LongType` consistently 
within Scala/SQL.

  In more details, the codes below:

  ``` scala
  spark.range(1).select(rand(1), rand(1L)).show()
  spark.range(1).selectExpr("rand(1)", "rand(1L)").show()
  ```

  prints..

  **Before**

  ```
  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+

  Input argument to rand must be an integer literal.;; line 1 pos 0
  org.apache.spark.sql.AnalysisException: Input argument to rand must be an 
integer literal.;; line 1 pos 0
  at 
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$5.apply(FunctionRegistry.scala:465)
  at
  ```

  **After**

  ```
  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+

  +--+--+
  |   rand(1)|   rand(1)|
  +--+--+
  |0.2630967864682161|0.2630967864682161|
  +--+--+
  ```
## How was this patch tested?

Unit tests in `DataFrameSuite.scala` and `RandomSuite.scala`.

Author: hyukjinkwon 

Closes #15432 from HyukjinKwon/SPARK-17854.

(cherry picked from commit 340f09d100cb669bc6795f085aac6fa05630a076)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dcbf3fd4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dcbf3fd4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dcbf3fd4

Branch: refs/heads/branch-2.1
Commit: dcbf3fd4bd42059aed9c966d4f0cdf58815eb802
Parents: c42301f
Author: hyukjinkwon 
Authored: Sun Nov 6 14:11:37 2016 +
Committer: Sean Owen 
Committed: Sun Nov 6 14:11:47 2016 +

--
 .../expressions/randomExpressions.scala | 50 +++-
 .../sql/catalyst/expressions/RandomSuite.scala  |  6 ++
 .../test/resources/sql-tests/inputs/random.sql  | 17 
 .../resources/sql-tests/results/random.sql.out  | 84 
 4 files changed, 135 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dcbf3fd4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
index a331a55..1d7a3c7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala
@@ -17,11 +17,10 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
-import org.apache.spark.TaskContext
 import org.apache.spark.sql.AnalysisException
 import

spark git commit: [SPARK-17854][SQL] rand/randn allows null/long as input seed

spark git commit: [SPARK-17854][SQL] rand/randn allows null/long as input seed

2 matches

Site Navigation

Mail list logo

Footer information