date:20190717

[GitHub] [spark] AmplabJenkins removed a comment on issue #25188: spark1

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25188: spark1
URL: https://github.com/apache/spark/pull/25188#issuecomment-512676002
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25188: spark1

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25188: spark1
URL: https://github.com/apache/spark/pull/25188#issuecomment-512676165
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25188: spark1

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25188: spark1
URL: https://github.com/apache/spark/pull/25188#issuecomment-512676002
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] talendinbox opened a new pull request #25188: spark1

2019-07-17 Thread GitBox

talendinbox opened a new pull request #25188: spark1
URL: https://github.com/apache/spark/pull/25188
 
 
   ## What changes were proposed in this pull request?
   
   (Please fill in changes proposed in this fix)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   Please review https://spark.apache.org/contributing.html before opening a 
pull request.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304743751
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
 
 Review comment:
   Yes, will do.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304743647
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- INTERSECT ALL same table in both branches
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab1 WHERE udf(k) = 1;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE k > udf(2)
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT);
+
+-- Type Coerced INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT array(1), udf(2);
+
+-- Mismatch on number of columns across both branches
+SELECT udf(k) FROM tab1
+INTERSECT ALL
+SELECT udf(k), udf(v) FROM tab2;
+
+-- Basic
+SELECT * FROM tab2
+INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+UNION ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+EXCEPT
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- test use parenthesis to control order of evaluation
+(
+  (
+(
+  SELECT * FROM tab1
+  EXCEPT
+  SELECT * FROM tab2
+)
+EXCEPT
+SELECT * FROM tab1
+  )
+  INTERSECT ALL
+  SELECT * FROM tab2
+)
+;
+
+-- Join under intersect all
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
 
 Review comment:
   Reverted.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512672985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12940/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512672978
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512672985
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12940/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512672978
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert 
and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512671797
 
 
   **[Test build #107825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107825/testReport)**
 for PR 25090 at commit 
[`2c8cc19`](https://github.com/apache/spark/commit/2c8cc194fb6552cebe6cd1333cb88374c4a156a8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert 
and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512671629
 
 
   @HyukjinKwon, I think I addressed all your comments. Please re-review this. 
Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] 
Proposed new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512671341
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107810/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] 
Proposed new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512671334
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed 
new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512671341
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107810/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed 
new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512671334
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512670846
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107819/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] 
Proposed new shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512648513
 
 
   **[Test build #107810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107810/testReport)**
 for PR 25007 at commit 
[`9f597dd`](https://github.com/apache/spark/commit/9f597dd726aba08642c4329534e5ae12ffa6fbe9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512670838
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API

2019-07-17 Thread GitBox

SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new 
shuffle writer API 
URL: https://github.com/apache/spark/pull/25007#issuecomment-512670975
 
 
   **[Test build #107810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107810/testReport)**
 for PR 25007 at commit 
[`9f597dd`](https://github.com/apache/spark/commit/9f597dd726aba08642c4329534e5ae12ffa6fbe9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512656638
 
 
   **[Test build #107819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107819/testReport)**
 for PR 25090 at commit 
[`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512670846
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107819/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512670838
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert 
and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512670717
 
 
   **[Test build #107819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107819/testReport)**
 for PR 25090 at commit 
[`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304738907
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+INTERSECT DISTINCT
+SELECT * FROM tab4;
+
+-- EXCEPT ALL + EXCEPT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Mismatch on number of columns across both branches
+SELECT k FROM tab3
+EXCEPT ALL
+SELECT k, v FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Using MINUS ALL
+SELECT * FROM tab3
+MINUS ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+MINUS DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+EXCEPT DISTINCT
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Join under except all. Should produce empty resultset since both left and 
right sets 
+-- are same.
+SELECT * 
+FROM   (SELECT udf(tab3.k),
+   udf(tab4.v)
+FROM   tab3 
+   JOIN tab4 
+ ON udf(tab3.k) = udf(tab4.k))
 
 Review comment:
   Yes, this can be done now with your `udf` fix. :) 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25184: [SPARK-28431]Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25184: [SPARK-28431]Fix CSV 
datasource throw com.univocity.parsers.common.TextParsingException with large 
size message
URL: https://github.com/apache/spark/pull/25184#discussion_r304737449
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -346,4 +348,23 @@ private[sql] object UnivocityParser {
   parser.options.columnNameOfCorruptRecord)
 filteredLines.flatMap(safeParser.parse)
   }
+
+  def limitParserErrorContentLength[T](f: () => T): T = {
+try {
+  f()
+} catch {
+  case e: TextParsingException =>
+e.setErrorContentLength(SQLConf.get.getConf(
 
 Review comment:
   @WeichenXu123, seems `setErrorContentLength` can be set in `CSVOptions`'s 
parser and writer settings (see 
https://github.com/uniVocity/univocity-parsers/blob/f616d151b48150bc9cb98943f9b6f8353b704359/src/test/java/com/univocity/parsers/common/DataProcessingExceptionTest.java)
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304735783
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304735863
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to 
en_US in StopWordsRemover if system default locale isn't in available locales 
in JVM
URL: https://github.com/apache/spark/pull/25133#issuecomment-512665659
 
 
   I am not sure. The change here doesn't look affecting the default locale in 
JVM but only in `StopWordsRemover`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make 
integrated UDF tests robust by making UDFs (virtually) no-op
URL: https://github.com/apache/spark/pull/25130#issuecomment-512665250
 
 
   Okay .. JDK 11 test, SBT, Maven builds look all fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304734548
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
 
 Review comment:
   Oh, yes. complex types cannot be supported via udf for now. I forgot. Yes, 
let's just don't do it for now and just replace `*` to UDF.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bersprockets commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view

2019-07-17 Thread GitBox

bersprockets commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] 
Self-join should not miss cached view
URL: https://github.com/apache/spark/pull/25068#issuecomment-512664402
 
 
   I did the following:
   
   - replaced `!v.sameOutput(child)` with `output != child.output`
   - replaced `!Cast.canUpCast` with `Cast.mayTruncate`
   
   In the process, I broke a test in SQLViewSuite. I will hunt down the cause 
tomorrow and hopefully post the changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304733255
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
 
 Review comment:
   Actually, changing to `udf(array(1))` gives the following message:
   ```
   cannot resolve 'udf(cast(array(1) as string))' due to data type mismatch: 
cannot cast string to array; line 3 pos 7
   ```
   
   The expected message is:
   ```
   ExceptAll can only be performed on tables with the compatible column types. 
array <> int at the first column of the second table; 
   ```
   
   @HyukjinKwon do you still want this change?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304732964
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
 
 Review comment:
   Yea, otherwise, it would just duplicate the tests in original files.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304733009
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
 
 Review comment:
   Yea, let's replace it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25183: [SPARK-28430][UI] Fix stage table rendering when some tasks' metrics are missing

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25183: [SPARK-28430][UI] Fix stage table 
rendering when some tasks' metrics are missing
URL: https://github.com/apache/spark/pull/25183#issuecomment-512662741
 
 
   Looks fine but usually if this patch involves UI changes, it attaches a 
screenshot after the fix though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304732716
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
 
 Review comment:
   @HyukjinKwon, I can replace * with udf(c1). Did you want something else?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-17 Thread GitBox

HyukjinKwon closed pull request #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #25182: [SPARK-27609][PYTHON] Convert values of function options to strings

2019-07-17 Thread GitBox

HyukjinKwon closed pull request #25182: [SPARK-27609][PYTHON] Convert values of 
function options to strings
URL: https://github.com/apache/spark/pull/25182
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304732055
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
 
 Review comment:
   Changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304732075
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
 
 Review comment:
   Changed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304731897
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
 
 Review comment:
   @HyukjinKwon, do you want this in all instances below?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto 
with overwrite is not honored
URL: https://github.com/apache/spark/pull/25175#issuecomment-512661717
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25182: [SPARK-27609][PYTHON] Convert values of function options to strings

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25182: [SPARK-27609][PYTHON] Convert values of 
function options to strings
URL: https://github.com/apache/spark/pull/25182#issuecomment-512661568
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

imback82 commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304731560
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out
 ##
 @@ -0,0 +1,345 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 27
+
+
+-- !query 0
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 3
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v)
+-- !query 3 schema
+struct<>
+-- !query 3 output
+
+
+
+-- !query 4
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2
+-- !query 4 schema
+struct
+-- !query 4 output
+0
+2
+2
+NULL
+
+
+-- !query 5
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2
+-- !query 5 schema
+struct
+-- !query 5 output
+0
+2
+2
+NULL
+
+
+-- !query 6
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL
+-- !query 6 schema
+struct
+-- !query 6 output
+0
+2
+2
+NULL
 
 Review comment:
   Reverted the comment since now it returns the correct result with your 
changes :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement 
RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512661306
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107811/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement 
RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512661303
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement 
RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512661306
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107811/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement 
RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512661303
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

SparkQA removed a comment on issue #25160: [SPARK-28399][ML] implement 
RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512649817
 
 
   **[Test build #107811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107811/testReport)**
 for PR 25160 at commit 
[`a196c09`](https://github.com/apache/spark/commit/a196c09bdc1a94a4f98da1328d29815bb993140b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25160: [SPARK-28399][ML] implement RobustScaler

2019-07-17 Thread GitBox

SparkQA commented on issue #25160: [SPARK-28399][ML] implement RobustScaler
URL: https://github.com/apache/spark/pull/25160#issuecomment-512661053
 
 
   **[Test build #107811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107811/testReport)**
 for PR 25160 at commit 
[`a196c09`](https://github.com/apache/spark/commit/a196c09bdc1a94a4f98da1328d29815bb993140b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert 
and port 'cross-join.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25168#issuecomment-512660918
 
 
   BTW, @viirya, please feel free to review those PRs when you have some times 
since you know those codes pretty well as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512660656
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107806/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement 
REPLACE TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512660654
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

2019-07-17 Thread GitBox

SparkQA commented on issue #25180: [SPARK-28423][SQL] merge Scan and 
Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-512660752
 
 
   **[Test build #107824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107824/testReport)**
 for PR 25180 at commit 
[`878eaa5`](https://github.com/apache/spark/commit/878eaa520dafa109d1682388c601e4f0b43916ee).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512660654
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512660656
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107806/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] 
Convert and port 'join-empty-relation.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25127#issuecomment-512660638
 
 
   Looks fine in general.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25127: 
[SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' 
into UDF test base
URL: https://github.com/apache/spark/pull/25127#discussion_r304730735
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-join-empty-relation.sql
 ##
 @@ -0,0 +1,37 @@
+-- List of configuration the test suite is run against:
+--SET spark.sql.autoBroadcastJoinThreshold=10485760
+--SET 
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true
+--SET 
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false
+
+-- This test file was converted from join-empty-relation.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES (1) AS GROUPING(a);
+CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES (1) AS GROUPING(a);
+
+CREATE TEMPORARY VIEW empty_table as SELECT a FROM t2 WHERE false;
+
+SELECT udf(t1.a), udf(empty_table.a) FROM t1 INNER JOIN empty_table ON 
(udf(t1.a) = udf(empty_table.a));
 
 Review comment:
   Likewise, we can test the UDFs like `udf(udf(t1.a))` or 
`udf(udf(empty_table.a) = udf(t1.a))`. Let's add such combinations as well.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan 
and Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-512660377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12939/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE 
TABLE and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512625863
 
 
   **[Test build #107806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107806/testReport)**
 for PR 24798 at commit 
[`be04476`](https://github.com/apache/spark/commit/be04476e968bd5cb5722c3b5a208b8430d78b1b9).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan 
and Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-512660375
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] 
Convert and port 'inline-table.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25124#issuecomment-512660401
 
 
   Looks fine in general otherwise.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and 
Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-512660377
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12939/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2

2019-07-17 Thread GitBox

SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE 
and REPLACE TABLE AS SELECT with V2
URL: https://github.com/apache/spark/pull/24798#issuecomment-512660320
 
 
   **[Test build #107806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107806/testReport)**
 for PR 24798 at commit 
[`be04476`](https://github.com/apache/spark/commit/be04476e968bd5cb5722c3b5a208b8430d78b1b9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream

2019-07-17 Thread GitBox

AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and 
Batch/Stream
URL: https://github.com/apache/spark/pull/25180#issuecomment-512660375
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#discussion_r304730520
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql
 ##
 @@ -0,0 +1,54 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select udf(a), b from values ("one", 1) as data(a, b);
+
+-- single column multiple rows
+select udf(a) from values 1, 2, 3 as data(a);
+
+-- three rows
+select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as 
data(a, b);
+
+-- null type
+select udf(a), b from values ("one", null), ("two", null) as data(a, b);
+
+-- int and long coercion
+select udf(a), b from values ("one", 1), ("two", 2L) as data(a, b);
+
+-- foldable expressions
+select udf(a), udf(b) from values ("one", 1 + 0), ("two", 1 + 3L) as data(a, 
b);
+
+-- complex types
+select udf(a), b from values ("one", array(0, 1)), ("two", array(2, 3)) as 
data(a, b);
+
+-- decimal and double coercion
+select udf(a), b from values ("one", 2.0), ("two", 3.0D) as data(a, b);
+
+-- error reporting: nondeterministic function rand
+select udf(a), b from values ("one", rand(5)), ("two", 3.0D) as data(a, b);
+
+-- error reporting: different number of columns
+select udf(a), udf(b) from values ("one", 2.0), ("two") as data(a, b);
+
+-- error reporting: types that are incompatible
+select udf(a), udf(b) from values ("one", array(0, 1)), ("two", struct(1, 2)) 
as data(a, b);
+
+-- error reporting: number aliases different from number data values
+select udf(a), udf(b) from values ("one"), ("two") as data(a, b);
+
+-- error reporting: unresolved expression
+select udf(a), udf(b) from values ("one", random_not_exist_func(1)), ("two", 
2) as data(a, b);
+
+-- error reporting: aggregate expression
+select udf(a), udf(b) from values ("one", count(1)), ("two", 2) as data(a, b);
+
+-- string to timestamp
 
 Review comment:
   Let's add udf in all tests. Otherwise, it just duplicates the original file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#discussion_r304730452
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql
 ##
 @@ -0,0 +1,54 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select udf(a), b from values ("one", 1) as data(a, b);
+
+-- single column multiple rows
+select udf(a) from values 1, 2, 3 as data(a);
+
+-- three rows
+select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as 
data(a, b);
+
+-- null type
+select udf(a), b from values ("one", null), ("two", null) as data(a, b);
+
+-- int and long coercion
+select udf(a), b from values ("one", 1), ("two", 2L) as data(a, b);
+
+-- foldable expressions
+select udf(a), udf(b) from values ("one", 1 + 0), ("two", 1 + 3L) as data(a, 
b);
 
 Review comment:
   I would test `udf(udf(a))` too


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#discussion_r304730380
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql
 ##
 @@ -0,0 +1,54 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select udf(a), b from values ("one", 1) as data(a, b);
 
 Review comment:
   See `udf-aggregates_part1.sql` to  check how I commented them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#discussion_r304730321
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql
 ##
 @@ -0,0 +1,54 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
+
+-- single row
+select udf(a), b from values ("one", 1) as data(a, b);
 
 Review comment:
   I think `values ("one", udf(1))` is not allowed as of SPARK-28291. We can 
add that test here, and comment them with linking `SPARK-28291` JIRA. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#discussion_r304730144
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql
 ##
 @@ -0,0 +1,54 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+-- single row, without table and column alias
+select * from values ("one", 1);
+
+-- single row, without column alias
+select * from values ("one", 1) as data;
 
 Review comment:
   let's explicitly test UDF here instead of `*`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] 
Convert and port 'pivot.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25122#issuecomment-512659898
 
 
   Looks fine otherwise if the tests pass. I will take another look before 
merging it in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25122: 
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25122#discussion_r304729849
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql
 ##
 @@ -0,0 +1,317 @@
+-- This test file was converted from pivot.sql.
+
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+--Note some test cases have been commented as the current integrated UDFs 
cannot handle complex types
+
+create temporary view courseSales as select * from values
+  ("dotNET", 2012, 1),
+  ("Java", 2012, 2),
+  ("dotNET", 2012, 5000),
+  ("dotNET", 2013, 48000),
+  ("Java", 2013, 3)
+  as courseSales(course, year, earnings);
+
+create temporary view years as select * from values
+  (2012, 1),
+  (2013, 2)
+  as years(y, s);
+
+create temporary view yearsWithComplexTypes as select * from values
+  (2012, array(1, 1), map('1', 1), struct(1, 'a')),
+  (2013, array(2, 2), map('2', 2), struct(2, 'b'))
+  as yearsWithComplexTypes(y, a, m, s);
+
+-- pivot courses
+SELECT * FROM (
+  SELECT udf(year), course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot years with no subquery
+SELECT * FROM courseSales
+PIVOT (
+  udf(sum(earnings))
+  FOR year IN (2012, 2013)
+);
+
+-- pivot courses with multiple aggregations
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(avg(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column
+SELECT * FROM (
+  SELECT udf(course) as course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column and with multiple aggregations on different 
columns
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(min(year))
 
 Review comment:
   We can try `udf(sum(udf(earnings)))` combination too in this file in general


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25122: 
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25122#discussion_r304729980
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql
 ##
 @@ -0,0 +1,317 @@
+-- This test file was converted from pivot.sql.
+
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+--Note some test cases have been commented as the current integrated UDFs 
cannot handle complex types
+
+create temporary view courseSales as select * from values
+  ("dotNET", 2012, 1),
+  ("Java", 2012, 2),
+  ("dotNET", 2012, 5000),
+  ("dotNET", 2013, 48000),
+  ("Java", 2013, 3)
+  as courseSales(course, year, earnings);
+
+create temporary view years as select * from values
+  (2012, 1),
+  (2013, 2)
+  as years(y, s);
+
+create temporary view yearsWithComplexTypes as select * from values
+  (2012, array(1, 1), map('1', 1), struct(1, 'a')),
+  (2013, array(2, 2), map('2', 2), struct(2, 'b'))
+  as yearsWithComplexTypes(y, a, m, s);
+
+-- pivot courses
+SELECT * FROM (
+  SELECT udf(year), course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot years with no subquery
+SELECT * FROM courseSales
+PIVOT (
+  udf(sum(earnings))
+  FOR year IN (2012, 2013)
+);
+
+-- pivot courses with multiple aggregations
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(avg(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column
+SELECT * FROM (
+  SELECT udf(course) as course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column and with multiple aggregations on different 
columns
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(min(year))
+  FOR course IN ('dotNET', 'Java')
+);
+
+--todo nan fix
+-- pivot on join query with multiple group by columns
+SELECT * FROM (
+  SELECT course, year, earnings, udf(s) as s
+  FROM courseSales
+  JOIN years ON year = y
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR s IN (1, 2)
+);
+
+-- pivot on join query with multiple aggregations on different columns
+SELECT * FROM (
+  SELECT course, year, earnings, s
+  FROM courseSales
+  JOIN years ON year = y
+)
+PIVOT (
+  udf(sum(earnings)), udf(min(s))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot on join query with multiple columns in one aggregation
+SELECT * FROM (
+  SELECT course, year, earnings, s
+  FROM courseSales
+  JOIN years ON year = y
+)
+PIVOT (
+  udf(sum(earnings * s))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with aliases and projection
+SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM (
+  SELECT year y, course c, earnings e FROM courseSales
+)
+PIVOT (
+  udf(sum(e)) s, udf(avg(e)) a
+  FOR y IN (2012, 2013)
+);
+
+-- pivot with projection and value aliases
+SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM (
+  SELECT year y, course c, earnings e FROM courseSales
+)
+PIVOT (
+  udf(sum(e)) s, udf(avg(e)) a
+  FOR y IN (2012 as firstYear, 2013 secondYear)
+);
+
+-- pivot years with non-aggregate function
+SELECT * FROM courseSales
+PIVOT (
+  udf(abs(earnings))
+  FOR year IN (2012, 2013)
+);
+
+-- pivot with one of the expressions as non-aggregate function
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), year
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with unresolvable columns
+SELECT * FROM (
+  SELECT course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR year IN (2012, 2013)
+);
+
+-- pivot with complex aggregate expressions
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(ceil(udf(sum(earnings, avg(earnings) + 1 as a1
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with invalid arguments in aggregate expressions
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(udf(avg(earnings)))
+  FOR course IN ('dotNET', 'Java')
+);
+
+--todo nan fix
+-- pivot on multiple pivot columns
+SELECT * FROM (
+  SELECT course, year, earnings, s
+  FROM courseSales
+  JOIN years ON year = y
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR (course, year) IN (('dotNET', 2012), ('Java', 2013))
+);
+
+--todo nan fix
+-- pivot on multiple pivot columns with aliased values
+SELECT * FROM (
+  SELECT course, year, earnings, s
+  FROM courseSales
+  JOIN years ON year = y
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR (course, s) IN (('dotNET', 2) as c1, ('Java', 1) as c2)
+);
+
+-- pivot on multiple pivot columns with values of wrong data types
+SELECT * FROM (
+  SELECT course, year, earnings, s
+  FROM courseSales
+  JOIN years ON

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25122: 
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25122#discussion_r304729849
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql
 ##
 @@ -0,0 +1,317 @@
+-- This test file was converted from pivot.sql.
+
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+--Note some test cases have been commented as the current integrated UDFs 
cannot handle complex types
+
+create temporary view courseSales as select * from values
+  ("dotNET", 2012, 1),
+  ("Java", 2012, 2),
+  ("dotNET", 2012, 5000),
+  ("dotNET", 2013, 48000),
+  ("Java", 2013, 3)
+  as courseSales(course, year, earnings);
+
+create temporary view years as select * from values
+  (2012, 1),
+  (2013, 2)
+  as years(y, s);
+
+create temporary view yearsWithComplexTypes as select * from values
+  (2012, array(1, 1), map('1', 1), struct(1, 'a')),
+  (2013, array(2, 2), map('2', 2), struct(2, 'b'))
+  as yearsWithComplexTypes(y, a, m, s);
+
+-- pivot courses
+SELECT * FROM (
+  SELECT udf(year), course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot years with no subquery
+SELECT * FROM courseSales
+PIVOT (
+  udf(sum(earnings))
+  FOR year IN (2012, 2013)
+);
+
+-- pivot courses with multiple aggregations
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(avg(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column
+SELECT * FROM (
+  SELECT udf(course) as course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings))
+  FOR course IN ('dotNET', 'Java')
+);
+
+-- pivot with no group by column and with multiple aggregations on different 
columns
+SELECT * FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  udf(sum(earnings)), udf(min(year))
 
 Review comment:
   We can try `udf(sum(udf(earnings)))` combination too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304729778
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- INTERSECT ALL same table in both branches
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab1 WHERE udf(k) = 1;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE k > udf(2)
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT);
+
+-- Type Coerced INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT array(1), udf(2);
+
+-- Mismatch on number of columns across both branches
+SELECT udf(k) FROM tab1
+INTERSECT ALL
+SELECT udf(k), udf(v) FROM tab2;
+
+-- Basic
+SELECT * FROM tab2
+INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+UNION ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+EXCEPT
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- test use parenthesis to control order of evaluation
+(
+  (
+(
+  SELECT * FROM tab1
+  EXCEPT
+  SELECT * FROM tab2
+)
+EXCEPT
+SELECT * FROM tab1
+  )
+  INTERSECT ALL
+  SELECT * FROM tab2
+)
+;
+
+-- Join under intersect all
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
+INTERSECT ALL 
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT));
+
+-- Join under intersect all (2)
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
+INTERSECT ALL 
+SELECT * 
+FROM   (SELECT udf(tab2.v) AS k,
+   udf(tab1.k) AS v
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT));
+
+-- Group by under intersect all
+SELECT CAST(udf(v) AS BIGINT) FROM tab1 GROUP BY v
+INTERSECT ALL
+SELECT CAST(udf(k) AS BIGINT) FROM tab2 GROUP BY k;
 
 Review comment:
   Let's get rid of the casts.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304729754
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- INTERSECT ALL same table in both branches
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab1 WHERE udf(k) = 1;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE k > udf(2)
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT);
+
+-- Type Coerced INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT array(1), udf(2);
+
+-- Mismatch on number of columns across both branches
+SELECT udf(k) FROM tab1
+INTERSECT ALL
+SELECT udf(k), udf(v) FROM tab2;
+
+-- Basic
+SELECT * FROM tab2
+INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+UNION ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+EXCEPT
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- test use parenthesis to control order of evaluation
+(
+  (
+(
+  SELECT * FROM tab1
+  EXCEPT
+  SELECT * FROM tab2
+)
+EXCEPT
+SELECT * FROM tab1
+  )
+  INTERSECT ALL
+  SELECT * FROM tab2
+)
+;
+
+-- Join under intersect all
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
+INTERSECT ALL 
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT));
+
+-- Join under intersect all (2)
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
+INTERSECT ALL 
+SELECT * 
+FROM   (SELECT udf(tab2.v) AS k,
+   udf(tab1.k) AS v
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT));
 
 Review comment:
   We could try `udf(udf(tab1.k) = udf(tab2.k))` or `udf(udf(tab1.k) = tab2.k)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304729678
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- INTERSECT ALL same table in both branches
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab1 WHERE udf(k) = 1;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE k > udf(2)
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT);
+
+-- Type Coerced INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT array(1), udf(2);
+
+-- Mismatch on number of columns across both branches
+SELECT udf(k) FROM tab1
+INTERSECT ALL
+SELECT udf(k), udf(v) FROM tab2;
+
+-- Basic
+SELECT * FROM tab2
+INTERSECT ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+UNION ALL
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- Chain of different `set operations
+SELECT * FROM tab1
+EXCEPT
+SELECT * FROM tab2
+EXCEPT
+SELECT * FROM tab1
+INTERSECT ALL
+SELECT * FROM tab2
+;
+
+-- test use parenthesis to control order of evaluation
+(
+  (
+(
+  SELECT * FROM tab1
+  EXCEPT
+  SELECT * FROM tab2
+)
+EXCEPT
+SELECT * FROM tab1
+  )
+  INTERSECT ALL
+  SELECT * FROM tab2
+)
+;
+
+-- Join under intersect all
+SELECT * 
+FROM   (SELECT udf(tab1.k),
+   udf(tab2.v)
+FROM   tab1 
+   JOIN tab2 
+ ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT))
 
 Review comment:
   Yea, now we don't have to add such cases anymore. Let's get rid of them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#discussion_r304729633
 
 

 ##
 File path: 
sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql
 ##
 @@ -0,0 +1,164 @@
+-- This test file was converted from intersect-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(1, 3),
+(2, 3),
+(null, null),
+(null, null)
+AS tab1(k, v);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2), 
+(2, 3),
+(3, 4),
+(null, null),
+(null, null)
+AS tab2(k, v);
+
+-- Basic INTERSECT ALL
+SELECT * FROM tab1
 
 Review comment:
   I think my comments I left at of your PRs are applied here too. Let's list 
up cols


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25113: [SPARK-28287][SQL][PYTHON][TESTS] 
Convert and port 'udaf.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25113#issuecomment-512659133
 
 
   Looks good to me otherwise.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25113: 
[SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25113#discussion_r304729414
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-udaf.sql
 ##
 @@ -0,0 +1,18 @@
+-- This test file was converted from udaf.sql.
+
+CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES
+(1), (2), (3), (4)
+as t1(int_col1);
+
+CREATE FUNCTION myDoubleAvg AS 'test.org.apache.spark.sql.MyDoubleAvg';
+
+SELECT default.myDoubleAvg(udf(int_col1)) as my_avg from t1;
 
 Review comment:
   @vinodkc, let's add a different combination in general. For instance,
   
   ```
   udf(default.myDoubleAvg(udf(int_col1)))
   ```
   
   ```
   udf(default.myDoubleAvg(int_col1))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] 
Convert and port 'outer-join.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25103#issuecomment-512658948
 
 
   Looks good to me in general if the tests pass


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] 
Convert and port 'group-by.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25098#issuecomment-512658787
 
 
   Looks fine in general but let's focus on testing GROUP BY clause with UDFs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25098: 
[SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test 
base
URL: https://github.com/apache/spark/pull/25098#discussion_r304729007
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql
 ##
 @@ -0,0 +1,156 @@
+-- This test file was converted from group-by.sql.
+-- Test data.
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
+AS testData(a, b);
+
+-- Aggregate with empty GroupBy expressions.
+SELECT udf(a), udf(COUNT(b)) FROM testData;
+SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData;
+
+-- Aggregate with non-empty GroupBy expressions.
+SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a;
+SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b;
+SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a);
+
+-- Aggregate grouped by literals.
+SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1;
+
+-- Aggregate grouped by literals (whole stage code generation).
+SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1;
+
+-- Aggregate grouped by literals (hash aggregate).
+SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 
GROUP BY 1;
+
+-- Aggregate grouped by literals (sort aggregate).
+SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1;
+
+-- Aggregate with complex GroupBy expressions.
+SELECT CAST(udf(a + b) as INT), udf(COUNT(b)) FROM testData GROUP BY a + b;
 
 Review comment:
   I would focus on adding udfs in `GROUP BY` clause because this test targets 
to test `GROUP BY` basically.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25098: 
[SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test 
base
URL: https://github.com/apache/spark/pull/25098#discussion_r304728860
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql
 ##
 @@ -0,0 +1,156 @@
+-- This test file was converted from group-by.sql.
+-- Test data.
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
+AS testData(a, b);
+
+-- Aggregate with empty GroupBy expressions.
+SELECT udf(a), udf(COUNT(b)) FROM testData;
+SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData;
+
+-- Aggregate with non-empty GroupBy expressions.
+SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a;
+SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b;
 
 Review comment:
   we could test `udf(COUNT(udf(b)))` combination too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25098: 
[SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test 
base
URL: https://github.com/apache/spark/pull/25098#discussion_r304728777
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql
 ##
 @@ -0,0 +1,156 @@
+-- This test file was converted from group-by.sql.
+-- Test data.
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
+AS testData(a, b);
+
+-- Aggregate with empty GroupBy expressions.
+SELECT udf(a), udf(COUNT(b)) FROM testData;
+SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData;
+
+-- Aggregate with non-empty GroupBy expressions.
+SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a;
+SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b;
+SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a);
+
+-- Aggregate grouped by literals.
+SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1;
+
+-- Aggregate grouped by literals (whole stage code generation).
+SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1;
 
 Review comment:
   This one seems not having an udf.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] 
Convert and port 'except-all.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25090#issuecomment-512658238
 
 
   Looks fine otherwise.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304728650
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+INTERSECT DISTINCT
+SELECT * FROM tab4;
+
+-- EXCEPT ALL + EXCEPT
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION ALL
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Mismatch on number of columns across both branches
+SELECT k FROM tab3
+EXCEPT ALL
+SELECT k, v FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Using MINUS ALL
+SELECT * FROM tab3
+MINUS ALL
+SELECT * FROM tab4
+UNION
+SELECT * FROM tab3
+MINUS DISTINCT
+SELECT * FROM tab4;
+
+-- Chain of set operations
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4
+EXCEPT DISTINCT
+SELECT * FROM tab3
+EXCEPT DISTINCT
+SELECT * FROM tab4;
+
+-- Join under except all. Should produce empty resultset since both left and 
right sets 
+-- are same.
+SELECT * 
+FROM   (SELECT udf(tab3.k),
+   udf(tab4.v)
+FROM   tab3 
+   JOIN tab4 
+ ON udf(tab3.k) = udf(tab4.k))
 
 Review comment:
   Can we use different combination here and below? For instnace,
   
   ```
   udf(tab3.k) = tab4.k)
   ```
   
   ```
   udf(udf(tab3.k) = udf(tab4.k))
   ```
   
   ```
   SELECT * 
   FROM   (SELECT tab3.k,
  udf(tab4.v)
   FROM   tab3 
  JOIN tab4 
ON udf(tab3.k) = udf(tab4.k))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert 
and port 'inline-table.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25124#issuecomment-512658030
 
 
   **[Test build #107823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107823/testReport)**
 for PR 25124 at commit 
[`89212b7`](https://github.com/apache/spark/commit/89212b73627a42ff6e0725ccc3c16bdd839d0805).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert 
and port 'join-empty-relation.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25127#issuecomment-512658007
 
 
   **[Test build #107822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107822/testReport)**
 for PR 25127 at commit 
[`394afe8`](https://github.com/apache/spark/commit/394afe85bf3cd1cf0da629714f34e1d4f29bfd4d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert 
and port 'pgSQL/select_having.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25161#issuecomment-512658014
 
 
   **[Test build #107821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107821/testReport)**
 for PR 25161 at commit 
[`6f44282`](https://github.com/apache/spark/commit/6f4428250499738c496aa89cbb338fcecdcd9b9d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base

2019-07-17 Thread GitBox

SparkQA commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and 
port 'cross-join.sql' into UDF test base
URL: https://github.com/apache/spark/pull/25168#issuecomment-512658009
 
 
   **[Test build #107820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107820/testReport)**
 for PR 25168 at commit 
[`ac20743`](https://github.com/apache/spark/commit/ac20743bf09d6a976f632c586da683220ff8bdf5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304728497
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
+
+-- Basic
+SELECT * FROM tab3
+EXCEPT ALL
+SELECT * FROM tab4;
+
+-- Basic
+SELECT * FROM tab4
+EXCEPT ALL
+SELECT * FROM tab3;
+
+-- EXCEPT ALL + INTERSECT
+SELECT * FROM tab4
 
 Review comment:
   I would add udfs in those tests. Otherwise, it would just duplicate tests in 
`except-all.sql`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304728377
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
 
 Review comment:
   I would test a different combination here `udf(c1 > udf(6))`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304728435
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- MINUS ALL (synonym for EXCEPT)
+SELECT * FROM tab1
+MINUS ALL
+SELECT * FROM tab2;
+
+-- EXCEPT ALL same table in both branches
+-- Note that there will one less NULL in the result compared to the non-udf 
result
+-- because udf converts null to a string "null".
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL;
+
+-- Empty left relation
+SELECT * FROM tab1 WHERE udf(c1) > 5
+EXCEPT ALL
+SELECT * FROM tab2;
+
+-- Empty right relation
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT * FROM tab2 WHERE c1 > udf(6);
+
+-- Type Coerced ExceptAll
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT CAST(udf(1) AS BIGINT);
+
+-- Error as types of two side are not compatible
+SELECT * FROM tab1
+EXCEPT ALL
+SELECT array(1);
 
 Review comment:
   `udf(array(1))`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25168: 
[SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25168#issuecomment-512657677
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25124: 
[SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25124#issuecomment-512657693
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12933/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25119: 
[SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25119#issuecomment-512657730
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12935/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base

2019-07-17 Thread GitBox

AmplabJenkins removed a comment on issue #25122: 
[SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25122#issuecomment-512657696
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base

2019-07-17 Thread GitBox

HyukjinKwon commented on a change in pull request #25090: 
[SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25090#discussion_r304728322
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql
 ##
 @@ -0,0 +1,166 @@
+-- This test file was converted from except-all.sql.
+-- Note that currently registered UDF returns a string. So there are some 
differences, for instance
+-- in string cast within UDF in Scala and Python.
+
+CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES
+(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1);
+CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES
+(1), (2), (2), (3), (5), (5), (null) AS tab2(c1);
+CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES
+(1, 2), 
+(1, 2),
+(1, 3),
+(2, 3),
+(2, 2)
+AS tab3(k, v);
+CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES
+(1, 2), 
+(2, 3),
+(2, 2),
+(2, 2),
+(2, 20)
+AS tab4(k, v);
+
+-- Basic EXCEPT ALL
+SELECT * FROM tab1
 
 Review comment:
   @imback82, can we manually list up the columns, for instance,  `SELECT 
udf(c1) FROM tab1`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 763 matches

Mail list logo