[GitHub] [spark] AmplabJenkins removed a comment on issue #25188: spark1
AmplabJenkins removed a comment on issue #25188: spark1 URL: https://github.com/apache/spark/pull/25188#issuecomment-512676002 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25188: spark1
AmplabJenkins commented on issue #25188: spark1 URL: https://github.com/apache/spark/pull/25188#issuecomment-512676165 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25188: spark1
AmplabJenkins commented on issue #25188: spark1 URL: https://github.com/apache/spark/pull/25188#issuecomment-512676002 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] talendinbox opened a new pull request #25188: spark1
talendinbox opened a new pull request #25188: spark1 URL: https://github.com/apache/spark/pull/25188 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review https://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304743751 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 Review comment: Yes, will do. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
imback82 commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304743647 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- INTERSECT ALL same table in both branches +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab1 WHERE udf(k) = 1; + +-- Empty left relation +SELECT * FROM tab1 WHERE k > udf(2) +INTERSECT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT); + +-- Type Coerced INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +INTERSECT ALL +SELECT array(1), udf(2); + +-- Mismatch on number of columns across both branches +SELECT udf(k) FROM tab1 +INTERSECT ALL +SELECT udf(k), udf(v) FROM tab2; + +-- Basic +SELECT * FROM tab2 +INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +UNION ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +EXCEPT +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- test use parenthesis to control order of evaluation +( + ( +( + SELECT * FROM tab1 + EXCEPT + SELECT * FROM tab2 +) +EXCEPT +SELECT * FROM tab1 + ) + INTERSECT ALL + SELECT * FROM tab2 +) +; + +-- Join under intersect all +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) Review comment: Reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512672985 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12940/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512672978 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512672985 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12940/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512672978 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512671797 **[Test build #107825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107825/testReport)** for PR 25090 at commit [`2c8cc19`](https://github.com/apache/spark/commit/2c8cc194fb6552cebe6cd1333cb88374c4a156a8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512671629 @HyukjinKwon, I think I addressed all your comments. Please re-review this. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512671341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107810/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512671334 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512671341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107810/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
AmplabJenkins commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512671334 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512670846 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107819/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA removed a comment on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512648513 **[Test build #107810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107810/testReport)** for PR 25007 at commit [`9f597dd`](https://github.com/apache/spark/commit/9f597dd726aba08642c4329534e5ae12ffa6fbe9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512670838 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API
SparkQA commented on issue #25007: [SPARK-28209][CORE][SHUFFLE] Proposed new shuffle writer API URL: https://github.com/apache/spark/pull/25007#issuecomment-512670975 **[Test build #107810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107810/testReport)** for PR 25007 at commit [`9f597dd`](https://github.com/apache/spark/commit/9f597dd726aba08642c4329534e5ae12ffa6fbe9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
SparkQA removed a comment on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512656638 **[Test build #107819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107819/testReport)** for PR 25090 at commit [`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512670846 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107819/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
AmplabJenkins commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512670838 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
SparkQA commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512670717 **[Test build #107819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107819/testReport)** for PR 25090 at commit [`a09df4b`](https://github.com/apache/spark/commit/a09df4b5dc90c93f05d68fd6695ccb2de663895c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304738907 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +INTERSECT DISTINCT +SELECT * FROM tab4; + +-- EXCEPT ALL + EXCEPT +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Mismatch on number of columns across both branches +SELECT k FROM tab3 +EXCEPT ALL +SELECT k, v FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Using MINUS ALL +SELECT * FROM tab3 +MINUS ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +MINUS DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +EXCEPT DISTINCT +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Join under except all. Should produce empty resultset since both left and right sets +-- are same. +SELECT * +FROM (SELECT udf(tab3.k), + udf(tab4.v) +FROM tab3 + JOIN tab4 + ON udf(tab3.k) = udf(tab4.k)) Review comment: Yes, this can be done now with your `udf` fix. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25184: [SPARK-28431]Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message
HyukjinKwon commented on a change in pull request #25184: [SPARK-28431]Fix CSV datasource throw com.univocity.parsers.common.TextParsingException with large size message URL: https://github.com/apache/spark/pull/25184#discussion_r304737449 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala ## @@ -346,4 +348,23 @@ private[sql] object UnivocityParser { parser.options.columnNameOfCorruptRecord) filteredLines.flatMap(safeParser.parse) } + + def limitParserErrorContentLength[T](f: () => T): T = { +try { + f() +} catch { + case e: TextParsingException => +e.setErrorContentLength(SQLConf.get.getConf( Review comment: @WeichenXu123, seems `setErrorContentLength` can be set in `CSVOptions`'s parser and writer settings (see https://github.com/uniVocity/univocity-parsers/blob/f616d151b48150bc9cb98943f9b6f8353b704359/src/test/java/com/univocity/parsers/common/DataProcessingExceptionTest.java) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304735783 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304735863 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM
HyukjinKwon commented on issue #25133: [SPARK-28365][ML] Fallback locale to en_US in StopWordsRemover if system default locale isn't in available locales in JVM URL: https://github.com/apache/spark/pull/25133#issuecomment-512665659 I am not sure. The change here doesn't look affecting the default locale in JVM but only in `StopWordsRemover`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op
HyukjinKwon commented on issue #25130: [SPARK-28359][SQL][PYTHON][TESTS] Make integrated UDF tests robust by making UDFs (virtually) no-op URL: https://github.com/apache/spark/pull/25130#issuecomment-512665250 Okay .. JDK 11 test, SBT, Maven builds look all fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304734548 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); Review comment: Oh, yes. complex types cannot be supported via udf for now. I forgot. Yes, let's just don't do it for now and just replace `*` to UDF. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bersprockets commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view
bersprockets commented on issue #25068: [SPARK-28156][SQL][BACKPORT-2.4] Self-join should not miss cached view URL: https://github.com/apache/spark/pull/25068#issuecomment-512664402 I did the following: - replaced `!v.sameOutput(child)` with `output != child.output` - replaced `!Cast.canUpCast` with `Cast.mayTruncate` In the process, I broke a test in SQLViewSuite. I will hunt down the cause tomorrow and hopefully post the changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304733255 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); Review comment: Actually, changing to `udf(array(1))` gives the following message: ``` cannot resolve 'udf(cast(array(1) as string))' due to data type mismatch: cannot cast string to array; line 3 pos 7 ``` The expected message is: ``` ExceptAll can only be performed on tables with the compatible column types. array <> int at the first column of the second table; ``` @HyukjinKwon do you still want this change? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304732964 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 Review comment: Yea, otherwise, it would just duplicate the tests in original files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304733009 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 Review comment: Yea, let's replace it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25183: [SPARK-28430][UI] Fix stage table rendering when some tasks' metrics are missing
HyukjinKwon commented on issue #25183: [SPARK-28430][UI] Fix stage table rendering when some tasks' metrics are missing URL: https://github.com/apache/spark/pull/25183#issuecomment-512662741 Looks fine but usually if this patch involves UI changes, it attaches a screenshot after the fix though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304732716 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 Review comment: @HyukjinKwon, I can replace * with udf(c1). Did you want something else? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored
HyukjinKwon closed pull request #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored URL: https://github.com/apache/spark/pull/25175 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25182: [SPARK-27609][PYTHON] Convert values of function options to strings
HyukjinKwon closed pull request #25182: [SPARK-27609][PYTHON] Convert values of function options to strings URL: https://github.com/apache/spark/pull/25182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304732055 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); Review comment: Changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304732075 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); Review comment: Changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304731897 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 Review comment: @HyukjinKwon, do you want this in all instances below? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored
HyukjinKwon commented on issue #25175: [SPARK-28411][PYTHON][SQL] InsertInto with overwrite is not honored URL: https://github.com/apache/spark/pull/25175#issuecomment-512661717 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25182: [SPARK-27609][PYTHON] Convert values of function options to strings
HyukjinKwon commented on issue #25182: [SPARK-27609][PYTHON] Convert values of function options to strings URL: https://github.com/apache/spark/pull/25182#issuecomment-512661568 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
imback82 commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304731560 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-except-all.sql.out ## @@ -0,0 +1,345 @@ +-- Automatically generated by SQLQueryTestSuite +-- Number of queries: 27 + + +-- !query 0 +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1) +-- !query 0 schema +struct<> +-- !query 0 output + + + +-- !query 1 +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1) +-- !query 1 schema +struct<> +-- !query 1 output + + + +-- !query 2 +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v) +-- !query 2 schema +struct<> +-- !query 2 output + + + +-- !query 3 +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v) +-- !query 3 schema +struct<> +-- !query 3 output + + + +-- !query 4 +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 +-- !query 4 schema +struct +-- !query 4 output +0 +2 +2 +NULL + + +-- !query 5 +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2 +-- !query 5 schema +struct +-- !query 5 output +0 +2 +2 +NULL + + +-- !query 6 +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL +-- !query 6 schema +struct +-- !query 6 output +0 +2 +2 +NULL Review comment: Reverted the comment since now it returns the correct result with your changes :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler
AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512661306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107811/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler
AmplabJenkins removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512661303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler
AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512661306 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107811/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler
AmplabJenkins commented on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512661303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler
SparkQA removed a comment on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512649817 **[Test build #107811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107811/testReport)** for PR 25160 at commit [`a196c09`](https://github.com/apache/spark/commit/a196c09bdc1a94a4f98da1328d29815bb993140b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25160: [SPARK-28399][ML] implement RobustScaler
SparkQA commented on issue #25160: [SPARK-28399][ML] implement RobustScaler URL: https://github.com/apache/spark/pull/25160#issuecomment-512661053 **[Test build #107811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107811/testReport)** for PR 25160 at commit [`a196c09`](https://github.com/apache/spark/commit/a196c09bdc1a94a4f98da1328d29815bb993140b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base
HyukjinKwon commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25168#issuecomment-512660918 BTW, @viirya, please feel free to review those PRs when you have some times since you know those codes pretty well as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512660656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107806/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
AmplabJenkins removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512660654 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream
SparkQA commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-512660752 **[Test build #107824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107824/testReport)** for PR 25180 at commit [`878eaa5`](https://github.com/apache/spark/commit/878eaa520dafa109d1682388c601e4f0b43916ee). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512660654 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
AmplabJenkins commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512660656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/107806/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
HyukjinKwon commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-512660638 Looks fine in general. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#discussion_r304730735 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-join-empty-relation.sql ## @@ -0,0 +1,37 @@ +-- List of configuration the test suite is run against: +--SET spark.sql.autoBroadcastJoinThreshold=10485760 +--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true +--SET spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false + +-- This test file was converted from join-empty-relation.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES (1) AS GROUPING(a); +CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES (1) AS GROUPING(a); + +CREATE TEMPORARY VIEW empty_table as SELECT a FROM t2 WHERE false; + +SELECT udf(t1.a), udf(empty_table.a) FROM t1 INNER JOIN empty_table ON (udf(t1.a) = udf(empty_table.a)); Review comment: Likewise, we can test the UDFs like `udf(udf(t1.a))` or `udf(udf(empty_table.a) = udf(t1.a))`. Let's add such combinations as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream
AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-512660377 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12939/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
SparkQA removed a comment on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512625863 **[Test build #107806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107806/testReport)** for PR 24798 at commit [`be04476`](https://github.com/apache/spark/commit/be04476e968bd5cb5722c3b5a208b8430d78b1b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream
AmplabJenkins removed a comment on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-512660375 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#issuecomment-512660401 Looks fine in general otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream
AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-512660377 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12939/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2
SparkQA commented on issue #24798: [SPARK-27724][SQL] Implement REPLACE TABLE and REPLACE TABLE AS SELECT with V2 URL: https://github.com/apache/spark/pull/24798#issuecomment-512660320 **[Test build #107806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107806/testReport)** for PR 24798 at commit [`be04476`](https://github.com/apache/spark/commit/be04476e968bd5cb5722c3b5a208b8430d78b1b9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream
AmplabJenkins commented on issue #25180: [SPARK-28423][SQL] merge Scan and Batch/Stream URL: https://github.com/apache/spark/pull/25180#issuecomment-512660375 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r304730520 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql ## @@ -0,0 +1,54 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select udf(a), b from values ("one", 1) as data(a, b); + +-- single column multiple rows +select udf(a) from values 1, 2, 3 as data(a); + +-- three rows +select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as data(a, b); + +-- null type +select udf(a), b from values ("one", null), ("two", null) as data(a, b); + +-- int and long coercion +select udf(a), b from values ("one", 1), ("two", 2L) as data(a, b); + +-- foldable expressions +select udf(a), udf(b) from values ("one", 1 + 0), ("two", 1 + 3L) as data(a, b); + +-- complex types +select udf(a), b from values ("one", array(0, 1)), ("two", array(2, 3)) as data(a, b); + +-- decimal and double coercion +select udf(a), b from values ("one", 2.0), ("two", 3.0D) as data(a, b); + +-- error reporting: nondeterministic function rand +select udf(a), b from values ("one", rand(5)), ("two", 3.0D) as data(a, b); + +-- error reporting: different number of columns +select udf(a), udf(b) from values ("one", 2.0), ("two") as data(a, b); + +-- error reporting: types that are incompatible +select udf(a), udf(b) from values ("one", array(0, 1)), ("two", struct(1, 2)) as data(a, b); + +-- error reporting: number aliases different from number data values +select udf(a), udf(b) from values ("one"), ("two") as data(a, b); + +-- error reporting: unresolved expression +select udf(a), udf(b) from values ("one", random_not_exist_func(1)), ("two", 2) as data(a, b); + +-- error reporting: aggregate expression +select udf(a), udf(b) from values ("one", count(1)), ("two", 2) as data(a, b); + +-- string to timestamp Review comment: Let's add udf in all tests. Otherwise, it just duplicates the original file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r304730452 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql ## @@ -0,0 +1,54 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select udf(a), b from values ("one", 1) as data(a, b); + +-- single column multiple rows +select udf(a) from values 1, 2, 3 as data(a); + +-- three rows +select udf(a), b from values ("one", 1), ("two", 2), ("three", null) as data(a, b); + +-- null type +select udf(a), b from values ("one", null), ("two", null) as data(a, b); + +-- int and long coercion +select udf(a), b from values ("one", 1), ("two", 2L) as data(a, b); + +-- foldable expressions +select udf(a), udf(b) from values ("one", 1 + 0), ("two", 1 + 3L) as data(a, b); Review comment: I would test `udf(udf(a))` too This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r304730380 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql ## @@ -0,0 +1,54 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select udf(a), b from values ("one", 1) as data(a, b); Review comment: See `udf-aggregates_part1.sql` to check how I commented them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r304730321 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql ## @@ -0,0 +1,54 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; + +-- single row +select udf(a), b from values ("one", 1) as data(a, b); Review comment: I think `values ("one", udf(1))` is not allowed as of SPARK-28291. We can add that test here, and comment them with linking `SPARK-28291` JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#discussion_r304730144 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-inline-table.sql ## @@ -0,0 +1,54 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +-- single row, without table and column alias +select * from values ("one", 1); + +-- single row, without column alias +select * from values ("one", 1) as data; Review comment: let's explicitly test UDF here instead of `*`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-512659898 Looks fine otherwise if the tests pass. I will take another look before merging it in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#discussion_r304729849 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql ## @@ -0,0 +1,317 @@ +-- This test file was converted from pivot.sql. + +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +--Note some test cases have been commented as the current integrated UDFs cannot handle complex types + +create temporary view courseSales as select * from values + ("dotNET", 2012, 1), + ("Java", 2012, 2), + ("dotNET", 2012, 5000), + ("dotNET", 2013, 48000), + ("Java", 2013, 3) + as courseSales(course, year, earnings); + +create temporary view years as select * from values + (2012, 1), + (2013, 2) + as years(y, s); + +create temporary view yearsWithComplexTypes as select * from values + (2012, array(1, 1), map('1', 1), struct(1, 'a')), + (2013, array(2, 2), map('2', 2), struct(2, 'b')) + as yearsWithComplexTypes(y, a, m, s); + +-- pivot courses +SELECT * FROM ( + SELECT udf(year), course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot years with no subquery +SELECT * FROM courseSales +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +); + +-- pivot courses with multiple aggregations +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(avg(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column +SELECT * FROM ( + SELECT udf(course) as course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column and with multiple aggregations on different columns +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(min(year)) Review comment: We can try `udf(sum(udf(earnings)))` combination too in this file in general This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#discussion_r304729980 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql ## @@ -0,0 +1,317 @@ +-- This test file was converted from pivot.sql. + +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +--Note some test cases have been commented as the current integrated UDFs cannot handle complex types + +create temporary view courseSales as select * from values + ("dotNET", 2012, 1), + ("Java", 2012, 2), + ("dotNET", 2012, 5000), + ("dotNET", 2013, 48000), + ("Java", 2013, 3) + as courseSales(course, year, earnings); + +create temporary view years as select * from values + (2012, 1), + (2013, 2) + as years(y, s); + +create temporary view yearsWithComplexTypes as select * from values + (2012, array(1, 1), map('1', 1), struct(1, 'a')), + (2013, array(2, 2), map('2', 2), struct(2, 'b')) + as yearsWithComplexTypes(y, a, m, s); + +-- pivot courses +SELECT * FROM ( + SELECT udf(year), course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot years with no subquery +SELECT * FROM courseSales +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +); + +-- pivot courses with multiple aggregations +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(avg(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column +SELECT * FROM ( + SELECT udf(course) as course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column and with multiple aggregations on different columns +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(min(year)) + FOR course IN ('dotNET', 'Java') +); + +--todo nan fix +-- pivot on join query with multiple group by columns +SELECT * FROM ( + SELECT course, year, earnings, udf(s) as s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)) + FOR s IN (1, 2) +); + +-- pivot on join query with multiple aggregations on different columns +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)), udf(min(s)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot on join query with multiple columns in one aggregation +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings * s)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with aliases and projection +SELECT 2012_s, 2013_s, 2012_a, 2013_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012, 2013) +); + +-- pivot with projection and value aliases +SELECT firstYear_s, secondYear_s, firstYear_a, secondYear_a, c FROM ( + SELECT year y, course c, earnings e FROM courseSales +) +PIVOT ( + udf(sum(e)) s, udf(avg(e)) a + FOR y IN (2012 as firstYear, 2013 secondYear) +); + +-- pivot years with non-aggregate function +SELECT * FROM courseSales +PIVOT ( + udf(abs(earnings)) + FOR year IN (2012, 2013) +); + +-- pivot with one of the expressions as non-aggregate function +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), year + FOR course IN ('dotNET', 'Java') +); + +-- pivot with unresolvable columns +SELECT * FROM ( + SELECT course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +); + +-- pivot with complex aggregate expressions +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(ceil(udf(sum(earnings, avg(earnings) + 1 as a1 + FOR course IN ('dotNET', 'Java') +); + +-- pivot with invalid arguments in aggregate expressions +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + sum(udf(avg(earnings))) + FOR course IN ('dotNET', 'Java') +); + +--todo nan fix +-- pivot on multiple pivot columns +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)) + FOR (course, year) IN (('dotNET', 2012), ('Java', 2013)) +); + +--todo nan fix +-- pivot on multiple pivot columns with aliased values +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON year = y +) +PIVOT ( + udf(sum(earnings)) + FOR (course, s) IN (('dotNET', 2) as c1, ('Java', 1) as c2) +); + +-- pivot on multiple pivot columns with values of wrong data types +SELECT * FROM ( + SELECT course, year, earnings, s + FROM courseSales + JOIN years ON
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#discussion_r304729849 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-pivot.sql ## @@ -0,0 +1,317 @@ +-- This test file was converted from pivot.sql. + +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +--Note some test cases have been commented as the current integrated UDFs cannot handle complex types + +create temporary view courseSales as select * from values + ("dotNET", 2012, 1), + ("Java", 2012, 2), + ("dotNET", 2012, 5000), + ("dotNET", 2013, 48000), + ("Java", 2013, 3) + as courseSales(course, year, earnings); + +create temporary view years as select * from values + (2012, 1), + (2013, 2) + as years(y, s); + +create temporary view yearsWithComplexTypes as select * from values + (2012, array(1, 1), map('1', 1), struct(1, 'a')), + (2013, array(2, 2), map('2', 2), struct(2, 'b')) + as yearsWithComplexTypes(y, a, m, s); + +-- pivot courses +SELECT * FROM ( + SELECT udf(year), course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot years with no subquery +SELECT * FROM courseSales +PIVOT ( + udf(sum(earnings)) + FOR year IN (2012, 2013) +); + +-- pivot courses with multiple aggregations +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(avg(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column +SELECT * FROM ( + SELECT udf(course) as course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)) + FOR course IN ('dotNET', 'Java') +); + +-- pivot with no group by column and with multiple aggregations on different columns +SELECT * FROM ( + SELECT year, course, earnings FROM courseSales +) +PIVOT ( + udf(sum(earnings)), udf(min(year)) Review comment: We can try `udf(sum(udf(earnings)))` combination too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304729778 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- INTERSECT ALL same table in both branches +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab1 WHERE udf(k) = 1; + +-- Empty left relation +SELECT * FROM tab1 WHERE k > udf(2) +INTERSECT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT); + +-- Type Coerced INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +INTERSECT ALL +SELECT array(1), udf(2); + +-- Mismatch on number of columns across both branches +SELECT udf(k) FROM tab1 +INTERSECT ALL +SELECT udf(k), udf(v) FROM tab2; + +-- Basic +SELECT * FROM tab2 +INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +UNION ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +EXCEPT +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- test use parenthesis to control order of evaluation +( + ( +( + SELECT * FROM tab1 + EXCEPT + SELECT * FROM tab2 +) +EXCEPT +SELECT * FROM tab1 + ) + INTERSECT ALL + SELECT * FROM tab2 +) +; + +-- Join under intersect all +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) +INTERSECT ALL +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)); + +-- Join under intersect all (2) +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) +INTERSECT ALL +SELECT * +FROM (SELECT udf(tab2.v) AS k, + udf(tab1.k) AS v +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)); + +-- Group by under intersect all +SELECT CAST(udf(v) AS BIGINT) FROM tab1 GROUP BY v +INTERSECT ALL +SELECT CAST(udf(k) AS BIGINT) FROM tab2 GROUP BY k; Review comment: Let's get rid of the casts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304729754 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- INTERSECT ALL same table in both branches +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab1 WHERE udf(k) = 1; + +-- Empty left relation +SELECT * FROM tab1 WHERE k > udf(2) +INTERSECT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT); + +-- Type Coerced INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +INTERSECT ALL +SELECT array(1), udf(2); + +-- Mismatch on number of columns across both branches +SELECT udf(k) FROM tab1 +INTERSECT ALL +SELECT udf(k), udf(v) FROM tab2; + +-- Basic +SELECT * FROM tab2 +INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +UNION ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +EXCEPT +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- test use parenthesis to control order of evaluation +( + ( +( + SELECT * FROM tab1 + EXCEPT + SELECT * FROM tab2 +) +EXCEPT +SELECT * FROM tab1 + ) + INTERSECT ALL + SELECT * FROM tab2 +) +; + +-- Join under intersect all +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) +INTERSECT ALL +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)); + +-- Join under intersect all (2) +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) +INTERSECT ALL +SELECT * +FROM (SELECT udf(tab2.v) AS k, + udf(tab1.k) AS v +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)); Review comment: We could try `udf(udf(tab1.k) = udf(tab2.k))` or `udf(udf(tab1.k) = tab2.k)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304729678 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- INTERSECT ALL same table in both branches +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab1 WHERE udf(k) = 1; + +-- Empty left relation +SELECT * FROM tab1 WHERE k > udf(2) +INTERSECT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 WHERE CAST(udf(k) AS BIGINT) > CAST(udf(3) AS BIGINT); + +-- Type Coerced INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT CAST(udf(1) AS BIGINT), CAST(udf(2) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +INTERSECT ALL +SELECT array(1), udf(2); + +-- Mismatch on number of columns across both branches +SELECT udf(k) FROM tab1 +INTERSECT ALL +SELECT udf(k), udf(v) FROM tab2; + +-- Basic +SELECT * FROM tab2 +INTERSECT ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +UNION ALL +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- Chain of different `set operations +SELECT * FROM tab1 +EXCEPT +SELECT * FROM tab2 +EXCEPT +SELECT * FROM tab1 +INTERSECT ALL +SELECT * FROM tab2 +; + +-- test use parenthesis to control order of evaluation +( + ( +( + SELECT * FROM tab1 + EXCEPT + SELECT * FROM tab2 +) +EXCEPT +SELECT * FROM tab1 + ) + INTERSECT ALL + SELECT * FROM tab2 +) +; + +-- Join under intersect all +SELECT * +FROM (SELECT udf(tab1.k), + udf(tab2.v) +FROM tab1 + JOIN tab2 + ON CAST(udf(tab1.k) AS BIGINT) = CAST(udf(tab2.k) AS BIGINT)) Review comment: Yea, now we don't have to add such cases anymore. Let's get rid of them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#discussion_r304729633 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-intersect-all.sql ## @@ -0,0 +1,164 @@ +-- This test file was converted from intersect-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(1, 3), +(2, 3), +(null, null), +(null, null) +AS tab1(k, v); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(2, 3), +(3, 4), +(null, null), +(null, null) +AS tab2(k, v); + +-- Basic INTERSECT ALL +SELECT * FROM tab1 Review comment: I think my comments I left at of your PRs are applied here too. Let's list up cols This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base
HyukjinKwon commented on issue #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base URL: https://github.com/apache/spark/pull/25113#issuecomment-512659133 Looks good to me otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25113: [SPARK-28287][SQL][PYTHON][TESTS] Convert and port 'udaf.sql' into UDF test base URL: https://github.com/apache/spark/pull/25113#discussion_r304729414 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-udaf.sql ## @@ -0,0 +1,18 @@ +-- This test file was converted from udaf.sql. + +CREATE OR REPLACE TEMPORARY VIEW t1 AS SELECT * FROM VALUES +(1), (2), (3), (4) +as t1(int_col1); + +CREATE FUNCTION myDoubleAvg AS 'test.org.apache.spark.sql.MyDoubleAvg'; + +SELECT default.myDoubleAvg(udf(int_col1)) as my_avg from t1; Review comment: @vinodkc, let's add a different combination in general. For instance, ``` udf(default.myDoubleAvg(udf(int_col1))) ``` ``` udf(default.myDoubleAvg(int_col1)) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base
HyukjinKwon commented on issue #25103: [SPARK-28285][SQL][PYTHON][TESTS] Convert and port 'outer-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25103#issuecomment-512658948 Looks good to me in general if the tests pass This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
HyukjinKwon commented on issue #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#issuecomment-512658787 Looks fine in general but let's focus on testing GROUP BY clause with UDFs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#discussion_r304729007 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql ## @@ -0,0 +1,156 @@ +-- This test file was converted from group-by.sql. +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +-- Aggregate with empty GroupBy expressions. +SELECT udf(a), udf(COUNT(b)) FROM testData; +SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData; + +-- Aggregate with non-empty GroupBy expressions. +SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a; +SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b; +SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a); + +-- Aggregate grouped by literals. +SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1; + +-- Aggregate grouped by literals (whole stage code generation). +SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1; + +-- Aggregate grouped by literals (hash aggregate). +SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1; + +-- Aggregate grouped by literals (sort aggregate). +SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1; + +-- Aggregate with complex GroupBy expressions. +SELECT CAST(udf(a + b) as INT), udf(COUNT(b)) FROM testData GROUP BY a + b; Review comment: I would focus on adding udfs in `GROUP BY` clause because this test targets to test `GROUP BY` basically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#discussion_r304728860 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql ## @@ -0,0 +1,156 @@ +-- This test file was converted from group-by.sql. +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +-- Aggregate with empty GroupBy expressions. +SELECT udf(a), udf(COUNT(b)) FROM testData; +SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData; + +-- Aggregate with non-empty GroupBy expressions. +SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a; +SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b; Review comment: we could test `udf(COUNT(udf(b)))` combination too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25098: [SPARK-28280][SQL][PYTHON][TESTS] Convert and port 'group-by.sql' into UDF test base URL: https://github.com/apache/spark/pull/25098#discussion_r304728777 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql ## @@ -0,0 +1,156 @@ +-- This test file was converted from group-by.sql. +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +-- Aggregate with empty GroupBy expressions. +SELECT udf(a), udf(COUNT(b)) FROM testData; +SELECT COUNT(udf(a)), udf(COUNT(b)) FROM testData; + +-- Aggregate with non-empty GroupBy expressions. +SELECT CAST(udf(a) as int), COUNT(udf(b)) FROM testData GROUP BY a; +SELECT udf(a), udf(COUNT(b)) FROM testData GROUP BY b; +SELECT COUNT(udf(a)), COUNT(udf(b)) FROM testData GROUP BY udf(a); + +-- Aggregate grouped by literals. +SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1; + +-- Aggregate grouped by literals (whole stage code generation). +SELECT 'foo' FROM testData WHERE a = 0 GROUP BY 1; Review comment: This one seems not having an udf. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on issue #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#issuecomment-512658238 Looks fine otherwise. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304728650 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +INTERSECT DISTINCT +SELECT * FROM tab4; + +-- EXCEPT ALL + EXCEPT +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION ALL +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Mismatch on number of columns across both branches +SELECT k FROM tab3 +EXCEPT ALL +SELECT k, v FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Using MINUS ALL +SELECT * FROM tab3 +MINUS ALL +SELECT * FROM tab4 +UNION +SELECT * FROM tab3 +MINUS DISTINCT +SELECT * FROM tab4; + +-- Chain of set operations +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4 +EXCEPT DISTINCT +SELECT * FROM tab3 +EXCEPT DISTINCT +SELECT * FROM tab4; + +-- Join under except all. Should produce empty resultset since both left and right sets +-- are same. +SELECT * +FROM (SELECT udf(tab3.k), + udf(tab4.v) +FROM tab3 + JOIN tab4 + ON udf(tab3.k) = udf(tab4.k)) Review comment: Can we use different combination here and below? For instnace, ``` udf(tab3.k) = tab4.k) ``` ``` udf(udf(tab3.k) = udf(tab4.k)) ``` ``` SELECT * FROM (SELECT tab3.k, udf(tab4.v) FROM tab3 JOIN tab4 ON udf(tab3.k) = udf(tab4.k)) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
SparkQA commented on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#issuecomment-512658030 **[Test build #107823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107823/testReport)** for PR 25124 at commit [`89212b7`](https://github.com/apache/spark/commit/89212b73627a42ff6e0725ccc3c16bdd839d0805). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base
SparkQA commented on issue #25127: [SPARK-28284][SQL][PYTHON][TESTS] Convert and port 'join-empty-relation.sql' into UDF test base URL: https://github.com/apache/spark/pull/25127#issuecomment-512658007 **[Test build #107822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107822/testReport)** for PR 25127 at commit [`394afe8`](https://github.com/apache/spark/commit/394afe85bf3cd1cf0da629714f34e1d4f29bfd4d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base
SparkQA commented on issue #25161: [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 'pgSQL/select_having.sql' into UDF test base URL: https://github.com/apache/spark/pull/25161#issuecomment-512658014 **[Test build #107821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107821/testReport)** for PR 25161 at commit [`6f44282`](https://github.com/apache/spark/commit/6f4428250499738c496aa89cbb338fcecdcd9b9d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base
SparkQA commented on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25168#issuecomment-512658009 **[Test build #107820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/107820/testReport)** for PR 25168 at commit [`ac20743`](https://github.com/apache/spark/commit/ac20743bf09d6a976f632c586da683220ff8bdf5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304728497 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); + +-- Basic +SELECT * FROM tab3 +EXCEPT ALL +SELECT * FROM tab4; + +-- Basic +SELECT * FROM tab4 +EXCEPT ALL +SELECT * FROM tab3; + +-- EXCEPT ALL + INTERSECT +SELECT * FROM tab4 Review comment: I would add udfs in those tests. Otherwise, it would just duplicate tests in `except-all.sql`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304728377 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); Review comment: I would test a different combination here `udf(c1 > udf(6))` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304728435 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2; + +-- MINUS ALL (synonym for EXCEPT) +SELECT * FROM tab1 +MINUS ALL +SELECT * FROM tab2; + +-- EXCEPT ALL same table in both branches +-- Note that there will one less NULL in the result compared to the non-udf result +-- because udf converts null to a string "null". +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE udf(c1) IS NOT NULL; + +-- Empty left relation +SELECT * FROM tab1 WHERE udf(c1) > 5 +EXCEPT ALL +SELECT * FROM tab2; + +-- Empty right relation +SELECT * FROM tab1 +EXCEPT ALL +SELECT * FROM tab2 WHERE c1 > udf(6); + +-- Type Coerced ExceptAll +SELECT * FROM tab1 +EXCEPT ALL +SELECT CAST(udf(1) AS BIGINT); + +-- Error as types of two side are not compatible +SELECT * FROM tab1 +EXCEPT ALL +SELECT array(1); Review comment: `udf(array(1))` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base
AmplabJenkins removed a comment on issue #25168: [SPARK-28276][SQL][PYTHON][TEST] Convert and port 'cross-join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25168#issuecomment-512657677 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base
AmplabJenkins removed a comment on issue #25124: [SPARK-28282][SQL][PYTHON][TESTS] Convert and port 'inline-table.sql' into UDF test base URL: https://github.com/apache/spark/pull/25124#issuecomment-512657693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12933/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base
AmplabJenkins removed a comment on issue #25119: [SPARK-28283][SQL][PYTHON][TESTS] Convert and port 'intersect-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25119#issuecomment-512657730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/12935/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base
AmplabJenkins removed a comment on issue #25122: [SPARK-28286][SQL][PYTHON][TESTS][WIP] Convert and port 'pivot.sql' into UDF test base URL: https://github.com/apache/spark/pull/25122#issuecomment-512657696 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25090: [SPARK-28278][SQL][PYTHON][TESTS] Convert and port 'except-all.sql' into UDF test base URL: https://github.com/apache/spark/pull/25090#discussion_r304728322 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-except-all.sql ## @@ -0,0 +1,166 @@ +-- This test file was converted from except-all.sql. +-- Note that currently registered UDF returns a string. So there are some differences, for instance +-- in string cast within UDF in Scala and Python. + +CREATE TEMPORARY VIEW tab1 AS SELECT * FROM VALUES +(0), (1), (2), (2), (2), (2), (3), (null), (null) AS tab1(c1); +CREATE TEMPORARY VIEW tab2 AS SELECT * FROM VALUES +(1), (2), (2), (3), (5), (5), (null) AS tab2(c1); +CREATE TEMPORARY VIEW tab3 AS SELECT * FROM VALUES +(1, 2), +(1, 2), +(1, 3), +(2, 3), +(2, 2) +AS tab3(k, v); +CREATE TEMPORARY VIEW tab4 AS SELECT * FROM VALUES +(1, 2), +(2, 3), +(2, 2), +(2, 2), +(2, 20) +AS tab4(k, v); + +-- Basic EXCEPT ALL +SELECT * FROM tab1 Review comment: @imback82, can we manually list up the columns, for instance, `SELECT udf(c1) FROM tab1`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org