[GitHub] [spark] AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490527962 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490527970 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10538/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490599686 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
AmplabJenkins removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490599697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105258/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
SparkQA removed a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490528827 **[Test build #105258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105258/testReport)** for PR 24557 at commit [`5c7e3c5`](https://github.com/apache/spark/commit/5c7e3c500aa46461cca1d2d802a6be4f7caa3cb4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
viirya commented on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490746455 @JoshRosen Thanks for the review! > * Could you also implement min_by(x, y)? Yes. I originally planed to have separate PR for it. I'm fine to add it here. A shared abstract superclass to share code is good. > * Presto also has three-argument versions of max_by / min_by: Agreed. We don't need three-argument versions now. If we need it, we can add it in a followup. > * Were there any bugs in older implementations of Presto version that we might have replicated here? Or Presto tests for edge-cases that we could emulate? * For using rows / structs as the ordering value, I also think it would work. I will add few tests. * For null ordering values, I already have few test cases. I checked Presto's results and they are matched. Let me see double-check if we've covered the same edge-case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
viirya edited a comment on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function URL: https://github.com/apache/spark/pull/24557#issuecomment-490746455 @JoshRosen Thanks for the review! > * Could you also implement min_by(x, y)? Yes. I originally planed to have separate PR for it. I'm fine to add it here. A shared abstract superclass to share code is good. > * Presto also has three-argument versions of max_by / min_by: Agreed. We don't need three-argument versions now. If we need it, we can add it in a followup. > * Were there any bugs in older implementations of Presto version that we might have replicated here? Or Presto tests for edge-cases that we could emulate? * For using rows / structs as the ordering value, I also think it would work. I will add few tests. * For null ordering values, I already have few test cases. I checked Presto's results and they are matched. Let me see double-check if we've covered the same edge-case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490745138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105273/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490745136 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490745136 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
SparkQA removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724858 **[Test build #105273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105273/testReport)** for PR 24555 at commit [`928f36a`](https://github.com/apache/spark/commit/928f36ab39a009df558f2cefa7590d635f62e002). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490745138 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105273/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
SparkQA commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490744695 **[Test build #105274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105274/testReport)** for PR 24553 at commit [`7a29f09`](https://github.com/apache/spark/commit/7a29f09db2f2bde32408af3eb9e5acc509f68fcc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
SparkQA commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490744786 **[Test build #105273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105273/testReport)** for PR 24555 at commit [`928f36a`](https://github.com/apache/spark/commit/928f36ab39a009df558f2cefa7590d635f62e002). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490744405 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490744396 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
AmplabJenkins commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490744405 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10550/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
AmplabJenkins commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490744396 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
HyukjinKwon commented on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490743428 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation
AmplabJenkins removed a comment on issue #24553: [SPARK-27604][SQL] Enhance constant propagation URL: https://github.com/apache/spark/pull/24553#issuecomment-490374276 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
wangyum commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490739903 cc @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] felixcheung commented on a change in pull request #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
felixcheung commented on a change in pull request #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#discussion_r282334994 ## File path: docs/structured-streaming-programming-guide.md ## @@ -510,8 +510,7 @@ returned by `SparkSession.readStream()`. In [R](api/R/read.stream.html), with th Input Sources There are a few built-in sources. - - **File source** - Reads files written in a directory as a stream of data. Supported file formats are text, csv, json, orc, parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations. - + - **File source** - Reads files written in a directory as a stream of data. Files will be processed in the order of file modification time. If `latestFirst` is set, order will be reversed. Supported file formats are text, CSV, JSON, ORC, Parquet. See the docs of the DataStreamReader interface for a more up-to-date list, and supported options for each file format. Note that the files must be atomically placed in the given directory, which in most file systems, can be achieved by file move operations. Review comment: why are ` text, CSV, JSON, ORC, Parquet` capitalized? I thought file format names are lower cased This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#issuecomment-490606875 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105266/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
SparkQA removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#issuecomment-490602853 **[Test build #105266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105266/testReport)** for PR 24548 at commit [`c10d65f`](https://github.com/apache/spark/commit/c10d65f6f17ab1a996e58c00bddd0b902bd64442). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#issuecomment-490604859 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#issuecomment-490606855 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source
AmplabJenkins removed a comment on issue #24548: [MINOR][SS][DOC] Added missing config `maxFileAge` in file streaming source URL: https://github.com/apache/spark/pull/24548#issuecomment-490604864 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10545/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490735505 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490735507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105272/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490735507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105272/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490735505 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
SparkQA removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490718370 **[Test build #105272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105272/testReport)** for PR 20430 at commit [`1c9caa0`](https://github.com/apache/spark/commit/1c9caa0ebec9ba6c8e0cf7ce95ef937159561c0a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
SparkQA commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490735330 **[Test build #105272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105272/testReport)** for PR 20430 at commit [`1c9caa0`](https://github.com/apache/spark/commit/1c9caa0ebec9ba6c8e0cf7ce95ef937159561c0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chummyhe89 commented on a change in pull request #24556: [SPARK-27641][CORE] Fix MetricsSystem to remove unregistered source correctly
chummyhe89 commented on a change in pull request #24556: [SPARK-27641][CORE] Fix MetricsSystem to remove unregistered source correctly URL: https://github.com/apache/spark/pull/24556#discussion_r282329325 ## File path: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala ## @@ -166,9 +166,11 @@ private[spark] class MetricsSystem private ( } def removeSource(source: Source) { -sources -= source -val regName = buildRegistryName(source) -registry.removeMatching((name: String, _: Metric) => name.startsWith(regName)) +if (sources.contains(source)) { + sources -= source + val regName = buildRegistryName(source) + registry.removeMatching((name: String, _: Metric) => name.startsWith(regName)) Review comment: Apologies for my misunderstanding.I will correct it soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chummyhe89 commented on a change in pull request #24556: [SPARK-27641][CORE] Fix MetricsSystem to remove unregistered source correctly
chummyhe89 commented on a change in pull request #24556: [SPARK-27641][CORE] Fix MetricsSystem to remove unregistered source correctly URL: https://github.com/apache/spark/pull/24556#discussion_r282328568 ## File path: core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala ## @@ -166,9 +166,11 @@ private[spark] class MetricsSystem private ( } def removeSource(source: Source) { -sources -= source -val regName = buildRegistryName(source) -registry.removeMatching((name: String, _: Metric) => name.startsWith(regName)) +if (sources.contains(source)) { + sources -= source Review comment: I add existing check because original code removes the metrics no matter whether the source was registered or not. i will use index to avoid the twice navigating.thanks for your advice. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490582850 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10543/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
SparkQA removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490583547 **[Test build #105263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105263/testReport)** for PR 24559 at commit [`6b9b7eb`](https://github.com/apache/spark/commit/6b9b7eb18d9477ce21cacbfa973c95094be02f3a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490644854 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105263/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
cloud-fan commented on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490729549 I think we need a design doc for the UDF API. We need to think about ease-of-use and performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490582840 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API
AmplabJenkins removed a comment on issue #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#issuecomment-490644847 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
cloud-fan commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API URL: https://github.com/apache/spark/pull/24559#discussion_r282326920 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/catalog/v2/ScalarFunction.java ## @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalog.v2; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.DataType; + +/** + * Interface for a function that produces a result value for each input row. + * + * The JVM type of result values produced by this function must be the type used by Spark's + * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * + * @param the JVM type of result values + */ +public interface ScalarFunction extends BoundFunction { + + /** + * Applies the function to an input row to produce a value. + * + * @param input an input row + * @return a result value + */ + R produceResult(InternalRow input); Review comment: A UDF doesn't take an entire row as it's input, but some columns. e.g. `SELECT substring(strCol, 3)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490585583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10544/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
SparkQA removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490586311 **[Test build #105264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105264/testReport)** for PR 24560 at commit [`3aa9ebd`](https://github.com/apache/spark/commit/3aa9ebd92569afd918189959ffdf602438278e39). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490650881 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490650890 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105264/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
AmplabJenkins removed a comment on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490585571 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API
cloud-fan commented on issue #24560: [SPARK-27661][SQL] Add SupportsNamespaces API URL: https://github.com/apache/spark/pull/24560#issuecomment-490728914 I feel separating creating and listing namespaces may make the API too complicated. I think it's OK if some implementation throws an exception in `createNamespace`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions
AmplabJenkins commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#issuecomment-490728457 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
cloud-fan closed pull request #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490570083 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105257/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490652287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105265/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
SparkQA removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490515926 **[Test build #105257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105257/testReport)** for PR 24149 at commit [`28ea0f9`](https://github.com/apache/spark/commit/28ea0f9e9fefe7a23ab43843286812a68e9fa7b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490570074 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
SparkQA removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490589051 **[Test build #105265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105265/testReport)** for PR 24149 at commit [`28ea0f9`](https://github.com/apache/spark/commit/28ea0f9e9fefe7a23ab43843286812a68e9fa7b9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
AmplabJenkins removed a comment on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490652282 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So…
cloud-fan commented on issue #24149: [SPARK-27207][SQL] : Ensure aggregate buffers are initialized again for So… URL: https://github.com/apache/spark/pull/24149#issuecomment-490726932 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join
nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join URL: https://github.com/apache/spark/pull/24563#issuecomment-490726901 Re: benchmarks. This is only anecdotal, but I’ve used this technique at work to bring a join that ran for a day without making progress down to only a few hours. As part of the experiments I mentioned above, I’ll try to make some dummy data that was similar to that use case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join
nvander1 commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] Rewrite ArraysOverlap Join URL: https://github.com/apache/spark/pull/24563#issuecomment-490726294 @viirya Oops, thanks for pointing out the missing title! :) I’ve only used this when the size of the arrays is several orders of magnitude less than the number of records on the largest side of the join. I don’t have any benchmarks to back this up yet (I’ll do some experiments and post the result here). An assumption is that the number of items in the largest array is several orders of magnitude less than the number of records on either side of the join. This feels similar to how the replication factor used to optimize skew joins is also small. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724523 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724523 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins removed a comment on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10549/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
AmplabJenkins commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10549/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
SparkQA commented on issue #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#issuecomment-490724858 **[Test build #105273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105273/testReport)** for PR 24555 at commit [`928f36a`](https://github.com/apache/spark/commit/928f36ab39a009df558f2cefa7590d635f62e002). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on issue #24546: [SPARK-27650][SQL] separate the row iterator functionality from ColumnarBatch
cloud-fan edited a comment on issue #24546: [SPARK-27650][SQL] separate the row iterator functionality from ColumnarBatch URL: https://github.com/apache/spark/pull/24546#issuecomment-490723477 > The PR description just says that this avoids referring to MutableColumnarRow in the new class This avoids referring to `MutableColumnarRow` in the old class(`ColumnarBatch`), so that `ColumnarBatch` does not refer to any internal classes and can be moved to the catalyst package. The related functionality that needs `MutableColumnarRow` is moved the new class `ColumnarBatchRowView `, and the new class is internal. @rdblue please let me know if you need further explanation. @kiszk The responsibility of `ColumnarBatch` is just to carry the columnar data to Spark. We can make it an interface, but I'd image all the people will have very similar implementations. I think it's better to keep it as a class, and users can use it directly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#discussion_r282321898 ## File path: graphx/src/test/scala/org/apache/spark/graphx/util/collection/GraphXPrimitiveKeyOpenHashMapSuite.scala ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graphx.util.collection + +import scala.reflect.ClassTag + +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.internal.config.Kryo._ +import org.apache.spark.serializer.KryoSerializer + +class GraphXPrimitiveKeyOpenHashMapSuite extends SparkFunSuite { + test("Kryo class register") { Review comment: I did not test this class in local env, it seems that too many anonymous classes are envolved here. I will remove this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#discussion_r282322656 ## File path: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ## @@ -488,7 +510,13 @@ private[serializer] object KryoSerializer { classOf[StorageLevel], classOf[CompressedMapStatus], classOf[HighlyCompressedMapStatus], +classOf[BitSet], classOf[CompactBuffer[_]], +classOf[OpenHashSet[Int]], Review comment: Like [https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphXUtils.scala#L45](url), I think they are different, since type specialization is used in [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala#L44](url) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
viirya commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490723543 Btw, the PR title is empty, currently. Could you write a proper title for this work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #24546: [SPARK-27650][SQL] separate the row iterator functionality from ColumnarBatch
cloud-fan commented on issue #24546: [SPARK-27650][SQL] separate the row iterator functionality from ColumnarBatch URL: https://github.com/apache/spark/pull/24546#issuecomment-490723477 > The PR description just says that this avoids referring to MutableColumnarRow in the new class This avoids referring to `MutableColumnarRow` in the old class(`ColumnarBatch`), so that `ColumnarBatch` does not refer to any internal classes and can be moved to the catalyst package. The related functionality that needs `MutableColumnarRow` is moved the new class `ColumnarBatchRowView `. @rdblue please let me know if you need further explanation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX
zhengruifeng commented on a change in pull request #24555: [SPARK-27656][GraphX][WIP] Safely register class for GraphX URL: https://github.com/apache/spark/pull/24555#discussion_r28239 ## File path: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala ## @@ -213,6 +213,28 @@ class KryoSerializer(conf: SparkConf) // We can't load those class directly in order to avoid unnecessary jar dependencies. // We load them safely, ignore it if the class not found. Seq( + "org.apache.spark.graphx.Edge", + "org.apache.spark.graphx.Edge$mcB$sp", Review comment: Yes, they are needed if we want to register `Edge`, since type specialization is used in [https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Edge.scala#L32](url) I had test this, if we do not reigster `org.apache.spark.graphx.Edge$mcB$sp`, Edge[Boolean] will not be handled by kryo. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490719425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10548/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins removed a comment on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490719418 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490719418 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled
AmplabJenkins commented on issue #20430: [SPARK-23263][TEST] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490719425 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10548/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #20430: [SPARK-23263][SQL] CTAS should update stat if autoUpdate statistics is enabled
SparkQA commented on issue #20430: [SPARK-23263][SQL] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430#issuecomment-490718370 **[Test build #105272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105272/testReport)** for PR 20430 at commit [`1c9caa0`](https://github.com/apache/spark/commit/1c9caa0ebec9ba6c8e0cf7ce95ef937159561c0a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #20430: [SPARK-23263][SQL] CTAS should update stat if autoUpdate statistics is enabled
wangyum opened a new pull request #20430: [SPARK-23263][SQL] CTAS should update stat if autoUpdate statistics is enabled URL: https://github.com/apache/spark/pull/20430 …update table size is enabled ## What changes were proposed in this pull request? How to reproduce: ```sql bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true spark-sql> create table test_create_parquet stored as parquet as select 1; spark-sql> desc extended test_create_parquet; ``` The table statistics will not exists. This pr fix this issue. ## How was this patch tested? unit tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #24550: [MINOR][SS] Rename `secondLatestBatchId` to `secondLatestOffsets`
beliefer commented on issue #24550: [MINOR][SS] Rename `secondLatestBatchId` to `secondLatestOffsets` URL: https://github.com/apache/spark/pull/24550#issuecomment-490713150 Thank you, @dongjoon-hyun @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
AmplabJenkins removed a comment on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490703388 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490703704 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
AmplabJenkins removed a comment on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490703304 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490703304 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL]
AmplabJenkins commented on issue #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563#issuecomment-490703388 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] nvander1 opened a new pull request #24563: [SPARK-27359] [OPTIMIZER] [SQL]
nvander1 opened a new pull request #24563: [SPARK-27359] [OPTIMIZER] [SQL] URL: https://github.com/apache/spark/pull/24563 ## What changes were proposed in this pull request? An optimization for joins on a condition of `arrays_overlap`. I believe this worthwhile to integrate into Spark due to the recent release of several new array functions in Spark 2.4. This optimization will allow users to make better use of the arrays overlap function. The technique proposed in the patch can also be trivially extended to joins with a condition involving `array_contains`. The following code will produce a cartesian product in the physical plans. ```scala import spark.implicits._ import org.apache.spark.sql.functions._ val a = Seq((Seq(1, 2, 3), "one")).toDF("num", "name") val b = Seq((Seq(1, 5), "two")).toDF("num", "name") val j = a.join(b, arrays_overlap(b("num"), a("num"))) j.explain(true) ``` ``` == Parsed Logical Plan == Join Inner, arrays_overlap(num#158, num#149) :- Project [_1#146 AS num#149, _2#147 AS name#150] : +- LocalRelation [_1#146, _2#147] +- Project [_1#155 AS num#158, _2#156 AS name#159] +- LocalRelation [_1#155, _2#156] == Analyzed Logical Plan == num: array, name: string, num: array, name: string Join Inner, arrays_overlap(num#158, num#149) :- Project [_1#146 AS num#149, _2#147 AS name#150] : +- LocalRelation [_1#146, _2#147] +- Project [_1#155 AS num#158, _2#156 AS name#159] +- LocalRelation [_1#155, _2#156] == Optimized Logical Plan == Join Inner, arrays_overlap(num#158, num#149) :- LocalRelation [num#149, name#150] +- LocalRelation [num#158, name#159] == Physical Plan == CartesianProduct arrays_overlap(num#158, num#149) :- LocalTableScan [num#149, name#150] +- LocalTableScan [num#158, name#159] ``` This is unacceptable for joins on large datasets. The query can be written into an equivalent equijoin by: 1. exploding the arrays 2. joining on the exploded columns 3. dropping the exploded columns on the joined data 4. removing duplicates from the result of 3) Doing so will bring a query that might otherwise never complete, down to a reasonable time. ``` == Parsed Logical Plan == Join Inner, arrays_overlap(num#158, num#149) :- Project [_1#146 AS num#149, _2#147 AS name#150] : +- LocalRelation [_1#146, _2#147] +- Project [_1#155 AS num#158, _2#156 AS name#159] +- LocalRelation [_1#155, _2#156] == Analyzed Logical Plan == num: array, name: string, num: array, name: string Join Inner, arrays_overlap(num#158, num#149) :- Project [_1#146 AS num#149, _2#147 AS name#150] : +- LocalRelation [_1#146, _2#147] +- Project [_1#155 AS num#158, _2#156 AS name#159] +- LocalRelation [_1#155, _2#156] == Optimized Logical Plan == Aggregate [1], [first(num#149, false) AS num#149, first(name#150, false) AS name#150, first(num#158, false) AS num#158, first(name#159, false) AS name#159] +- Project [num#149, name#150, num#158, name#159] +- Join Inner, (explode_larr#178 = explode_rarr#180) :- Project [num#149, name#150, explode_larr#178] : +- Generate explode(num#149), false, [explode_larr#178] : +- LocalRelation [num#149, name#150] +- Project [num#158, name#159, explode_rarr#180] +- Generate explode(num#158), false, [explode_rarr#180] +- LocalRelation [num#158, name#159] == Physical Plan == SortAggregate(key=[1#185], functions=[finalmerge_first(merge first#188, valueSet#189) AS first(num#149)()#181, finalmerge_first(merge first#192, valueSet#193) AS first(name#150)()#182, finalmerge_first(merge first#196, valueSet#197) AS first(num#158)()#183, finalmerge_first(merge first#200, valueSet#201) AS first(name#159)()#184], output=[num#149, name#150, num#158, name#159]) +- Sort [1#185 ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(1#185, 200) +- SortAggregate(key=[1 AS 1#185], functions=[partial_first(num#149, false) AS (first#188, valueSet#189), partial_first(name#150, false) AS (first#192, valueSet#193), partial_first(num#158, false) AS (first#196, valueSet#197), partial_first(name#159, false) AS (first#200, valueSet#201)], output=[1#185, first#188, valueSet#189, first#192, valueSet#193, first#196, valueSet#197, first#200, valueSet#201]) +- *(3) Sort [1 AS 1#185 ASC NULLS FIRST], false, 0 +- *(3) Project [num#149, name#150, num#158, name#159] +- *(3) SortMergeJoin [explode_larr#178], [explode_rarr#180], Inner :- Sort [explode_larr#178 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(explode_larr#178, 200) : +- *(1) Project [num#149, name#150, explode_larr#178] :+- *(1) Generate explode(num#149), [num#149, name#150],
[GitHub] [spark] attilapiros commented on issue #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host
attilapiros commented on issue #24554: [SPARK-27622][Core] Avoiding the network when block manager fetches disk persisted RDD blocks from the same host URL: https://github.com/apache/spark/pull/24554#issuecomment-490699288 I can see one more possible improvement here: in `BlockManagerMasterEndpoint#getLocationsAndStatus` I can remove that block manager ID from the remote locations to which the local directories are given in the `BlockLocationsAndStatus`. As if the direct local disk access failed for that then there is no reason to try it via network too. In a following commit I will make this change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
SparkQA commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490696174 **[Test build #105271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105271/testReport)** for PR 24562 at commit [`9d73fb1`](https://github.com/apache/spark/commit/9d73fb15029ca94ca1d4dff31c6d65d1e837c592). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490696226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490696230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105271/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490696230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105271/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
SparkQA removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490692583 **[Test build #105271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105271/testReport)** for PR 24562 at commit [`9d73fb1`](https://github.com/apache/spark/commit/9d73fb15029ca94ca1d4dff31c6d65d1e837c592). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490696226 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly
wangyum commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly URL: https://github.com/apache/spark/pull/24486#issuecomment-490695924 > Have you manually tested it? To read the Spark bucketed table in Hive side as non-bucketed table? Yes. I have tested it. Note that we should set `spark.sql.parquet.writeLegacyFormat` to true. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490693637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10547/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490693637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10547/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490693633 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
AmplabJenkins removed a comment on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490693633 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
SparkQA commented on issue #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562#issuecomment-490692583 **[Test build #105271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105271/testReport)** for PR 24562 at commit [`9d73fb1`](https://github.com/apache/spark/commit/9d73fb15029ca94ca1d4dff31c6d65d1e837c592). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command
wangyum opened a new pull request #24562: [SPARK-27662][SQL] Fix SQL tab shows two jobs for one SQL command URL: https://github.com/apache/spark/pull/24562 ## What changes were proposed in this pull request? ![image](https://user-images.githubusercontent.com/5399861/57415308-41d43880-722e-11e9-85fc-91f8774dd7ef.png) It shows two jobs for one SQL command, the first is the actual job and the other is `LocalTableScan`: https://user-images.githubusercontent.com/5399861/57415322-51ec1800-722e-11e9-8553-4a4d7c34d90d.png; width="240"> https://user-images.githubusercontent.com/5399861/57415338-5dd7da00-722e-11e9-9f87-ce1b89ecbf92.png; width="240"> This pr fix this issue. ## How was this patch tested? manual tests: ```shell build/sbt clean package -Phive -Phive-thriftserver export SPARK_PREPEND_CLASSES=true bin/spark-sql # create table t(id int); ``` after this pr: ![image](https://user-images.githubusercontent.com/5399861/57415163-b8bd0180-722d-11e9-8365-f48cae885bde.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #24518: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources
HyukjinKwon closed pull request #24518: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources URL: https://github.com/apache/spark/pull/24518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24518: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources
HyukjinKwon commented on issue #24518: [SPARK-27627][SQL] Make option "pathGlobFilter" as a general option for all file sources URL: https://github.com/apache/spark/pull/24518#issuecomment-490690215 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24561: [SPARK-26130] : Change Event Timeline Display Functionality on the Stages Page to use either REST API or data from other tables
AmplabJenkins removed a comment on issue #24561: [SPARK-26130] : Change Event Timeline Display Functionality on the Stages Page to use either REST API or data from other tables URL: https://github.com/apache/spark/pull/24561#issuecomment-490671885 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105270/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org