[GitHub] [spark] SparkQA commented on pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
SparkQA commented on pull request #29395: URL: https://github.com/apache/spark/pull/29395#issuecomment-673900225 **[Test build #127436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127436/testReport)** for PR 29395 at commit [`9c18479`](https://github.com/apache/spark/commit/9c18479e0b71c5b6ec1a2a0f268c598cf03fa879). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
AmplabJenkins removed a comment on pull request #29395: URL: https://github.com/apache/spark/pull/29395#issuecomment-673898076 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
AmplabJenkins commented on pull request #29395: URL: https://github.com/apache/spark/pull/29395#issuecomment-673898076 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
AmplabJenkins removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673896427 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127429/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
AmplabJenkins removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673896423 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
AmplabJenkins commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673896423 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
SparkQA removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673840529 **[Test build #127429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127429/testReport)** for PR 29270 at commit [`891346e`](https://github.com/apache/spark/commit/891346e6b541cc181f1aa5213d0540330bdf99ec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
SparkQA commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673896023 **[Test build #127429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127429/testReport)** for PR 29270 at commit [`891346e`](https://github.com/apache/spark/commit/891346e6b541cc181f1aa5213d0540330bdf99ec). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
Ngone51 commented on a change in pull request #29395: URL: https://github.com/apache/spark/pull/29395#discussion_r470420645 ## File path: core/src/main/scala/org/apache/spark/internal/config/Tests.scala ## @@ -61,4 +61,19 @@ private[spark] object Tests { .version("3.0.0") .intConf .createWithDefault(2) + + val RESOURCES_WARNING_TESTING = ConfigBuilder("spark.resources.warnings.testing") +.version("3.1.0") +.booleanConf +.createWithDefault(false) + + // This configuration is used for unit tests to allow skipping the task cpus to cores validation + // to allow emulating standalone mode behavior while running in local mode. Standalone mode + // by default doesn't specify a number of executor cores, it just uses all the ones available + // on the host. + val SKIP_VALIDATE_CORES_TESTING = + ConfigBuilder("spark.testing.skipValidateCores") +.version("3.1.0") +.booleanConf +.createWithDefault(false) Review comment: Thank you @dongjoon-hyun for letting me know. I was wondering about it previously. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #29370: [SPARK-32526][SQL]Fix some test cases of `sql/catalyst` module in scala 2.13
LuciferYang commented on pull request #29370: URL: https://github.com/apache/spark/pull/29370#issuecomment-673893066 @srowen @HyukjinKwon I will try to give a new pr to resolve rest problems This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
gengliangwang commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673892677 A late LGTM. Thanks for the great work! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins removed a comment on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-673882009 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
AmplabJenkins commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-673882009 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
SparkQA commented on pull request #28685: URL: https://github.com/apache/spark/pull/28685#issuecomment-673881668 **[Test build #127435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127435/testReport)** for PR 28685 at commit [`08405e8`](https://github.com/apache/spark/commit/08405e8703cb8119530161b7a86a3d564c1224ce). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page
AmplabJenkins removed a comment on pull request #29082: URL: https://github.com/apache/spark/pull/29082#issuecomment-673880280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page
AmplabJenkins commented on pull request #29082: URL: https://github.com/apache/spark/pull/29082#issuecomment-673880280 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29082: [SPARK-32288][UI] Add exception summary for failed tasks in stage page
SparkQA commented on pull request #29082: URL: https://github.com/apache/spark/pull/29082#issuecomment-673880029 **[Test build #127434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127434/testReport)** for PR 29082 at commit [`b231182`](https://github.com/apache/spark/commit/b23118243bf89f4afebc13640743cc92ff3bb15f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals edited a comment on pull request #29413: [SPARK-32597][CORE] Tune Event Drop in Async Event Queue
itskals edited a comment on pull request #29413: URL: https://github.com/apache/spark/pull/29413#issuecomment-673878203 > Much harder? IIUC, if users have some experienced stats of the queues of the applications, I guess they could set the individual queues more accurately and we don't need such "pool" at all. @Ngone51 if configuring the event sizes was so easy, then I am fine. I am of the opinion that it is bit hard to arrive at right number, might need trial and error... Guessed, it would have been easier to configure 1 number than 3 or 4... also with some dynamism like this PR, will help... anyways thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itskals commented on pull request #29413: [SPARK-32597][CORE] Tune Event Drop in Async Event Queue
itskals commented on pull request #29413: URL: https://github.com/apache/spark/pull/29413#issuecomment-673878203 > Much harder? IIUC, if users have some experienced stats of the queues of the applications, I guess they could set the individual queues more accurately and we don't need such "pool" at all. @Ngone51 if configuring the event sizes was so easy, then I am fine. Guessed, it would have been easier to configure 1 number than 3 or 4... thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29370: [SPARK-32526][SQL]Fix some test cases of `sql/catalyst` module in scala 2.13
HyukjinKwon commented on pull request #29370: URL: https://github.com/apache/spark/pull/29370#issuecomment-673875162 Nice to get this being fixed! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl
AmplabJenkins removed a comment on pull request #29423: URL: https://github.com/apache/spark/pull/29423#issuecomment-673874120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl
AmplabJenkins commented on pull request #29423: URL: https://github.com/apache/spark/pull/29423#issuecomment-673874120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl
SparkQA removed a comment on pull request #29423: URL: https://github.com/apache/spark/pull/29423#issuecomment-673767434 **[Test build #127427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127427/testReport)** for PR 29423 at commit [`57d8fd8`](https://github.com/apache/spark/commit/57d8fd86c93caf34d1586175f96df173a6239946). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29423: [SPARK-20680][SQL][FOLLOW-UP] Add HiveVoidType in HiveClientImpl
SparkQA commented on pull request #29423: URL: https://github.com/apache/spark/pull/29423#issuecomment-673873670 **[Test build #127427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127427/testReport)** for PR 29423 at commit [`57d8fd8`](https://github.com/apache/spark/commit/57d8fd86c93caf34d1586175f96df173a6239946). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown
viirya commented on a change in pull request #29427: URL: https://github.com/apache/spark/pull/29427#discussion_r470399287 ## File path: sql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala ## @@ -513,5 +513,98 @@ class OrcFilterSuite extends OrcTest with SharedSparkSession { ).get.toString } } + + test("SPARK-25557: case sensitivity in predicate pushdown") { +withTempPath { dir => + val count = 10 + val tableName = "spark_25557" + val tableDir1 = dir.getAbsoluteFile + "/table1" + + // Physical ORC files have both `A` and `a` fields. + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A", "id as a") + .write.mode("overwrite").orc(tableDir1) + } + + // Metastore table has both `A` and `a` fields too. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + sql( +s""" + |CREATE TABLE $tableName (A LONG, a LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + checkAnswer(sql(s"select a, A from $tableName"), (0 until count).map(c => Row(c, c - 1))) + + val actual1 = stripSparkFilter(sql(s"select A from $tableName where A < 0")) + assert(actual1.count() == 1) + + val actual2 = stripSparkFilter(sql(s"select A from $tableName where a < 0")) + assert(actual2.count() == 0) +} + +// Exception thrown for ambiguous case. +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + val e = intercept[AnalysisException] { +sql(s"select a from $tableName where a < 0").collect() + } + assert(e.getMessage.contains( +"Reference 'a' is ambiguous")) +} + } + + // Metastore table has only `A` field. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (A LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + val e = intercept[SparkException] { +sql(s"select A from $tableName where A < 0").collect() + } + assert(e.getCause.isInstanceOf[RuntimeException] && e.getCause.getMessage.contains( +"""Found duplicate field(s) "A": [A, a] in case-insensitive mode""")) +} + } + + // Physical ORC files have only `A` field. + val tableDir2 = dir.getAbsoluteFile + "/table2" + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A") + .write.mode("overwrite").orc(tableDir2) + } + + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (a LONG) USING ORC LOCATION '$tableDir2' + """.stripMargin) + + checkAnswer(sql(s"select a from $tableName"), (0 until count).map(c => Row(c - 1))) + + val actual = stripSparkFilter(sql(s"select a from $tableName where a < 0")) + // TODO: ORC predicate pushdown should work under case-insensitive analysis. + // assert(actual.count() == 1) Review comment: I use original SPARK-25557 as PR title now. If we want to backport this test to branch-3.0, should I create a new JIRA ticket for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on a change in pull request #29333: URL: https://github.com/apache/spark/pull/29333#discussion_r470397420 ## File path: .github/workflows/master.yml ## @@ -170,13 +170,19 @@ jobs: # Show installed packages in R. sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]' # Run the tests. -- name: "Run tests: ${{ matrix.modules }}" +- name: Run tests run: | # Hive tests become flaky when running in parallel as it's too intensive. if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi mkdir -p ~/.m2 ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS" rm -rf ~/.m2/repository/org/apache/spark +- name: Upload test results to report + if: always() + uses: actions/upload-artifact@v2 Review comment: Yeah, if the tests don't fail, it should upload JUnit XML files and then report the successful test cases. e.g.) 1000 tests passed 6 skipped 0 failures. GitHub Actions has things like `failure()` but I think we should run this always (to report successful cases and also failed cases). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown
viirya commented on a change in pull request #29427: URL: https://github.com/apache/spark/pull/29427#discussion_r470398272 ## File path: sql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala ## @@ -513,5 +513,98 @@ class OrcFilterSuite extends OrcTest with SharedSparkSession { ).get.toString } } + + test("SPARK-25557: case sensitivity in predicate pushdown") { +withTempPath { dir => + val count = 10 + val tableName = "spark_25557" + val tableDir1 = dir.getAbsoluteFile + "/table1" + + // Physical ORC files have both `A` and `a` fields. + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A", "id as a") + .write.mode("overwrite").orc(tableDir1) + } + + // Metastore table has both `A` and `a` fields too. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + sql( +s""" + |CREATE TABLE $tableName (A LONG, a LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + checkAnswer(sql(s"select a, A from $tableName"), (0 until count).map(c => Row(c, c - 1))) + + val actual1 = stripSparkFilter(sql(s"select A from $tableName where A < 0")) + assert(actual1.count() == 1) + + val actual2 = stripSparkFilter(sql(s"select A from $tableName where a < 0")) + assert(actual2.count() == 0) +} + +// Exception thrown for ambiguous case. +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + val e = intercept[AnalysisException] { +sql(s"select a from $tableName where a < 0").collect() + } + assert(e.getMessage.contains( +"Reference 'a' is ambiguous")) +} + } + + // Metastore table has only `A` field. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (A LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + val e = intercept[SparkException] { +sql(s"select A from $tableName where A < 0").collect() + } + assert(e.getCause.isInstanceOf[RuntimeException] && e.getCause.getMessage.contains( +"""Found duplicate field(s) "A": [A, a] in case-insensitive mode""")) +} + } + + // Physical ORC files have only `A` field. + val tableDir2 = dir.getAbsoluteFile + "/table2" + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A") + .write.mode("overwrite").orc(tableDir2) + } + + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (a LONG) USING ORC LOCATION '$tableDir2' + """.stripMargin) + + checkAnswer(sql(s"select a from $tableName"), (0 until count).map(c => Row(c - 1))) + + val actual = stripSparkFilter(sql(s"select a from $tableName where a < 0")) + // TODO: ORC predicate pushdown should work under case-insensitive analysis. + // assert(actual.count() == 1) Review comment: Yes, this should be fixed in branch-3.0 too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673870618 Also, I would like to thank @cpintado from GitHub. He virtually guided me here a lot on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on a change in pull request #29333: URL: https://github.com/apache/spark/pull/29333#discussion_r470397420 ## File path: .github/workflows/master.yml ## @@ -170,13 +170,19 @@ jobs: # Show installed packages in R. sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]' # Run the tests. -- name: "Run tests: ${{ matrix.modules }}" +- name: Run tests run: | # Hive tests become flaky when running in parallel as it's too intensive. if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi mkdir -p ~/.m2 ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS" rm -rf ~/.m2/repository/org/apache/spark +- name: Upload test results to report + if: always() + uses: actions/upload-artifact@v2 Review comment: Yeah, if the tests fail, it should upload JUnit XML files and then report the failed test cases. GitHub Actions has things like `failure()` but I think we should run this always (to report successful cases and also failed cases). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on a change in pull request #29333: URL: https://github.com/apache/spark/pull/29333#discussion_r470397420 ## File path: .github/workflows/master.yml ## @@ -170,13 +170,19 @@ jobs: # Show installed packages in R. sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]' # Run the tests. -- name: "Run tests: ${{ matrix.modules }}" +- name: Run tests run: | # Hive tests become flaky when running in parallel as it's too intensive. if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi mkdir -p ~/.m2 ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS" rm -rf ~/.m2/repository/org/apache/spark +- name: Upload test results to report + if: always() + uses: actions/upload-artifact@v2 Review comment: Yeah, if the tests fail, it should upload JUnit XML files and then report the failed test cases. GitHub Actions has things like `failure()` but I think we should run this always. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29429: [DO-NOT-MERGE] Verify GitHub Actions test report
AmplabJenkins removed a comment on pull request #29429: URL: https://github.com/apache/spark/pull/29429#issuecomment-673870274 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673868134 Thank you all !! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown
dongjoon-hyun commented on a change in pull request #29427: URL: https://github.com/apache/spark/pull/29427#discussion_r470396482 ## File path: sql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala ## @@ -513,5 +513,98 @@ class OrcFilterSuite extends OrcTest with SharedSparkSession { ).get.toString } } + + test("SPARK-25557: case sensitivity in predicate pushdown") { +withTempPath { dir => + val count = 10 + val tableName = "spark_25557" + val tableDir1 = dir.getAbsoluteFile + "/table1" + + // Physical ORC files have both `A` and `a` fields. + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A", "id as a") + .write.mode("overwrite").orc(tableDir1) + } + + // Metastore table has both `A` and `a` fields too. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + sql( +s""" + |CREATE TABLE $tableName (A LONG, a LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + checkAnswer(sql(s"select a, A from $tableName"), (0 until count).map(c => Row(c, c - 1))) + + val actual1 = stripSparkFilter(sql(s"select A from $tableName where A < 0")) + assert(actual1.count() == 1) + + val actual2 = stripSparkFilter(sql(s"select A from $tableName where a < 0")) + assert(actual2.count() == 0) +} + +// Exception thrown for ambiguous case. +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + val e = intercept[AnalysisException] { +sql(s"select a from $tableName where a < 0").collect() + } + assert(e.getMessage.contains( +"Reference 'a' is ambiguous")) +} + } + + // Metastore table has only `A` field. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (A LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + val e = intercept[SparkException] { +sql(s"select A from $tableName where A < 0").collect() + } + assert(e.getCause.isInstanceOf[RuntimeException] && e.getCause.getMessage.contains( +"""Found duplicate field(s) "A": [A, a] in case-insensitive mode""")) +} + } + + // Physical ORC files have only `A` field. + val tableDir2 = dir.getAbsoluteFile + "/table2" + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A") + .write.mode("overwrite").orc(tableDir2) + } + + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (a LONG) USING ORC LOCATION '$tableDir2' + """.stripMargin) + + checkAnswer(sql(s"select a from $tableName"), (0 until count).map(c => Row(c - 1))) + + val actual = stripSparkFilter(sql(s"select a from $tableName where a < 0")) + // TODO: ORC predicate pushdown should work under case-insensitive analysis. + // assert(actual.count() == 1) Review comment: Can we have this non-nested test case on `branch-3.0`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673868049 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127432/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
dongjoon-hyun commented on a change in pull request #29395: URL: https://github.com/apache/spark/pull/29395#discussion_r470395517 ## File path: core/src/main/scala/org/apache/spark/internal/config/Tests.scala ## @@ -61,4 +61,19 @@ private[spark] object Tests { .version("3.0.0") .intConf .createWithDefault(2) + + val RESOURCES_WARNING_TESTING = ConfigBuilder("spark.resources.warnings.testing") +.version("3.1.0") Review comment: This should be `3.0.1` when it comes to `branch-3.0`, @Ngone51 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
dongjoon-hyun closed pull request #29333: URL: https://github.com/apache/spark/pull/29333 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29429: [DO-NOT-MERGE] Verify GitHub Actions test report
AmplabJenkins commented on pull request #29429: URL: https://github.com/apache/spark/pull/29429#issuecomment-673870274 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673868046 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673868031 **[Test build #127432 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127432/testReport)** for PR 28841 at commit [`1ee4af4`](https://github.com/apache/spark/commit/1ee4af433229baa55b3b1d3c970ef362bb2525fa). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
viirya commented on a change in pull request #29333: URL: https://github.com/apache/spark/pull/29333#discussion_r470397100 ## File path: .github/workflows/master.yml ## @@ -170,13 +170,19 @@ jobs: # Show installed packages in R. sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]' # Run the tests. -- name: "Run tests: ${{ matrix.modules }}" +- name: Run tests run: | # Hive tests become flaky when running in parallel as it's too intensive. if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi mkdir -p ~/.m2 ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS" rm -rf ~/.m2/repository/org/apache/spark +- name: Upload test results to report + if: always() + uses: actions/upload-artifact@v2 Review comment: If previous `Run tests` is passed without failure, do we still need run this? I remember Github Actions has some conditions other than `always()` can be used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on a change in pull request #29333: URL: https://github.com/apache/spark/pull/29333#discussion_r470397420 ## File path: .github/workflows/master.yml ## @@ -170,13 +170,19 @@ jobs: # Show installed packages in R. sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]' # Run the tests. -- name: "Run tests: ${{ matrix.modules }}" +- name: Run tests run: | # Hive tests become flaky when running in parallel as it's too intensive. if [[ "$MODULES_TO_TEST" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi mkdir -p ~/.m2 ./dev/run-tests --parallelism 2 --modules "$MODULES_TO_TEST" --included-tags "$INCLUDED_TAGS" --excluded-tags "$EXCLUDED_TAGS" rm -rf ~/.m2/repository/org/apache/spark +- name: Upload test results to report + if: always() + uses: actions/upload-artifact@v2 Review comment: Yeah, if the tests file, it should upload JUnit XML files and then report the failed test cases. GitHub Actions has things like `failure()` but I think we should run this always. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673865344 **[Test build #127432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127432/testReport)** for PR 28841 at commit [`1ee4af4`](https://github.com/apache/spark/commit/1ee4af433229baa55b3b1d3c970ef362bb2525fa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673868046 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673869619 I opened a PR to verify if this works well in the main commit (https://github.com/apache/spark/commit/5debde94019d46d4ab66f7927d9e5e8c4d16a7ec) and the PR (https://github.com/apache/spark/pull/29429). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #29429: [DO-NOT-MERGE] Verify GitHub Actions test report
HyukjinKwon opened a new pull request #29429: URL: https://github.com/apache/spark/pull/29429 ### What changes were proposed in this pull request? This PR is to trigger the test report at https://github.com/apache/spark/pull/29333. ### Why are the changes needed? N/A ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? N/A This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28904: [SPARK-30462][SS] Streamline the logic on file stream source and sink metadata log to avoid memory issue
dongjoon-hyun commented on pull request #28904: URL: https://github.com/apache/spark/pull/28904#issuecomment-673870133 Sure. Have a nice vacation and take care, @HeartSaVioR . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29427: [SPARK-25557][SQL][TEST][Followup] Add case-sensitivity test for ORC predicate pushdown
dongjoon-hyun commented on a change in pull request #29427: URL: https://github.com/apache/spark/pull/29427#discussion_r470396482 ## File path: sql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala ## @@ -513,5 +513,98 @@ class OrcFilterSuite extends OrcTest with SharedSparkSession { ).get.toString } } + + test("SPARK-25557: case sensitivity in predicate pushdown") { +withTempPath { dir => + val count = 10 + val tableName = "spark_25557" + val tableDir1 = dir.getAbsoluteFile + "/table1" + + // Physical ORC files have both `A` and `a` fields. + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A", "id as a") + .write.mode("overwrite").orc(tableDir1) + } + + // Metastore table has both `A` and `a` fields too. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { + sql( +s""" + |CREATE TABLE $tableName (A LONG, a LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + checkAnswer(sql(s"select a, A from $tableName"), (0 until count).map(c => Row(c, c - 1))) + + val actual1 = stripSparkFilter(sql(s"select A from $tableName where A < 0")) + assert(actual1.count() == 1) + + val actual2 = stripSparkFilter(sql(s"select A from $tableName where a < 0")) + assert(actual2.count() == 0) +} + +// Exception thrown for ambiguous case. +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + val e = intercept[AnalysisException] { +sql(s"select a from $tableName where a < 0").collect() + } + assert(e.getMessage.contains( +"Reference 'a' is ambiguous")) +} + } + + // Metastore table has only `A` field. + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (A LONG) USING ORC LOCATION '$tableDir1' + """.stripMargin) + + val e = intercept[SparkException] { +sql(s"select A from $tableName where A < 0").collect() + } + assert(e.getCause.isInstanceOf[RuntimeException] && e.getCause.getMessage.contains( +"""Found duplicate field(s) "A": [A, a] in case-insensitive mode""")) +} + } + + // Physical ORC files have only `A` field. + val tableDir2 = dir.getAbsoluteFile + "/table2" + withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") { +spark.range(count).repartition(count).selectExpr("id - 1 as A") + .write.mode("overwrite").orc(tableDir2) + } + + withTable(tableName) { +withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") { + sql( +s""" + |CREATE TABLE $tableName (a LONG) USING ORC LOCATION '$tableDir2' + """.stripMargin) + + checkAnswer(sql(s"select a from $tableName"), (0 until count).map(c => Row(c - 1))) + + val actual = stripSparkFilter(sql(s"select a from $tableName where a < 0")) + // TODO: ORC predicate pushdown should work under case-insensitive analysis. + // assert(actual.count() == 1) Review comment: Can we have this case on `branch-3.0`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Karl-WangSK commented on pull request #29360: [SPARK-32542][SQL] Add an optimizer rule to split an Expand into multiple Expands for aggregates
Karl-WangSK commented on pull request #29360: URL: https://github.com/apache/spark/pull/29360#issuecomment-673868646 yes.The shuffle output is the same, because the size of the data is the same. As you can see the benchmark: cube 7 fields k1, k2, k3, k4, k5, k6, k7(128x projections) and cube 6 fields k1, k2, k3, k4, k5, k6(64x projections) with grouping off data size is double ,but the time ,one is 2.4min ,the another one is 8.7min, not just double time .It will be affected by data size Especially when the memory is limited. The original data I created is about 20M, executor memory is 1g. when it expands to 64x or 128x. It will have big impact on shuffle performance. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29429: [DO-NOT-MERGE] Verify GitHub Actions test report
SparkQA commented on pull request #29429: URL: https://github.com/apache/spark/pull/29429#issuecomment-673870087 **[Test build #127433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127433/testReport)** for PR 29429 at commit [`c356076`](https://github.com/apache/spark/commit/c356076a0b761e9e8f598fe8468eca5191b7bad6). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29395: [3.0][SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTasks should consider all kinds of resources
dongjoon-hyun commented on a change in pull request #29395: URL: https://github.com/apache/spark/pull/29395#discussion_r470395606 ## File path: core/src/main/scala/org/apache/spark/internal/config/Tests.scala ## @@ -61,4 +61,19 @@ private[spark] object Tests { .version("3.0.0") .intConf .createWithDefault(2) + + val RESOURCES_WARNING_TESTING = ConfigBuilder("spark.resources.warnings.testing") +.version("3.1.0") +.booleanConf +.createWithDefault(false) + + // This configuration is used for unit tests to allow skipping the task cpus to cores validation + // to allow emulating standalone mode behavior while running in local mode. Standalone mode + // by default doesn't specify a number of executor cores, it just uses all the ones available + // on the host. + val SKIP_VALIDATE_CORES_TESTING = + ConfigBuilder("spark.testing.skipValidateCores") +.version("3.1.0") +.booleanConf +.createWithDefault(false) Review comment: ditto. This should be `3.0.1` when it comes to `branch-3.0`, @Ngone51 . Also, after merging this, please update `master` branch consistently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673865720 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673865720 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673865344 **[Test build #127432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127432/testReport)** for PR 28841 at commit [`1ee4af4`](https://github.com/apache/spark/commit/1ee4af433229baa55b3b1d3c970ef362bb2525fa). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673864612 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127431/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673864606 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673861861 **[Test build #127431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)** for PR 28841 at commit [`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673864599 **[Test build #127431 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)** for PR 28841 at commit [`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673864606 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673862283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673862283 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
SparkQA commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673861861 **[Test build #127431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127431/testReport)** for PR 28841 at commit [`4329c8a`](https://github.com/apache/spark/commit/4329c8abeb64702c5b92880e11b76511087da841). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] manuzhang closed pull request #28954: [SPARK-32083][SQL] Apply CoalesceShufflePartitions when input RDD has 0 partitions with AQE
manuzhang closed pull request #28954: URL: https://github.com/apache/spark/pull/28954 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29424: [SQL][MINOR] Fixed approx_count_distinct rsd param description
HyukjinKwon commented on pull request #29424: URL: https://github.com/apache/spark/pull/29424#issuecomment-673856142 Looks fine. Can you address the style nits @Comonut? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite
AmplabJenkins removed a comment on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673855623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29422: [SPARK-32613][CORE] Fix regressions in DecommissionWorkerSuite
AmplabJenkins commented on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673855623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
SparkQA removed a comment on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673773346 **[Test build #127428 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127428/testReport)** for PR 29422 at commit [`c051532`](https://github.com/apache/spark/commit/c051532a08f067ffa77b13e12207723d4ecbe27f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
SparkQA commented on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673855161 **[Test build #127428 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127428/testReport)** for PR 29422 at commit [`c051532`](https://github.com/apache/spark/commit/c051532a08f067ffa77b13e12207723d4ecbe27f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470379655 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.isWholeBased) { Review comment: According to the plan, `first_value` and `last_value` need to be realized. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470377487 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the + * value of `input` is null at the `offset`th row or there is no such an `offset`th row, null + * is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row + of the window frame (counting from 1). If the value of `input` at the `offset`th row is Review comment: ``offset`th row of the window` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r470375984 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +479,55 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the row that is the `offset`th row of + * the window frame (counting from 1). Offsets start at 0, which is the current row. When the Review comment: Sorry! I made a mistake. `Offsets start at 1` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon edited a comment on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673844193 Okay the example [looks working fine](https://github.com/HyukjinKwon/spark/runs/980622718). This PR should be ready for a review and merged. @srowen, @gengliangwang, @dongjoon-hyun, @dbtsai, @viirya and @maropu can you take a look please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29333: [SPARK-32357][INFRA] Publish failed and succeeded test reports in GitHub Actions
HyukjinKwon commented on pull request #29333: URL: https://github.com/apache/spark/pull/29333#issuecomment-673844193 Okay the example [looks working fine](https://github.com/HyukjinKwon/spark/runs/980622718). This PR should be ready for a review and merged. cc @srowen, @gengliangwang, @dongjoon-hyun, @dbtsai, @viirya can you take a look please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AmplabJenkins removed a comment on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-673842847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AmplabJenkins commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-673842847 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
SparkQA commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-673842527 **[Test build #127430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127430/testReport)** for PR 29428 at commit [`b4d816e`](https://github.com/apache/spark/commit/b4d816e26766923a40c42d2b3ae4356802b16886). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AngersZh commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-673841449 FYI @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AngersZh opened a new pull request #29428: URL: https://github.com/apache/spark/pull/29428 ### What changes were proposed in this pull request? For SQL ``` SELECT TRANSFORM(a, b, c) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' NULL DEFINED AS 'null' USING 'cat' AS (a, b, c) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' NULL DEFINED AS 'NULL' FROM testData ``` The correct TOK_TABLEROWFORMATFIELD should be `, `nut actually ` ','` TOK_TABLEROWFORMATLINES should be `\n` but actually` '\n'` ### Why are the changes needed? Fix string value format ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added UT This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
AmplabJenkins removed a comment on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673840860 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
AmplabJenkins commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673840860 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
SparkQA commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673840529 **[Test build #127429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127429/testReport)** for PR 29270 at commit [`891346e`](https://github.com/apache/spark/commit/891346e6b541cc181f1aa5213d0540330bdf99ec). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-673840363 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r470368611 ## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ## @@ -298,12 +302,22 @@ trait TPCDSBase extends SharedSparkSession { tableNames.foreach { tableName => createTable(spark, tableName) if (injectStats) { -// To simulate plan generation on actual TPCDS data, injects data stats here +// To simulate plan generation on actual TPC-DS data, injects data stats here spark.sessionState.catalog.alterTableStats( TableIdentifier(tableName), Some(TPCDSTableStats.sf100TableStats(tableName))) } } } + override def afterAll(): Unit = { +conf.setConf(SQLConf.CBO_ENABLED, originalCBCEnabled) +conf.setConf(SQLConf.PLAN_STATS_ENABLED, originalPlanStatsEnabled) +conf.setConf(SQLConf.JOIN_REORDER_ENABLED, originalJoinReorderEnabled) +tableNames.foreach { tableName => + spark.sessionState.catalog.alterTableStats(TableIdentifier(tableName), None) Review comment: Yes sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r470368540 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe SparkPlan + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: String): Unit = { +val dir = getDirForTest(name) +val simplified = getSimplifiedPlan(plan) +val foundMatch = dir.exists() && isApproved(dir, simplified) + +if (!foundMatch) { + FileUtils.deleteDirectory(dir) + assert(dir.mkdirs()) + + val file = new File(dir, "simplified.txt") + FileUtils.writeStringToFile(file, simplified,
[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r470368356 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe SparkPlan + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: String): Unit = { +val dir = getDirForTest(name) +val simplified = getSimplifiedPlan(plan) +val foundMatch = dir.exists() && isApproved(dir, simplified) + +if (!foundMatch) { + FileUtils.deleteDirectory(dir) + assert(dir.mkdirs()) + + val file = new File(dir, "simplified.txt") + FileUtils.writeStringToFile(file, simplified,
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
AmplabJenkins removed a comment on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-673839896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
AmplabJenkins commented on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-673839896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
SparkQA commented on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-673838923 **[Test build #127425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127425/testReport)** for PR 28939 at commit [`5d65caf`](https://github.com/apache/spark/commit/5d65caf55c7b87fc0035444b60847e4037ad0f40). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
SparkQA removed a comment on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-673754055 **[Test build #127425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127425/testReport)** for PR 28939 at commit [`5d65caf`](https://github.com/apache/spark/commit/5d65caf55c7b87fc0035444b60847e4037ad0f40). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm edited a comment on pull request #28618: [SPARK-31801][API][SHUFFLE] Register map output metadata
mridulm edited a comment on pull request #28618: URL: https://github.com/apache/spark/pull/28618#issuecomment-673827204 @mccheah I will take a look at this later this week/early next week. +CC @squito, @holdenk who reviewed the design doc. Thanks for working on this ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #28618: [SPARK-31801][API][SHUFFLE] Register map output metadata
mridulm commented on pull request #28618: URL: https://github.com/apache/spark/pull/28618#issuecomment-673827204 @mccheah I will take a look at this later this week/early next week. +CC @squito, @holdenk who reviewed the design doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
AmplabJenkins removed a comment on pull request #29426: URL: https://github.com/apache/spark/pull/29426#issuecomment-673824399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
AmplabJenkins commented on pull request #29426: URL: https://github.com/apache/spark/pull/29426#issuecomment-673824399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster and Kubernetes with --jars
mridulm commented on pull request #28939: URL: https://github.com/apache/spark/pull/28939#issuecomment-673823982 This looks good to me - once Tom's comment is addressed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
AmplabJenkins removed a comment on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673822318 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
SparkQA removed a comment on pull request #29426: URL: https://github.com/apache/spark/pull/29426#issuecomment-673751893 **[Test build #127423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127423/testReport)** for PR 29426 at commit [`ddbf11f`](https://github.com/apache/spark/commit/ddbf11f5f4073ed378dd51654c1b085afb00128e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
AmplabJenkins commented on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673822318 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29426: [SPARK-32610][DOCS] Fix the link to metrics.dropwizard.io in monitoring.md to refer the proper version
SparkQA commented on pull request #29426: URL: https://github.com/apache/spark/pull/29426#issuecomment-673821660 **[Test build #127423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127423/testReport)** for PR 29426 at commit [`ddbf11f`](https://github.com/apache/spark/commit/ddbf11f5f4073ed378dd51654c1b085afb00128e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
SparkQA removed a comment on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673751916 **[Test build #127424 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127424/testReport)** for PR 29422 at commit [`6334f80`](https://github.com/apache/spark/commit/6334f80e5d593d7e8a29bcd9598ec5e19d756162). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29422: [CORE][SPARK-32613] Fix regressions in DecommissionWorkerSuite
SparkQA commented on pull request #29422: URL: https://github.com/apache/spark/pull/29422#issuecomment-673819205 **[Test build #127424 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127424/testReport)** for PR 29422 at commit [`6334f80`](https://github.com/apache/spark/commit/6334f80e5d593d7e8a29bcd9598ec5e19d756162). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId
AngersZh commented on a change in pull request #28490: URL: https://github.com/apache/spark/pull/28490#discussion_r470362613 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1479,6 +1479,33 @@ class Analyzer( // Skip the having clause here, this will be handled in ResolveAggregateFunctions. case h: UnresolvedHaving => h + case agg @ (_: Aggregate | _: GroupingSets) => +val resolved = agg.mapExpressions(resolveExpressionTopDown(_, agg)) +val hasStructField = resolved.expressions.exists { + _.collectFirst { case gsf: GetStructField => gsf }.isDefined +} +if (hasStructField) { + // For struct field, it will be resolve as Alias(GetStructField, name), + // In Aggregate/GroupingSets this behavior will cause same struct field + // in aggExprs/groupExprs/selectedGroupByExprs will be resolved divided + // with different ExprId of Alias and replace failed when construct + // Aggregate in ResolveGroupingAnalytics, so we resolve duplicated struct + // field here with same ExprId Review comment: > I don't get it. `CleanupAliases` will remove aliases from the grouping expressions. Why do we hit the bug? This error happen when `ResolveGroupingAnalytics` construct Grouping Analytics Aggregation, When expand expression, match error because of different ExprID This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org