[GitHub] [spark] AngersZhuuuu commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AngersZh commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-678732959 > @AngersZh Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test. Can you show me some link about this UT failed in hiev-1.2 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29519: Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis"
AmplabJenkins removed a comment on pull request #29519: URL: https://github.com/apache/spark/pull/29519#issuecomment-678732841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya closed pull request #29518: URL: https://github.com/apache/spark/pull/29518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29519: Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis"
AmplabJenkins commented on pull request #29519: URL: https://github.com/apache/spark/pull/29519#issuecomment-678732841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29519: Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis"
SparkQA commented on pull request #29519: URL: https://github.com/apache/spark/pull/29519#issuecomment-678732756 **[Test build #127797 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127797/testReport)** for PR 29519 at commit [`cfccfa6`](https://github.com/apache/spark/commit/cfccfa645949011781ae77c5f3c93bade294599a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
viirya edited a comment on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-678732316 Because master and branch-3.0 both have few tests failed under hive-1.2 profile. And this diff missed a change in hive-1.2 code that causes compilation error. So it will make debugging the failed tests harder. I'd like revert this first at #29519. cc @cloud-fan @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
viirya commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-678732571 @AngersZh Thanks. BTW, my PR accidentially caused compilation error for hive-1.2 profile, I'm reverting it in #29519 29519 first, so you can debug and fix the failed test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #29519: Revert "[SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis"
viirya opened a new pull request #29519: URL: https://github.com/apache/spark/pull/29519 ### What changes were proposed in this pull request? This reverts commit e277ef1a83e37bc94e7817467ca882d660c83284. ### Why are the changes needed? Because master and branch-3.0 both have few tests failed under hive-1.2 profile. And the PR missed a change in hive-1.2 code that causes compilation error. So it will make debugging the failed tests harder. I'd like revert this first. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
AngersZh commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-678732395 > @AngersZh The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.[SPARK-32608](https://issues.apache.org/jira/browse/SPARK-32608): Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it? Checking This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29457: [SPARK-32646][SQL] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29457: URL: https://github.com/apache/spark/pull/29457#issuecomment-678732316 Because master and branch-3.0 both have few tests failed under hive-1.2 profile. And this diff missed a change in hive-1.2 code that causes compilation error. So it will make debugging the failed tests harder. I'd like revert this first. cc @cloud-fan @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29503: [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code
viirya commented on pull request #29503: URL: https://github.com/apache/spark/pull/29503#issuecomment-678731933 Thanks! Merging to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #29503: [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code
viirya closed pull request #29503: URL: https://github.com/apache/spark/pull/29503 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chanduhawk commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
chanduhawk commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r475171964 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -220,7 +220,9 @@ class CSVOptions( format.setQuote(quote) format.setQuoteEscape(escape) charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping) -format.setComment(comment) +if (isCommentSet) { Review comment: If we will change that way then it might impact existing users for which \u is a comment character by default. So I would say a separate optional config is a better solution. What I am saying here is that we need to wait for univocity 3.0.0 to be available where the new changes will be available then we can add spark changes in a proper manner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] chanduhawk commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
chanduhawk commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r475171964 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -220,7 +220,9 @@ class CSVOptions( format.setQuote(quote) format.setQuoteEscape(escape) charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping) -format.setComment(comment) +if (isCommentSet) { Review comment: If we will change that way then it might impact existing users for which \u is a comment character by default. So I would say a separate optional config is a better solution This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29428: [SPARK-32608][SQL] Script Transform ROW FORMAT DELIMIT value should format value
viirya commented on pull request #29428: URL: https://github.com/apache/spark/pull/29428#issuecomment-678730505 @AngersZh The test "org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value" is failed under hive-1.2 profile in master and branch-3.0 branches. Can you look at it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678730061 To clarify: The three tests are also failed in branch-3.0: ``` org.apache.spark.sql.hive.execution.HiveScriptTransformationSuite.SPARK-32608: Script Transform ROW FORMAT DELIMIT value should format value org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite.Read/Write Hive PARQUET serde table org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite.Read/Write Hive TEXTFILE serde table ``` This test is failed before #29457. I manually checkouted to bf221debd02b11003b092201d0326302196e4ba5, and ran the test locally to verify. ``` org.apache.spark.sql.hive.orc.HiveOrcHadoopFsRelationSuite.save()/load() - partitioned table - simple queries - partition columns in data ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #29513: [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya edited a comment on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-678719209 Err.. I think these tests are already failed in current branch-3.0 and master branches. Please see https://github.com/apache/spark/pull/29517. I created SPARK-32689 to track it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678726409 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127796/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678726408 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678726408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678726374 **[Test build #127796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127796/testReport)** for PR 29518 at commit [`5ce7567`](https://github.com/apache/spark/commit/5ce756759b49ff977a8b49c893df30284f21ed96). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678721328 **[Test build #127796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127796/testReport)** for PR 29518 at commit [`5ce7567`](https://github.com/apache/spark/commit/5ce756759b49ff977a8b49c893df30284f21ed96). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678725614 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127793/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678725613 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
SparkQA removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678719205 **[Test build #127793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127793/testReport)** for PR 29516 at commit [`87e8b65`](https://github.com/apache/spark/commit/87e8b65c67fce1bef56e8071e42deca9954fcff8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678725613 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
SparkQA commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678725602 **[Test build #127793 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127793/testReport)** for PR 29516 at commit [`87e8b65`](https://github.com/apache/spark/commit/87e8b65c67fce1bef56e8071e42deca9954fcff8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
AmplabJenkins removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678725229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
AmplabJenkins commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678725229 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
SparkQA removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678707161 **[Test build #127790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127790/testReport)** for PR 29328 at commit [`e1decd4`](https://github.com/apache/spark/commit/e1decd4c7921f58446a46081b339d308d36529cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
SparkQA commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678725113 **[Test build #127790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127790/testReport)** for PR 29328 at commit [`e1decd4`](https://github.com/apache/spark/commit/e1decd4c7921f58446a46081b339d308d36529cc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LantaoJin commented on pull request #29378: [SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN
LantaoJin commented on pull request #29378: URL: https://github.com/apache/spark/pull/29378#issuecomment-678722916 Gently ping @tgravescs @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
AmplabJenkins commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678722822 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
AmplabJenkins removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678722822 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
SparkQA removed a comment on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678703766 **[Test build #127789 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127789/testReport)** for PR 29328 at commit [`92bf5ef`](https://github.com/apache/spark/commit/92bf5efa57a011b8cc306812901348d88e7c1223). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29328: [SPARK-32516][SQL] 'path' option cannot coexist with load()'s path parameters
SparkQA commented on pull request #29328: URL: https://github.com/apache/spark/pull/29328#issuecomment-678722674 **[Test build #127789 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127789/testReport)** for PR 29328 at commit [`92bf5ef`](https://github.com/apache/spark/commit/92bf5efa57a011b8cc306812901348d88e7c1223). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jkleckner commented on pull request #29496: [SPARK-24266][k8s] Back port spark 28423 to 2.4 to restart watcher
jkleckner commented on pull request #29496: URL: https://github.com/apache/spark/pull/29496#issuecomment-678722230 FWIW, we have yet to see a hang for our Hourly job. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jkleckner commented on pull request #29496: [SPARK-24266][k8s] Back port spark 28423 to 2.4 to restart watcher
jkleckner commented on pull request #29496: URL: https://github.com/apache/spark/pull/29496#issuecomment-678722154 Please retest this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678721402 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678721402 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678721328 **[Test build #127796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127796/testReport)** for PR 29518 at commit [`5ce7567`](https://github.com/apache/spark/commit/5ce756759b49ff977a8b49c893df30284f21ed96). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720854 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127795/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720465 **[Test build #127795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127795/testReport)** for PR 29518 at commit [`960e695`](https://github.com/apache/spark/commit/960e6957e925d74e9ace8931275a952395d55165). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720852 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720850 **[Test build #127795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127795/testReport)** for PR 29518 at commit [`960e695`](https://github.com/apache/spark/commit/960e6957e925d74e9ace8931275a952395d55165). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720852 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720546 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on a change in pull request #29515: URL: https://github.com/apache/spark/pull/29515#discussion_r475160410 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala ## @@ -70,14 +70,22 @@ object LiteralGenerator { lazy val floatLiteralGen: Gen[Literal] = for { - f <- Gen.chooseNum(Float.MinValue / 2, Float.MaxValue / 2, -Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) + f <- Gen.oneOf( +Gen.oneOf( + Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, + 0.0f, -0.0f, 1.0f, -1.0f), +Arbitrary.arbFloat.arbitrary + ) Review comment: Disregard this comment. When using more than one generator, then it would not generate some of the interesting combinations like `0.0 and -0.0`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720546 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29518: URL: https://github.com/apache/spark/pull/29518#issuecomment-678720465 **[Test build #127795 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127795/testReport)** for PR 29518 at commit [`960e695`](https://github.com/apache/spark/commit/960e6957e925d74e9ace8931275a952395d55165). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #29518: [SPARK-32646][SQL][FOLLOWUP][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya opened a new pull request #29518: URL: https://github.com/apache/spark/pull/29518 ### What changes were proposed in this pull request? This is a followup of #29457 to fix a compilation error. ### Why are the changes needed? Fix a compilation error under hive1.2 profile. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
viirya closed pull request #29517: URL: https://github.com/apache/spark/pull/29517 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk edited a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk edited a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719262 > I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed. I meant, that it could be fixed in another PR, before this PR is merged This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya edited a comment on pull request #29513: [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya edited a comment on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-678719209 Err.. I think these tests are already failed in current branch-3.0 branch. Please see https://github.com/apache/spark/pull/29517. I created SPARK-32689 to track it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins removed a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678719270 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719289 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719262 > I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed. I meant, that it could be fixed before this PR is merged This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678719270 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
srowen commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r475158855 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala ## @@ -25,16 +25,21 @@ object CSVExprUtils { * This is currently being used in CSV reading path and CSV schema inference. */ def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): Iterator[String] = { -iter.filter { line => - line.trim.nonEmpty && !line.startsWith(options.comment.toString) +if (options.isCommentSet) { + val commentPrefix = options.comment.toString + iter.filter { line => +line.trim.nonEmpty && !line.startsWith(commentPrefix) + } +} else { + iter.filter(_.trim.nonEmpty) } } def skipComments(iter: Iterator[String], options: CSVOptions): Iterator[String] = { if (options.isCommentSet) { val commentPrefix = options.comment.toString iter.dropWhile { line => -line.trim.isEmpty || line.trim.startsWith(commentPrefix) +line.trim.isEmpty || line.startsWith(commentPrefix) Review comment: I think it's correct to _not_ trim the string that's checked to see if it starts with a comment, which is a slightly separate issue. `\u` can't be used as a comment char, but other non-printable chars _could_. ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -1902,25 +1902,26 @@ abstract class CSVSuite extends QueryTest with SharedSparkSession with TestCsvDa test("SPARK-25387: bad input should not cause NPE") { val schema = StructType(StructField("a", IntegerType) :: Nil) -val input = spark.createDataset(Seq("\u\u\u0001234")) +val input = spark.createDataset(Seq("\u0001\u\u0001234")) Review comment: I think this test was wrong in 2 ways. First it relied on, actually, ignoring lines starting with `\u`, which is the very bug we're fixing. You can see below it's asserting there is no result at all, when there should be _some_ result. ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -1902,25 +1902,26 @@ abstract class CSVSuite extends QueryTest with SharedSparkSession with TestCsvDa test("SPARK-25387: bad input should not cause NPE") { val schema = StructType(StructField("a", IntegerType) :: Nil) -val input = spark.createDataset(Seq("\u\u\u0001234")) +val input = spark.createDataset(Seq("\u0001\u\u0001234")) checkAnswer(spark.read.schema(schema).csv(input), Row(null)) checkAnswer(spark.read.option("multiLine", true).schema(schema).csv(input), Row(null)) -assert(spark.read.csv(input).collect().toSet == Set(Row())) +assert(spark.read.schema(schema).csv(input).collect().toSet == Set(Row(null))) } test("SPARK-31261: bad csv input with `columnNameCorruptRecord` should not cause NPE") { val schema = StructType( StructField("a", IntegerType) :: StructField("_corrupt_record", StringType) :: Nil) -val input = spark.createDataset(Seq("\u\u\u0001234")) +val input = spark.createDataset(Seq("\u0001\u\u0001234")) checkAnswer( spark.read .option("columnNameOfCorruptRecord", "_corrupt_record") .schema(schema) .csv(input), - Row(null, null)) -assert(spark.read.csv(input).collect().toSet == Set(Row())) + Row(null, "\u0001\u\u0001234")) Review comment: The other problem I think is that this was asserting there is no corrupt record -- no result at all -- when I think clearly the test should result in a single row with a corrupt record. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
SparkQA commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678719205 **[Test build #127793 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127793/testReport)** for PR 29516 at commit [`87e8b65`](https://github.com/apache/spark/commit/87e8b65c67fce1bef56e8071e42deca9954fcff8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
SparkQA commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719210 **[Test build #127794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127794/testReport)** for PR 29515 at commit [`77d63ac`](https://github.com/apache/spark/commit/77d63ac4b9356f2d8db407e05b763f3b4107c80d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29513: [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-678719209 Err.. I think these tests are already failed in current branch-3.0 branch. Please see https://github.com/apache/spark/pull/29517. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678719160 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on a change in pull request #29515: URL: https://github.com/apache/spark/pull/29515#discussion_r475158533 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala ## @@ -70,14 +70,22 @@ object LiteralGenerator { lazy val floatLiteralGen: Gen[Literal] = for { - f <- Gen.chooseNum(Float.MinValue / 2, Float.MaxValue / 2, -Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) + f <- Gen.oneOf( +Gen.oneOf( + Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, Review comment: Sure thing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
srowen commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678718652 I think we can't commit a change that causes tests to fail of course. The fix of the tests would have to go with the fix in underlying code as needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678718410 > **[Test build #127791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport)** for PR 29515 at commit [`8c8313c`](https://github.com/apache/spark/commit/8c8313c7689c04ce781011c420f193ac2a14d9d9). > > * This patch **fails Spark unit tests**. > * This patch merges cleanly. > * This patch adds no public classes. This failure is discovered by this change, not caused. Should that be fixed by a separate pull request? Not sure which of the two is the correct behavior. There could be more like this, but due to the random nature of generated values it might not show every build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
AmplabJenkins removed a comment on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678718120 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127792/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
AmplabJenkins removed a comment on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678718118 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
SparkQA removed a comment on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678710787 **[Test build #127792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127792/testReport)** for PR 29517 at commit [`0ef55d3`](https://github.com/apache/spark/commit/0ef55d35cf52b0e9d2fcc86a1e3530c7509ada93). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
AmplabJenkins commented on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678718118 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
SparkQA commented on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678718091 **[Test build #127792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127792/testReport)** for PR 29517 at commit [`0ef55d3`](https://github.com/apache/spark/commit/0ef55d35cf52b0e9d2fcc86a1e3530c7509ada93). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on a change in pull request #29515: URL: https://github.com/apache/spark/pull/29515#discussion_r475157521 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala ## @@ -70,14 +70,22 @@ object LiteralGenerator { lazy val floatLiteralGen: Gen[Literal] = for { - f <- Gen.chooseNum(Float.MinValue / 2, Float.MaxValue / 2, -Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) + f <- Gen.oneOf( +Gen.oneOf( + Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, + 0.0f, -0.0f, 1.0f, -1.0f), +Arbitrary.arbFloat.arbitrary Review comment: It generates all the possible floating point values equally likely, besides the special values: `Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity`, that are not returned by the `Arbitrary.arbFloat.arbitrary`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tanelk commented on a change in pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
tanelk commented on a change in pull request #29515: URL: https://github.com/apache/spark/pull/29515#discussion_r475157086 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala ## @@ -70,14 +70,22 @@ object LiteralGenerator { lazy val floatLiteralGen: Gen[Literal] = for { - f <- Gen.chooseNum(Float.MinValue / 2, Float.MaxValue / 2, -Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) + f <- Gen.oneOf( +Gen.oneOf( + Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, + 0.0f, -0.0f, 1.0f, -1.0f), Review comment: They aren't in the sense, that `Arbitrary.arbFloat.arbitrary` can generate them, but they are in the sense, that it is more likely, that a function could act weirdly at these values. For example `log1p`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #29387: [SPARK-32481][CORE][SQL] Support truncate table to move data to trash
sunchao commented on a change in pull request #29387: URL: https://github.com/apache/spark/pull/29387#discussion_r475156812 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala ## @@ -3101,6 +3101,28 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { assert(spark.sessionState.catalog.isRegisteredFunction(rand)) } } + + test("Move data to trash on truncate table if enabled") { +withTable("tab1") { + withSQLConf(SQLConf.TRUNCATE_TRASH_ENABLED.key -> "true") { +sql("CREATE TABLE tab1 (col INT) USING parquet") +sql("INSERT INTO tab1 SELECT 1") + +val tablePath = new Path(spark.sessionState.catalog + .getTableMetadata(TableIdentifier("tab1")).storage.locationUri.get) +val hadoopConf = spark.sessionState.newHadoopConf() +val fs = tablePath.getFileSystem(hadoopConf) +// trash interval should be configured from hadoop side +hadoopConf.setInt("fs.trash.Interval", 5) + +val trashRoot = fs.getTrashRoot(tablePath) Review comment: Yes a default impl is defined in `FileSystem` which calls `getHomeDirectory` implemented in the same class. Even though it is supported, it seems the trash mechanism is less useful in cloud object stores like S3, where renaming doesn't exist and therefore moving to trash is much more expensive. However user can disable that by the configs given here and in Hadoop itself. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #29074: [SPARK-32282][SQL] Improve EnsureRquirement.reorderJoinKeys to handle more scenarios such as PartitioningCollection
c21 commented on a change in pull request #29074: URL: https://github.com/apache/spark/pull/29074#discussion_r475155778 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/exchange/EnsureRequirementsSuite.scala ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.exchange + +import org.apache.spark.sql.catalyst.expressions.Literal +import org.apache.spark.sql.catalyst.plans.Inner +import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, PartitioningCollection} +import org.apache.spark.sql.execution.{DummySparkPlan, SortExec} +import org.apache.spark.sql.execution.joins.SortMergeJoinExec +import org.apache.spark.sql.test.SharedSparkSession + +class EnsureRequirementsSuite extends SharedSparkSession { + private val exprA = Literal(1) + private val exprB = Literal(2) + private val exprC = Literal(3) + + test("reorder should handle PartitioningCollection") { +val plan1 = DummySparkPlan( + outputPartitioning = PartitioningCollection(Seq( +HashPartitioning(exprA :: exprB :: Nil, 5), +HashPartitioning(exprA :: Nil, 5 +val plan2 = DummySparkPlan() + +// Test PartitioningCollection on the left side of join. +val smjExec1 = SortMergeJoinExec( + exprB :: exprA :: Nil, exprA :: exprB :: Nil, Inner, None, plan1, plan2) +EnsureRequirements(spark.sessionState.conf).apply(smjExec1) match { + case SortMergeJoinExec(leftKeys, rightKeys, _, _, +SortExec(_, _, + DummySparkPlan(_, _, PartitioningCollection(leftPartitionings), _, _), _), +SortExec(_, _, + ShuffleExchangeExec(HashPartitioning(rightPartitioningExpressions, _), _, _), _), _) => +assert(leftKeys !== smjExec1.leftKeys) +assert(rightKeys !== smjExec1.rightKeys) +assert(leftKeys === leftPartitionings.head.asInstanceOf[HashPartitioning].expressions) +assert(rightKeys === rightPartitioningExpressions) + case other => fail(other.toString) +} + +// Test PartitioningCollection on the right side of join. +val smjExec2 = SortMergeJoinExec( + exprA :: exprB :: Nil, exprB :: exprA :: Nil, Inner, None, plan2, plan1) +EnsureRequirements(spark.sessionState.conf).apply(smjExec2) match { + case SortMergeJoinExec(leftKeys, rightKeys, _, _, +SortExec(_, _, + ShuffleExchangeExec(HashPartitioning(leftPartitioningExpressions, _), _, _), _), +SortExec(_, _, + DummySparkPlan(_, _, PartitioningCollection(rightPartitionings), _, _), _), _) => +assert(leftKeys !== smjExec2.leftKeys) +assert(rightKeys !== smjExec2.rightKeys) +assert(leftKeys === leftPartitioningExpressions) +assert(rightKeys === rightPartitionings.head.asInstanceOf[HashPartitioning].expressions) + case other => fail(other.toString) +} + +// Both sides are PartitioningCollection, but left side cannot be reorderd to match +// and it should fall back to the right side. +val smjExec3 = SortMergeJoinExec( + exprA :: exprC :: Nil, exprB :: exprA :: Nil, Inner, None, plan1, plan1) +EnsureRequirements(spark.sessionState.conf).apply(smjExec3) match { + case SortMergeJoinExec(leftKeys, rightKeys, _, _, +SortExec(_, _, + ShuffleExchangeExec(HashPartitioning(leftPartitioningExpressions, _), _, _), _), +SortExec(_, _, + DummySparkPlan(_, _, PartitioningCollection(rightPartitionings), _, _), _), _) => +assert(leftKeys !== smjExec3.leftKeys) +assert(rightKeys !== smjExec3.rightKeys) +assert(leftKeys === leftPartitioningExpressions) +assert(rightKeys === rightPartitionings.head.asInstanceOf[HashPartitioning].expressions) + case other => fail(other.toString) +} + } + + test("reorder should fallback to the other side partitioning") { +val plan1 = DummySparkPlan( + outputPartitioning = HashPartitioning(exprA :: exprB :: exprC :: Nil, 5)) +val plan2 = DummySparkPlan( + outputPartitioning = HashPartitioning(exprB :: exprC :: Nil, 5)) + +// Test fallback to the right side, which has PartitioningCollection. Review comment:
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins removed a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678716609 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127791/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins removed a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678716607 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
AmplabJenkins commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678716607 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
SparkQA removed a comment on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678707898 **[Test build #127791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport)** for PR 29515 at commit [`8c8313c`](https://github.com/apache/spark/commit/8c8313c7689c04ce781011c420f193ac2a14d9d9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
SparkQA commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678716570 **[Test build #127791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport)** for PR 29515 at commit [`8c8313c`](https://github.com/apache/spark/commit/8c8313c7689c04ce781011c420f193ac2a14d9d9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #26343: [SPARK-29683][YARN] Job will fail due to executor failures all available nodes are blacklisted
github-actions[bot] closed pull request #26343: URL: https://github.com/apache/spark/pull/26343 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27968: [SPARK-31202][CORE]Improve SizeEstimator for AppendOnlyMap
github-actions[bot] commented on pull request #27968: URL: https://github.com/apache/spark/pull/27968#issuecomment-678714066 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27568: [SPARK-30821][K8S]Handle container failure in executor pods with multiple containers
github-actions[bot] commented on pull request #27568: URL: https://github.com/apache/spark/pull/27568#issuecomment-678714068 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] leanken commented on pull request #29503: [SPARK-32678][SQL] Rename EmptyHashedRelationWithAllNullKeys and simplify NAAJ generated code
leanken commented on pull request #29503: URL: https://github.com/apache/spark/pull/29503#issuecomment-678713927 @viirya Test passed, is it ok to merge? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678712274 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127788/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
SparkQA removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678701603 **[Test build #127788 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127788/testReport)** for PR 29516 at commit [`6358727`](https://github.com/apache/spark/commit/6358727eb1c5bb92715a4448f7179727893937b3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins removed a comment on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678712272 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
AmplabJenkins commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678712272 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
SparkQA commented on pull request #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-678712244 **[Test build #127788 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127788/testReport)** for PR 29516 at commit [`6358727`](https://github.com/apache/spark/commit/6358727eb1c5bb92715a4448f7179727893937b3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
AmplabJenkins removed a comment on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678710897 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
AmplabJenkins commented on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678710897 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
SparkQA commented on pull request #29517: URL: https://github.com/apache/spark/pull/29517#issuecomment-678710787 **[Test build #127792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127792/testReport)** for PR 29517 at commit [`0ef55d3`](https://github.com/apache/spark/commit/0ef55d35cf52b0e9d2fcc86a1e3530c7509ada93). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 removed a comment on pull request #29097: [SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively
c21 removed a comment on pull request #29097: URL: https://github.com/apache/spark/pull/29097#issuecomment-678710762 I have similar concern with @gatorsmile . I think this also depends on the run-time cardinality of data. E.g., if left side is smaller than right side, but every row from left side is same, and every row from right side is not same (unique). We should buffer right side here even though ride side is larger, because if we buffer left side, we essentially need to read all left side into the buffer. In addition, this PR is swapping left and right side based on total size. However, during run-time, each task/partition can have different amount of data per left + right side. I think simply swapping left and right side here might cause some tasks to regress but some tasks to improve. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #29097: [SPARK-32299] [SQL] Decide SMJ Join Orientation adaptively
c21 commented on pull request #29097: URL: https://github.com/apache/spark/pull/29097#issuecomment-678710762 I have similar concern with @gatorsmile . I think this also depends on the run-time cardinality of data. E.g., if left side is smaller than right side, but every row from left side is same, and every row from right side is not same (unique). We should buffer right side here even though ride side is larger, because if we buffer left side, we essentially need to read all left side into the buffer. In addition, this PR is swapping left and right side based on total size. However, during run-time, each task/partition can have different amount of data per left + right side. I think simply swapping left and right side here might cause some tasks to regress but some tasks to improve. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya opened a new pull request #29517: [DO-NOT-MERGE][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] Test HiveSerDeReadWriteSuite
viirya opened a new pull request #29517: URL: https://github.com/apache/spark/pull/29517 This is just used to run test against hadoop2.7 + hive1.2 with branch-3.0 branch. Will close it after test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29513: [SPARK-32646][SQL][BRANCH-3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-678710381 Not sure if these errors are related. E.g., for `org.apache.spark.sql.hive.execution.HiveSerDeReadWriteSuite.Read/Write Hive PARQUET serde table`, this is the query plan: ``` == Parsed Logical Plan == 'UnresolvedRelation [hive_serde] == Analyzed Logical Plan == c1: date SubqueryAlias spark_catalog.default.hive_serde +- HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752] == Optimized Logical Plan == HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752] == Physical Plan == Scan hive default.hive_serde [c1#40752], HiveTableRelation `default`.`hive_serde`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [c1#40752] ``` ORC unrelated and no pushdown predicate. Btw, I cannot reproduce the errors locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
srowen commented on a change in pull request #29515: URL: https://github.com/apache/spark/pull/29515#discussion_r475148456 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala ## @@ -70,14 +70,22 @@ object LiteralGenerator { lazy val floatLiteralGen: Gen[Literal] = for { - f <- Gen.chooseNum(Float.MinValue / 2, Float.MaxValue / 2, -Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity) + f <- Gen.oneOf( +Gen.oneOf( + Float.NaN, Float.PositiveInfinity, Float.NegativeInfinity, Float.MinPositiveValue, Review comment: Do you want MaxValue in here too, as the largest non-infinite float? same for double This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
SparkQA commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678707898 **[Test build #127791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127791/testReport)** for PR 29515 at commit [`8c8313c`](https://github.com/apache/spark/commit/8c8313c7689c04ce781011c420f193ac2a14d9d9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #29515: [SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double
maropu commented on pull request #29515: URL: https://github.com/apache/spark/pull/29515#issuecomment-678707533 also cc: @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org