[GitHub] [spark] cloud-fan closed pull request #29531: Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…
cloud-fan closed pull request #29531: URL: https://github.com/apache/spark/pull/29531 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476195407 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.startWithCurrentRow) { + collect("OFFSET", frame, e, f) +} else { + collect("WHOLE_OFFSET", frame, e, f) Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29531: Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…
cloud-fan commented on pull request #29531: URL: https://github.com/apache/spark/pull/29531#issuecomment-679786678 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
cloud-fan commented on a change in pull request #29535: URL: https://github.com/apache/spark/pull/29535#discussion_r476194909 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -822,6 +821,8 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { */ def table(tableName: String): DataFrame = { assertNoSpecifiedSchema("table") +for ((k, v) <- this.extraOptions) + sparkSession.conf.set(k, v) Review comment: We need to clearly define the behavior here. I think the options don't make sense if we read temp view, view, or v1 tables in the session catalog. For v1 table, they are created with `CREATE TABLE ... USING ... OPTIONS ...`, so the options are already there and it's confusing to overwrite them per scan. We do need to keep the scan options when reading v2 tables, but I don't think session conf is the right channel. I think we should put the options in `UnresolvedRelation` and apply these options when we resolve `UnresolvedRelation` to v2 relations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r476193874 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + private val referenceRegex = "#\\d+".r + private val normalizeRegex = "#\\d+L?".r + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe SparkPlan + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: String): Unit = { Review comment: I can rename it to `generateGoldenFile` since we prefer `expected` compares to `approved`. This is an automated message from the Apache Git Service. To respond to the message,
[GitHub] [spark] Ngone51 commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on pull request #29270: URL: https://github.com/apache/spark/pull/29270#issuecomment-679761800 @HyukjinKwon I can address your comments to the follow-up PR: https://github.com/apache/spark/pull/29537 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
Ngone51 commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r476191815 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + private val referenceRegex = "#\\d+".r + private val normalizeRegex = "#\\d+L?".r + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan Review comment: Sounds reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
HyukjinKwon removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216 Should be ready for a look. cc @dongjoon-hyun @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679735247 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679735247 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679725980 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127871/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
SparkQA commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679731340 **[Test build #127872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127872/testReport)** for PR 29536 at commit [`42d4e5c`](https://github.com/apache/spark/commit/42d4e5cf5c2eeed28a7077e57c85f5e38c84c6d1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679725935 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
HyukjinKwon commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216 Should be ready for a look. cc @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
HyukjinKwon edited a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216 Should be ready for a look. cc @dongjoon-hyun @gengliangwang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679725935 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
cloud-fan commented on a change in pull request #29522: URL: https://github.com/apache/spark/pull/29522#discussion_r476186399 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ## @@ -47,7 +47,8 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] { // For example, if a user submits a query like this: // `$"struct_col".withField("b", lit(1)).withField("b", lit(2)).getField("b")` // we want to return `lit(2)` (and not `lit(1)`). - matches.last._2 + val expr = matches.last._2 + If(IsNull(struct), Literal(null, expr.dataType), expr) Review comment: other optimizer rules will optimize it, but yea it's better to not introduce unnecessary plan change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
huaxingao commented on pull request #29535: URL: https://github.com/apache/spark/pull/29535#issuecomment-679702212 cc @MaxGekk @cloud-fan @viirya This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679696722 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679696722 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679616301 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127870/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
SparkQA commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679693189 **[Test build #127871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127871/testReport)** for PR 29536 at commit [`3900892`](https://github.com/apache/spark/commit/3900892b5b5572bd17953b4a45f823c30a69527d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
AmplabJenkins removed a comment on pull request #29535: URL: https://github.com/apache/spark/pull/29535#issuecomment-679691360 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
AmplabJenkins commented on pull request #29535: URL: https://github.com/apache/spark/pull/29535#issuecomment-679691360 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData
AmplabJenkins removed a comment on pull request #29527: URL: https://github.com/apache/spark/pull/29527#issuecomment-679688408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
SparkQA removed a comment on pull request #29535: URL: https://github.com/apache/spark/pull/29535#issuecomment-679427286 **[Test build #127860 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127860/testReport)** for PR 29535 at commit [`b54cc55`](https://github.com/apache/spark/commit/b54cc55c85055ced0582a0e0c513c2e2c1fee504). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData
AmplabJenkins commented on pull request #29527: URL: https://github.com/apache/spark/pull/29527#issuecomment-679688408 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options
SparkQA commented on pull request #29535: URL: https://github.com/apache/spark/pull/29535#issuecomment-679686309 **[Test build #127860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127860/testReport)** for PR 29535 at commit [`b54cc55`](https://github.com/apache/spark/commit/b54cc55c85055ced0582a0e0c513c2e2c1fee504). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
viirya commented on a change in pull request #29522: URL: https://github.com/apache/spark/pull/29522#discussion_r476183168 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ## @@ -47,7 +47,8 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] { // For example, if a user submits a query like this: // `$"struct_col".withField("b", lit(1)).withField("b", lit(2)).getField("b")` // we want to return `lit(2)` (and not `lit(1)`). - matches.last._2 + val expr = matches.last._2 + If(IsNull(struct), Literal(null, expr.dataType), expr) Review comment: I think we need this only if `struct.nullable` is true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData
SparkQA removed a comment on pull request #29527: URL: https://github.com/apache/spark/pull/29527#issuecomment-679462245 **[Test build #127865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127865/testReport)** for PR 29527 at commit [`82676be`](https://github.com/apache/spark/commit/82676be62c05a67c1eb59b212cf64f0010c37236). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData
SparkQA commented on pull request #29527: URL: https://github.com/apache/spark/pull/29527#issuecomment-679682967 **[Test build #127865 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127865/testReport)** for PR 29527 at commit [`82676be`](https://github.com/apache/spark/commit/82676be62c05a67c1eb59b212cf64f0010c37236). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679677392 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679677392 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
cloud-fan commented on pull request #29522: URL: https://github.com/apache/spark/pull/29522#issuecomment-679639309 thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
HyukjinKwon commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r476178381 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + private val referenceRegex = "#\\d+".r + private val normalizeRegex = "#\\d+L?".r + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan Review comment: If `actualSimplifiedPlan` refers `expected` vs `actual`, I think it would be nicer if we rename `approved` to `expected`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
cloud-fan closed pull request #29522: URL: https://github.com/apache/spark/pull/29522 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-679633036 Thanks! @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
HyukjinKwon commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476177245 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Batched: true -Location: InMemoryFileIndex [file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales] Review comment: Hm, how do we check the differences by other changes but not caught by this test? I think we should only result simplified forms here if we test simplified forms. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on pull request #29530: URL: https://github.com/apache/spark/pull/29530#issuecomment-679631947 Thanks @HyukjinKwon This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
HyukjinKwon commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r476176111 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + private val referenceRegex = "#\\d+".r + private val normalizeRegex = "#\\d+L?".r + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe SparkPlan + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: String): Unit = { +val dir = getDirForTest(name) +val simplified = getSimplifiedPlan(plan) +val foundMatch = dir.exists() && isApproved(dir, simplified) + +if (!foundMatch) { + FileUtils.deleteDirectory(dir) + assert(dir.mkdirs()) + + val file = new File(dir,
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression
HyukjinKwon commented on a change in pull request #29270: URL: https://github.com/apache/spark/pull/29270#discussion_r476174979 ## File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala ## @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql + +import java.io.File +import java.nio.charset.StandardCharsets + +import scala.collection.mutable + +import org.apache.commons.io.FileUtils + +import org.apache.spark.sql.catalyst.expressions.AttributeSet +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite +import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec} +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.tags.ExtendedSQLTest + +// scalastyle:off line.size.limit +/** + * Check that TPC-DS SparkPlans don't change. + * If there are plan differences, the error message looks like this: + * Plans did not match: + * last approved simplified plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt + * last approved explain plan: /path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt + * [last approved simplified plan] + * + * actual simplified plan: /path/to/tmp/q1.actual.simplified.txt + * actual explain plan: /path/to/tmp/q1.actual.explain.txt + * [actual simplified plan] + * + * The explain files are saved to help debug later, they are not checked. Only the simplified + * plans are checked (by string comparison). + * + * + * To run the entire test suite: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To run a single test file upon change: + * {{{ + * build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + * + * To re-generate golden files for entire suite, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite" + * }}} + * + * To re-generate golden file for a single test, run: + * {{{ + * SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)" + * }}} + */ +// scalastyle:on line.size.limit +trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite { + + private val originalMaxToStringFields = conf.maxToStringFields + + override def beforeAll(): Unit = { +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue) +super.beforeAll() + } + + override def afterAll(): Unit = { +super.afterAll() +conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields) + } + + private val regenerateGoldenFiles: Boolean = System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1" + + protected val baseResourcePath = { +// use the same way as `SQLQueryTestSuite` to get the resource path +java.nio.file.Paths.get("src", "test", "resources", "tpcds-plan-stability").toFile + } + + private val referenceRegex = "#\\d+".r + private val normalizeRegex = "#\\d+L?".r + + def goldenFilePath: String + + private def getDirForTest(name: String): File = { +new File(goldenFilePath, name) + } + + private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = { +val file = new File(dir, "simplified.txt") +val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8) +approved == actualSimplifiedPlan + } + + /** + * Serialize and save this SparkPlan. + * The resulting file is used by [[checkWithApproved]] to check stability. + * + * @param planthe SparkPlan + * @param namethe name of the query + * @param explain the full explain output; this is saved to help debug later as the simplified + *plan is not too useful for debugging + */ + private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: String): Unit = { Review comment: I think we usually say a golden file instead of a approved file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins removed a comment on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-678908136 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source
AmplabJenkins commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-679624551 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
cloud-fan commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476170777 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Review comment: After a second thought, this might be useful to understand the query and debug the test failure, let's keep it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
HyukjinKwon closed pull request #29530: URL: https://github.com/apache/spark/pull/29530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
cloud-fan commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476170398 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Batched: true -Location: InMemoryFileIndex [file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales] Review comment: the test only checks the simplified form, the explain result is only for debugging purpose. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
HyukjinKwon commented on pull request #29530: URL: https://github.com/apache/spark/pull/29530#issuecomment-679620235 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679616281 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679616281 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
cloud-fan closed pull request #29513: URL: https://github.com/apache/spark/pull/29513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #27908: [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable()
HyukjinKwon closed pull request #27908: URL: https://github.com/apache/spark/pull/27908 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
cloud-fan commented on pull request #29513: URL: https://github.com/apache/spark/pull/29513#issuecomment-679614653 thanks, merging to 3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #27908: [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable()
HyukjinKwon commented on pull request #27908: URL: https://github.com/apache/spark/pull/27908#issuecomment-679612590 Merged to master. Thanks @nchammas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #27908: [SPARK-31000] Add ability to set table description via Catalog.createTable()
HyukjinKwon commented on pull request #27908: URL: https://github.com/apache/spark/pull/27908#issuecomment-679606450 Ah, I think it's okay with `description`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476159993 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala ## @@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode { function match { case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", frame, e, f) case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, f) -case f: OffsetWindowFunction => collect("OFFSET", frame, e, f) +case f: OffsetWindowFunction => if (f.startWithCurrentRow) { + collect("OFFSET", frame, e, f) +} else { + collect("WHOLE_OFFSET", frame, e, f) Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
HyukjinKwon commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r476158718 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala ## @@ -25,16 +25,21 @@ object CSVExprUtils { * This is currently being used in CSV reading path and CSV schema inference. */ def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): Iterator[String] = { -iter.filter { line => - line.trim.nonEmpty && !line.startsWith(options.comment.toString) +if (options.isCommentSet) { + val commentPrefix = options.comment.toString + iter.filter { line => +line.trim.nonEmpty && !line.startsWith(commentPrefix) + } +} else { + iter.filter(_.trim.nonEmpty) } } def skipComments(iter: Iterator[String], options: CSVOptions): Iterator[String] = { if (options.isCommentSet) { val commentPrefix = options.comment.toString iter.dropWhile { line => -line.trim.isEmpty || line.trim.startsWith(commentPrefix) +line.trim.isEmpty || line.startsWith(commentPrefix) Review comment: Oh, NVM. I misread. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476153597 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala ## @@ -77,31 +77,30 @@ object WindowFunctionFrame { * @param newMutableProjection function used to create the projection. * @param offset by which rows get moved within a partition. */ -final class OffsetWindowFunctionFrame( Review comment: We can't uniform the behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679588148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679588148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476152901 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -363,6 +363,12 @@ abstract class OffsetWindowFunction */ val direction: SortDirection + /** + * Whether the offset is start with the current row. If `startWithCurrentRow` is false, `offset` + * means the offset is start with the first row of the entire window partition. + */ Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
HyukjinKwon commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476152148 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Batched: true -Location: InMemoryFileIndex [file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales] Review comment: How did the tests pass before? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679582757 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127869/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
srowen commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r476151826 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala ## @@ -25,16 +25,21 @@ object CSVExprUtils { * This is currently being used in CSV reading path and CSV schema inference. */ def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): Iterator[String] = { -iter.filter { line => - line.trim.nonEmpty && !line.startsWith(options.comment.toString) +if (options.isCommentSet) { + val commentPrefix = options.comment.toString + iter.filter { line => +line.trim.nonEmpty && !line.startsWith(commentPrefix) + } +} else { + iter.filter(_.trim.nonEmpty) } } def skipComments(iter: Iterator[String], options: CSVOptions): Iterator[String] = { if (options.isCommentSet) { val commentPrefix = options.comment.toString iter.dropWhile { line => -line.trim.isEmpty || line.trim.startsWith(commentPrefix) +line.trim.isEmpty || line.startsWith(commentPrefix) Review comment: I think the existing logic matches the logic of https://github.com/apache/spark/pull/29516/files#diff-7faa93f00223527237747227998e30f1R27 ? maybe I'm missing your point. The logic has always been to drop lines that are empty after trimming no matter what, regardless of comment char. Right or wrong that's separate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
SparkQA commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679586372 **[Test build #127870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127870/testReport)** for PR 29536 at commit [`63032da`](https://github.com/apache/spark/commit/63032daff78ceb8de651fd514339e45bce7b799e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679582734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679582734 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
srowen commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r476149030 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -220,7 +220,9 @@ class CSVOptions( format.setQuote(quote) format.setQuoteEscape(escape) charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping) -format.setComment(comment) +if (isCommentSet) { Review comment: Oh right, this stanza is for writer settings. There is no `setCommentProcessingEnabled` for writers in univocity. Comments aren't generated. In fact the comment setting doesn't matter, really? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679568877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679568877 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV
srowen commented on a change in pull request #29516: URL: https://github.com/apache/spark/pull/29516#discussion_r476141555 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -220,7 +220,9 @@ class CSVOptions( format.setQuote(quote) format.setQuoteEscape(escape) charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping) -format.setComment(comment) +if (isCommentSet) { Review comment: Right, yeah, so we have to use the new method in univocity 2.9.0 to turn off its comment handling if its unset in Spark (= `\u`) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
SparkQA commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679567186 **[Test build #127869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127869/testReport)** for PR 29536 at commit [`6f78677`](https://github.com/apache/spark/commit/6f78677069bee9847c3064a7b1460f248f44d2ef). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679563541 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127864/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins removed a comment on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679563520 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc
AmplabJenkins commented on pull request #29536: URL: https://github.com/apache/spark/pull/29536#issuecomment-679563520 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file
viirya commented on pull request #29497: URL: https://github.com/apache/spark/pull/29497#issuecomment-679549416 This idea sounds okay. I just have few questions. 1. Is it only for certain exceptions or all exceptions? For example, this PR targets only `AnalysisException`. 2. How to organize them? query compilation/query execution + package seems okay, but is there possibly unclear cases? For example, `AnalysisException` is also thrown in `RunnableCommand`, is it query compilation error and query execution error? 3. Is a top level object of error message too big to hold all exceptions in the module? For example `org.apache.spark.sql.QueryCompilationErrors` and `org.apache.spark.sql.QueryExecutionErrors` might contain many exceptions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] baohe-zhang commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
baohe-zhang commented on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679530413 @HeartSaVioR Thanks for the review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
cloud-fan commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476119880 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Review comment: I think it's fine This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
Ngone51 commented on a change in pull request #29537: URL: https://github.com/apache/spark/pull/29537#discussion_r476117593 ## File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt ## @@ -89,394 +89,394 @@ TakeOrderedAndProject (87) (1) Scan parquet default.store_sales -Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, ss_ext_list_price#4] +Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, ss_ext_list_price#x] Review comment: Sorry for bringing massive changes again. Do you think we should avoid this change? If it should, I think we could make a custom copy of `replaceNotIncludedMsg` in `PlanStabilitySuite` instead of resue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
AmplabJenkins removed a comment on pull request #29537: URL: https://github.com/apache/spark/pull/29537#issuecomment-679520909 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
AmplabJenkins commented on pull request #29537: URL: https://github.com/apache/spark/pull/29537#issuecomment-679520909 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
SparkQA commented on pull request #29537: URL: https://github.com/apache/spark/pull/29537#issuecomment-679519576 **[Test build #127868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127868/testReport)** for PR 29537 at commit [`6f46616`](https://github.com/apache/spark/commit/6f46616cf7e15bb1e9d02950e32ad69ee0461f1a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on a change in pull request #29530: URL: https://github.com/apache/spark/pull/29530#discussion_r476115684 ## File path: sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase { * @return the builder so far. */ private def buildLeafSearchArgument( - dataTypeMap: Map[String, DataType], + dataTypeMap: Map[String, OrcPrimitiveField], expression: Filter, builder: Builder): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = - getPredicateLeafType(dataTypeMap(attribute)) + getPredicateLeafType(dataTypeMap(attribute).fieldType) Review comment: It was found if we run the tests with hive-1.2 profile. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on a change in pull request #29530: URL: https://github.com/apache/spark/pull/29530#discussion_r476115078 ## File path: sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase { * @return the builder so far. */ private def buildLeafSearchArgument( - dataTypeMap: Map[String, DataType], + dataTypeMap: Map[String, OrcPrimitiveField], expression: Filter, builder: Builder): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = - getPredicateLeafType(dataTypeMap(attribute)) + getPredicateLeafType(dataTypeMap(attribute).fieldType) Review comment: Oh, and `getPredicateLeafType`'s scope. It is used in OrcFilterSuite, so cannot be just `private`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
Ngone51 commented on pull request #29537: URL: https://github.com/apache/spark/pull/29537#issuecomment-679513602 cc: @wangyum @maropu @cloud-fan (Note this PR has only updated a few golden files yet for easier review as @maropu mentioned previously) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins commented on pull request #29530: URL: https://github.com/apache/spark/pull/29530#issuecomment-679511781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
AmplabJenkins removed a comment on pull request #29530: URL: https://github.com/apache/spark/pull/29530#issuecomment-679511781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan
Ngone51 opened a new pull request #29537: URL: https://github.com/apache/spark/pull/29537 ### What changes were proposed in this pull request? 1. Extract `SQLQueryTestSuite.replaceNotIncludedMsg` to `PlanTest`. 2. Reuse `replaceNotIncludedMsg` to normalize the explain plan that generated in `PlanStabilitySuite`. ### Why are the changes needed? Eliminates the personal related information (e.g., local directories) in the explain plan. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Updated test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
SparkQA commented on pull request #29530: URL: https://github.com/apache/spark/pull/29530#issuecomment-679510051 **[Test build #127867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127867/testReport)** for PR 29530 at commit [`50f0037`](https://github.com/apache/spark/commit/50f00379ef043a81448b4fd6e424cf7f715ebf96). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis
viirya commented on a change in pull request #29530: URL: https://github.com/apache/spark/pull/29530#discussion_r476109029 ## File path: sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase { * @return the builder so far. */ private def buildLeafSearchArgument( - dataTypeMap: Map[String, DataType], + dataTypeMap: Map[String, OrcPrimitiveField], expression: Filter, builder: Builder): Option[Builder] = { def getType(attribute: String): PredicateLeaf.Type = - getPredicateLeafType(dataTypeMap(attribute)) + getPredicateLeafType(dataTypeMap(attribute).fieldType) Review comment: yes. ## File path: sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala ## @@ -139,7 +139,7 @@ private[sql] object OrcFilters extends OrcFiltersBase { /** * Get PredicateLeafType which is corresponding to the given DataType. */ - private def getPredicateLeafType(dataType: DataType) = dataType match { + private[sql] def getPredicateLeafType(dataType: DataType) = dataType match { Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
HeartSaVioR commented on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679490978 Thanks @baohe-zhang for the PR! Merged into master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
HeartSaVioR closed pull request #29509: URL: https://github.com/apache/spark/pull/29509 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
AmplabJenkins removed a comment on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679484677 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
AmplabJenkins commented on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679484677 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
SparkQA removed a comment on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679429186 **[Test build #127861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127861/testReport)** for PR 29509 at commit [`f83d2df`](https://github.com/apache/spark/commit/f83d2dfced494471f1fb5bdb1268260bb3a5e275). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager
SparkQA commented on pull request #29509: URL: https://github.com/apache/spark/pull/29509#issuecomment-679481328 **[Test build #127861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127861/testReport)** for PR 29509 at commit [`f83d2df`](https://github.com/apache/spark/commit/f83d2dfced494471f1fb5bdb1268260bb3a5e275). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
AmplabJenkins removed a comment on pull request #29522: URL: https://github.com/apache/spark/pull/29522#issuecomment-679472632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
AmplabJenkins commented on pull request #29522: URL: https://github.com/apache/spark/pull/29522#issuecomment-679472632 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function
beliefer commented on a change in pull request #28685: URL: https://github.com/apache/spark/pull/28685#discussion_r476096152 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala ## @@ -474,6 +480,54 @@ case class Lag(input: Expression, offset: Expression, default: Expression) override val direction = Descending } +/** + * The NthValue function returns the value of `input` at the `offset`th row from beginning of the + * window frame. Offsets start at 1. When the value of `input` is null at the `offset`th row or + * there is no such an `offset`th row, null is returned. + * + * @param input expression to evaluate `offset`th row of the window frame. + * @param offset rows to jump ahead in the partition. + */ +@ExpressionDescription( + usage = """ +_FUNC_(input[, offset]) - Returns the value of `input` at the row that is the `offset`th row + from beginning of the window frame. Offsets start at 1. If the value of `input` at the + `offset`th row is null, null is returned. If there is no such an offset row (e.g., when the + offset is 10, size of the window frame less than 10), null is returned. + """, + since = "3.1.0") +case class NthValue(input: Expression, offset: Expression) +extends OffsetWindowFunction { + + override val default = Literal(null) + + override val startWithCurrentRow = false + + override val direction = Ascending + + override def checkInputDataTypes(): TypeCheckResult = { +val check = super.checkInputDataTypes() +if (check.isFailure) { + check +} else if (offset.foldable) { + offset.dataType match { +case IntegerType => + offset.eval().asInstanceOf[Int] match { +case i: Int if i <= 0 => TypeCheckFailure( + s"The 'offset' argument of nth_value must be greater than zero but it is $i.") +case _ => TypeCheckSuccess + } +case _ => TypeCheckFailure( + s"The 'offset' parameter must be a int literal but it is ${offset.dataType}.") Review comment: postgresql: integer oracle:integer redshift:integer vertica:integer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
SparkQA removed a comment on pull request #29522: URL: https://github.com/apache/spark/pull/29522#issuecomment-679380548 **[Test build #127857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127857/testReport)** for PR 29522 at commit [`df0be4a`](https://github.com/apache/spark/commit/df0be4a93b3aa3d1c2fd32a8301e186de396eab3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null
SparkQA commented on pull request #29522: URL: https://github.com/apache/spark/pull/29522#issuecomment-679471423 **[Test build #127857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127857/testReport)** for PR 29522 at commit [`df0be4a`](https://github.com/apache/spark/commit/df0be4a93b3aa3d1c2fd32a8301e186de396eab3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org