[GitHub] [spark] cloud-fan commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
cloud-fan commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608360273 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ /** * A placeholder expression for cube/rollup, which will be replaced by analyzer */ -trait GroupingSet extends Expression with CodegenFallback { +trait GroupingAnalytic extends Expression with CodegenFallback { Review comment: another option is `BaseGroupingSets`. cube/rollup is syntax sugar for grouping sets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814624945 Looks pretty good otherwise. Make sure updating PR description up to date. I will leave it to @srowen, @MaxGekk and @maropu since they are reviewing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608359236 ## File path: docs/sql-data-sources-text.md ## @@ -0,0 +1,40 @@ +--- +layout: global +title: Text Files +displayTitle: Text Files +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +Spark SQL provides `spark.read().text("file_name")` to read a file or directory of text files into a Spark DataFrame, and `dataframe.write().text("path")` to write to a text file. When reading a text file, each line becomes each row that has string "value" column by default. The line separator can be changed as shown in the example below. When specifying a directory as a file path, make sure that the files included in the directory do not contain a format that is inappropriate for reading text, such as ORC or Parquet. The `option()` function can be used to customize the behavior of reading or writing, such as controlling behavior of the line separator, compression, and so on. Review comment: "When specifying a directory as a file path, make sure that the files included in the directory do not contain a format that is inappropriate for reading text, such as ORC or Parquet" I think this is too much to know or document. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
itholic commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608358810 ## File path: docs/sql-data-sources.md ## @@ -47,6 +47,7 @@ goes into specific options that are available for the built-in data sources. * [ORC Files](sql-data-sources-orc.html) * [JSON Files](sql-data-sources-json.html) * [CSV Files](sql-data-sources-csv.html) +* [TEXT Files](sql-data-sources-text.html) Review comment: Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608358436 ## File path: docs/sql-data-sources.md ## @@ -47,6 +47,7 @@ goes into specific options that are available for the built-in data sources. * [ORC Files](sql-data-sources-orc.html) * [JSON Files](sql-data-sources-json.html) * [CSV Files](sql-data-sources-csv.html) +* [TEXT Files](sql-data-sources-text.html) Review comment: TEXT -> Text -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #32060: [SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping
sigmod commented on pull request #32060: URL: https://github.com/apache/spark/pull/32060#issuecomment-814622504 @dbaliafroozeh @hvanhovell @maryannxue @gengliangwang: this PR is ready for review. Let me know if you have any questions. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #31791: [SPARK-34678][SQL] Add table function registry
cloud-fan closed pull request #31791: URL: https://github.com/apache/spark/pull/31791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #31791: [SPARK-34678][SQL] Add table function registry
cloud-fan commented on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-814620919 The Github Action failures are unrelated and the jenkins passes, I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814620180 **[Test build #136997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)** for PR 32053 at commit [`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA removed a comment on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814610651 **[Test build #136997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)** for PR 32053 at commit [`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814619990 @AngersZh please describe why we should rename. The change look incomplete and I can't follow why we should rename. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608355354 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ /** * A placeholder expression for cube/rollup, which will be replaced by analyzer */ -trait GroupingSet extends Expression with CodegenFallback { +trait GroupingAnalytic extends Expression with CodegenFallback { Review comment: Do you mean `GroupingAnalytics`? and does it represent all grouping analysis including group-bys? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608354977 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -26,7 +26,7 @@ import org.apache.spark.sql.types._ /** * A placeholder expression for cube/rollup, which will be replaced by analyzer */ -trait GroupingSet extends Expression with CodegenFallback { +trait GroupingAnalytic extends Expression with CodegenFallback { def groupingSets: Seq[Seq[Expression]] def selectedGroupByExprs: Seq[Seq[Expression]] Review comment: Error message below "Cannot call GroupingSet.groupByExprs"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
sarutak commented on a change in pull request #32074: URL: https://github.com/apache/spark/pull/32074#discussion_r608354674 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala ## @@ -952,6 +952,72 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd } } + test("SPARK-34977: LIST FILES/JARS/ARCHIVES should handle multiple quoted path arguments") { +withTempDir { dir => + val file1 = File.createTempFile("someprefix1", "somesuffix1", dir) + val file2 = File.createTempFile("someprefix2", "somesuffix2", dir) + val file3 = File.createTempFile("someprefix3", "somesuffix 3", dir) + + Files.write(file1.toPath, "file1".getBytes) + Files.write(file2.toPath, "file2".getBytes) + Files.write(file3.toPath, "file3".getBytes) + + sql(s"ADD FILE ${file1.getAbsolutePath}") + sql(s"ADD FILE ${file2.getAbsolutePath}") + sql(s"ADD FILE '${file3.getAbsolutePath}'") + val listFiles = sql("LIST FILES " + +s"'${file1.getAbsolutePath}' ${file2.getAbsolutePath} '${file3.getAbsolutePath}'") + assert(listFiles.count === 3) + assert(listFiles.filter(_.getString(0).contains(file1.getName)).count() === 1) + assert(listFiles.filter(_.getString(0).contains(file2.getName)).count() === 1) + assert(listFiles.filter( +_.getString(0).contains(file3.getName.replace(" ", "%20"))).count() === 1) + + val file4 = File.createTempFile("someprefix4", "somesuffix4", dir) + val file5 = File.createTempFile("someprefix5", "somesuffix5", dir) + val file6 = File.createTempFile("someprefix6", "somesuffix6", dir) + Files.write(file4.toPath, "file4".getBytes) + Files.write(file5.toPath, "file5".getBytes) + Files.write(file6.toPath, "file6".getBytes) + + val jarFile1 = new File(dir, "test1.jar") + val jarFile2 = new File(dir, "test2.jar") + val jarFile3 = new File(dir, "test 3.jar") + TestUtils.createJar(Seq(file4), jarFile1) + TestUtils.createJar(Seq(file5), jarFile2) + TestUtils.createJar(Seq(file6), jarFile3) + + sql(s"ADD ARCHIVE ${jarFile1.getAbsolutePath}") + sql(s"ADD ARCHIVE ${jarFile2.getAbsolutePath}#foo") + sql(s"ADD ARCHIVE '${jarFile3.getAbsolutePath}'") + val listArchives = sql("LIST ARCHIVES " + +s"'${jarFile1.getAbsolutePath}' ${jarFile2.getAbsolutePath} '${jarFile3.getAbsolutePath}'") + assert(listArchives.count === 3) + assert(listArchives.filter(_.getString(0).contains(jarFile1.getName)).count() === 1) + assert(listArchives.filter(_.getString(0).contains(jarFile2.getName)).count() === 1) + assert(listArchives.filter( +_.getString(0).contains(jarFile3.getName.replace(" ", "%20"))).count() === 1) + + val file7 = File.createTempFile("someprefix7", "somesuffix7", dir) + val file8 = File.createTempFile("someprefix8", "somesuffix8", dir) + Files.write(file4.toPath, "file7".getBytes) + Files.write(file5.toPath, "file8".getBytes) + + val jarFile4 = new File(dir, "test4.jar") + val jarFile5 = new File(dir, "test5.jar") + TestUtils.createJar(Seq(file7), jarFile4) + TestUtils.createJar(Seq(file8), jarFile5) + + sql(s"ADD JAR ${jarFile4.getAbsolutePath}") + sql(s"ADD JAR ${jarFile5.getAbsolutePath}") Review comment: Unlike `ADD FILE "path"` and `ADD ARCHIVE "PATH"`, we cannot execute `ADD JAR "path"` when the path contains whitespaces. I think it's a bug and #32052 will fix this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608354526 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -106,34 +106,37 @@ object GroupingSet { } } -case class Cube(groupingSetIndexes: Seq[Seq[Int]], children: Seq[Expression]) extends GroupingSet { +case class Cube( +groupingSetIndexes: Seq[Seq[Int]], +children: Seq[Expression]) extends GroupingAnalytic { override def groupingSets: Seq[Seq[Expression]] = groupingSetIndexes.map(_.map(children)) - override def selectedGroupByExprs: Seq[Seq[Expression]] = GroupingSet.cubeExprs(groupingSets) + override def selectedGroupByExprs: Seq[Seq[Expression]] = GroupingAnalytic.cubeExprs(groupingSets) } object Cube { def apply(groupingSets: Seq[Seq[Expression]]): Cube = { -Cube(GroupingSet.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) +Cube(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) } } case class Rollup( groupingSetIndexes: Seq[Seq[Int]], -children: Seq[Expression]) extends GroupingSet { +children: Seq[Expression]) extends GroupingAnalytic { override def groupingSets: Seq[Seq[Expression]] = groupingSetIndexes.map(_.map(children)) - override def selectedGroupByExprs: Seq[Seq[Expression]] = GroupingSet.rollupExprs(groupingSets) + override def selectedGroupByExprs: Seq[Seq[Expression]] = +GroupingAnalytic.rollupExprs(groupingSets) } object Rollup { def apply(groupingSets: Seq[Seq[Expression]]): Rollup = { -Rollup(GroupingSet.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) +Rollup(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) } } case class GroupingSets( Review comment: What about this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608354425 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -145,7 +148,7 @@ object GroupingSets { def apply( groupingSets: Seq[Seq[Expression]], userGivenGroupByExprs: Seq[Expression]): GroupingSets = { -val groupingSetIndexes = GroupingSet.computeGroupingSetIndexes(groupingSets) +val groupingSetIndexes = GroupingAnalytic.computeGroupingSetIndexes(groupingSets) Review comment: Shall we rename the variables too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
imback82 commented on a change in pull request #32032: URL: https://github.com/apache/spark/pull/32032#discussion_r608354169 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -37,3 +38,35 @@ trait Command extends LogicalPlan { trait LeafCommand extends Command with LeafLike[LogicalPlan] trait UnaryCommand extends Command with UnaryLike[LogicalPlan] trait BinaryCommand extends Command with BinaryLike[LogicalPlan] + +/** + * A logical node that represents a command whose children are only analyzed, but not optimized. + */ +trait AnalysisOnlyCommand extends Command { + private var _isAnalyzed: Boolean = false + + def childrenToAnalyze: Seq[LogicalPlan] + + override def children: Seq[LogicalPlan] = if (_isAnalyzed) Nil else childrenToAnalyze Review comment: Hmm, I don't think we can use case class at this level (e.g., cannot have abstract member like `childrenToAnalyze`), right? If we need to make the node immutable, I think the responsibility should be at the concrete command - similar to the first commit I had? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
HyukjinKwon commented on a change in pull request #32073: URL: https://github.com/apache/spark/pull/32073#discussion_r608354243 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -106,34 +106,37 @@ object GroupingSet { } } -case class Cube(groupingSetIndexes: Seq[Seq[Int]], children: Seq[Expression]) extends GroupingSet { +case class Cube( +groupingSetIndexes: Seq[Seq[Int]], +children: Seq[Expression]) extends GroupingAnalytic { override def groupingSets: Seq[Seq[Expression]] = groupingSetIndexes.map(_.map(children)) - override def selectedGroupByExprs: Seq[Seq[Expression]] = GroupingSet.cubeExprs(groupingSets) + override def selectedGroupByExprs: Seq[Seq[Expression]] = GroupingAnalytic.cubeExprs(groupingSets) } object Cube { def apply(groupingSets: Seq[Seq[Expression]]): Cube = { -Cube(GroupingSet.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) +Cube(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), groupingSets.flatten) Review comment: Should we rename `computeGroupingSetIndexes` -> `computeGroupingAnalyticIndexes` too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping
SparkQA commented on pull request #32060: URL: https://github.com/apache/spark/pull/32060#issuecomment-814617613 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41564/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak opened a new pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted
sarutak opened a new pull request #32074: URL: https://github.com/apache/spark/pull/32074 ### What changes were proposed in this pull request? This PR fixes an issue that `LIST {FILES/JARS/ARCHIVES} path1, path2, ...` cannot list all paths if at least one path is quoted. An example here. ``` ADD FILE /tmp/test1; ADD FILE /tmp/test2; LIST FILES /tmp/test1 /tmp/test2; file:/tmp/test1 file:/tmp/test2 LIST FILES /tmp/test1 "/tmp/test2"; file:/tmp/test2 ``` In this example, the second `LIST FILES` doesn't show `file:/tmp/test1`. To resolve this issue, I modified the syntax rule to be able to handle this case. I also changed `SparkSQLParser` to be able to handle paths which contains white spaces. ### Why are the changes needed? This is a bug. I also have a plan which extends `ADD FILE/JAR/ARCHIVE` to take multiple paths like Hive and the syntax rule change is necessary for that. ### Does this PR introduce _any_ user-facing change? Yes. Users can pass quoted paths when using `ADD FILE/JAR/ARCHIVE`. ### How was this patch tested? New test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA removed a comment on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814609011 **[Test build #136995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)** for PR 32053 at commit [`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814617263 **[Test build #136995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)** for PR 32053 at commit [`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31791: [SPARK-34678][SQL] Add table function registry
cloud-fan commented on a change in pull request #31791: URL: https://github.com/apache/spark/pull/31791#discussion_r608353136 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -83,15 +85,94 @@ trait FunctionRegistry { /** Clear all registered functions. */ def clear(): Unit +} - /** Create a copy of this registry with identical functions as this registry. */ - override def clone(): FunctionRegistry = throw new CloneNotSupportedException() +object FunctionRegistryBase { + + /** + * Return an expression info and a function builder for the function as defined by + * T using the given name. + */ + def build[T : ClassTag](name: String): (ExpressionInfo, Seq[Expression] => T) = { Review comment: You are right, I missed this part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
cloud-fan commented on a change in pull request #32054: URL: https://github.com/apache/spark/pull/32054#discussion_r608352820 ## File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ## @@ -1765,4 +1765,35 @@ class SubquerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark } } } + + test("SPARK-34946: correlated scalar subquery in grouping expressions only") { Review comment: ah I see, then +1! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AngersZh commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r608352704 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -598,8 +598,8 @@ class Analyzer(override val catalogManager: CatalogManager) val aggForResolving = h.child match { // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we delete them from // groupingExpressions for condition resolving. -case a @ Aggregate(Seq(gs: GroupingSet), _, _) => - a.copy(groupingExpressions = gs.groupByExprs) +case a @ Aggregate(Seq(gs: GroupingAnalytic), _, _) => + a.copy(groupingExpressions =gs.groupingSets, gs.groupByExprs) Review comment: > nit: one space after `=` Mistake when merge code, done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AngersZh commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r608352609 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1787,16 +1787,41 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Aggregate(newGroups, aggs, child) } + +private def containUnresolvedOrdinal(e: Expression): Boolean = e match { + case _: UnresolvedOrdinal => true + case Cube(_, groupByExprs) => groupByExprs.exists(containUnresolvedOrdinal) + case Rollup(_, groupByExprs) => groupByExprs.exists(containUnresolvedOrdinal) + case GroupingSets(_, flatGroupingSets, groupByExprs) => +flatGroupingSets.exists(containUnresolvedOrdinal) || + groupByExprs.exists(containUnresolvedOrdinal) Review comment: > Can we simply do `case a: GroupingAnalytic a.children.exists...`? Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AngersZh commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r608352564 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1787,16 +1787,41 @@ class Analyzer(override val catalogManager: CatalogManager) // Replace the index with the corresponding expression in aggregateExpressions. The index is // a 1-base position of aggregateExpressions, which is output columns (select expression) case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) && -groups.exists(_.isInstanceOf[UnresolvedOrdinal]) => -val newGroups = groups.map { - case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size => -aggs(index - 1) - case ordinal @ UnresolvedOrdinal(index) => -throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) - case o => o -} +groups.exists(containUnresolvedOrdinal) => +val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs))) Aggregate(newGroups, aggs, child) } + +private def containUnresolvedOrdinal(e: Expression): Boolean = e match { + case _: UnresolvedOrdinal => true + case Cube(_, groupByExprs) => groupByExprs.exists(containUnresolvedOrdinal) + case Rollup(_, groupByExprs) => groupByExprs.exists(containUnresolvedOrdinal) + case GroupingSets(_, flatGroupingSets, groupByExprs) => +flatGroupingSets.exists(containUnresolvedOrdinal) || + groupByExprs.exists(containUnresolvedOrdinal) + case _ => false +} + +private def resolveGroupByExpressionOrdinal( +expr: Expression, +aggs: Seq[Expression]): Expression = expr match { + case ordinal @ UnresolvedOrdinal(index) => +if (index > 0 && index <= aggs.size) { + aggs(index - 1) +} else { + throw QueryCompilationErrors.groupByPositionRangeError(index, aggs.size, ordinal) +} + case cube @ Cube(_, groupByExprs) => Review comment: > how about using `expr.withNewChildren`> Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AngersZh commented on a change in pull request #30145: URL: https://github.com/apache/spark/pull/30145#discussion_r608352335 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala ## @@ -144,12 +147,12 @@ case class GroupingSets( object GroupingSets { def apply( groupingSets: Seq[Seq[Expression]], - userGivenGroupByExprs: Seq[Expression]): GroupingSets = { -val groupingSetIndexes = GroupingSet.computeGroupingSetIndexes(groupingSets) + userGivenGroupByExprs: Seq[Expression]): GroupingAnalytic = { +val groupingSetIndexes = GroupingAnalytic.computeGroupingSetIndexes(groupingSets) GroupingSets(groupingSetIndexes, groupingSets.flatten, userGivenGroupByExprs) } - def apply(groupingSets: Seq[Seq[Expression]]): GroupingSets = { + def apply(groupingSets: Seq[Seq[Expression]]): GroupingAnalytic = { Review comment: > we can probably do the rename in a separate PR. Done https://github.com/apache/spark/pull/32073 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala ## @@ -42,10 +52,19 @@ object SubstituteUnresolvedOrdinals extends Rule[LogicalPlan] { } withOrigin(s.origin)(s.copy(order = newOrders)) -case a: Aggregate if conf.groupByOrdinal && a.groupingExpressions.exists(isIntLiteral) => +case a: Aggregate if conf.groupByOrdinal && a.groupingExpressions.exists(containIntLiteral) => val newGroups = a.groupingExpressions.map { case ordinal @ Literal(index: Int, IntegerType) => withOrigin(ordinal.origin)(UnresolvedOrdinal(index)) +case cube @ Cube(_, children) => + withOrigin(cube.origin)(cube.copy(children = children.map(substituteUnresolvedOrdinal))) +case rollup @ Rollup(_, children) => + withOrigin(rollup.origin)(rollup.copy( +children = children.map(substituteUnresolvedOrdinal))) +case groupingSets @ GroupingSets(_, flatGroupingSets, groupByExprs) => + withOrigin(groupingSets.origin)(groupingSets.copy( +flatGroupingSets = flatGroupingSets.map(substituteUnresolvedOrdinal), +groupByExprs = groupByExprs.map(substituteUnresolvedOrdinal))) Review comment: > ditto, we can use `withNewChildren` Yea ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala ## @@ -27,13 +27,23 @@ import org.apache.spark.sql.types.IntegerType * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression. */ object SubstituteUnresolvedOrdinals extends Rule[LogicalPlan] { - private def isIntLiteral(e: Expression) = e match { + private def containIntLiteral(e: Expression): Boolean = e match { case Literal(_, IntegerType) => true +case Cube(_, groupByExprs) => groupByExprs.exists(containIntLiteral) +case Rollup(_, groupByExprs) => groupByExprs.exists(containIntLiteral) +case GroupingSets(_, flatGroupingSets, groupByExprs) => + flatGroupingSets.exists(containIntLiteral) || groupByExprs.exists(containIntLiteral) Review comment: > ditto, we can use `children` Yea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814615642 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41571/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping
SparkQA commented on pull request #32060: URL: https://github.com/apache/spark/pull/32060#issuecomment-814615001 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41564/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814609505 **[Test build #136996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)** for PR 30145 at commit [`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814612779 **[Test build #136996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)** for PR 30145 at commit [`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32066: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()
AmplabJenkins commented on pull request #32066: URL: https://github.com/apache/spark/pull/32066#issuecomment-814612482 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41563/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32066: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()
SparkQA commented on pull request #32066: URL: https://github.com/apache/spark/pull/32066#issuecomment-814612454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32070: [SPARK-34668][SQL] Support casting of day-time intervals to strings
cloud-fan commented on a change in pull request #32070: URL: https://github.com/apache/spark/pull/32070#discussion_r608349143 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ## @@ -818,6 +818,31 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper { checkConsistencyBetweenInterpretedAndCodegen( (child: Expression) => Cast(child, StringType), YearMonthIntervalType) } + + test("SPARK-34668: cast day-time interval to string") { +Seq( + Duration.ZERO -> "0 0:0:0", + Duration.of(1, ChronoUnit.MICROS) -> "0 0:0:0.01", + Duration.ofMillis(-1) -> "-0 0:0:0.001", + Duration.ofMillis(1234) -> "0 0:0:1.234", + Duration.ofSeconds(-59).minus(99, ChronoUnit.MICROS) -> "-0 0:0:59.99", + Duration.ofMinutes(30).plusMillis(10) -> "0 0:30:0.01", + Duration.ofHours(-23).minusSeconds(59) -> "-0 23:0:59", Review comment: that's the literal syntax, which is supposed to be more flexible. How about the cast behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AmplabJenkins removed a comment on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814611799 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41567/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
SparkQA commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814611787 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41567/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AmplabJenkins commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814611799 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41567/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
cloud-fan commented on a change in pull request #32032: URL: https://github.com/apache/spark/pull/32032#discussion_r608348517 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -37,3 +38,35 @@ trait Command extends LogicalPlan { trait LeafCommand extends Command with LeafLike[LogicalPlan] trait UnaryCommand extends Command with UnaryLike[LogicalPlan] trait BinaryCommand extends Command with BinaryLike[LogicalPlan] + +/** + * A logical node that represents a command whose children are only analyzed, but not optimized. + */ +trait AnalysisOnlyCommand extends Command { + private var _isAnalyzed: Boolean = false + + def childrenToAnalyze: Seq[LogicalPlan] + + override def children: Seq[LogicalPlan] = if (_isAnalyzed) Nil else childrenToAnalyze Review comment: This is tricky because the `children` can change dynamically. I was expecting to put `isAnalyzed` as a case class parameter, so `markAsAnalyzed` creates a new copy and the plan node is still immutable. We can avoid changing `TreeNode` if we do so. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814610651 **[Test build #136997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)** for PR 32053 at commit [`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
itholic commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608347743 ## File path: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ## @@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession spark) { // $example off:csv_dataset$ } + private static void runTextDatasetExample(SparkSession spark) { +// $example on:text_dataset$ +// A text dataset is pointed to by path. +// The path can be either a single text file or a directory of text files +String path = "examples/src/main/resources/people.text"; + +Dataset df1 = spark.read().text(path); +df1.show(); +// +---+ +// | value| +// +---+ +// |Michael, 29| +// | Andy, 30| +// | Justin, 19| +// +---+ + +// You can use 'lineSep' option to define the line separator. +// If None is set, it covers all `\r`, `\r\n` and `\n` (default). +Dataset df2 = spark.read().option("lineSep", ",").text(path); +df2.show(); +// +---+ +// | value| +// +---+ +// |Michael| +// | 29\nAndy| +// | 30\nJustin| +// | 19\n| +// +---+ + +// You can also use 'wholetext' option to read each input file as a single row. +Dataset df3 = spark.read().option("wholetext", "true").text(path); +df3.show(); +// ++ +// | value| +// ++ +// |Michael, 29\nAndy...| +// ++ + +// "output" is a folder which contains multiple text files and a _SUCCESS file. +df1.write().text("output"); + +// You can specify the compression format using the 'compression' option. +df1.write().option("compression", "gzip").text("output_compressed"); + +// Read all files in a folder. +String folderPath = "examples/src/main/resources"; +Dataset df = spark.read().text(folderPath); +df.show(); +// +---+ +// | value| +// +---+ +// |238val_238| Review comment: Thanks! Just removed this from the examples block, and rather add the more comments to the main contents block. (because we already have the case for read one proper text file above) "When specifying a directory as a file path, make sure that the files included in the directory do not contain a format that is inappropriate for reading text, such as ORC or Parquet" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
itholic commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608347743 ## File path: examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java ## @@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession spark) { // $example off:csv_dataset$ } + private static void runTextDatasetExample(SparkSession spark) { +// $example on:text_dataset$ +// A text dataset is pointed to by path. +// The path can be either a single text file or a directory of text files +String path = "examples/src/main/resources/people.text"; + +Dataset df1 = spark.read().text(path); +df1.show(); +// +---+ +// | value| +// +---+ +// |Michael, 29| +// | Andy, 30| +// | Justin, 19| +// +---+ + +// You can use 'lineSep' option to define the line separator. +// If None is set, it covers all `\r`, `\r\n` and `\n` (default). +Dataset df2 = spark.read().option("lineSep", ",").text(path); +df2.show(); +// +---+ +// | value| +// +---+ +// |Michael| +// | 29\nAndy| +// | 30\nJustin| +// | 19\n| +// +---+ + +// You can also use 'wholetext' option to read each input file as a single row. +Dataset df3 = spark.read().option("wholetext", "true").text(path); +df3.show(); +// ++ +// | value| +// ++ +// |Michael, 29\nAndy...| +// ++ + +// "output" is a folder which contains multiple text files and a _SUCCESS file. +df1.write().text("output"); + +// You can specify the compression format using the 'compression' option. +df1.write().option("compression", "gzip").text("output_compressed"); + +// Read all files in a folder. +String folderPath = "examples/src/main/resources"; +Dataset df = spark.read().text(folderPath); +df.show(); +// +---+ +// | value| +// +---+ +// |238val_238| Review comment: Thanks! Just removed this from the examples block, and rather add the more comments to the main contents block. (because we already have the case for read one proper text file above) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814609505 **[Test build #136996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)** for PR 30145 at commit [`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #32070: [SPARK-34668][SQL] Support casting of day-time intervals to strings
MaxGekk commented on a change in pull request #32070: URL: https://github.com/apache/spark/pull/32070#discussion_r608347206 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala ## @@ -818,6 +818,31 @@ abstract class CastSuiteBase extends SparkFunSuite with ExpressionEvalHelper { checkConsistencyBetweenInterpretedAndCodegen( (child: Expression) => Cast(child, StringType), YearMonthIntervalType) } + + test("SPARK-34668: cast day-time interval to string") { +Seq( + Duration.ZERO -> "0 0:0:0", + Duration.of(1, ChronoUnit.MICROS) -> "0 0:0:0.01", + Duration.ofMillis(-1) -> "-0 0:0:0.001", + Duration.ofMillis(1234) -> "0 0:0:1.234", + Duration.ofSeconds(-59).minus(99, ChronoUnit.MICROS) -> "-0 0:0:59.99", + Duration.ofMinutes(30).plusMillis(10) -> "0 0:30:0.01", + Duration.ofHours(-23).minusSeconds(59) -> "-0 23:0:59", Review comment: > Have we checked with other databases? For example, Oracle doesn't prepend zero for hours, see the [doc](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Literals.html#GUID-49FADC66-794D-4763-88C7-B81BB4F26D9E): ``` INTERVAL '4 5:12:10.222' DAY TO SECOND ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
itholic commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608347059 ## File path: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ## @@ -309,6 +310,67 @@ object SQLDataSourceExample { // $example off:csv_dataset$ } + private def runTextDatasetExample(spark: SparkSession): Unit = { +// $example on:text_dataset$ +// A text dataset is pointed to by path. +// The path can be either a single text file or a directory of text files +val path = "examples/src/main/resources/people.txt" + +val df1 = spark.read.text(path) +df1.show() +// +---+ +// | value| +// +---+ +// |Michael, 29| +// | Andy, 30| +// | Justin, 19| +// +---+ + +// You can use 'lineSep' option to define the line separator. +// If None is set, the line separator handles all `\r`, `\r\n` and `\n` by default. Review comment: Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814609011 **[Test build #136995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)** for PR 32053 at commit [`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AmplabJenkins removed a comment on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814608951 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136990/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
SparkQA removed a comment on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814604786 **[Test build #136990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)** for PR 32073 at commit [`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
SparkQA commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814608927 **[Test build #136990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)** for PR 32073 at commit [`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait GroupingAnalytic extends Expression with CodegenFallback ` * `case class Cube(` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AmplabJenkins commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814608951 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136990/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814607938 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136994/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814605230 **[Test build #136994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)** for PR 30145 at commit [`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814607938 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136994/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
AmplabJenkins removed a comment on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814607685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41565/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814607918 **[Test build #136994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)** for PR 30145 at commit [`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
AmplabJenkins commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814607685 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41565/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
SparkQA commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814607655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
beliefer commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-814607415 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814605230 **[Test build #136994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)** for PR 30145 at commit [`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders
AmplabJenkins removed a comment on pull request #32068: URL: https://github.com/apache/spark/pull/32068#issuecomment-814605090 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders
AmplabJenkins commented on pull request #32068: URL: https://github.com/apache/spark/pull/32068#issuecomment-814605090 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136974/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI
SparkQA commented on pull request #31974: URL: https://github.com/apache/spark/pull/31974#issuecomment-814604872 **[Test build #136993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136993/testReport)** for PR 31974 at commit [`d5ca5e2`](https://github.com/apache/spark/commit/d5ca5e2f9763f087b3af7f26f3a6f4c58e89cfb9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
allisonwang-db commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-814604930 Looks good to me! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #32061: URL: https://github.com/apache/spark/pull/32061#issuecomment-814604810 **[Test build #136991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136991/testReport)** for PR 32061 at commit [`dcc16a1`](https://github.com/apache/spark/commit/dcc16a1cc6aa7d5f7323f77f44a766a5f6e785bd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.
SparkQA commented on pull request #32032: URL: https://github.com/apache/spark/pull/32032#issuecomment-814604823 **[Test build #136992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136992/testReport)** for PR 32032 at commit [`acb74a1`](https://github.com/apache/spark/commit/acb74a115972fa5e45e9f212760b09e1c18bd462). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
SparkQA commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814604786 **[Test build #136990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)** for PR 32073 at commit [`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allisonwang-db commented on a change in pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
allisonwang-db commented on a change in pull request #31920: URL: https://github.com/apache/spark/pull/31920#discussion_r608343436 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala ## @@ -303,4 +303,64 @@ object QueryParsingErrors { new ParseException(s"Found duplicate keys '$key'.", ctx) } + def formatForSetConfigurationUnExpectedError(ctx: SetConfigurationContext): Throwable = { +new ParseException( + s""" + |Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include + |special characters in key, or include semicolon in value, please use quotes, + |e.g., SET `ke y`=`v;alue`. + """.stripMargin.replaceAll("\n", " "), ctx) + } + + def invalidPropertyKeyForSetQuotedConfigurationError( + keyCandidate: String, valueStr: String, ctx: SetQuotedConfigurationContext): Throwable = { +new ParseException(s"'$keyCandidate' is an invalid property key, please " + + s"use quotes, e.g. SET `$keyCandidate`=`$valueStr`", ctx) + } + + def invalidPropertyValueForSetQuotedConfigurationError( Review comment: Sounds good. We can emphasize the property key/value should be quoted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
AmplabJenkins removed a comment on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814604206 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136988/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader
AmplabJenkins removed a comment on pull request #32071: URL: https://github.com/apache/spark/pull/32071#issuecomment-814604207 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32067: [SPARK-34962][SQL] Explicit representation of * in UpdateAction and InsertAction in MergeIntoTable
AmplabJenkins removed a comment on pull request #32067: URL: https://github.com/apache/spark/pull/32067#issuecomment-814604208 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins removed a comment on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814604212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41566/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
AmplabJenkins removed a comment on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-814604209 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry
AmplabJenkins removed a comment on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-814604214 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136970/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct
AmplabJenkins removed a comment on pull request #32059: URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping
AmplabJenkins removed a comment on pull request #32060: URL: https://github.com/apache/spark/pull/32060#issuecomment-814604210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
AmplabJenkins commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814604206 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136988/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
AmplabJenkins commented on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-814604209 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136972/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader
AmplabJenkins commented on pull request #32071: URL: https://github.com/apache/spark/pull/32071#issuecomment-814604207 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136973/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32067: [SPARK-34962][SQL] Explicit representation of * in UpdateAction and InsertAction in MergeIntoTable
AmplabJenkins commented on pull request #32067: URL: https://github.com/apache/spark/pull/32067#issuecomment-814604208 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136969/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31791: [SPARK-34678][SQL] Add table function registry
AmplabJenkins commented on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-814604214 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136970/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
AmplabJenkins commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814604212 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41566/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct
AmplabJenkins commented on pull request #32059: URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping
AmplabJenkins commented on pull request #32060: URL: https://github.com/apache/spark/pull/32060#issuecomment-814604210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders
SparkQA removed a comment on pull request #32068: URL: https://github.com/apache/spark/pull/32068#issuecomment-814510451 **[Test build #136974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136974/testReport)** for PR 32068 at commit [`89923f6`](https://github.com/apache/spark/commit/89923f662ae23e642774574db47fb8a8af95cae7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders
SparkQA commented on pull request #32068: URL: https://github.com/apache/spark/pull/32068#issuecomment-814604077 **[Test build #136974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136974/testReport)** for PR 32068 at commit [`89923f6`](https://github.com/apache/spark/commit/89923f662ae23e642774574db47fb8a8af95cae7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608341903 ## File path: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ## @@ -309,6 +310,67 @@ object SQLDataSourceExample { // $example off:csv_dataset$ } + private def runTextDatasetExample(spark: SparkSession): Unit = { +// $example on:text_dataset$ +// A text dataset is pointed to by path. +// The path can be either a single text file or a directory of text files +val path = "examples/src/main/resources/people.txt" + +val df1 = spark.read.text(path) +df1.show() +// +---+ +// | value| +// +---+ +// |Michael, 29| +// | Andy, 30| +// | Justin, 19| +// +---+ + +// You can use 'lineSep' option to define the line separator. +// If None is set, it covers all `\r`, `\r\n` and `\n` (default). Review comment: @itholic please don't resolve the comment if you did not address. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on a change in pull request #32053: URL: https://github.com/apache/spark/pull/32053#discussion_r608341605 ## File path: examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala ## @@ -309,6 +310,67 @@ object SQLDataSourceExample { // $example off:csv_dataset$ } + private def runTextDatasetExample(spark: SparkSession): Unit = { +// $example on:text_dataset$ +// A text dataset is pointed to by path. +// The path can be either a single text file or a directory of text files +val path = "examples/src/main/resources/people.txt" + +val df1 = spark.read.text(path) +df1.show() +// +---+ +// | value| +// +---+ +// |Michael, 29| +// | Andy, 30| +// | Justin, 19| +// +---+ + +// You can use 'lineSep' option to define the line separator. +// If None is set, the line separator handles all `\r`, `\r\n` and `\n` by default. Review comment: and address the same instances too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon edited a comment on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814602384 @itholic please address all leftover comments before requesting another review: https://github.com/apache/spark/pull/32053#discussion_r607762705 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents
HyukjinKwon commented on pull request #32053: URL: https://github.com/apache/spark/pull/32053#issuecomment-814602384 @itholic please address all leftover comments before requesting another review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AngersZh commented on pull request #32073: URL: https://github.com/apache/spark/pull/32073#issuecomment-814598788 FYI @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic
AngersZh opened a new pull request #32073: URL: https://github.com/apache/spark/pull/32073 ### What changes were proposed in this pull request? Rename GroupingSet to GroupingAnalytic ### Why are the changes needed? Refactor class name ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Not need -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal
SparkQA commented on pull request #30145: URL: https://github.com/apache/spark/pull/30145#issuecomment-814596163 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41566/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry
SparkQA removed a comment on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-814491009 **[Test build #136970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136970/testReport)** for PR 31791 at commit [`3e41b61`](https://github.com/apache/spark/commit/3e41b618b1d01e1e36db3fa3b324834718ce38e0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 closed pull request #32041: [SPARK-34701][SQL] Introduce CommandWithAnalyzedChildren for a command to have its children only analyzed but not optimized
imback82 closed pull request #32041: URL: https://github.com/apache/spark/pull/32041 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #32041: [SPARK-34701][SQL] Introduce CommandWithAnalyzedChildren for a command to have its children only analyzed but not optimized
imback82 commented on pull request #32041: URL: https://github.com/apache/spark/pull/32041#issuecomment-814595902 Closing in favor of #32032. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31791: [SPARK-34678][SQL] Add table function registry
SparkQA commented on pull request #31791: URL: https://github.com/apache/spark/pull/31791#issuecomment-814595797 **[Test build #136970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136970/testReport)** for PR 31791 at commit [`3e41b61`](https://github.com/apache/spark/commit/3e41b618b1d01e1e36db3fa3b324834718ce38e0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct
SparkQA commented on pull request #32059: URL: https://github.com/apache/spark/pull/32059#issuecomment-814595654 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41562/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
SparkQA removed a comment on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-814495351 **[Test build #136972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136972/testReport)** for PR 32054 at commit [`140fb72`](https://github.com/apache/spark/commit/140fb72dcc82ffd7d98acfeef6d8fa0d5db05970). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate
SparkQA commented on pull request #32054: URL: https://github.com/apache/spark/pull/32054#issuecomment-814594965 **[Test build #136972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136972/testReport)** for PR 32054 at commit [`140fb72`](https://github.com/apache/spark/commit/140fb72dcc82ffd7d98acfeef6d8fa0d5db05970). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader
sunchao commented on pull request #32071: URL: https://github.com/apache/spark/pull/32071#issuecomment-814594810 Thanks for the review. I'm not sure if these are intended for nested types - they were introduced in #9774 and soon replaced by the vectorized value readers. For nested reader we might want to do something differently instead of reusing the parquet-mr classes, such as decode a batch of def/rep levels at a time according to the batch size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org