[GitHub] [spark] cloud-fan commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


cloud-fan commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608360273



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -26,7 +26,7 @@ import org.apache.spark.sql.types._
 /**
  * A placeholder expression for cube/rollup, which will be replaced by analyzer
  */
-trait GroupingSet extends Expression with CodegenFallback {
+trait GroupingAnalytic extends Expression with CodegenFallback {

Review comment:
   another option is `BaseGroupingSets`. cube/rollup is syntax sugar for 
grouping sets.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814624945


   Looks pretty good otherwise. Make sure updating PR description up to date. I 
will leave it to @srowen, @MaxGekk and @maropu since they are reviewing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608359236



##
File path: docs/sql-data-sources-text.md
##
@@ -0,0 +1,40 @@
+---
+layout: global
+title: Text Files
+displayTitle: Text Files
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+Spark SQL provides `spark.read().text("file_name")` to read a file or 
directory of text files into a Spark DataFrame, and 
`dataframe.write().text("path")` to write to a text file. When reading a text 
file, each line becomes each row that has string "value" column by default. The 
line separator can be changed as shown in the example below. When specifying a 
directory as a file path, make sure that the files included in the directory do 
not contain a format that is inappropriate for reading text, such as ORC or 
Parquet. The `option()` function can be used to customize the behavior of 
reading or writing, such as controlling behavior of the line separator, 
compression, and so on.

Review comment:
   "When specifying a directory as a file path, make sure that the files 
included in the directory do not contain a format that is inappropriate for 
reading text, such as ORC or Parquet" I think this is too much to know or 
document.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608358810



##
File path: docs/sql-data-sources.md
##
@@ -47,6 +47,7 @@ goes into specific options that are available for the 
built-in data sources.
 * [ORC Files](sql-data-sources-orc.html)
 * [JSON Files](sql-data-sources-json.html)
 * [CSV Files](sql-data-sources-csv.html)
+* [TEXT Files](sql-data-sources-text.html)

Review comment:
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608358436



##
File path: docs/sql-data-sources.md
##
@@ -47,6 +47,7 @@ goes into specific options that are available for the 
built-in data sources.
 * [ORC Files](sql-data-sources-orc.html)
 * [JSON Files](sql-data-sources-json.html)
 * [CSV Files](sql-data-sources-csv.html)
+* [TEXT Files](sql-data-sources-text.html)

Review comment:
   TEXT -> Text




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sigmod commented on pull request #32060: [SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping

2021-04-06 Thread GitBox


sigmod commented on pull request #32060:
URL: https://github.com/apache/spark/pull/32060#issuecomment-814622504


   @dbaliafroozeh @hvanhovell @maryannxue @gengliangwang: this PR is ready for 
review. Let me know if you have any questions. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


cloud-fan closed pull request #31791:
URL: https://github.com/apache/spark/pull/31791


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


cloud-fan commented on pull request #31791:
URL: https://github.com/apache/spark/pull/31791#issuecomment-814620919


   The Github Action failures are unrelated and the jenkins passes, I'm merging 
it to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814620180


   **[Test build #136997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)**
 for PR 32053 at commit 
[`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814610651


   **[Test build #136997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)**
 for PR 32053 at commit 
[`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814619990


   @AngersZh please describe why we should rename. The change look 
incomplete and I can't follow why we should rename.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608355354



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -26,7 +26,7 @@ import org.apache.spark.sql.types._
 /**
  * A placeholder expression for cube/rollup, which will be replaced by analyzer
  */
-trait GroupingSet extends Expression with CodegenFallback {
+trait GroupingAnalytic extends Expression with CodegenFallback {

Review comment:
   Do you mean `GroupingAnalytics`? and does it represent all grouping 
analysis including group-bys?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608354977



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -26,7 +26,7 @@ import org.apache.spark.sql.types._
 /**
  * A placeholder expression for cube/rollup, which will be replaced by analyzer
  */
-trait GroupingSet extends Expression with CodegenFallback {
+trait GroupingAnalytic extends Expression with CodegenFallback {
 
   def groupingSets: Seq[Seq[Expression]]
   def selectedGroupByExprs: Seq[Seq[Expression]]

Review comment:
   Error message below "Cannot call GroupingSet.groupByExprs"?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-06 Thread GitBox


sarutak commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r608354674



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
##
@@ -952,6 +952,72 @@ class HiveQuerySuite extends HiveComparisonTest with 
SQLTestUtils with BeforeAnd
 }
   }
 
+  test("SPARK-34977: LIST FILES/JARS/ARCHIVES should handle multiple quoted 
path arguments") {
+withTempDir { dir =>
+  val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
+  val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
+  val file3 = File.createTempFile("someprefix3", "somesuffix 3", dir)
+
+  Files.write(file1.toPath, "file1".getBytes)
+  Files.write(file2.toPath, "file2".getBytes)
+  Files.write(file3.toPath, "file3".getBytes)
+
+  sql(s"ADD FILE ${file1.getAbsolutePath}")
+  sql(s"ADD FILE ${file2.getAbsolutePath}")
+  sql(s"ADD FILE '${file3.getAbsolutePath}'")
+  val listFiles = sql("LIST FILES " +
+s"'${file1.getAbsolutePath}' ${file2.getAbsolutePath} 
'${file3.getAbsolutePath}'")
+  assert(listFiles.count === 3)
+  assert(listFiles.filter(_.getString(0).contains(file1.getName)).count() 
=== 1)
+  assert(listFiles.filter(_.getString(0).contains(file2.getName)).count() 
=== 1)
+  assert(listFiles.filter(
+_.getString(0).contains(file3.getName.replace(" ", "%20"))).count() 
=== 1)
+
+  val file4 = File.createTempFile("someprefix4", "somesuffix4", dir)
+  val file5 = File.createTempFile("someprefix5", "somesuffix5", dir)
+  val file6 = File.createTempFile("someprefix6", "somesuffix6", dir)
+  Files.write(file4.toPath, "file4".getBytes)
+  Files.write(file5.toPath, "file5".getBytes)
+  Files.write(file6.toPath, "file6".getBytes)
+
+  val jarFile1 = new File(dir, "test1.jar")
+  val jarFile2 = new File(dir, "test2.jar")
+  val jarFile3 = new File(dir, "test 3.jar")
+  TestUtils.createJar(Seq(file4), jarFile1)
+  TestUtils.createJar(Seq(file5), jarFile2)
+  TestUtils.createJar(Seq(file6), jarFile3)
+
+  sql(s"ADD ARCHIVE ${jarFile1.getAbsolutePath}")
+  sql(s"ADD ARCHIVE ${jarFile2.getAbsolutePath}#foo")
+  sql(s"ADD ARCHIVE '${jarFile3.getAbsolutePath}'")
+  val listArchives = sql("LIST ARCHIVES " +
+s"'${jarFile1.getAbsolutePath}' ${jarFile2.getAbsolutePath} 
'${jarFile3.getAbsolutePath}'")
+  assert(listArchives.count === 3)
+  
assert(listArchives.filter(_.getString(0).contains(jarFile1.getName)).count() 
=== 1)
+  
assert(listArchives.filter(_.getString(0).contains(jarFile2.getName)).count() 
=== 1)
+  assert(listArchives.filter(
+_.getString(0).contains(jarFile3.getName.replace(" ", "%20"))).count() 
=== 1)
+
+  val file7 = File.createTempFile("someprefix7", "somesuffix7", dir)
+  val file8 = File.createTempFile("someprefix8", "somesuffix8", dir)
+  Files.write(file4.toPath, "file7".getBytes)
+  Files.write(file5.toPath, "file8".getBytes)
+
+  val jarFile4 = new File(dir, "test4.jar")
+  val jarFile5 = new File(dir, "test5.jar")
+  TestUtils.createJar(Seq(file7), jarFile4)
+  TestUtils.createJar(Seq(file8), jarFile5)
+
+  sql(s"ADD JAR ${jarFile4.getAbsolutePath}")
+  sql(s"ADD JAR ${jarFile5.getAbsolutePath}")

Review comment:
   Unlike `ADD FILE "path"` and `ADD ARCHIVE "PATH"`, we cannot execute 
`ADD JAR "path"` when the path contains whitespaces.
   I think it's a bug and #32052 will fix this issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608354526



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -106,34 +106,37 @@ object GroupingSet {
   }
 }
 
-case class Cube(groupingSetIndexes: Seq[Seq[Int]], children: Seq[Expression]) 
extends GroupingSet {
+case class Cube(
+groupingSetIndexes: Seq[Seq[Int]],
+children: Seq[Expression]) extends GroupingAnalytic {
   override def groupingSets: Seq[Seq[Expression]] = 
groupingSetIndexes.map(_.map(children))
-  override def selectedGroupByExprs: Seq[Seq[Expression]] = 
GroupingSet.cubeExprs(groupingSets)
+  override def selectedGroupByExprs: Seq[Seq[Expression]] = 
GroupingAnalytic.cubeExprs(groupingSets)
 }
 
 object Cube {
   def apply(groupingSets: Seq[Seq[Expression]]): Cube = {
-Cube(GroupingSet.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)
+Cube(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)
   }
 }
 
 case class Rollup(
 groupingSetIndexes: Seq[Seq[Int]],
-children: Seq[Expression]) extends GroupingSet {
+children: Seq[Expression]) extends GroupingAnalytic {
   override def groupingSets: Seq[Seq[Expression]] = 
groupingSetIndexes.map(_.map(children))
-  override def selectedGroupByExprs: Seq[Seq[Expression]] = 
GroupingSet.rollupExprs(groupingSets)
+  override def selectedGroupByExprs: Seq[Seq[Expression]] =
+GroupingAnalytic.rollupExprs(groupingSets)
 }
 
 object Rollup {
   def apply(groupingSets: Seq[Seq[Expression]]): Rollup = {
-Rollup(GroupingSet.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)
+Rollup(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)
   }
 }
 
 case class GroupingSets(

Review comment:
   What about this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608354425



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -145,7 +148,7 @@ object GroupingSets {
   def apply(
   groupingSets: Seq[Seq[Expression]],
   userGivenGroupByExprs: Seq[Expression]): GroupingSets = {
-val groupingSetIndexes = 
GroupingSet.computeGroupingSetIndexes(groupingSets)
+val groupingSetIndexes = 
GroupingAnalytic.computeGroupingSetIndexes(groupingSets)

Review comment:
   Shall we rename the variables too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-06 Thread GitBox


imback82 commented on a change in pull request #32032:
URL: https://github.com/apache/spark/pull/32032#discussion_r608354169



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -37,3 +38,35 @@ trait Command extends LogicalPlan {
 trait LeafCommand extends Command with LeafLike[LogicalPlan]
 trait UnaryCommand extends Command with UnaryLike[LogicalPlan]
 trait BinaryCommand extends Command with BinaryLike[LogicalPlan]
+
+/**
+ * A logical node that represents a command whose children are only analyzed, 
but not optimized.
+ */
+trait AnalysisOnlyCommand extends Command {
+  private var _isAnalyzed: Boolean = false
+
+  def childrenToAnalyze: Seq[LogicalPlan]
+
+  override def children: Seq[LogicalPlan] = if (_isAnalyzed) Nil else 
childrenToAnalyze

Review comment:
   Hmm, I don't think we can use case class at this level (e.g., cannot 
have abstract member like `childrenToAnalyze`), right?
   
   If we need to make the node immutable, I think the responsibility should be 
at the concrete command - similar to the first commit I had?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32073:
URL: https://github.com/apache/spark/pull/32073#discussion_r608354243



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -106,34 +106,37 @@ object GroupingSet {
   }
 }
 
-case class Cube(groupingSetIndexes: Seq[Seq[Int]], children: Seq[Expression]) 
extends GroupingSet {
+case class Cube(
+groupingSetIndexes: Seq[Seq[Int]],
+children: Seq[Expression]) extends GroupingAnalytic {
   override def groupingSets: Seq[Seq[Expression]] = 
groupingSetIndexes.map(_.map(children))
-  override def selectedGroupByExprs: Seq[Seq[Expression]] = 
GroupingSet.cubeExprs(groupingSets)
+  override def selectedGroupByExprs: Seq[Seq[Expression]] = 
GroupingAnalytic.cubeExprs(groupingSets)
 }
 
 object Cube {
   def apply(groupingSets: Seq[Seq[Expression]]): Cube = {
-Cube(GroupingSet.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)
+Cube(GroupingAnalytic.computeGroupingSetIndexes(groupingSets), 
groupingSets.flatten)

Review comment:
   Should we rename `computeGroupingSetIndexes` -> 
`computeGroupingAnalyticIndexes` too?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping

2021-04-06 Thread GitBox


SparkQA commented on pull request #32060:
URL: https://github.com/apache/spark/pull/32060#issuecomment-814617613


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41564/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sarutak opened a new pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

2021-04-06 Thread GitBox


sarutak opened a new pull request #32074:
URL: https://github.com/apache/spark/pull/32074


   ### What changes were proposed in this pull request?
   
   This PR fixes an issue that `LIST {FILES/JARS/ARCHIVES} path1, path2, ...` 
cannot list all paths if at least one path is quoted.
   An example here.
   ```
   ADD FILE /tmp/test1;
   ADD FILE /tmp/test2;
   
   LIST FILES /tmp/test1 /tmp/test2;
   file:/tmp/test1
   file:/tmp/test2
   
   LIST FILES /tmp/test1 "/tmp/test2";
   file:/tmp/test2
   ```
   
   In this example, the second `LIST FILES` doesn't show `file:/tmp/test1`.
   
   To resolve this issue, I modified the syntax rule to be able to handle this 
case.
   I also changed `SparkSQLParser` to be able to handle paths which contains 
white spaces.
   
   ### Why are the changes needed?
   
   This is a bug.
   I also have a plan which extends `ADD FILE/JAR/ARCHIVE` to take multiple 
paths like Hive and the syntax rule change is necessary for that.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. Users can pass quoted paths when using `ADD FILE/JAR/ARCHIVE`.
   
   ### How was this patch tested?
   
   New test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814609011


   **[Test build #136995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)**
 for PR 32053 at commit 
[`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814617263


   **[Test build #136995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)**
 for PR 32053 at commit 
[`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


cloud-fan commented on a change in pull request #31791:
URL: https://github.com/apache/spark/pull/31791#discussion_r608353136



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -83,15 +85,94 @@ trait FunctionRegistry {
 
   /** Clear all registered functions. */
   def clear(): Unit
+}
 
-  /** Create a copy of this registry with identical functions as this 
registry. */
-  override def clone(): FunctionRegistry = throw new 
CloneNotSupportedException()
+object FunctionRegistryBase {
+
+  /**
+   * Return an expression info and a function builder for the function as 
defined by
+   * T using the given name.
+   */
+  def build[T : ClassTag](name: String): (ExpressionInfo, Seq[Expression] => 
T) = {

Review comment:
   You are right, I missed this part.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-06 Thread GitBox


cloud-fan commented on a change in pull request #32054:
URL: https://github.com/apache/spark/pull/32054#discussion_r608352820



##
File path: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
##
@@ -1765,4 +1765,35 @@ class SubquerySuite extends QueryTest with 
SharedSparkSession with AdaptiveSpark
   }
 }
   }
+
+  test("SPARK-34946: correlated scalar subquery in grouping expressions only") 
{

Review comment:
   ah I see, then +1!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AngersZh commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r608352704



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -598,8 +598,8 @@ class Analyzer(override val catalogManager: CatalogManager)
   val aggForResolving = h.child match {
 // For CUBE/ROLLUP expressions, to avoid resolving repeatedly, here we 
delete them from
 // groupingExpressions for condition resolving.
-case a @ Aggregate(Seq(gs: GroupingSet), _, _) =>
-  a.copy(groupingExpressions = gs.groupByExprs)
+case a @ Aggregate(Seq(gs: GroupingAnalytic), _, _) =>
+  a.copy(groupingExpressions =gs.groupingSets, gs.groupByExprs)

Review comment:
   > nit: one space after `=`
   
   Mistake when merge code, done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AngersZh commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r608352609



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1787,16 +1787,41 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))
 Aggregate(newGroups, aggs, child)
 }
+
+private def containUnresolvedOrdinal(e: Expression): Boolean = e match {
+  case _: UnresolvedOrdinal => true
+  case Cube(_, groupByExprs) => 
groupByExprs.exists(containUnresolvedOrdinal)
+  case Rollup(_, groupByExprs) => 
groupByExprs.exists(containUnresolvedOrdinal)
+  case GroupingSets(_, flatGroupingSets, groupByExprs) =>
+flatGroupingSets.exists(containUnresolvedOrdinal) ||
+  groupByExprs.exists(containUnresolvedOrdinal)

Review comment:
   > Can we simply do `case a: GroupingAnalytic a.children.exists...`?
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AngersZh commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r608352564



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1787,16 +1787,41 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // Replace the index with the corresponding expression in 
aggregateExpressions. The index is
   // a 1-base position of aggregateExpressions, which is output columns 
(select expression)
   case Aggregate(groups, aggs, child) if aggs.forall(_.resolved) &&
-groups.exists(_.isInstanceOf[UnresolvedOrdinal]) =>
-val newGroups = groups.map {
-  case u @ UnresolvedOrdinal(index) if index > 0 && index <= aggs.size 
=>
-aggs(index - 1)
-  case ordinal @ UnresolvedOrdinal(index) =>
-throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
-  case o => o
-}
+groups.exists(containUnresolvedOrdinal) =>
+val newGroups = groups.map((resolveGroupByExpressionOrdinal(_, aggs)))
 Aggregate(newGroups, aggs, child)
 }
+
+private def containUnresolvedOrdinal(e: Expression): Boolean = e match {
+  case _: UnresolvedOrdinal => true
+  case Cube(_, groupByExprs) => 
groupByExprs.exists(containUnresolvedOrdinal)
+  case Rollup(_, groupByExprs) => 
groupByExprs.exists(containUnresolvedOrdinal)
+  case GroupingSets(_, flatGroupingSets, groupByExprs) =>
+flatGroupingSets.exists(containUnresolvedOrdinal) ||
+  groupByExprs.exists(containUnresolvedOrdinal)
+  case _ => false
+}
+
+private def resolveGroupByExpressionOrdinal(
+expr: Expression,
+aggs: Seq[Expression]): Expression = expr match {
+  case ordinal @ UnresolvedOrdinal(index) =>
+if (index > 0 && index <= aggs.size) {
+  aggs(index - 1)
+} else {
+  throw QueryCompilationErrors.groupByPositionRangeError(index, 
aggs.size, ordinal)
+}
+  case cube @ Cube(_, groupByExprs) =>

Review comment:
   > how about using `expr.withNewChildren`>
   
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AngersZh commented on a change in pull request #30145:
URL: https://github.com/apache/spark/pull/30145#discussion_r608352335



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/grouping.scala
##
@@ -144,12 +147,12 @@ case class GroupingSets(
 object GroupingSets {
   def apply(
   groupingSets: Seq[Seq[Expression]],
-  userGivenGroupByExprs: Seq[Expression]): GroupingSets = {
-val groupingSetIndexes = 
GroupingSet.computeGroupingSetIndexes(groupingSets)
+  userGivenGroupByExprs: Seq[Expression]): GroupingAnalytic = {
+val groupingSetIndexes = 
GroupingAnalytic.computeGroupingSetIndexes(groupingSets)
 GroupingSets(groupingSetIndexes, groupingSets.flatten, 
userGivenGroupByExprs)
   }
 
-  def apply(groupingSets: Seq[Seq[Expression]]): GroupingSets = {
+  def apply(groupingSets: Seq[Seq[Expression]]): GroupingAnalytic = {

Review comment:
   > we can probably do the rename in a separate PR.
   
   Done https://github.com/apache/spark/pull/32073

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
##
@@ -42,10 +52,19 @@ object SubstituteUnresolvedOrdinals extends 
Rule[LogicalPlan] {
   }
   withOrigin(s.origin)(s.copy(order = newOrders))
 
-case a: Aggregate if conf.groupByOrdinal && 
a.groupingExpressions.exists(isIntLiteral) =>
+case a: Aggregate if conf.groupByOrdinal && 
a.groupingExpressions.exists(containIntLiteral) =>
   val newGroups = a.groupingExpressions.map {
 case ordinal @ Literal(index: Int, IntegerType) =>
   withOrigin(ordinal.origin)(UnresolvedOrdinal(index))
+case cube @ Cube(_, children) =>
+  withOrigin(cube.origin)(cube.copy(children = 
children.map(substituteUnresolvedOrdinal)))
+case rollup @ Rollup(_, children) =>
+  withOrigin(rollup.origin)(rollup.copy(
+children = children.map(substituteUnresolvedOrdinal)))
+case groupingSets @ GroupingSets(_, flatGroupingSets, groupByExprs) =>
+  withOrigin(groupingSets.origin)(groupingSets.copy(
+flatGroupingSets = 
flatGroupingSets.map(substituteUnresolvedOrdinal),
+groupByExprs = groupByExprs.map(substituteUnresolvedOrdinal)))

Review comment:
   > ditto, we can use `withNewChildren`
   
   Yea

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinals.scala
##
@@ -27,13 +27,23 @@ import org.apache.spark.sql.types.IntegerType
  * Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal 
expression.
  */
 object SubstituteUnresolvedOrdinals extends Rule[LogicalPlan] {
-  private def isIntLiteral(e: Expression) = e match {
+  private def containIntLiteral(e: Expression): Boolean = e match {
 case Literal(_, IntegerType) => true
+case Cube(_, groupByExprs) => groupByExprs.exists(containIntLiteral)
+case Rollup(_, groupByExprs) => groupByExprs.exists(containIntLiteral)
+case GroupingSets(_, flatGroupingSets, groupByExprs) =>
+  flatGroupingSets.exists(containIntLiteral) || 
groupByExprs.exists(containIntLiteral)

Review comment:
   > ditto, we can use `children`
   
   Yea




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814615642


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41571/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping

2021-04-06 Thread GitBox


SparkQA commented on pull request #32060:
URL: https://github.com/apache/spark/pull/32060#issuecomment-814615001


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41564/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814609505


   **[Test build #136996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)**
 for PR 30145 at commit 
[`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814612779


   **[Test build #136996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)**
 for PR 30145 at commit 
[`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32066: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32066:
URL: https://github.com/apache/spark/pull/32066#issuecomment-814612482


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41563/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32066: [SPARK-34970][SQL][SERCURITY] Redact map-type options in the output of explain()

2021-04-06 Thread GitBox


SparkQA commented on pull request #32066:
URL: https://github.com/apache/spark/pull/32066#issuecomment-814612454






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32070: [SPARK-34668][SQL] Support casting of day-time intervals to strings

2021-04-06 Thread GitBox


cloud-fan commented on a change in pull request #32070:
URL: https://github.com/apache/spark/pull/32070#discussion_r608349143



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
##
@@ -818,6 +818,31 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkConsistencyBetweenInterpretedAndCodegen(
   (child: Expression) => Cast(child, StringType), YearMonthIntervalType)
   }
+
+  test("SPARK-34668: cast day-time interval to string") {
+Seq(
+  Duration.ZERO -> "0 0:0:0",
+  Duration.of(1, ChronoUnit.MICROS) -> "0 0:0:0.01",
+  Duration.ofMillis(-1) -> "-0 0:0:0.001",
+  Duration.ofMillis(1234) -> "0 0:0:1.234",
+  Duration.ofSeconds(-59).minus(99, ChronoUnit.MICROS) -> "-0 
0:0:59.99",
+  Duration.ofMinutes(30).plusMillis(10) -> "0 0:30:0.01",
+  Duration.ofHours(-23).minusSeconds(59) -> "-0 23:0:59",

Review comment:
   that's the literal syntax, which is supposed to be more flexible. How 
about the cast behavior?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814611799


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41567/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


SparkQA commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814611787


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41567/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814611799


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41567/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-06 Thread GitBox


cloud-fan commented on a change in pull request #32032:
URL: https://github.com/apache/spark/pull/32032#discussion_r608348517



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -37,3 +38,35 @@ trait Command extends LogicalPlan {
 trait LeafCommand extends Command with LeafLike[LogicalPlan]
 trait UnaryCommand extends Command with UnaryLike[LogicalPlan]
 trait BinaryCommand extends Command with BinaryLike[LogicalPlan]
+
+/**
+ * A logical node that represents a command whose children are only analyzed, 
but not optimized.
+ */
+trait AnalysisOnlyCommand extends Command {
+  private var _isAnalyzed: Boolean = false
+
+  def childrenToAnalyze: Seq[LogicalPlan]
+
+  override def children: Seq[LogicalPlan] = if (_isAnalyzed) Nil else 
childrenToAnalyze

Review comment:
   This is tricky because the `children` can change dynamically. I was 
expecting to put `isAnalyzed` as a case class parameter, so `markAsAnalyzed` 
creates a new copy and the plan node is still immutable. We can avoid changing 
`TreeNode` if we do so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814610651


   **[Test build #136997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136997/testReport)**
 for PR 32053 at commit 
[`0415cd8`](https://github.com/apache/spark/commit/0415cd87bfcc3fa82915fae9bac7417204aa962a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608347743



##
File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession 
spark) {
 // $example off:csv_dataset$
   }
 
+  private static void runTextDatasetExample(SparkSession spark) {
+// $example on:text_dataset$
+// A text dataset is pointed to by path.
+// The path can be either a single text file or a directory of text files
+String path = "examples/src/main/resources/people.text";
+
+Dataset df1 = spark.read().text(path);
+df1.show();
+// +---+
+// |  value|
+// +---+
+// |Michael, 29|
+// |   Andy, 30|
+// | Justin, 19|
+// +---+
+
+// You can use 'lineSep' option to define the line separator.
+// If None is set, it covers all `\r`, `\r\n` and `\n` (default).
+Dataset df2 = spark.read().option("lineSep", ",").text(path);
+df2.show();
+// +---+
+// |  value|
+// +---+
+// |Michael|
+// |   29\nAndy|
+// | 30\nJustin|
+// |   19\n|
+// +---+
+
+// You can also use 'wholetext' option to read each input file as a single 
row.
+Dataset df3 = spark.read().option("wholetext", "true").text(path);
+df3.show();
+//  ++
+//  |   value|
+//  ++
+//  |Michael, 29\nAndy...|
+//  ++
+
+// "output" is a folder which contains multiple text files and a _SUCCESS 
file.
+df1.write().text("output");
+
+// You can specify the compression format using the 'compression' option.
+df1.write().option("compression", "gzip").text("output_compressed");
+
+// Read all files in a folder.
+String folderPath = "examples/src/main/resources";
+Dataset df = spark.read().text(folderPath);
+df.show();
+// +---+
+// |  value|
+// +---+
+// |238val_238|

Review comment:
   Thanks!
   Just removed this from the examples block, and rather add the more comments 
to the main contents block.
   (because we already have the case for read one proper text file above)
   
   "When specifying a directory as a file path, make sure that the files 
included in the directory do not contain a format that is inappropriate for 
reading text, such as ORC or Parquet"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608347743



##
File path: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
##
@@ -389,6 +392,67 @@ private static void runCsvDatasetExample(SparkSession 
spark) {
 // $example off:csv_dataset$
   }
 
+  private static void runTextDatasetExample(SparkSession spark) {
+// $example on:text_dataset$
+// A text dataset is pointed to by path.
+// The path can be either a single text file or a directory of text files
+String path = "examples/src/main/resources/people.text";
+
+Dataset df1 = spark.read().text(path);
+df1.show();
+// +---+
+// |  value|
+// +---+
+// |Michael, 29|
+// |   Andy, 30|
+// | Justin, 19|
+// +---+
+
+// You can use 'lineSep' option to define the line separator.
+// If None is set, it covers all `\r`, `\r\n` and `\n` (default).
+Dataset df2 = spark.read().option("lineSep", ",").text(path);
+df2.show();
+// +---+
+// |  value|
+// +---+
+// |Michael|
+// |   29\nAndy|
+// | 30\nJustin|
+// |   19\n|
+// +---+
+
+// You can also use 'wholetext' option to read each input file as a single 
row.
+Dataset df3 = spark.read().option("wholetext", "true").text(path);
+df3.show();
+//  ++
+//  |   value|
+//  ++
+//  |Michael, 29\nAndy...|
+//  ++
+
+// "output" is a folder which contains multiple text files and a _SUCCESS 
file.
+df1.write().text("output");
+
+// You can specify the compression format using the 'compression' option.
+df1.write().option("compression", "gzip").text("output_compressed");
+
+// Read all files in a folder.
+String folderPath = "examples/src/main/resources";
+Dataset df = spark.read().text(folderPath);
+df.show();
+// +---+
+// |  value|
+// +---+
+// |238val_238|

Review comment:
   Thanks!
   Just removed this from the examples block, and rather add the more comments 
to the main contents block.
   (because we already have the case for read one proper text file above)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814609505


   **[Test build #136996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136996/testReport)**
 for PR 30145 at commit 
[`ff6794e`](https://github.com/apache/spark/commit/ff6794eb5387b6e83bfd3875884df02b75b0fafd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #32070: [SPARK-34668][SQL] Support casting of day-time intervals to strings

2021-04-06 Thread GitBox


MaxGekk commented on a change in pull request #32070:
URL: https://github.com/apache/spark/pull/32070#discussion_r608347206



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
##
@@ -818,6 +818,31 @@ abstract class CastSuiteBase extends SparkFunSuite with 
ExpressionEvalHelper {
 checkConsistencyBetweenInterpretedAndCodegen(
   (child: Expression) => Cast(child, StringType), YearMonthIntervalType)
   }
+
+  test("SPARK-34668: cast day-time interval to string") {
+Seq(
+  Duration.ZERO -> "0 0:0:0",
+  Duration.of(1, ChronoUnit.MICROS) -> "0 0:0:0.01",
+  Duration.ofMillis(-1) -> "-0 0:0:0.001",
+  Duration.ofMillis(1234) -> "0 0:0:1.234",
+  Duration.ofSeconds(-59).minus(99, ChronoUnit.MICROS) -> "-0 
0:0:59.99",
+  Duration.ofMinutes(30).plusMillis(10) -> "0 0:30:0.01",
+  Duration.ofHours(-23).minusSeconds(59) -> "-0 23:0:59",

Review comment:
   > Have we checked with other databases?
   
   For example, Oracle doesn't prepend zero for hours, see the 
[doc](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Literals.html#GUID-49FADC66-794D-4763-88C7-B81BB4F26D9E):
   ```
   INTERVAL '4 5:12:10.222' DAY TO SECOND
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


itholic commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608347059



##
File path: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
##
@@ -309,6 +310,67 @@ object SQLDataSourceExample {
 // $example off:csv_dataset$
   }
 
+  private def runTextDatasetExample(spark: SparkSession): Unit = {
+// $example on:text_dataset$
+// A text dataset is pointed to by path.
+// The path can be either a single text file or a directory of text files
+val path = "examples/src/main/resources/people.txt"
+
+val df1 = spark.read.text(path)
+df1.show()
+// +---+
+// |  value|
+// +---+
+// |Michael, 29|
+// |   Andy, 30|
+// | Justin, 19|
+// +---+
+
+// You can use 'lineSep' option to define the line separator.
+// If None is set, the line separator handles all `\r`, `\r\n` and `\n` by 
default.

Review comment:
   Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814609011


   **[Test build #136995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136995/testReport)**
 for PR 32053 at commit 
[`f6198b7`](https://github.com/apache/spark/commit/f6198b7455b543f2b4eea6f429586198c8ec3229).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814608951


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136990/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814604786


   **[Test build #136990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)**
 for PR 32073 at commit 
[`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


SparkQA commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814608927


   **[Test build #136990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)**
 for PR 32073 at commit 
[`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait GroupingAnalytic extends Expression with CodegenFallback `
 * `case class Cube(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814608951


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136990/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814607938


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136994/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814605230


   **[Test build #136994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)**
 for PR 30145 at commit 
[`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814607938


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136994/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814607685


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41565/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814607918


   **[Test build #136994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)**
 for PR 30145 at commit 
[`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814607685


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41565/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


SparkQA commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814607655






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-04-06 Thread GitBox


beliefer commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-814607415


   cc @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814605230


   **[Test build #136994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136994/testReport)**
 for PR 30145 at commit 
[`a013120`](https://github.com/apache/spark/commit/a013120c8e9f0bdfb6eac91b3ed881059577d855).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32068:
URL: https://github.com/apache/spark/pull/32068#issuecomment-814605090


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136974/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32068:
URL: https://github.com/apache/spark/pull/32068#issuecomment-814605090


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136974/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31974: [SPARK-34877][CORE][YARN]Add the code change for adding the Spark AM log link in spark UI

2021-04-06 Thread GitBox


SparkQA commented on pull request #31974:
URL: https://github.com/apache/spark/pull/31974#issuecomment-814604872


   **[Test build #136993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136993/testReport)**
 for PR 31974 at commit 
[`d5ca5e2`](https://github.com/apache/spark/commit/d5ca5e2f9763f087b3af7f26f3a6f4c58e89cfb9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-04-06 Thread GitBox


allisonwang-db commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-814604930


   Looks good to me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down

2021-04-06 Thread GitBox


SparkQA commented on pull request #32061:
URL: https://github.com/apache/spark/pull/32061#issuecomment-814604810


   **[Test build #136991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136991/testReport)**
 for PR 32061 at commit 
[`dcc16a1`](https://github.com/apache/spark/commit/dcc16a1cc6aa7d5f7323f77f44a766a5f6e785bd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32032: [SPARK-34701][SQL] Introduce AnalysisOnlyCommand that allows its children to be removed once the command is marked as analyzed.

2021-04-06 Thread GitBox


SparkQA commented on pull request #32032:
URL: https://github.com/apache/spark/pull/32032#issuecomment-814604823


   **[Test build #136992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136992/testReport)**
 for PR 32032 at commit 
[`acb74a1`](https://github.com/apache/spark/commit/acb74a115972fa5e45e9f212760b09e1c18bd462).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


SparkQA commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814604786


   **[Test build #136990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136990/testReport)**
 for PR 32073 at commit 
[`e77c289`](https://github.com/apache/spark/commit/e77c289c297ae0e91787415eab5b8bea4c17d158).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] allisonwang-db commented on a change in pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-04-06 Thread GitBox


allisonwang-db commented on a change in pull request #31920:
URL: https://github.com/apache/spark/pull/31920#discussion_r608343436



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala
##
@@ -303,4 +303,64 @@ object QueryParsingErrors {
 new ParseException(s"Found duplicate keys '$key'.", ctx)
   }
 
+  def formatForSetConfigurationUnExpectedError(ctx: SetConfigurationContext): 
Throwable = {
+new ParseException(
+  s"""
+ |Expected format is 'SET', 'SET key', or 'SET key=value'. If you want 
to include
+ |special characters in key, or include semicolon in value, please use 
quotes,
+ |e.g., SET `ke y`=`v;alue`.
+   """.stripMargin.replaceAll("\n", " "), ctx)
+  }
+
+  def invalidPropertyKeyForSetQuotedConfigurationError(
+  keyCandidate: String, valueStr: String, ctx: 
SetQuotedConfigurationContext): Throwable = {
+new ParseException(s"'$keyCandidate' is an invalid property key, please " +
+  s"use quotes, e.g. SET `$keyCandidate`=`$valueStr`", ctx)
+  }
+
+  def invalidPropertyValueForSetQuotedConfigurationError(

Review comment:
   Sounds good. We can emphasize the property key/value should be quoted. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814604206


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136988/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32071:
URL: https://github.com/apache/spark/pull/32071#issuecomment-814604207


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32067: [SPARK-34962][SQL] Explicit representation of * in UpdateAction and InsertAction in MergeIntoTable

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32067:
URL: https://github.com/apache/spark/pull/32067#issuecomment-814604208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136969/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814604212


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-814604209


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136972/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #31791:
URL: https://github.com/apache/spark/pull/31791#issuecomment-814604214


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping

2021-04-06 Thread GitBox


AmplabJenkins removed a comment on pull request #32060:
URL: https://github.com/apache/spark/pull/32060#issuecomment-814604210






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814604206


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136988/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-814604209


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136972/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32071:
URL: https://github.com/apache/spark/pull/32071#issuecomment-814604207


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136973/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32067: [SPARK-34962][SQL] Explicit representation of * in UpdateAction and InsertAction in MergeIntoTable

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32067:
URL: https://github.com/apache/spark/pull/32067#issuecomment-814604208


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136969/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #31791:
URL: https://github.com/apache/spark/pull/31791#issuecomment-814604214


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136970/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814604212


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814604213


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32060: [WIP][SPARK-34916][SQL] Add condition lambda and rule id to the transform family for early stopping

2021-04-06 Thread GitBox


AmplabJenkins commented on pull request #32060:
URL: https://github.com/apache/spark/pull/32060#issuecomment-814604210






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #32068:
URL: https://github.com/apache/spark/pull/32068#issuecomment-814510451


   **[Test build #136974 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136974/testReport)**
 for PR 32068 at commit 
[`89923f6`](https://github.com/apache/spark/commit/89923f662ae23e642774574db47fb8a8af95cae7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32068: [SPARK-34910][SQL] Add an option for different stride orders

2021-04-06 Thread GitBox


SparkQA commented on pull request #32068:
URL: https://github.com/apache/spark/pull/32068#issuecomment-814604077


   **[Test build #136974 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136974/testReport)**
 for PR 32068 at commit 
[`89923f6`](https://github.com/apache/spark/commit/89923f662ae23e642774574db47fb8a8af95cae7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608341903



##
File path: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
##
@@ -309,6 +310,67 @@ object SQLDataSourceExample {
 // $example off:csv_dataset$
   }
 
+  private def runTextDatasetExample(spark: SparkSession): Unit = {
+// $example on:text_dataset$
+// A text dataset is pointed to by path.
+// The path can be either a single text file or a directory of text files
+val path = "examples/src/main/resources/people.txt"
+
+val df1 = spark.read.text(path)
+df1.show()
+// +---+
+// |  value|
+// +---+
+// |Michael, 29|
+// |   Andy, 30|
+// | Justin, 19|
+// +---+
+
+// You can use 'lineSep' option to define the line separator.
+// If None is set, it covers all `\r`, `\r\n` and `\n` (default).

Review comment:
   @itholic please don't resolve the comment if you did not address.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on a change in pull request #32053:
URL: https://github.com/apache/spark/pull/32053#discussion_r608341605



##
File path: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
##
@@ -309,6 +310,67 @@ object SQLDataSourceExample {
 // $example off:csv_dataset$
   }
 
+  private def runTextDatasetExample(spark: SparkSession): Unit = {
+// $example on:text_dataset$
+// A text dataset is pointed to by path.
+// The path can be either a single text file or a directory of text files
+val path = "examples/src/main/resources/people.txt"
+
+val df1 = spark.read.text(path)
+df1.show()
+// +---+
+// |  value|
+// +---+
+// |Michael, 29|
+// |   Andy, 30|
+// | Justin, 19|
+// +---+
+
+// You can use 'lineSep' option to define the line separator.
+// If None is set, the line separator handles all `\r`, `\r\n` and `\n` by 
default.

Review comment:
   and address the same instances too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon edited a comment on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814602384


   @itholic please address all leftover comments before requesting another 
review: https://github.com/apache/spark/pull/32053#discussion_r607762705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32053: [SPARK-34493][DOCS] Add "TEXT Files" page for Data Source documents

2021-04-06 Thread GitBox


HyukjinKwon commented on pull request #32053:
URL: https://github.com/apache/spark/pull/32053#issuecomment-814602384


   @itholic please address all leftover comments before requesting another 
review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AngersZh commented on pull request #32073:
URL: https://github.com/apache/spark/pull/32073#issuecomment-814598788


   FYI @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu opened a new pull request #32073: [SPARK-34976][SQL] Rename GroupingSet to GroupingAnalytic

2021-04-06 Thread GitBox


AngersZh opened a new pull request #32073:
URL: https://github.com/apache/spark/pull/32073


   ### What changes were proposed in this pull request?
   Rename GroupingSet to GroupingAnalytic
   
   ### Why are the changes needed?
   Refactor class name
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Not need


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30145: [SPARK-33233][SQL]CUBE/ROLLUP/GROUPING SETS support GROUP BY ordinal

2021-04-06 Thread GitBox


SparkQA commented on pull request #30145:
URL: https://github.com/apache/spark/pull/30145#issuecomment-814596163


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41566/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #31791:
URL: https://github.com/apache/spark/pull/31791#issuecomment-814491009


   **[Test build #136970 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136970/testReport)**
 for PR 31791 at commit 
[`3e41b61`](https://github.com/apache/spark/commit/3e41b618b1d01e1e36db3fa3b324834718ce38e0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 closed pull request #32041: [SPARK-34701][SQL] Introduce CommandWithAnalyzedChildren for a command to have its children only analyzed but not optimized

2021-04-06 Thread GitBox


imback82 closed pull request #32041:
URL: https://github.com/apache/spark/pull/32041


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on pull request #32041: [SPARK-34701][SQL] Introduce CommandWithAnalyzedChildren for a command to have its children only analyzed but not optimized

2021-04-06 Thread GitBox


imback82 commented on pull request #32041:
URL: https://github.com/apache/spark/pull/32041#issuecomment-814595902


   Closing in favor of #32032.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31791: [SPARK-34678][SQL] Add table function registry

2021-04-06 Thread GitBox


SparkQA commented on pull request #31791:
URL: https://github.com/apache/spark/pull/31791#issuecomment-814595797


   **[Test build #136970 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136970/testReport)**
 for PR 31791 at commit 
[`3e41b61`](https://github.com/apache/spark/commit/3e41b618b1d01e1e36db3fa3b324834718ce38e0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32059: [SPARK-34963][SQL] Fix nested column pruning for extracting case-insensitive struct field from array of struct

2021-04-06 Thread GitBox


SparkQA commented on pull request #32059:
URL: https://github.com/apache/spark/pull/32059#issuecomment-814595654


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41562/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-06 Thread GitBox


SparkQA removed a comment on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-814495351


   **[Test build #136972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136972/testReport)**
 for PR 32054 at commit 
[`140fb72`](https://github.com/apache/spark/commit/140fb72dcc82ffd7d98acfeef6d8fa0d5db05970).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32054: [SPARK-34946][SQL] Block unsupported correlated scalar subquery in Aggregate

2021-04-06 Thread GitBox


SparkQA commented on pull request #32054:
URL: https://github.com/apache/spark/pull/32054#issuecomment-814594965


   **[Test build #136972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136972/testReport)**
 for PR 32054 at commit 
[`140fb72`](https://github.com/apache/spark/commit/140fb72dcc82ffd7d98acfeef6d8fa0d5db05970).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #32071: [SPARK-34973][SQL] Cleanup unused fields and methods in vectorized Parquet reader

2021-04-06 Thread GitBox


sunchao commented on pull request #32071:
URL: https://github.com/apache/spark/pull/32071#issuecomment-814594810


   Thanks for the review. I'm not sure if these are intended for nested types - 
they were introduced in #9774 and soon replaced by the vectorized value 
readers. For nested reader we might want to do something differently instead of 
reusing the parquet-mr classes, such as decode a batch of def/rep levels at a 
time according to the batch size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >