date:20200824

[GitHub] [spark] cloud-fan closed pull request #29531: Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…

2020-08-24 Thread GitBox



cloud-fan closed pull request #29531:
URL: https://github.com/apache/spark/pull/29531


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-24 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r476195407



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode {
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
 case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, 
f)
-case f: OffsetWindowFunction => collect("OFFSET", frame, e, f)
+case f: OffsetWindowFunction => if (f.startWithCurrentRow) {
+  collect("OFFSET", frame, e, f)
+} else {
+  collect("WHOLE_OFFSET", frame, e, f)

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29531: Revert "[SPARK-32412][SQL] Unify error handling for spark thrift serv…

2020-08-24 Thread GitBox



cloud-fan commented on pull request #29531:
URL: https://github.com/apache/spark/pull/29531#issuecomment-679786678


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



cloud-fan commented on a change in pull request #29535:
URL: https://github.com/apache/spark/pull/29535#discussion_r476194909



##
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##
@@ -822,6 +821,8 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
*/
   def table(tableName: String): DataFrame = {
 assertNoSpecifiedSchema("table")
+for ((k, v) <- this.extraOptions)
+  sparkSession.conf.set(k, v)

Review comment:
   We need to clearly define the behavior here.
   
   I think the options don't make sense if we read temp view, view, or v1 
tables in the session catalog. For v1 table, they are created with `CREATE 
TABLE  ... USING ... OPTIONS ...`, so the options are already there and it's 
confusing to overwrite them per scan.
   
   We do need to keep the scan options when reading v2 tables, but I don't 
think session conf is the right channel. I think we should put the options in 
`UnresolvedRelation` and apply these options when we resolve 
`UnresolvedRelation` to v2 relations.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r476193874



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  private val referenceRegex = "#\\d+".r
+  private val normalizeRegex = "#\\d+L?".r
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {

Review comment:
   I can rename it to `generateGoldenFile` since we prefer `expected` 
compares to `approved`. 





This is an automated message from the Apache Git Service.
To respond to the message,

[GitHub] [spark] Ngone51 commented on pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



Ngone51 commented on pull request #29270:
URL: https://github.com/apache/spark/pull/29270#issuecomment-679761800


   @HyukjinKwon I can address your comments to the follow-up PR: 
https://github.com/apache/spark/pull/29537



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r476191815



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  private val referenceRegex = "#\\d+".r
+  private val normalizeRegex = "#\\d+L?".r
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan

Review comment:
   Sounds reasonable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



HyukjinKwon removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216


   Should be ready for a look. cc @dongjoon-hyun @gengliangwang 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679735247







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679735247







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679725980


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127871/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



SparkQA commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679731340


   **[Test build #127872 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127872/testReport)**
 for PR 29536 at commit 
[`42d4e5c`](https://github.com/apache/spark/commit/42d4e5cf5c2eeed28a7077e57c85f5e38c84c6d1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679725935


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



HyukjinKwon commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216


   Should be ready for a look. cc @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



HyukjinKwon edited a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679726216


   Should be ready for a look. cc @dongjoon-hyun @gengliangwang 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Explicitly cache and hash 'build' directly in GitHub Actions

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679725935







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



cloud-fan commented on a change in pull request #29522:
URL: https://github.com/apache/spark/pull/29522#discussion_r476186399



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
##
@@ -47,7 +47,8 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] {
   // For example, if a user submits a query like this:
   // `$"struct_col".withField("b", lit(1)).withField("b", 
lit(2)).getField("b")`
   // we want to return `lit(2)` (and not `lit(1)`).
-  matches.last._2
+  val expr = matches.last._2
+  If(IsNull(struct), Literal(null, expr.dataType), expr)

Review comment:
   other optimizer rules will optimize it, but yea it's better to not 
introduce unnecessary plan change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



huaxingao commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-679702212


   cc @MaxGekk @cloud-fan @viirya 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679696722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679696722







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679616301


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127870/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



SparkQA commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679693189


   **[Test build #127871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127871/testReport)**
 for PR 29536 at commit 
[`3900892`](https://github.com/apache/spark/commit/3900892b5b5572bd17953b4a45f823c30a69527d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-679691360







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-679691360







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-679688408







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



SparkQA removed a comment on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-679427286


   **[Test build #127860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127860/testReport)**
 for PR 29535 at commit 
[`b54cc55`](https://github.com/apache/spark/commit/b54cc55c85055ced0582a0e0c513c2e2c1fee504).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-679688408







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29535: [SPARK-32592][SQL] Make DataFrameReader.table take the specified options

2020-08-24 Thread GitBox



SparkQA commented on pull request #29535:
URL: https://github.com/apache/spark/pull/29535#issuecomment-679686309


   **[Test build #127860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127860/testReport)**
 for PR 29535 at commit 
[`b54cc55`](https://github.com/apache/spark/commit/b54cc55c85055ced0582a0e0c513c2e2c1fee504).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



viirya commented on a change in pull request #29522:
URL: https://github.com/apache/spark/pull/29522#discussion_r476183168



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
##
@@ -47,7 +47,8 @@ object SimplifyExtractValueOps extends Rule[LogicalPlan] {
   // For example, if a user submits a query like this:
   // `$"struct_col".withField("b", lit(1)).withField("b", 
lit(2)).getField("b")`
   // we want to return `lit(2)` (and not `lit(1)`).
-  matches.last._2
+  val expr = matches.last._2
+  If(IsNull(struct), Literal(null, expr.dataType), expr)

Review comment:
   I think we need this only if `struct.nullable` is true?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData

2020-08-24 Thread GitBox



SparkQA removed a comment on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-679462245


   **[Test build #127865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127865/testReport)**
 for PR 29527 at commit 
[`82676be`](https://github.com/apache/spark/commit/82676be62c05a67c1eb59b212cf64f0010c37236).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29527: [SPARK-32664][CORE] Switch the log level from info to debug at BlockManager.getLocalBlockData

2020-08-24 Thread GitBox



SparkQA commented on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-679682967


   **[Test build #127865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127865/testReport)**
 for PR 29527 at commit 
[`82676be`](https://github.com/apache/spark/commit/82676be62c05a67c1eb59b212cf64f0010c37236).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679677392







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679677392







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



cloud-fan commented on pull request #29522:
URL: https://github.com/apache/spark/pull/29522#issuecomment-679639309


   thanks, merging to master!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r476178381



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  private val referenceRegex = "#\\d+".r
+  private val normalizeRegex = "#\\d+L?".r
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan

Review comment:
   If `actualSimplifiedPlan` refers `expected` vs `actual`, I think it 
would be nicer if we rename `approved` to `expected`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



cloud-fan closed pull request #29522:
URL: https://github.com/apache/spark/pull/29522


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



viirya commented on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-679633036


   Thanks! @cloud-fan



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476177245



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]
 Batched: true
-Location: InMemoryFileIndex 
[file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales]

Review comment:
   Hm, how do we check the differences by other changes but not caught by 
this test?
   I think we should only result simplified forms here if we test simplified 
forms.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



viirya commented on pull request #29530:
URL: https://github.com/apache/spark/pull/29530#issuecomment-679631947


   Thanks @HyukjinKwon 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r476176111



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  private val referenceRegex = "#\\d+".r
+  private val normalizeRegex = "#\\d+L?".r
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {
+val dir = getDirForTest(name)
+val simplified = getSimplifiedPlan(plan)
+val foundMatch = dir.exists() && isApproved(dir, simplified)
+
+if (!foundMatch) {
+  FileUtils.deleteDirectory(dir)
+  assert(dir.mkdirs())
+
+  val file = new File(dir,

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r476174979



##
File path: sql/core/src/test/scala/org/apache/spark/sql/PlanStabilitySuite.scala
##
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+import java.nio.charset.StandardCharsets
+
+import scala.collection.mutable
+
+import org.apache.commons.io.FileUtils
+
+import org.apache.spark.sql.catalyst.expressions.AttributeSet
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.adaptive.DisableAdaptiveExecutionSuite
+import org.apache.spark.sql.execution.exchange.{Exchange, ReusedExchangeExec}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.tags.ExtendedSQLTest
+
+// scalastyle:off line.size.limit
+/**
+ * Check that TPC-DS SparkPlans don't change.
+ * If there are plan differences, the error message looks like this:
+ *   Plans did not match:
+ *   last approved simplified plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/simplified.txt
+ *   last approved explain plan: 
/path/to/tpcds-plan-stability/approved-plans-xxx/q1/explain.txt
+ *   [last approved simplified plan]
+ *
+ *   actual simplified plan: /path/to/tmp/q1.actual.simplified.txt
+ *   actual explain plan: /path/to/tmp/q1.actual.explain.txt
+ *   [actual simplified plan]
+ *
+ * The explain files are saved to help debug later, they are not checked. Only 
the simplified
+ * plans are checked (by string comparison).
+ *
+ *
+ * To run the entire test suite:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To run a single test file upon change:
+ * {{{
+ *   build/sbt "sql/test-only *PlanStability[WithStats]Suite -- -z 
(tpcds-v1.4/q49)"
+ * }}}
+ *
+ * To re-generate golden files for entire suite, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite"
+ * }}}
+ *
+ * To re-generate golden file for a single test, run:
+ * {{{
+ *   SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*PlanStability[WithStats]Suite -- -z (tpcds-v1.4/q49)"
+ * }}}
+ */
+// scalastyle:on line.size.limit
+trait PlanStabilitySuite extends TPCDSBase with DisableAdaptiveExecutionSuite {
+
+  private val originalMaxToStringFields = conf.maxToStringFields
+
+  override def beforeAll(): Unit = {
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, Int.MaxValue)
+super.beforeAll()
+  }
+
+  override def afterAll(): Unit = {
+super.afterAll()
+conf.setConf(SQLConf.MAX_TO_STRING_FIELDS, originalMaxToStringFields)
+  }
+
+  private val regenerateGoldenFiles: Boolean = 
System.getenv("SPARK_GENERATE_GOLDEN_FILES") == "1"
+
+  protected val baseResourcePath = {
+// use the same way as `SQLQueryTestSuite` to get the resource path
+java.nio.file.Paths.get("src", "test", "resources", 
"tpcds-plan-stability").toFile
+  }
+
+  private val referenceRegex = "#\\d+".r
+  private val normalizeRegex = "#\\d+L?".r
+
+  def goldenFilePath: String
+
+  private def getDirForTest(name: String): File = {
+new File(goldenFilePath, name)
+  }
+
+  private def isApproved(dir: File, actualSimplifiedPlan: String): Boolean = {
+val file = new File(dir, "simplified.txt")
+val approved = FileUtils.readFileToString(file, StandardCharsets.UTF_8)
+approved == actualSimplifiedPlan
+  }
+
+  /**
+   * Serialize and save this SparkPlan.
+   * The resulting file is used by [[checkWithApproved]] to check stability.
+   *
+   * @param planthe SparkPlan
+   * @param namethe name of the query
+   * @param explain the full explain output; this is saved to help debug later 
as the simplified
+   *plan is not too useful for debugging
+   */
+  private def generateApprovedPlanFile(plan: SparkPlan, name: String, explain: 
String): Unit = {

Review comment:
   I think we usually say a golden file instead of a approved file.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-678908136


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-679624551


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



cloud-fan commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476170777



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]

Review comment:
   After a second thought, this might be useful to understand the query and 
debug the test failure, let's keep it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



HyukjinKwon closed pull request #29530:
URL: https://github.com/apache/spark/pull/29530


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



cloud-fan commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476170398



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]
 Batched: true
-Location: InMemoryFileIndex 
[file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales]

Review comment:
   the test only checks the simplified form, the explain result is only for 
debugging purpose.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



HyukjinKwon commented on pull request #29530:
URL: https://github.com/apache/spark/pull/29530#issuecomment-679620235







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679616281


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679616281







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



cloud-fan closed pull request #29513:
URL: https://github.com/apache/spark/pull/29513


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #27908: [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable()

2020-08-24 Thread GitBox



HyukjinKwon closed pull request #27908:
URL: https://github.com/apache/spark/pull/27908


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



cloud-fan commented on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-679614653


   thanks, merging to 3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #27908: [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable()

2020-08-24 Thread GitBox



HyukjinKwon commented on pull request #27908:
URL: https://github.com/apache/spark/pull/27908#issuecomment-679612590


   Merged to master.
   
   Thanks @nchammas.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #27908: [SPARK-31000] Add ability to set table description via Catalog.createTable()

2020-08-24 Thread GitBox



HyukjinKwon commented on pull request #27908:
URL: https://github.com/apache/spark/pull/27908#issuecomment-679606450


   Ah, I think it's okay with `description`.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-24 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r476159993



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowExecBase.scala
##
@@ -137,7 +137,11 @@ trait WindowExecBase extends UnaryExecNode {
   function match {
 case AggregateExpression(f, _, _, _, _) => collect("AGGREGATE", 
frame, e, f)
 case f: AggregateWindowFunction => collect("AGGREGATE", frame, e, 
f)
-case f: OffsetWindowFunction => collect("OFFSET", frame, e, f)
+case f: OffsetWindowFunction => if (f.startWithCurrentRow) {
+  collect("OFFSET", frame, e, f)
+} else {
+  collect("WHOLE_OFFSET", frame, e, f)

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476158718



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
##
@@ -25,16 +25,21 @@ object CSVExprUtils {
* This is currently being used in CSV reading path and CSV schema inference.
*/
   def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
-iter.filter { line =>
-  line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+if (options.isCommentSet) {
+  val commentPrefix = options.comment.toString
+  iter.filter { line =>
+line.trim.nonEmpty && !line.startsWith(commentPrefix)
+  }
+} else {
+  iter.filter(_.trim.nonEmpty)
 }
   }
 
   def skipComments(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
 if (options.isCommentSet) {
   val commentPrefix = options.comment.toString
   iter.dropWhile { line =>
-line.trim.isEmpty || line.trim.startsWith(commentPrefix)
+line.trim.isEmpty || line.startsWith(commentPrefix)

Review comment:
   Oh, NVM. I misread.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-24 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r476153597



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/window/WindowFunctionFrame.scala
##
@@ -77,31 +77,30 @@ object WindowFunctionFrame {
  * @param newMutableProjection function used to create the projection.
  * @param offset by which rows get moved within a partition.
  */
-final class OffsetWindowFunctionFrame(

Review comment:
   We can't uniform the behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679588148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679588148







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-24 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r476152901



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -363,6 +363,12 @@ abstract class OffsetWindowFunction
*/
   val direction: SortDirection
 
+  /**
+   * Whether the offset is start with the current row. If 
`startWithCurrentRow` is false, `offset`
+   * means the offset is start with the first row of the entire window 
partition.
+   */

Review comment:
   OK.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



HyukjinKwon commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476152148



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]
 Batched: true
-Location: InMemoryFileIndex 
[file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSV1_4_PlanStabilityWithStatsSuite/store_sales]

Review comment:
   How did the tests pass before?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679582757


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127869/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-24 Thread GitBox



srowen commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476151826



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala
##
@@ -25,16 +25,21 @@ object CSVExprUtils {
* This is currently being used in CSV reading path and CSV schema inference.
*/
   def filterCommentAndEmpty(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
-iter.filter { line =>
-  line.trim.nonEmpty && !line.startsWith(options.comment.toString)
+if (options.isCommentSet) {
+  val commentPrefix = options.comment.toString
+  iter.filter { line =>
+line.trim.nonEmpty && !line.startsWith(commentPrefix)
+  }
+} else {
+  iter.filter(_.trim.nonEmpty)
 }
   }
 
   def skipComments(iter: Iterator[String], options: CSVOptions): 
Iterator[String] = {
 if (options.isCommentSet) {
   val commentPrefix = options.comment.toString
   iter.dropWhile { line =>
-line.trim.isEmpty || line.trim.startsWith(commentPrefix)
+line.trim.isEmpty || line.startsWith(commentPrefix)

Review comment:
   I think the existing logic matches the logic of 
https://github.com/apache/spark/pull/29516/files#diff-7faa93f00223527237747227998e30f1R27
 ? maybe I'm missing your point. The logic has always been to drop lines that 
are empty after trimming no matter what, regardless of comment char. Right or 
wrong that's separate.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



SparkQA commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679586372


   **[Test build #127870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127870/testReport)**
 for PR 29536 at commit 
[`63032da`](https://github.com/apache/spark/commit/63032daff78ceb8de651fd514339e45bce7b799e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679582734







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679582734


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-24 Thread GitBox



srowen commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476149030



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
##
@@ -220,7 +220,9 @@ class CSVOptions(
 format.setQuote(quote)
 format.setQuoteEscape(escape)
 charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping)
-format.setComment(comment)
+if (isCommentSet) {

Review comment:
   Oh right, this stanza is for writer settings. There is no 
`setCommentProcessingEnabled` for writers in univocity. Comments aren't 
generated. In fact the comment setting doesn't matter, really?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679568877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679568877







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-24 Thread GitBox



srowen commented on a change in pull request #29516:
URL: https://github.com/apache/spark/pull/29516#discussion_r476141555



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala
##
@@ -220,7 +220,9 @@ class CSVOptions(
 format.setQuote(quote)
 format.setQuoteEscape(escape)
 charToEscapeQuoteEscaping.foreach(format.setCharToEscapeQuoteEscaping)
-format.setComment(comment)
+if (isCommentSet) {

Review comment:
   Right, yeah, so we have to use the new method in univocity 2.9.0 to turn 
off its comment handling if its unset in Spark (= `\u`)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



SparkQA commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679567186


   **[Test build #127869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127869/testReport)**
 for PR 29536 at commit 
[`6f78677`](https://github.com/apache/spark/commit/6f78677069bee9847c3064a7b1460f248f44d2ef).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679563541


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127864/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679563520


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29536: [SPARK-32695][INFRA] Add 'build' and 'project/build.properties' into cache key of SBT and Zinc

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29536:
URL: https://github.com/apache/spark/pull/29536#issuecomment-679563520







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #29497: [WIP][SPARK-32670][SQL]Group exception messages in Catalyst Analyzer in one file

2020-08-24 Thread GitBox



viirya commented on pull request #29497:
URL: https://github.com/apache/spark/pull/29497#issuecomment-679549416


   This idea sounds okay. I just have few questions.
   
   1. Is it only for certain exceptions or all exceptions? For example, this PR 
targets only `AnalysisException`.
   
   2. How to organize them? query compilation/query execution + package seems 
okay, but is there possibly unclear cases? For example, `AnalysisException` is 
also thrown in `RunnableCommand`, is it query compilation error and query 
execution error?
   
   3. Is a top level object of error message too big to hold all exceptions in 
the module? For example `org.apache.spark.sql.QueryCompilationErrors` and 
`org.apache.spark.sql.QueryExecutionErrors` might contain many exceptions.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] baohe-zhang commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



baohe-zhang commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679530413


   @HeartSaVioR Thanks for the review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



cloud-fan commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476119880



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]

Review comment:
   I think it's fine





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



Ngone51 commented on a change in pull request #29537:
URL: https://github.com/apache/spark/pull/29537#discussion_r476117593



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt
##
@@ -89,394 +89,394 @@ TakeOrderedAndProject (87)
 
 
 (1) Scan parquet default.store_sales
-Output [4]: [ss_sold_date_sk#1, ss_customer_sk#2, ss_ext_discount_amt#3, 
ss_ext_list_price#4]
+Output [4]: [ss_sold_date_sk#x, ss_customer_sk#x, ss_ext_discount_amt#x, 
ss_ext_list_price#x]

Review comment:
   Sorry for bringing massive changes again.
   
   Do you think we should avoid this change? If it should, I think we could 
make a custom copy of `replaceNotIncludedMsg` in `PlanStabilitySuite` instead 
of resue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29537:
URL: https://github.com/apache/spark/pull/29537#issuecomment-679520909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29537:
URL: https://github.com/apache/spark/pull/29537#issuecomment-679520909







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



SparkQA commented on pull request #29537:
URL: https://github.com/apache/spark/pull/29537#issuecomment-679519576


   **[Test build #127868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127868/testReport)**
 for PR 29537 at commit 
[`6f46616`](https://github.com/apache/spark/commit/6f46616cf7e15bb1e9d02950e32ad69ee0461f1a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



viirya commented on a change in pull request #29530:
URL: https://github.com/apache/spark/pull/29530#discussion_r476115684



##
File path: 
sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##
@@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase {
* @return the builder so far.
*/
   private def buildLeafSearchArgument(
-  dataTypeMap: Map[String, DataType],
+  dataTypeMap: Map[String, OrcPrimitiveField],
   expression: Filter,
   builder: Builder): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
-  getPredicateLeafType(dataTypeMap(attribute))
+  getPredicateLeafType(dataTypeMap(attribute).fieldType)

Review comment:
   It was found if we run the tests with hive-1.2 profile.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



viirya commented on a change in pull request #29530:
URL: https://github.com/apache/spark/pull/29530#discussion_r476115078



##
File path: 
sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##
@@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase {
* @return the builder so far.
*/
   private def buildLeafSearchArgument(
-  dataTypeMap: Map[String, DataType],
+  dataTypeMap: Map[String, OrcPrimitiveField],
   expression: Filter,
   builder: Builder): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
-  getPredicateLeafType(dataTypeMap(attribute))
+  getPredicateLeafType(dataTypeMap(attribute).fieldType)

Review comment:
   Oh, and `getPredicateLeafType`'s scope. It is used in OrcFilterSuite, so 
cannot be just `private`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



Ngone51 commented on pull request #29537:
URL: https://github.com/apache/spark/pull/29537#issuecomment-679513602


   cc: @wangyum @maropu @cloud-fan 
   
   (Note this PR has only updated a few golden files yet for easier review as 
@maropu mentioned previously)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29530:
URL: https://github.com/apache/spark/pull/29530#issuecomment-679511781







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29530:
URL: https://github.com/apache/spark/pull/29530#issuecomment-679511781







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 opened a new pull request #29537: [SPARK-32466][SQL][FOLLOW-UP] Normalize Location info in explain plan

2020-08-24 Thread GitBox



Ngone51 opened a new pull request #29537:
URL: https://github.com/apache/spark/pull/29537


   
   
   ### What changes were proposed in this pull request?
   
   
   1. Extract `SQLQueryTestSuite.replaceNotIncludedMsg` to `PlanTest`.
   
   2. Reuse `replaceNotIncludedMsg` to normalize the explain plan that 
generated in `PlanStabilitySuite`.
   
   
   ### Why are the changes needed?
   
   
   
   Eliminates the personal related information (e.g., local directories) in the 
explain plan.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No.
   
   
   ### How was this patch tested?
   
   
   Updated test.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



SparkQA commented on pull request #29530:
URL: https://github.com/apache/spark/pull/29530#issuecomment-679510051


   **[Test build #127867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127867/testReport)**
 for PR 29530 at commit 
[`50f0037`](https://github.com/apache/spark/commit/50f00379ef043a81448b4fd6e424cf7f715ebf96).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #29530: [SPARK-32646][SQL][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-24 Thread GitBox



viirya commented on a change in pull request #29530:
URL: https://github.com/apache/spark/pull/29530#discussion_r476109029



##
File path: 
sql/core/v1.2/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##
@@ -215,11 +215,11 @@ private[sql] object OrcFilters extends OrcFiltersBase {
* @return the builder so far.
*/
   private def buildLeafSearchArgument(
-  dataTypeMap: Map[String, DataType],
+  dataTypeMap: Map[String, OrcPrimitiveField],
   expression: Filter,
   builder: Builder): Option[Builder] = {
 def getType(attribute: String): PredicateLeaf.Type =
-  getPredicateLeafType(dataTypeMap(attribute))
+  getPredicateLeafType(dataTypeMap(attribute).fieldType)

Review comment:
   yes.

##
File path: 
sql/core/v2.3/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
##
@@ -139,7 +139,7 @@ private[sql] object OrcFilters extends OrcFiltersBase {
   /**
* Get PredicateLeafType which is corresponding to the given DataType.
*/
-  private def getPredicateLeafType(dataType: DataType) = dataType match {
+  private[sql] def getPredicateLeafType(dataType: DataType) = dataType match {

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



HeartSaVioR commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679490978


   Thanks @baohe-zhang for the PR! Merged into master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



HeartSaVioR closed pull request #29509:
URL: https://github.com/apache/spark/pull/29509


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679484677







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679484677







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



SparkQA removed a comment on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679429186


   **[Test build #127861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127861/testReport)**
 for PR 29509 at commit 
[`f83d2df`](https://github.com/apache/spark/commit/f83d2dfced494471f1fb5bdb1268260bb3a5e275).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-24 Thread GitBox



SparkQA commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-679481328


   **[Test build #127861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127861/testReport)**
 for PR 29509 at commit 
[`f83d2df`](https://github.com/apache/spark/commit/f83d2dfced494471f1fb5bdb1268260bb3a5e275).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



AmplabJenkins removed a comment on pull request #29522:
URL: https://github.com/apache/spark/pull/29522#issuecomment-679472632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



AmplabJenkins commented on pull request #29522:
URL: https://github.com/apache/spark/pull/29522#issuecomment-679472632







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #28685: [SPARK-27951][SQL] Support ANSI SQL NTH_VALUE window function

2020-08-24 Thread GitBox



beliefer commented on a change in pull request #28685:
URL: https://github.com/apache/spark/pull/28685#discussion_r476096152



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
##
@@ -474,6 +480,54 @@ case class Lag(input: Expression, offset: Expression, 
default: Expression)
   override val direction = Descending
 }
 
+/**
+ * The NthValue function returns the value of `input` at the `offset`th row 
from beginning of the
+ * window frame. Offsets start at 1. When the value of `input` is null at the 
`offset`th row or
+ * there is no such an `offset`th row, null is returned.
+ *
+ * @param input expression to evaluate `offset`th row of the window frame.
+ * @param offset rows to jump ahead in the partition.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(input[, offset]) - Returns the value of `input` at the row that is 
the `offset`th row
+  from beginning of the window frame. Offsets start at 1. If the value of 
`input` at the
+  `offset`th row is null, null is returned. If there is no such an offset 
row (e.g., when the
+  offset is 10, size of the window frame less than 10), null is returned.
+  """,
+  since = "3.1.0")
+case class NthValue(input: Expression, offset: Expression)
+extends OffsetWindowFunction {
+
+  override val default = Literal(null)
+
+  override val startWithCurrentRow = false
+
+  override val direction = Ascending
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val check = super.checkInputDataTypes()
+if (check.isFailure) {
+  check
+} else if (offset.foldable) {
+  offset.dataType match {
+case IntegerType =>
+  offset.eval().asInstanceOf[Int] match {
+case i: Int if i <= 0 => TypeCheckFailure(
+  s"The 'offset' argument of nth_value must be greater than zero 
but it is $i.")
+case _ => TypeCheckSuccess
+  }
+case _ => TypeCheckFailure(
+  s"The 'offset' parameter must be a int literal but it is 
${offset.dataType}.")

Review comment:
   postgresql: integer
   oracle:integer
   redshift:integer
   vertica:integer





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



SparkQA removed a comment on pull request #29522:
URL: https://github.com/apache/spark/pull/29522#issuecomment-679380548


   **[Test build #127857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127857/testReport)**
 for PR 29522 at commit 
[`df0be4a`](https://github.com/apache/spark/commit/df0be4a93b3aa3d1c2fd32a8301e186de396eab3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #29522: [SPARK-32641][SQL] withField + getField should return null if original struct was null

2020-08-24 Thread GitBox



SparkQA commented on pull request #29522:
URL: https://github.com/apache/spark/pull/29522#issuecomment-679471423


   **[Test build #127857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127857/testReport)**
 for PR 29522 at commit 
[`df0be4a`](https://github.com/apache/spark/commit/df0be4a93b3aa3d1c2fd32a8301e186de396eab3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 570 matches

Mail list logo