[GitHub] [spark] ScrapCodes commented on pull request #29334: [SPARK-32495][2.4] Update jackson versions to a maintained release, to fix various security vulnerabilities.

2020-08-23 Thread GitBox


ScrapCodes commented on pull request #29334:
URL: https://github.com/apache/spark/pull/29334#issuecomment-678919877


   Thank you @cowtowncoder, @srowen and @Fokko. Indeed, the Security 
vulnerabilities serve the purpose of generating the false alarm only and do not 
apply to spark, however if some client application depends on Spark and uses 
jackson-databind, they need to deal with security issues on their own. 
   
   Best thing to do is upgrade to 3.0, but it is sort of difficult to upgrade 
for folks who have recently upgraded to Spark 2.4.x . This is also the reason 
we are still maintaining the release version 2.4.x. Lot of great suggestions 
have chimed in, shading the jar comes with it's own set of complexity. I am not 
absolutely sure, but If we cannot upgrade as is, I had suggest we can 
re-consider this later.
   
   Thanks again everyone for chiming in and providing valuable suggestions.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678918248







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678918248







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


maropu commented on a change in pull request #29421:
URL: https://github.com/apache/spark/pull/29421#discussion_r475356961



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala
##
@@ -182,7 +182,11 @@ class HiveScriptTransformationSuite extends 
BaseScriptTransformationSuite with T
 identity,
 df.select(
   'a.cast("string").as("key"),
-  'b.cast("string").as("value")).collect())
+  concat_ws("\t",
+'b.cast("string"),
+'c.cast("string"),
+'d.cast("string"),
+'e.cast("string")).as("value")).collect())

Review comment:
   Oh, I see. In the case, we should return NULL.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


SparkQA commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678917855


   **[Test build #127830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127830/testReport)**
 for PR 29421 at commit 
[`5f03222`](https://github.com/apache/spark/commit/5f032229ca2c457753622e21e22d92848de24fa6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


maropu commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678916433


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

2020-08-23 Thread GitBox


maropu commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r475352241



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -328,27 +328,46 @@ object TypeCoercion {
*/
   object WidenSetOperationTypes extends Rule[LogicalPlan] {
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-  case s @ Except(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Except(newChildren.head, newChildren.last, isAll)
-
-  case s @ Intersect(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Intersect(newChildren.head, newChildren.last, isAll)
-
-  case s: Union if s.childrenResolved && !s.byName &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  val exprIdMapArray = mutable.ArrayBuffer[(ExprId, Attribute)]()
+  val newPlan = plan resolveOperatorsUp {
+case s @ Except(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Except(newChildren.head, newChildren.last, isAll)
+
+case s @ Intersect(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Intersect(newChildren.head, newChildren.last, isAll)
+
+case s: Union if s.childrenResolved && !s.byName &&
   s.children.forall(_.output.length == s.children.head.output.length) 
&& !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(s.children)
-s.copy(children = newChildren)
+  val (newChildren, newExprIds) = 
buildNewChildrenWithWiderTypes(s.children)
+  exprIdMapArray ++= newExprIds
+  s.copy(children = newChildren)
+  }
+
+  // Re-maps existing references to the new ones (exprId and dataType)
+  // for aliases added when widening columns' data types.

Review comment:
   Yea, I tried it first, but `RemoveNoopOperators` will remove a `Project` 
with a rewritten alias 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L480
   Because it assumes projects having common exprIds have the same output. 
There may be a way to avoid the case and I'll check `TimeWindowing`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

2020-08-23 Thread GitBox


cloud-fan commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r475348817



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -328,27 +328,46 @@ object TypeCoercion {
*/
   object WidenSetOperationTypes extends Rule[LogicalPlan] {
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-  case s @ Except(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Except(newChildren.head, newChildren.last, isAll)
-
-  case s @ Intersect(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Intersect(newChildren.head, newChildren.last, isAll)
-
-  case s: Union if s.childrenResolved && !s.byName &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  val exprIdMapArray = mutable.ArrayBuffer[(ExprId, Attribute)]()
+  val newPlan = plan resolveOperatorsUp {
+case s @ Except(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Except(newChildren.head, newChildren.last, isAll)
+
+case s @ Intersect(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Intersect(newChildren.head, newChildren.last, isAll)
+
+case s: Union if s.childrenResolved && !s.byName &&
   s.children.forall(_.output.length == s.children.head.output.length) 
&& !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(s.children)
-s.copy(children = newChildren)
+  val (newChildren, newExprIds) = 
buildNewChildrenWithWiderTypes(s.children)
+  exprIdMapArray ++= newExprIds
+  s.copy(children = newChildren)
+  }
+
+  // Re-maps existing references to the new ones (exprId and dataType)
+  // for aliases added when widening columns' data types.

Review comment:
   Yes, like re-alias with exprId=1
   
   Just did a quick search, rule `TimeWindowing`, `Aggregation` did it. AFAIK 
it's common when need to change the plan in the middle and don't want to affect 
the parent nodes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

2020-08-23 Thread GitBox


maropu commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r475347837



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -328,27 +328,46 @@ object TypeCoercion {
*/
   object WidenSetOperationTypes extends Rule[LogicalPlan] {
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-  case s @ Except(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Except(newChildren.head, newChildren.last, isAll)
-
-  case s @ Intersect(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Intersect(newChildren.head, newChildren.last, isAll)
-
-  case s: Union if s.childrenResolved && !s.byName &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  val exprIdMapArray = mutable.ArrayBuffer[(ExprId, Attribute)]()
+  val newPlan = plan resolveOperatorsUp {
+case s @ Except(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Except(newChildren.head, newChildren.last, isAll)
+
+case s @ Intersect(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Intersect(newChildren.head, newChildren.last, isAll)
+
+case s: Union if s.childrenResolved && !s.byName &&
   s.children.forall(_.output.length == s.children.head.output.length) 
&& !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(s.children)
-s.copy(children = newChildren)
+  val (newChildren, newExprIds) = 
buildNewChildrenWithWiderTypes(s.children)
+  exprIdMapArray ++= newExprIds
+  s.copy(children = newChildren)
+  }
+
+  // Re-maps existing references to the new ones (exprId and dataType)
+  // for aliases added when widening columns' data types.

Review comment:
   You meant re-alias with exprId=1 in the example above like this?
   ```
   org.apache.spark.sql.AnalysisException: Resolved attribute(s) v#1 missing 
from v#3 in operator !Project [v#1]. Attribute(s) with the same name appear in 
the operation: v. Please check if the right attribute(s) are used.;;
   !Project [v#1]  <-- the reference got missing
   +- SubqueryAlias t
  +- Union
 :- Project [cast(v#1 as decimal(11,0)) AS v#3] <-  re-alias 
with exprId=#1 ?!
 :  +- Project [v#1]
 : +- SubqueryAlias t3
 :+- SubqueryAlias tbl
 :   +- LocalRelation [v#1]
 +- Project [v#2]
+- Project [CheckOverflow((promote_precision(cast(v#1 as 
decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0, 
DecimalType(11,0), true) AS v#2]
   +- SubqueryAlias t3
  +- SubqueryAlias tbl
 +- LocalRelation [v#1]
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

2020-08-23 Thread GitBox


viirya commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r475347688



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -328,27 +328,46 @@ object TypeCoercion {
*/
   object WidenSetOperationTypes extends Rule[LogicalPlan] {
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-  case s @ Except(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Except(newChildren.head, newChildren.last, isAll)
-
-  case s @ Intersect(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Intersect(newChildren.head, newChildren.last, isAll)
-
-  case s: Union if s.childrenResolved && !s.byName &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  val exprIdMapArray = mutable.ArrayBuffer[(ExprId, Attribute)]()
+  val newPlan = plan resolveOperatorsUp {
+case s @ Except(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Except(newChildren.head, newChildren.last, isAll)
+
+case s @ Intersect(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Intersect(newChildren.head, newChildren.last, isAll)
+
+case s: Union if s.childrenResolved && !s.byName &&
   s.children.forall(_.output.length == s.children.head.output.length) 
&& !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(s.children)
-s.copy(children = newChildren)
+  val (newChildren, newExprIds) = 
buildNewChildrenWithWiderTypes(s.children)
+  exprIdMapArray ++= newExprIds
+  s.copy(children = newChildren)
+  }
+
+  // Re-maps existing references to the new ones (exprId and dataType)
+  // for aliases added when widening columns' data types.

Review comment:
   I thought about it too. But I'm not sure if duplicate exprId is okay. If 
this is common way, it sounds simple and safe.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tanelk commented on pull request #29515: [WIP][SPARK-32688][SQL][TESTS] Add special values to LiteralGenerator for float and double

2020-08-23 Thread GitBox


tanelk commented on pull request #29515:
URL: https://github.com/apache/spark/pull/29515#issuecomment-678908920


   There is a `org.apache.spark.sql.RandomDataGenerator`, that does pretty much 
the same thing as the `LiteralGenerator`. Perhaps they should be unified?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-23 Thread GitBox


cchighman commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-678908616


   I intend to update the PR based on comments, I'll try to swing around to it 
this evening.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cchighman commented on a change in pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-23 Thread GitBox


cchighman commented on a change in pull request #28841:
URL: https://github.com/apache/spark/pull/28841#discussion_r475347166



##
File path: docs/sql-data-sources-generic-options.md
##
@@ -119,3 +119,48 @@ To load all files recursively, you can use:
 {% include_example recursive_file_lookup r/RSparkSQLExample.R %}
 
 
+
+### Modification Time Path Filters
+`modifiedBefore` and `modifiedAfter` are options that can be 
+applied together or separately in order to achieve greater
+granularity over which files may load during a Spark batch query.
+
+When the `timeZone` option is present, modified timestamps will be
+interpreted according to the specified zone. When a timezone option
+is not provided, modified timestamps will be interpreted according
+to the default zone specified within the Spark configuration. Without
+any timezone configuration, modified timestamps are interpreted as UTC.
+
+`modifiedBefore` will only allow files having last modified
+timestamps occurring before the specified time to load. For example,
+when`modifiedBefore` has the timestamp `2020-06-01T12:00:00` applied,
+all files modified after that time will not be considered when loading
+from a file data source.
+ 
+`modifiedAfter` only allows files having last modified timestamps
+occurring after the specified timestamp. For example, when`modifiedAfter`
+has the timestamp `2020-06-01T12:00:00` applied, only files modified after 
+this time will be eligible when loading from a file data source. When both
+`modifiedBefore` and `modifiedAfter` are specified together, files having
+last modified timestamps within the resulting time range are the only files
+allowed to load.

Review comment:
   Will update

##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PathFilterSuite.scala
##
@@ -0,0 +1,501 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+import java.time.{LocalDateTime, ZoneOffset}
+import java.time.format.DateTimeFormatter
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.sql.{AnalysisException, QueryTest, Row}
+import org.apache.spark.sql.catalyst.util.{stringToFile, CaseInsensitiveMap, 
DateTimeUtils}
+import org.apache.spark.sql.test.SharedSparkSession
+
+class PathFilterSuite extends QueryTest with SharedSparkSession {
+  import testImplicits._
+
+  test("SPARK-31962: when modifiedAfter specified with a past date") {
+withTempDir { dir =>
+  val path = new Path(dir.getCanonicalPath)
+  val file = new File(dir, "file1.csv")
+  stringToFile(file, "text")
+  file.setLastModified(DateTimeUtils.currentTimestamp())
+  val df = spark.read
+.option("modifiedAfter", "2019-05-10T01:11:00")
+.format("csv")
+.load(path.toString)
+  assert(df.count() == 1)
+}
+  }
+
+  test("SPARK-31962: when modifiedBefore specified with a future date") {
+withTempDir { dir =>
+  val path = new Path(dir.getCanonicalPath)
+  val file = new File(dir, "file1.csv")
+  stringToFile(file, "text")
+  val df = spark.read
+.option("modifiedBefore", "2090-05-10T01:11:00")
+.format("csv")
+.load(path.toString)
+  assert(df.count() == 1)
+}
+  }
+
+  test("SPARK-31962: when modifiedBefore specified with a past date") {
+withTempDir { dir =>
+  val path = new Path(dir.getCanonicalPath)
+  val file = new File(dir, "file1.csv")
+  stringToFile(file, "text")
+  file.setLastModified(DateTimeUtils.currentTimestamp())
+  val msg = intercept[AnalysisException] {
+spark.read
+  .option("modifiedBefore", "1984-05-01T01:00:00")
+  .format("csv")
+  .load(path.toString)
+  }.getMessage
+  assert(msg.contains("Unable to infer schema for CSV"))
+}
+  }
+
+  test("SPARK-31962: when modifiedAfter specified with a past date, multiple 
files, one valid") {
+withTempDir { dir =>
+  val path = new Path(dir.getCanonicalPath)
+  val file1 = new File(dir, "file1.csv")
+  val file2 = new File(dir, "file2.csv")
+  stringToFile(file1, "text")
+  stringToFile(file2, "text")
+  

[GitHub] [spark] AmplabJenkins commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #28841:
URL: https://github.com/apache/spark/pull/28841#issuecomment-678908136


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


maropu commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678907495


   Ah, I see.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

2020-08-23 Thread GitBox


cloud-fan commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r475346278



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##
@@ -328,27 +328,46 @@ object TypeCoercion {
*/
   object WidenSetOperationTypes extends Rule[LogicalPlan] {
 
-def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
-  case s @ Except(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Except(newChildren.head, newChildren.last, isAll)
-
-  case s @ Intersect(left, right, isAll) if s.childrenResolved &&
-left.output.length == right.output.length && !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(left :: right :: Nil)
-assert(newChildren.length == 2)
-Intersect(newChildren.head, newChildren.last, isAll)
-
-  case s: Union if s.childrenResolved && !s.byName &&
+def apply(plan: LogicalPlan): LogicalPlan = {
+  val exprIdMapArray = mutable.ArrayBuffer[(ExprId, Attribute)]()
+  val newPlan = plan resolveOperatorsUp {
+case s @ Except(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Except(newChildren.head, newChildren.last, isAll)
+
+case s @ Intersect(left, right, isAll) if s.childrenResolved &&
+  left.output.length == right.output.length && !s.resolved =>
+  val (newChildren, newExprIds) = buildNewChildrenWithWiderTypes(left 
:: right :: Nil)
+  exprIdMapArray ++= newExprIds
+  assert(newChildren.length == 2)
+  Intersect(newChildren.head, newChildren.last, isAll)
+
+case s: Union if s.childrenResolved && !s.byName &&
   s.children.forall(_.output.length == s.children.head.output.length) 
&& !s.resolved =>
-val newChildren: Seq[LogicalPlan] = 
buildNewChildrenWithWiderTypes(s.children)
-s.copy(children = newChildren)
+  val (newChildren, newExprIds) = 
buildNewChildrenWithWiderTypes(s.children)
+  exprIdMapArray ++= newExprIds
+  s.copy(children = newChildren)
+  }
+
+  // Re-maps existing references to the new ones (exprId and dataType)
+  // for aliases added when widening columns' data types.

Review comment:
   Another common way to solve this issue is to create an `Alias` with the 
existing exprId, so that we don't need to rewrite the parent nodes.
   
   I think it's safer than rewriting the parent nodes. We rewrite parent nodes 
in `ResolveReferences.dedupRight`, we hit bugs and end up with a complicated 
solution.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678907031







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


viirya commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678906923


   @maropu I think #29406 was only merged to master, so we don't need to 
backport this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678907031







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


SparkQA removed a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678844753


   **[Test build #127819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127819/testReport)**
 for PR 29526 at commit 
[`d16f654`](https://github.com/apache/spark/commit/d16f65482820746299868a3572a42129d7e3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


SparkQA commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678906358


   **[Test build #127819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127819/testReport)**
 for PR 29526 at commit 
[`d16f654`](https://github.com/apache/spark/commit/d16f65482820746299868a3572a42129d7e3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


maropu commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678906274


   Merged to master. @viirya Looks like conflicts with bnrahc3.0. Could you 
backport it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


maropu edited a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678906274


   Merged to master. @viirya Looks like conflicts with branch-3.0. Could you 
backport it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


viirya commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678906138


   Thanks all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu closed pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


maropu closed pull request #29526:
URL: https://github.com/apache/spark/pull/29526


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678905843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678905843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


SparkQA removed a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678845961


   **[Test build #127820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127820/testReport)**
 for PR 29526 at commit 
[`b37f694`](https://github.com/apache/spark/commit/b37f6949f1f7c4c6d2264559402a963eb077990d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


SparkQA commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678905088


   **[Test build #127820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127820/testReport)**
 for PR 29526 at commit 
[`b37f694`](https://github.com/apache/spark/commit/b37f6949f1f7c4c6d2264559402a963eb077990d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678903482


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127823/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-23 Thread GitBox


SparkQA commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-678903820


   **[Test build #127829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127829/testReport)**
 for PR 29414 at commit 
[`dabae9b`](https://github.com/apache/spark/commit/dabae9b38038c06f8b3f1e9a7b6b5be04150b667).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678903475


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


SparkQA removed a comment on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678859868


   **[Test build #127823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127823/testReport)**
 for PR 29421 at commit 
[`5f03222`](https://github.com/apache/spark/commit/5f032229ca2c457753622e21e22d92848de24fa6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678903475







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


srowen commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678903211


   Oh yeah, to backport, you would need to check out branch-3.0, cherry-pick 
the commit, and the push straight to branch-3.0. It's not hard, just doesn't 
use the script (I don't know why it doesn't work anymore). Just takes a little 
care to make sure you push what you mean and where!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


SparkQA commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678903206


   **[Test build #127823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127823/testReport)**
 for PR 29421 at commit 
[`5f03222`](https://github.com/apache/spark/commit/5f032229ca2c457753622e21e22d92848de24fa6).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-23 Thread GitBox


AngersZh commented on a change in pull request #29414:
URL: https://github.com/apache/spark/pull/29414#discussion_r475341428



##
File path: sql/core/src/test/resources/sql-tests/results/transform.sql.out
##
@@ -0,0 +1,224 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t
+-- !query schema
+struct
+-- !query output
+1
+2
+3
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 127. Error: /bin/bash: 
some_non_existent_command: command not found
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 2. Error: python: can't open file 
'some_non_existent_file': [Errno 2] No such file or directory
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1.001997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2.001997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3.001997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1   1997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2   1997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3   1997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArrayIndexOutOfBoundsException
+1
+
+
+-- !query
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i)
+-- !query schema
+struct
+-- !query output
+NULL   NULLNULLNULLNULLNULLNULLNULLNULL
+
+
+-- !query
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+mismatched input 'GROUP' expecting {, ';'}(line 4, pos 0)
+
+== SQL ==
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+^^^
+
+
+-- !query
+MAP a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+REDUCE a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, null)
+  ROW FORMAT DELIMITED
+  FIELDS TERMINATED BY '|'
+  LINES TERMINATED BY '\n'
+  NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)

Review comment:
   > Also, could you add test cases 

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-678902339







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29414:
URL: https://github.com/apache/spark/pull/29414#issuecomment-678902339







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29414: [SPARK-32106][SQL] Implement script transform in sql/core

2020-08-23 Thread GitBox


AngersZh commented on a change in pull request #29414:
URL: https://github.com/apache/spark/pull/29414#discussion_r475341428



##
File path: sql/core/src/test/resources/sql-tests/results/transform.sql.out
##
@@ -0,0 +1,224 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query
+CREATE OR REPLACE TEMPORARY VIEW t AS SELECT * FROM VALUES
+('1', true, unhex('537061726B2053514C'), tinyint(1), 1, smallint(100), 
bigint(1), float(1.0), 1.0, Decimal(1.0), timestamp('1997-01-02'), 
date('2000-04-01')),
+('2', false, unhex('537061726B2053514C'), tinyint(2), 2,  smallint(200), 
bigint(2), float(2.0), 2.0, Decimal(2.0), timestamp('1997-01-02 03:04:05'), 
date('2000-04-02')),
+('3', true, unhex('537061726B2053514C'), tinyint(3), 3, smallint(300), 
bigint(3), float(3.0), 3.0, Decimal(3.0), timestamp('1997-02-10 17:32:01-08'), 
date('2000-04-03'))
+AS t(a, b, c, d, e, f, g, h, i, j, k, l)
+-- !query schema
+struct<>
+-- !query output
+
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat' AS (a)
+FROM t
+-- !query schema
+struct
+-- !query output
+1
+2
+3
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'some_non_existent_command' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 127. Error: /bin/bash: 
some_non_existent_command: command not found
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'python some_non_existent_file' AS (a)
+FROM t
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.SparkException
+Subprocess exited with status 2. Error: python: can't open file 
'some_non_existent_file': [Errno 2] No such file or directory
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b boolean,
+c binary,
+d tinyint,
+e int,
+f smallint,
+g long,
+h float,
+i double,
+j decimal(38, 18),
+k timestamp,
+l date)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1.001997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2.001997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3.001997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT a, b, decode(c, 'UTF-8'), d, e, f, g, h, i, j, k, l FROM (
+  SELECT TRANSFORM(a, b, c, d, e, f, g, h, i, j, k, l)
+  USING 'cat' AS (
+a string,
+b string,
+c string,
+d string,
+e string,
+f string,
+g string,
+h string,
+i string,
+j string,
+k string,
+l string)
+  FROM t
+) tmp
+-- !query schema
+struct
+-- !query output
+1  trueSpark SQL   1   1   100 1   1.0 1.0 
1   1997-01-02 00:00:00 2000-04-01
+2  false   Spark SQL   2   2   200 2   2.0 2.0 
2   1997-01-02 03:04:05 2000-04-02
+3  trueSpark SQL   3   3   300 3   3.0 3.0 
3   1997-02-10 17:32:01 2000-04-03
+
+
+-- !query
+SELECT TRANSFORM(a)
+USING 'cat'
+FROM t
+-- !query schema
+struct<>
+-- !query output
+java.lang.ArrayIndexOutOfBoundsException
+1
+
+
+-- !query
+SELECT TRANSFORM(a, b)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c)
+USING 'cat'
+FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, d, e, f, g, h, i)
+USING 'cat' AS (a int, b short, c long, d byte, e float, f double, g 
decimal(38, 18), h date, i timestamp)
+FROM VALUES
+('a','','1231a','a','213.21a','213.21a','0a.21d','2000-04-01123','1997-0102 
00:00:') tmp(a, b, c, d, e, f, g, h, i)
+-- !query schema
+struct
+-- !query output
+NULL   NULLNULLNULLNULLNULLNULLNULLNULL
+
+
+-- !query
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+-- !query schema
+struct<>
+-- !query output
+org.apache.spark.sql.catalyst.parser.ParseException
+
+mismatched input 'GROUP' expecting {, ';'}(line 4, pos 0)
+
+== SQL ==
+SELECT TRANSFORM(b, max(a), sum(f))
+USING 'cat' AS (a, b)
+FROM t
+GROUP BY b
+^^^
+
+
+-- !query
+MAP a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+REDUCE a, b USING 'cat' AS (a, b) FROM t
+-- !query schema
+struct
+-- !query output
+1  true
+2  false
+3  true
+
+
+-- !query
+SELECT TRANSFORM(a, b, c, null)
+  ROW FORMAT DELIMITED
+  FIELDS TERMINATED BY '|'
+  LINES TERMINATED BY '\n'
+  NULL DEFINED AS 'NULL'
+USING 'cat' AS (a, b, c, d)

Review comment:
   > Also, could you add test cases 

[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678900712


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/127824/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678900708


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


SparkQA removed a comment on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678870379


   **[Test build #127824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127824/testReport)**
 for PR 29516 at commit 
[`f3d14c6`](https://github.com/apache/spark/commit/f3d14c61550877a6d3b2df15954fee30c8546fa5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678900708







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


huaxingao commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678899420


   I don't know how to merge this one. I got the following message:
   ```
   Pull request 29501 is not mergeable in its current form.
   Continue? (experts only!) (y/n): 
   ```
   I am not sure if I should continue. Do I need to reopen the PR first?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-678897328







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-678897328







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-23 Thread GitBox


SparkQA commented on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-678897095


   **[Test build #127828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127828/testReport)**
 for PR 29513 at commit 
[`a19e523`](https://github.com/apache/spark/commit/a19e523a02b7ef39213aabb130d554839a50beeb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29513: [SPARK-32646][SQL][3.0][test-hadoop2.7][test-hive1.2] ORC predicate pushdown should work with case-insensitive analysis

2020-08-23 Thread GitBox


viirya commented on pull request #29513:
URL: https://github.com/apache/spark/pull/29513#issuecomment-678896440


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29527: [SPARK-32664] fixes log level

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-678895387


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29527: [SPARK-32664] fixes log level

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-678895672


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29527: [SPARK-32664] fixes log level

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29527:
URL: https://github.com/apache/spark/pull/29527#issuecomment-678895387


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


srowen commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678895333


   Go ahead yes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dmoore62 opened a new pull request #29527: [SPARK-32664] fixes log level

2020-08-23 Thread GitBox


dmoore62 opened a new pull request #29527:
URL: https://github.com/apache/spark/pull/29527


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


huaxingao commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678893930


   @srowen I will merge into 3.0?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29524: [SPARK-32092][ML][PySpark][3.0] Removed foldCol related code

2020-08-23 Thread GitBox


huaxingao commented on pull request #29524:
URL: https://github.com/apache/spark/pull/29524#issuecomment-678893290


   Merged to 3.0. Thank you all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao closed pull request #29524: [SPARK-32092][ML][PySpark][3.0] Removed foldCol related code

2020-08-23 Thread GitBox


huaxingao closed pull request #29524:
URL: https://github.com/apache/spark/pull/29524


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-67647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-67621







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-67621







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-67647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29228: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-08-23 Thread GitBox


SparkQA commented on pull request #29228:
URL: https://github.com/apache/spark/pull/29228#issuecomment-67372


   **[Test build #127827 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127827/testReport)**
 for PR 29228 at commit 
[`86dc8f8`](https://github.com/apache/spark/commit/86dc8f81702f6694bd17d4578d81133ce0731ac5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


SparkQA commented on pull request #29509:
URL: https://github.com/apache/spark/pull/29509#issuecomment-67349


   **[Test build #127826 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127826/testReport)**
 for PR 29509 at commit 
[`1b105e3`](https://github.com/apache/spark/commit/1b105e3a9ca090fd134b8eebce5ed714d8567a1e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29355: [SPARK-32552][SQL][DOCS]Complete the documentation for Table-valued Function

2020-08-23 Thread GitBox


huaxingao commented on pull request #29355:
URL: https://github.com/apache/spark/pull/29355#issuecomment-678887751


   Thanks a lot! @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475325934



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.File
+import java.util.NoSuchElementException
+import java.util.concurrent.LinkedBlockingQueue
+
+import org.apache.commons.io.FileUtils
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.KVUtils._
+import org.apache.spark.util.kvstore._
+
+class HybridStoreSuite extends SparkFunSuite with BeforeAndAfter {
+
+  private var db: LevelDB = _
+  private var dbpath: File = _
+
+  before {
+dbpath = File.createTempFile("test.", ".ldb")
+dbpath.delete()
+db = new LevelDB(dbpath, new KVStoreScalaSerializer())
+  }
+
+  after {
+if (db != null) {
+  db.close()
+}
+if (dbpath != null) {
+  FileUtils.deleteQuietly(dbpath)
+}
+  }
+
+  test("test multiple objects write read delete") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+val t2 = createCustomType1(2)
+
+intercept[NoSuchElementException] {
+  store.read(t1.getClass(), t1.key)
+}
+
+store.write(t1)
+store.write(t2)
+store.delete(t2.getClass(), t2.key)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  intercept[NoSuchElementException] {
+   store.read(t2.getClass(), t2.key)
+  }
+  assert(store.read(t1.getClass(), t1.key) === t1)
+  assert(store.count(t1.getClass()) === 1L)
+}
+  }
+
+  test("test metadata") {
+val store = createHybridStore()
+assert(store.getMetadata(classOf[CustomType1]) === null)
+
+val t1 = createCustomType1(1)
+store.setMetadata(t1)
+assert(store.getMetadata(classOf[CustomType1]) === t1)
+
+// Switch to LevelDB and set a new metadata
+switchHybridStore(store)
+
+val t2 = createCustomType1(2)
+store.setMetadata(t2)
+assert(store.getMetadata(classOf[CustomType1]) === t2)
+  }
+
+  test("test update") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+
+store.write(t)
+t.name = "name2"
+store.write(t)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.count(t.getClass()) === 1L)
+  assert(store.read(t.getClass(), t.key) === t)
+}
+  }
+
+  test("test basic iteration") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+store.write(t1)
+val t2 = createCustomType1(2)
+store.write(t2)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.view(t1.getClass()).iterator().next().id === t1.id)
+  assert(store.view(t1.getClass()).skip(1).iterator().next().id === t2.id)
+  assert(store.view(t1.getClass()).skip(1).max(1).iterator().next().id === 
t2.id)
+  
assert(store.view(t1.getClass()).first(t1.key).max(1).iterator().next().id === 
t1.id)
+  
assert(store.view(t1.getClass()).first(t2.key).max(1).iterator().next().id === 
t2.id)
+}
+  }
+
+  test("test delete after switch") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+store.write(t)
+switchHybridStore(store)
+intercept[IllegalStateException] {
+ store.delete(t.getClass(), t.key)
+}
+  }
+
+  test("test klassMap") {
+val store = createHybridStore()
+val t1 = createCustomType1(1)
+store.write(t1)
+assert(store.klassMap.size === 1)
+val t2 = new CustomType2("key2")
+store.write(t2)
+assert(store.klassMap.size === 2)
+
+switchHybridStore(store)
+val t3 = new CustomType3("key3")
+store.write(t3)
+// Cannot put new klass to klassMap after the switching starts
+assert(store.klassMap.size === 2)
+  }
+
+  private def createHybridStore(): HybridStore = {
+val store = new HybridStore()
+store.setLevelDB(db)
+store
+  }
+
+  private def createCustomType1(i: Int): CustomType1 = {
+new CustomType1("key" + i, "id" + i, 

[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475325165



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.File
+import java.util.NoSuchElementException
+import java.util.concurrent.LinkedBlockingQueue
+
+import org.apache.commons.io.FileUtils
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.KVUtils._
+import org.apache.spark.util.kvstore._
+
+class HybridStoreSuite extends SparkFunSuite with BeforeAndAfter {
+
+  private var db: LevelDB = _
+  private var dbpath: File = _
+
+  before {
+dbpath = File.createTempFile("test.", ".ldb")
+dbpath.delete()
+db = new LevelDB(dbpath, new KVStoreScalaSerializer())
+  }
+
+  after {
+if (db != null) {
+  db.close()
+}
+if (dbpath != null) {
+  FileUtils.deleteQuietly(dbpath)
+}
+  }
+
+  test("test multiple objects write read delete") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+val t2 = createCustomType1(2)
+
+intercept[NoSuchElementException] {
+  store.read(t1.getClass(), t1.key)
+}
+
+store.write(t1)
+store.write(t2)
+store.delete(t2.getClass(), t2.key)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  intercept[NoSuchElementException] {
+   store.read(t2.getClass(), t2.key)
+  }
+  assert(store.read(t1.getClass(), t1.key) === t1)
+  assert(store.count(t1.getClass()) === 1L)
+}
+  }
+
+  test("test metadata") {
+val store = createHybridStore()
+assert(store.getMetadata(classOf[CustomType1]) === null)
+
+val t1 = createCustomType1(1)
+store.setMetadata(t1)
+assert(store.getMetadata(classOf[CustomType1]) === t1)
+
+// Switch to LevelDB and set a new metadata
+switchHybridStore(store)
+
+val t2 = createCustomType1(2)
+store.setMetadata(t2)
+assert(store.getMetadata(classOf[CustomType1]) === t2)
+  }
+
+  test("test update") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+
+store.write(t)
+t.name = "name2"
+store.write(t)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.count(t.getClass()) === 1L)
+  assert(store.read(t.getClass(), t.key) === t)
+}
+  }
+
+  test("test basic iteration") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+store.write(t1)
+val t2 = createCustomType1(2)
+store.write(t2)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.view(t1.getClass()).iterator().next().id === t1.id)
+  assert(store.view(t1.getClass()).skip(1).iterator().next().id === t2.id)
+  assert(store.view(t1.getClass()).skip(1).max(1).iterator().next().id === 
t2.id)
+  
assert(store.view(t1.getClass()).first(t1.key).max(1).iterator().next().id === 
t1.id)
+  
assert(store.view(t1.getClass()).first(t2.key).max(1).iterator().next().id === 
t2.id)
+}
+  }
+
+  test("test delete after switch") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+store.write(t)
+switchHybridStore(store)
+intercept[IllegalStateException] {
+ store.delete(t.getClass(), t.key)
+}
+  }
+
+  test("test klassMap") {
+val store = createHybridStore()
+val t1 = createCustomType1(1)
+store.write(t1)
+assert(store.klassMap.size === 1)
+val t2 = new CustomType2("key2")
+store.write(t2)
+assert(store.klassMap.size === 2)
+
+switchHybridStore(store)
+val t3 = new CustomType3("key3")
+store.write(t3)
+// Cannot put new klass to klassMap after the switching starts
+assert(store.klassMap.size === 2)
+  }
+
+  private def createHybridStore(): HybridStore = {
+val store = new HybridStore()
+store.setLevelDB(db)
+store
+  }
+
+  private def createCustomType1(i: Int): CustomType1 = {
+new CustomType1("key" + i, "id" + i, 

[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475325041



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.File
+import java.util.NoSuchElementException
+import java.util.concurrent.LinkedBlockingQueue
+
+import org.apache.commons.io.FileUtils
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.KVUtils._
+import org.apache.spark.util.kvstore._
+
+class HybridStoreSuite extends SparkFunSuite with BeforeAndAfter {
+
+  private var db: LevelDB = _
+  private var dbpath: File = _
+
+  before {
+dbpath = File.createTempFile("test.", ".ldb")
+dbpath.delete()
+db = new LevelDB(dbpath, new KVStoreScalaSerializer())
+  }
+
+  after {
+if (db != null) {
+  db.close()
+}
+if (dbpath != null) {
+  FileUtils.deleteQuietly(dbpath)
+}
+  }
+
+  test("test multiple objects write read delete") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+val t2 = createCustomType1(2)
+
+intercept[NoSuchElementException] {
+  store.read(t1.getClass(), t1.key)
+}
+
+store.write(t1)
+store.write(t2)
+store.delete(t2.getClass(), t2.key)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  intercept[NoSuchElementException] {
+   store.read(t2.getClass(), t2.key)

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475325096



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala
##
@@ -0,0 +1,230 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.history
+
+import java.io.File
+import java.util.NoSuchElementException
+import java.util.concurrent.LinkedBlockingQueue
+
+import org.apache.commons.io.FileUtils
+import org.scalatest.BeforeAndAfter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.status.KVUtils._
+import org.apache.spark.util.kvstore._
+
+class HybridStoreSuite extends SparkFunSuite with BeforeAndAfter {
+
+  private var db: LevelDB = _
+  private var dbpath: File = _
+
+  before {
+dbpath = File.createTempFile("test.", ".ldb")
+dbpath.delete()
+db = new LevelDB(dbpath, new KVStoreScalaSerializer())
+  }
+
+  after {
+if (db != null) {
+  db.close()
+}
+if (dbpath != null) {
+  FileUtils.deleteQuietly(dbpath)
+}
+  }
+
+  test("test multiple objects write read delete") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+val t2 = createCustomType1(2)
+
+intercept[NoSuchElementException] {
+  store.read(t1.getClass(), t1.key)
+}
+
+store.write(t1)
+store.write(t2)
+store.delete(t2.getClass(), t2.key)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  intercept[NoSuchElementException] {
+   store.read(t2.getClass(), t2.key)
+  }
+  assert(store.read(t1.getClass(), t1.key) === t1)
+  assert(store.count(t1.getClass()) === 1L)
+}
+  }
+
+  test("test metadata") {
+val store = createHybridStore()
+assert(store.getMetadata(classOf[CustomType1]) === null)
+
+val t1 = createCustomType1(1)
+store.setMetadata(t1)
+assert(store.getMetadata(classOf[CustomType1]) === t1)
+
+// Switch to LevelDB and set a new metadata
+switchHybridStore(store)
+
+val t2 = createCustomType1(2)
+store.setMetadata(t2)
+assert(store.getMetadata(classOf[CustomType1]) === t2)
+  }
+
+  test("test update") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+
+store.write(t)
+t.name = "name2"
+store.write(t)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.count(t.getClass()) === 1L)
+  assert(store.read(t.getClass(), t.key) === t)
+}
+  }
+
+  test("test basic iteration") {
+val store = createHybridStore()
+
+val t1 = createCustomType1(1)
+store.write(t1)
+val t2 = createCustomType1(2)
+store.write(t2)
+
+Seq(false, true).foreach { switch =>
+  if (switch) switchHybridStore(store)
+
+  assert(store.view(t1.getClass()).iterator().next().id === t1.id)
+  assert(store.view(t1.getClass()).skip(1).iterator().next().id === t2.id)
+  assert(store.view(t1.getClass()).skip(1).max(1).iterator().next().id === 
t2.id)
+  
assert(store.view(t1.getClass()).first(t1.key).max(1).iterator().next().id === 
t1.id)
+  
assert(store.view(t1.getClass()).first(t2.key).max(1).iterator().next().id === 
t2.id)
+}
+  }
+
+  test("test delete after switch") {
+val store = createHybridStore()
+val t = createCustomType1(1)
+store.write(t)
+switchHybridStore(store)
+intercept[IllegalStateException] {
+ store.delete(t.getClass(), t.key)
+}
+  }
+
+  test("test klassMap") {
+val store = createHybridStore()
+val t1 = createCustomType1(1)
+store.write(t1)
+assert(store.klassMap.size === 1)
+val t2 = new CustomType2("key2")
+store.write(t2)
+assert(store.klassMap.size === 2)
+
+switchHybridStore(store)
+val t3 = new CustomType3("key3")
+store.write(t3)
+// Cannot put new klass to klassMap after the switching starts
+assert(store.klassMap.size === 2)
+  }
+
+  private def createHybridStore(): HybridStore = {
+val store = new HybridStore()
+store.setLevelDB(db)
+store
+  }
+
+  private def createCustomType1(i: Int): CustomType1 = {
+new CustomType1("key" + i, "id" + i, 

[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475325018



##
File path: 
core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala
##
@@ -1509,13 +1513,18 @@ class FsHistoryProviderSuite extends SparkFunSuite with 
Matchers with Logging {
 new FileOutputStream(file).close()
   }
 
-  private def createTestConf(inMemory: Boolean = false): SparkConf = {
+  private def createTestConf(
+  inMemory: Boolean = false,
+  useHybridStore: Boolean = false): SparkConf = {
 val conf = new SparkConf()
   .set(HISTORY_LOG_DIR, testDir.getAbsolutePath())
   .set(FAST_IN_PROGRESS_PARSING, true)
 
 if (!inMemory) {
   conf.set(LOCAL_STORE_DIR, Utils.createTempDir().getAbsolutePath())
+  if (useHybridStore) {

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


huaxingao commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678885037


   I think we need to put the fix in 3.0, because in the case of data is 
already cached, this fix makes 3.0.0 behave the same as 2.4.
   In 2.4
   ```
   cache norm in memory
   ```
   
   currently in 3.0
   ```
   always cache zipped data (data and norm) regardless if original data is 
cached or not
   ```
   
   After this fix
   ```
   if (data is cached)
 cache norm in memory and disk
   else 
 cache zipped data (data and norm)
   ```
   
   The double caching in current 3.0 may cause performance degradation from 2.4 
to 3.0, so we want to put the fix in 3.0. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


cloud-fan commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678884094


   good catch! LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] baohe-zhang commented on a change in pull request #29509: [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager

2020-08-23 Thread GitBox


baohe-zhang commented on a change in pull request #29509:
URL: https://github.com/apache/spark/pull/29509#discussion_r475320679



##
File path: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
##
@@ -1214,8 +1214,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, 
clock: Clock)
 // Use InMemoryStore to rebuild app store
 while (hybridStore == null) {
   // A RuntimeException will be thrown if the heap memory is not sufficient
-  memoryManager.lease(appId, attempt.info.attemptId, reader.totalSize,

Review comment:
   It's related to the test code, but my original thought is that passing 
the actual amount of memory, instead of filesize to memoryManager.lease() would 
make more sense. Although exposing inner details to fsHistoryProvider is not 
ideal. What's your opinion? Should I revert this change?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29000: [SPARK-27194][SPARK-29302][SQL] Fix commit collision in dynamic partition overwrite mode

2020-08-23 Thread GitBox


LuciferYang commented on pull request #29000:
URL: https://github.com/apache/spark/pull/29000#issuecomment-678872666


   @Ngone51  Could you please review it again ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] agrawaldevesh commented on a change in pull request #29452: [SPARK-32643][CORE][K8s] Consolidate state decommissioning in the TaskSchedulerImpl realm

2020-08-23 Thread GitBox


agrawaldevesh commented on a change in pull request #29452:
URL: https://github.com/apache/spark/pull/29452#discussion_r475310046



##
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##
@@ -18,11 +18,22 @@
 package org.apache.spark.scheduler
 
 /**
- * Provides more detail when an executor is being decommissioned.
+ * Message providing more detail when an executor is being decommissioned.
  * @param message Human readable reason for why the decommissioning is 
happening.
  * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
  * being decommissioned too. Used to infer if the 
shuffle data might
  * be lost even if the external shuffle service is 
enabled.
  */
 private[spark]
 case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean)
+
+/**
+ * State related to decommissioning that is kept by the TaskSchedulerImpl. 
This state is derived
+ * from the info message above but it is kept distinct to allow the state to 
evolve independently
+ * from the message.
+ */
+case class ExecutorDecommissionState(
+message: String,

Review comment:
   So far that need hasn't come up :-) But when it does, we can easily add 
it.

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorDecommissionInfo.scala
##
@@ -18,11 +18,22 @@
 package org.apache.spark.scheduler
 
 /**
- * Provides more detail when an executor is being decommissioned.
+ * Message providing more detail when an executor is being decommissioned.
  * @param message Human readable reason for why the decommissioning is 
happening.
  * @param isHostDecommissioned Whether the host (aka the `node` or `worker` in 
other places) is
  * being decommissioned too. Used to infer if the 
shuffle data might
  * be lost even if the external shuffle service is 
enabled.
  */
 private[spark]
 case class ExecutorDecommissionInfo(message: String, isHostDecommissioned: 
Boolean)
+
+/**
+ * State related to decommissioning that is kept by the TaskSchedulerImpl. 
This state is derived
+ * from the info message above but it is kept distinct to allow the state to 
evolve independently
+ * from the message.
+ */
+case class ExecutorDecommissionState(
+message: String,
+// Timestamp the decommissioning commenced in millis since epoch of the 
driver's clock

Review comment:
   Yeah, it is used to compute the formerly known 
`tidToExecutorKillTimeMapping` (search for this on the code on the left). It's 
not so much for expiry of the decommission state, for which we are using the 
cache that you suggested in the previous PR.
   
   Good suggestion to add some idea of how it is used. I will add a comment.

##
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##
@@ -1123,14 +1127,6 @@ private[spark] class TaskSetManager(
 
   def executorDecommission(execId: String): Unit = {
 recomputeLocality()
-if (speculationEnabled) {

Review comment:
   This was used as an efficiency improvement: To not do this book keeping 
in the driver if the speculation is not enabled. Save both some cpu cycles and 
memory.
   
   Now this check is done in checkSpeculatableTasks, which is not even called 
if speculation is disabled. And thus automatically begets this efficiency 
improvement. This is a positive side effect of changing the book keeping by 
merging tidToExecutorKillTimeMapping into executorDecommissionState.
   
   In the meanwhile I will hunt for a suitable test that adds some coverage 
here or consider adding one.

##
File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##
@@ -926,18 +926,21 @@ private[spark] class TaskSchedulerImpl(
 // and some of those can have isHostDecommissioned false. We merge 
them such that
 // if we heard isHostDecommissioned ever true, then we keep that one 
since it is
 // most likely coming from the cluster manager and thus authoritative
-val oldDecomInfo = executorsPendingDecommission.get(executorId)
-if (!oldDecomInfo.exists(_.isHostDecommissioned)) {
-  executorsPendingDecommission(executorId) = decommissionInfo
+val oldDecomState = executorsPendingDecommission.get(executorId)
+if (!oldDecomState.exists(_.isHostDecommissioned)) {
+  executorsPendingDecommission(executorId) = ExecutorDecommissionState(
+decommissionInfo.message,
+oldDecomState.map(_.startTime).getOrElse(clock.getTimeMillis()),

Review comment:
   Sure, I will tweak the comment above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[GitHub] [spark] zhengruifeng commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


zhengruifeng commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678871882


   this double caching did not exist in 2.4, and it was first introduced in 
3.0.0, so I tend to put it into RC2. How doyou think about it? @huaxingao 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


AngersZh commented on a change in pull request #29526:
URL: https://github.com/apache/spark/pull/29526#discussion_r475310377



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala
##
@@ -176,9 +176,10 @@ object FileSourceStrategy extends Strategy with 
PredicateHelper with Logging {
 l.resolve(fsRelation.dataSchema, 
fsRelation.sparkSession.sessionState.analyzer.resolver)
 
   // Partition keys are not available in the statistics of the files.
+  val dataColumnsWithoutPartitionCols = 
dataColumns.filterNot(partitionColumns.contains)

Review comment:
   > Sure. Added.
   > 
   > It only happens in hive-1.2 profile, because for hive-2.3 we go for a 
different path to create pushed down filters. In that path, we have checked if 
an attribute is in the field map.
   
   LGTM,when  I doing that pr  https://github.com/apache/spark/pull/29406, I 
have thought that : from the code, seems dataColumns won't have partition col. 
   
   Thanks for these fix.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29507: [SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29507:
URL: https://github.com/apache/spark/pull/29507#issuecomment-678870683







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29507: [SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29507:
URL: https://github.com/apache/spark/pull/29507#issuecomment-678870683







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


SparkQA commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678870379


   **[Test build #127824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127824/testReport)**
 for PR 29516 at commit 
[`f3d14c6`](https://github.com/apache/spark/commit/f3d14c61550877a6d3b2df15954fee30c8546fa5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29507: [SPARK-32680][SQL] Don't Preprocess V2 CTAS with Unresolved Query

2020-08-23 Thread GitBox


SparkQA commented on pull request #29507:
URL: https://github.com/apache/spark/pull/29507#issuecomment-678870388


   **[Test build #127825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127825/testReport)**
 for PR 29507 at commit 
[`e03e64d`](https://github.com/apache/spark/commit/e03e64dbc44660fbcd2183e2cdc222ebccbcd7c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


srowen commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678869607


   Do we need it in 3.0? I'm not super against it but it's more of an 
improvement, optimization, not a bug fix



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


srowen commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678869497


   BTW I think we may still have a real test failure here, I'm looking into it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678869069







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678869069







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29516: [WIP][SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2020-08-23 Thread GitBox


zhengruifeng commented on pull request #29516:
URL: https://github.com/apache/spark/pull/29516#issuecomment-678868865


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on a change in pull request #29270: [SPARK-32466][TEST][SQL] Add PlanStabilitySuite to detect SparkPlan regression

2020-08-23 Thread GitBox


Ngone51 commented on a change in pull request #29270:
URL: https://github.com/apache/spark/pull/29270#discussion_r475307366



##
File path: 
sql/core/src/test/resources/tpcds-plan-stability/approved-plans-modified/q10.sf100/explain.txt
##
@@ -0,0 +1,286 @@
+== Physical Plan ==
+TakeOrderedAndProject (52)
++- * HashAggregate (51)
+   +- Exchange (50)
+  +- * HashAggregate (49)
+ +- * Project (48)
++- * BroadcastHashJoin Inner BuildLeft (47)
+   :- BroadcastExchange (43)
+   :  +- * Project (42)
+   : +- * BroadcastHashJoin Inner BuildRight (41)
+   ::- * Project (35)
+   ::  +- SortMergeJoin LeftSemi (34)
+   :: :- SortMergeJoin LeftSemi (25)
+   :: :  :- * Sort (5)
+   :: :  :  +- Exchange (4)
+   :: :  : +- * Filter (3)
+   :: :  :+- * ColumnarToRow (2)
+   :: :  :   +- Scan parquet default.customer 
(1)
+   :: :  +- * Sort (24)
+   :: : +- Exchange (23)
+   :: :+- Union (22)
+   :: :   :- * Project (15)
+   :: :   :  +- * BroadcastHashJoin Inner 
BuildRight (14)
+   :: :   : :- * Filter (8)
+   :: :   : :  +- * ColumnarToRow (7)
+   :: :   : : +- Scan parquet 
default.web_sales (6)
+   :: :   : +- BroadcastExchange (13)
+   :: :   :+- * Project (12)
+   :: :   :   +- * Filter (11)
+   :: :   :  +- * ColumnarToRow 
(10)
+   :: :   : +- Scan parquet 
default.date_dim (9)
+   :: :   +- * Project (21)
+   :: :  +- * BroadcastHashJoin Inner 
BuildRight (20)
+   :: : :- * Filter (18)
+   :: : :  +- * ColumnarToRow (17)
+   :: : : +- Scan parquet 
default.catalog_sales (16)
+   :: : +- ReusedExchange (19)
+   :: +- * Sort (33)
+   ::+- Exchange (32)
+   ::   +- * Project (31)
+   ::  +- * BroadcastHashJoin Inner BuildRight 
(30)
+   :: :- * Filter (28)
+   :: :  +- * ColumnarToRow (27)
+   :: : +- Scan parquet 
default.store_sales (26)
+   :: +- ReusedExchange (29)
+   :+- BroadcastExchange (40)
+   :   +- * Project (39)
+   :  +- * Filter (38)
+   : +- * ColumnarToRow (37)
+   :+- Scan parquet default.customer_address 
(36)
+   +- * Filter (46)
+  +- * ColumnarToRow (45)
+ +- Scan parquet default.customer_demographics (44)
+
+
+(1) Scan parquet default.customer
+Output [3]: [c_customer_sk#1, c_current_cdemo_sk#2, c_current_addr_sk#3]
+Batched: true
+Location: InMemoryFileIndex 
[file:/Users/yi.wu/IdeaProjects/spark/sql/core/spark-warehouse/org.apache.spark.sql.TPCDSModifiedPlanStabilityWithStatsSuite/customer]

Review comment:
   Oh, I see. Thanks for pointing it out. I'll make a follow-up soon.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #29434: [SPARK-32526][SQL] Pass all test of sql/catalyst module in Scala 2.13

2020-08-23 Thread GitBox


LuciferYang commented on pull request #29434:
URL: https://github.com/apache/spark/pull/29434#issuecomment-678867553


   @srowen @cloud-fan @HyukjinKwon @dongjoon-hyun Thank you for your review~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29501: [SPARK-32676][3.0][ML] Fix double caching in KMeans/BiKMeans

2020-08-23 Thread GitBox


zhengruifeng commented on pull request #29501:
URL: https://github.com/apache/spark/pull/29501#issuecomment-678867157


   @srowen @huaxingao Thanks for reviewing! would you mind to help backporting 
this to 3.0? I do not have a computer to do this right now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29477: [SPARK-32661][K8S] Spark executors should request extra memory for off-heap allocations.

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29477:
URL: https://github.com/apache/spark/pull/29477#issuecomment-678860803


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/32446/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29477: [SPARK-32661][K8S] Spark executors should request extra memory for off-heap allocations.

2020-08-23 Thread GitBox


SparkQA commented on pull request #29477:
URL: https://github.com/apache/spark/pull/29477#issuecomment-678860789


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/32446/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29477: [SPARK-32661][K8S] Spark executors should request extra memory for off-heap allocations.

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29477:
URL: https://github.com/apache/spark/pull/29477#issuecomment-678860798







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29477: [SPARK-32661][K8S] Spark executors should request extra memory for off-heap allocations.

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29477:
URL: https://github.com/apache/spark/pull/29477#issuecomment-678860798


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins removed a comment on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678860152







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


AmplabJenkins commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678860152







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29421: [SPARK-32388][SQL][test-hadoop2.7][test-hive1.2] TRANSFORM with schema-less mode should keep the same with hive

2020-08-23 Thread GitBox


SparkQA commented on pull request #29421:
URL: https://github.com/apache/spark/pull/29421#issuecomment-678859868


   **[Test build #127823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127823/testReport)**
 for PR 29421 at commit 
[`5f03222`](https://github.com/apache/spark/commit/5f032229ca2c457753622e21e22d92848de24fa6).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


SparkQA commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678845961


   **[Test build #127820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/127820/testReport)**
 for PR 29526 at commit 
[`b37f694`](https://github.com/apache/spark/commit/b37f6949f1f7c4c6d2264559402a963eb077990d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29505: [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog

2020-08-23 Thread GitBox


dongjoon-hyun commented on pull request #29505:
URL: https://github.com/apache/spark/pull/29505#issuecomment-678846054


   Thank you and welcome, @michal-wieleba .
   You are added to the Apache Spark contributor group and SPARK-32648 is 
assigned to you.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


viirya commented on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678845494


   Yeah, we don't run hive-1.2 test usually except we know the diff touches 
hive 1.2 code path. For these failed tests, they don't touch the code directly, 
but affect it indirectly...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29505: [SPARK-32648][SS] Remove unused DELETE_ACTION in FileStreamSinkLog

2020-08-23 Thread GitBox


dongjoon-hyun closed pull request #29505:
URL: https://github.com/apache/spark/pull/29505


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu edited a comment on pull request #29526: [SPARK-32352][SQL][FOLLOW-UP][test-hadoop2.7][test-hive1.2] Exclude partition columns from data columns

2020-08-23 Thread GitBox


maropu edited a comment on pull request #29526:
URL: https://github.com/apache/spark/pull/29526#issuecomment-678845205


   Nice, thanks for the swift fixes, @viirya! Anyway, it seems we didn't notice 
this test failure for 10+ days, so we need to carefully check the branches 
w/hive-1.2...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >