date:20200520

[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428478287



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   Ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-20 Thread GitBox



cloud-fan commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r428474929



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project, UnaryNode}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Wraps `LogicalRelation` to provide the number of buckets for coalescing.
+ */
+case class CoalesceBuckets(
+numCoalescedBuckets: Int,
+child: LogicalRelation) extends UnaryNode {
+  require(numCoalescedBuckets > 0,
+s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.")
+
+  override def output: Seq[Attribute] = child.output
+}
+
+/**
+ * This rule adds a `CoalesceBuckets` logical plan if the following conditions 
are met:
+ *   - Two bucketed tables are joined.
+ *   - Join is the equi-join.
+ *   - The larger bucket number is divisible by the smaller bucket number.
+ *   - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true.
+ *   - The difference in the number of buckets is less than the value set in
+ * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff".
+ */
+object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan]  {
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if 
r.bucketSpec.nonEmpty =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): 
Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) && ((large - small) <= 
conf.coalesceBucketsInJoinMaxNumBucketsDiff)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) =>
+CoalesceBuckets(numCoalescedBuckets, l)
+}
+  }
+
+  object ExtractJoinWithBuckets {
+def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = {
+  plan match {
+case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) =>

Review comment:
   yes





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428474898



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   Let's also don't forget to update the PR title and description.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-20 Thread GitBox



SparkQA commented on pull request #28556:
URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185


   **[Test build #122916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)**
 for PR 28556 at commit 
[`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-20 Thread GitBox



viirya commented on a change in pull request #28556:
URL: https://github.com/apache/spark/pull/28556#discussion_r428473655



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing {
   g: Generate,
   nestedFieldToAlias: Map[ExtractValue, Alias],
   attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = {
-val newGenerator = g.generator.transform {
-  case f: ExtractValue if nestedFieldToAlias.contains(f) =>
-nestedFieldToAlias(f).toAttribute
-}.asInstanceOf[Generator]
-
 // Defer updating `Generate.unrequiredChildIndex` to next round of 
`ColumnPruning`.
-val newGenerate = g.copy(generator = newGenerator)
-
-NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases)
+NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, 
attrToAliases)

Review comment:
   Changed the method name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join

2020-05-20 Thread GitBox



viirya commented on a change in pull request #28556:
URL: https://github.com/apache/spark/pull/28556#discussion_r428473708



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##
@@ -34,7 +34,8 @@ object NestedColumnAliasing {
 : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match 
{
 case Project(projectList, child)
 if SQLConf.get.nestedSchemaPruningEnabled && 
canProjectPushThrough(child) =>
-  getAliasSubMap(projectList)
+  val exprsToPrune = projectList ++ child.expressions

Review comment:
   changed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



GuoPhilipse commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428473216



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)"
 code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()"
   }
-  private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 
(long)$MICROS_PER_SECOND"
+  private[this] def longToTimeStampCode(l: ExprValue): Block = {
+if (SQLConf.get.numericConvertToTimestampInSeconds) code"" +

Review comment:
   yes,let me correct it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



cloud-fan commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428472274



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),

Review comment:
   `timestamp_millisecond` -> `timestamp_millis`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



cloud-fan commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428472315



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MillisToTimestamp]("timestamp_milliseconds"),
+expression[MicrosToTimestamp]("timestamp_microseconds"),

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631909451


   **[Test build #122915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122915/testReport)**
 for PR 28534 at commit 
[`97263e8`](https://github.com/apache/spark/commit/97263e84877bb767e476cfde635e5fb991dc4d6e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631892624







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631892624







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631892304


   **[Test build #122913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122913/testReport)**
 for PR 28534 at commit 
[`370bba7`](https://github.com/apache/spark/commit/370bba7676689deb92617303bce3bb173a62e0c3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



SparkQA commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631892298


   **[Test build #122914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)**
 for PR 28595 at commit 
[`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631890539







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631890539







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r42873



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   Not a big deal but yeah let's do that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428443365



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   @cloud-fan, is it means we shall rename `MilliSecondsToTimestamp` to 
`MillisToTimestamp` and `MicroSecondsToTimestamp` to `MicrosToTimestamp`. I 
will correct it if needed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



TJX2014 commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428443365



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   @cloud-fan, is it means we shall rename `MilliSecondsToTimestamp` to 
`MillisToTimestamp` and `MicroSecondsToTimestamp` to `MicrosToTimestamp`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631876701







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631876701







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



SparkQA commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631876272


   **[Test build #122912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122912/testReport)**
 for PR 28534 at commit 
[`a6383a1`](https://github.com/apache/spark/commit/a6383a188f5bc0da524793ac34bfa823bf55adbf).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions

2020-05-20 Thread GitBox



cloud-fan commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428439032



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   oh, I didn't realize it's `MILLIS` not `MILLISECONDS`. Let's follow the 
short name.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-20 Thread GitBox



imback82 commented on pull request #28123:
URL: https://github.com/apache/spark/pull/28123#issuecomment-631869859


   > 1. It's not possible to have an accurate cost model to guide this 
optimization. Do we have a heuristic? like coalescing 1 buckets to 2 is 
very likely to cause regression as parallelism is reduced too much.
   
   A heuristic is discussed 
[here](https://github.com/apache/spark/pull/28123/files#r411809034) to set the 
default value for 
`spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff`. And the config 
should help preventing the regression by the reduced parallelism.
   
   > 2. Are we going to support joins with more than 2 tables? e.g. "100 
buckets table" join "50 buckets table" join "10 buckets table".
   
   Good idea. I can do a follow-up PR to improve the current rule (which 
doesn't support nested joins)?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode

2020-05-20 Thread GitBox



cloud-fan commented on pull request #28523:
URL: https://github.com/apache/spark/pull/28523#issuecomment-631869094


   I thought you have canceled your veto in 
https://github.com/apache/spark/pull/28523#issuecomment-628164147
   
   BTW, if someone leaves a veto, please be active to defend it. You can't just 
leave a veto and then disappear, assuming no one would merge the PR. It's a 3.0 
blocker and your last review was 7 days ago. If you are busy with other stuff, 
please let people know when you are able to come back to review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-05-20 Thread GitBox



imback82 commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r428435196



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala
##
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project, UnaryNode}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Wraps `LogicalRelation` to provide the number of buckets for coalescing.
+ */
+case class CoalesceBuckets(
+numCoalescedBuckets: Int,
+child: LogicalRelation) extends UnaryNode {
+  require(numCoalescedBuckets > 0,
+s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.")
+
+  override def output: Seq[Attribute] = child.output
+}
+
+/**
+ * This rule adds a `CoalesceBuckets` logical plan if the following conditions 
are met:
+ *   - Two bucketed tables are joined.
+ *   - Join is the equi-join.
+ *   - The larger bucket number is divisible by the smaller bucket number.
+ *   - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true.
+ *   - The difference in the number of buckets is less than the value set in
+ * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff".
+ */
+object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan]  {
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if 
r.bucketSpec.nonEmpty =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): 
Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) && ((large - small) <= 
conf.coalesceBucketsInJoinMaxNumBucketsDiff)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) =>
+CoalesceBuckets(numCoalescedBuckets, l)
+}
+  }
+
+  object ExtractJoinWithBuckets {
+def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = {
+  plan match {
+case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) =>

Review comment:
   Sure, I guess the concern here is the unnecessarily reduced parallelism 
if broadcast join is picked.
   
   Is `QueryExecution.preparations` a reasonable place to insert the rule? 
(That seems to be the only place where you can plug in a rule that takes in a 
`SparkPlan` and returns a `SparkPlan`.)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428433186



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##
@@ -401,6 +401,81 @@ case class DayOfYear(child: Expression) extends 
UnaryExpression with ImplicitCas
   }
 }
 
+abstract class NumberToTimestampBase extends UnaryExpression
+  with ImplicitCastInputTypes {
+
+  protected def upScaleFactor: Long
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(LongType)
+
+  override def dataType: DataType = TimestampType
+
+  override def nullSafeEval(input: Any): Any = {
+Math.multiplyExact(input.asInstanceOf[Long], upScaleFactor)
+  }
+
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+if (upScaleFactor == 1) {
+  defineCodeGen(ctx, ev, c => c)
+} else {
+  defineCodeGen(ctx, ev, c => s"java.lang.Math.multiplyExact($c, 
$upScaleFactor)")
+}
+  }
+}
+
+@ExpressionDescription(
+  usage = "_FUNC_(seconds) - Creates timestamp from the number of seconds 
since UTC epoch.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(1230219000);
+   2008-12-25 07:30:00
+  """,
+  group = "datetime_funcs",
+  since = "3.1.0")
+case class SecondsToTimestamp(child: Expression)
+  extends NumberToTimestampBase {
+
+  override def upScaleFactor: Long = MICROS_PER_SECOND
+
+  override def prettyName: String = "timestamp_seconds"
+}
+
+@ExpressionDescription(
+  usage = "_FUNC_(milliseconds) - " +
+"Creates timestamp from the number of milliseconds since UTC epoch.",

Review comment:
   nit: let's do either:
   
   ```
   usage = """
 _FUNC_(milliseconds) - Creates timestamp from the number of milliseconds
   since UTC epoch.
   ...
   """
   ```
   
   or
   
   ```
   // scalastyle:off line.size.limit
   @ExpressionDescription(
 usage = "_FUNC_(milliseconds) - Creates timestamp from the number of 
milliseconds since UTC epoch.",
 ...
   // scalastyle:on line.size.limit
   ```
   
   just for the sake of consistency.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #28582: [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString

2020-05-20 Thread GitBox



cloud-fan commented on pull request #28582:
URL: https://github.com/apache/spark/pull/28582#issuecomment-631865472


   thanks, merging to master/3.0!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #28582: [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString

2020-05-20 Thread GitBox



cloud-fan closed pull request #28582:
URL: https://github.com/apache/spark/pull/28582


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428431767



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   I think it's okay to have expressions presumably to get away from 
`cast(ts as long)` behaviour which is not invasive.  But it would be more 
interesting to see how other DBMSes solved this problem. From arbitrary 
googling, vendors have different approaches.
   
   Seems like BigQuery has these three functions 
(https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=en#timestamp_seconds).
 Shall we match the naming at least?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28534:
URL: https://github.com/apache/spark/pull/28534#discussion_r428431767



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
##
@@ -424,6 +424,9 @@ object FunctionRegistry {
 expression[MakeInterval]("make_interval"),
 expression[DatePart]("date_part"),
 expression[Extract]("extract"),
+expression[SecondsToTimestamp]("timestamp_seconds"),
+expression[MilliSecondsToTimestamp]("timestamp_milliseconds"),
+expression[MicroSecondsToTimestamp]("timestamp_microseconds"),

Review comment:
   I think it's okay to have expressions presumably to get away from 
`cast(ts as long)` behaviour which is not invasive.  But it would be more 
interesting to see how other DBMSes solved this problem. From arbitrary 
googling, vendors have different approaches.
   
   Seems like BigQuery has these three functions 
(https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=ko#timestamp_seconds).
 Shall we match the naming at least?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer

2020-05-20 Thread GitBox



HyukjinKwon commented on pull request #28534:
URL: https://github.com/apache/spark/pull/28534#issuecomment-631862270


   I would prefer this way over #28593. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428428811



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2586,6 +2586,22 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE =
+buildConf("spark.sql.legacy.numericConvertToTimestampEnable")
+  .doc("when true,use legacy numberic can convert to timestamp")
+  .version("3.0.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS =
+buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds")
+  .internal()
+  .doc("The legacy only works when 
LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." +
+"when true,the value will be  interpreted as seconds,which follow 
spark style," +
+"when false,value is interpreted as milliseconds,which follow hive 
style")

Review comment:
   Sorry but I can't still follow why Spark should take care about 
following Hive style here. Most likely the legacy users are already depending 
on this behaviour, and few users might had to do the workaround by themselves. 
I don't think even `cast(ts as long)` is a standard and an widely accepted 
behaviour. -1 from me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428428811



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2586,6 +2586,22 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE =
+buildConf("spark.sql.legacy.numericConvertToTimestampEnable")
+  .doc("when true,use legacy numberic can convert to timestamp")
+  .version("3.0.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS =
+buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds")
+  .internal()
+  .doc("The legacy only works when 
LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." +
+"when true,the value will be  interpreted as seconds,which follow 
spark style," +
+"when false,value is interpreted as milliseconds,which follow hive 
style")

Review comment:
   Sorry but I can't still follow why Spark should follow Hive style. Most 
likely the legacy users are already depending on this behaviour, and few users 
might had to do the workaround by themselves. I don't think even `cast(ts as 
long)` is a standard and an widely accepted behaviour. -1 from me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428428811



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2586,6 +2586,22 @@ object SQLConf {
   .checkValue(_ > 0, "The timeout value must be positive")
   .createWithDefault(10L)
 
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE =
+buildConf("spark.sql.legacy.numericConvertToTimestampEnable")
+  .doc("when true,use legacy numberic can convert to timestamp")
+  .version("3.0.0")
+  .booleanConf
+  .createWithDefault(false)
+
+  val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS =
+buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds")
+  .internal()
+  .doc("The legacy only works when 
LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." +
+"when true,the value will be  interpreted as seconds,which follow 
spark style," +
+"when false,value is interpreted as milliseconds,which follow hive 
style")

Review comment:
   Sorry but I can't still follow why Spark should follow Hive style even 
by default. Most likely the legacy users are already depending on this 
behaviour, and few users might had to do the workaround by themselves. I don't 
think even `cast(ts as long)` is a standard and an widely accepted behaviour. 
-1 from me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] duanmeng commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-20 Thread GitBox



duanmeng commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r428428247



##
File path: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
##
@@ -626,16 +628,61 @@ final class ShuffleBlockFetcherIterator(
   buf.release()
   throwFetchFailedException(blockId, mapIndex, address, e)
   }
+
+  // If shuffle digest enabled is true, check the block with checkSum.
+  var failedOnDigestCheck = false
+  if (digestEnabled) {
+if (digest >= 0) {
+  val digestToCheck = try {
+DigestUtils.getDigest(in)
+  } catch {
+case e: IOException =>
+  logError("Error occurs when checking digest", e)
+  buf.release()
+  throwFetchFailedException(blockId, mapIndex, address, e)
+  }
+  failedOnDigestCheck = digest != digestToCheck
+  if (!failedOnDigestCheck) {

Review comment:
   should be `if (failedOndigestCheck)`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] duanmeng commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data

2020-05-20 Thread GitBox



duanmeng commented on a change in pull request #28525:
URL: https://github.com/apache/spark/pull/28525#discussion_r428427803



##
File path: 
core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala
##
@@ -170,11 +190,38 @@ private[spark] class IndexShuffleBlockResolver(
   // There is only one IndexShuffleBlockResolver per executor, this 
synchronization make sure
   // the following check and rename are atomic.
   synchronized {
-val existingLengths = checkIndexAndDataFile(indexFile, dataFile, 
lengths.length)
-if (existingLengths != null) {
+val digests = new Array[Long](lengths.length)
+val dateIn = if (dataTmp != null && dataTmp.exists()) {
+  new FileInputStream(dataTmp)
+} else {
+  null
+}
+Utils.tryWithSafeFinally {
+  if (digestEnable && dateIn != null) {
+for (i <- (0 until lengths.length)) {
+  val length = lengths(i)
+  if (length == 0) {
+digests(i) = -1L
+  } else {
+digests(i) = DigestUtils.getDigest(new 
LimitedInputStream(dateIn, length))
+  }
+}
+  }
+} {
+  if (dateIn != null) {
+dateIn.close()
+  }
+}
+
+val existingLengthsDigests =
+  checkIndexAndDataFile(indexFile, dataFile, lengths.length, digests)
+if (existingLengthsDigests != null) {
+  val existingLengths = existingLengthsDigests._1
+  val existingDigests = existingLengthsDigests._2
   // Another attempt for the same task has already written our map 
outputs successfully,
   // so just use the existing partition lengths and delete our 
temporary map outputs.
   System.arraycopy(existingLengths, 0, lengths, 0, lengths.length)
+  System.arraycopy(existingDigests, 0, digests, 0, digests.length)

Review comment:
   This may be unnecessary





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-20 Thread GitBox



cloud-fan commented on a change in pull request #28128:
URL: https://github.com/apache/spark/pull/28128#discussion_r428427640



##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!SparkSession.listenerRegistered.get()) {

Review comment:
   nit: we are in the same class so we can remove `SparkSession.`

##
File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
##
@@ -1064,6 +1055,20 @@ object SparkSession extends Logging {
   // Private methods from now on
   

 
+  private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false)
+
+  /** Register the AppEnd listener onto the Context  */
+  private def registerContextListener(sparkContext: SparkContext): Unit = {
+if (!SparkSession.listenerRegistered.get()) {
+  sparkContext.addSparkListener(new SparkListener {
+override def onApplicationEnd(applicationEnd: 
SparkListenerApplicationEnd): Unit = {
+  defaultSession.set(null)
+}
+  })
+  SparkSession.listenerRegistered.set(true)

Review comment:
   ditto





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] WeichenXu123 commented on pull request #28584: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite

2020-05-20 Thread GitBox



WeichenXu123 commented on pull request #28584:
URL: https://github.com/apache/spark/pull/28584#issuecomment-631859882


   Also fix this test like:
   ```
 test("share messages with allGather() call") {
   val conf = new SparkConf()
 .setMaster("local-cluster[4, 1, 1024]")
 .setAppName("test-cluster")
   sc = new SparkContext(conf)
   val rdd = sc.makeRDD(1 to 4, 4)
   val rdd2 = rdd.barrier().mapPartitions { it =>
 val context = BarrierTaskContext.get()
 // Sleep for a random time before global sync.
 Thread.sleep(Random.nextInt(1000))
 // Pass partitionId message in
 val message: String = context.partitionId().toString
 val messages: Array[String] = context.allGather(message)
 Iterator.single(messages.toList)
   }
   // Take a sorted list of all the partitionId messages
   val messages_list = rdd2.collect()
   assert (messages_list.length === 4)
   for (messages <- messages_list) {
 assert (messages === (0 until 4).map(_.toString).toList)
   }
 }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



HyukjinKwon commented on a change in pull request #28593:
URL: https://github.com/apache/spark/pull/28593#discussion_r428422910



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
##
@@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with 
TimeZoneAwareExpression wit
 val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)"
 code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()"
   }
-  private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 
(long)$MICROS_PER_SECOND"
+  private[this] def longToTimeStampCode(l: ExprValue): Block = {
+if (SQLConf.get.numericConvertToTimestampInSeconds) code"" +

Review comment:
   Let's change `l` to something else per 
https://github.com/databricks/scala-style-guide#variable-naming while we're 
here.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631851933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631851933







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804956


   **[Test build #122908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)**
 for PR 28594 at commit 
[`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType

2020-05-20 Thread GitBox



Ngone51 commented on pull request #28572:
URL: https://github.com/apache/spark/pull/28572#issuecomment-631851718


   thanks all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631851365


   **[Test build #122908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)**
 for PR 28594 at commit 
[`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631848306


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122910/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631848296


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631848296







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631815642


   **[Test build #122910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)**
 for PR 28593 at commit 
[`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631848096


   **[Test build #122910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)**
 for PR 28593 at commit 
[`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631836881







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631836881







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631791158


   **[Test build #122907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)**
 for PR 28594 at commit 
[`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631836062


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122911/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631836233


   **[Test build #122907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)**
 for PR 28594 at commit 
[`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631836059


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



SparkQA commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631836040


   **[Test build #122911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)**
 for PR 28595 at commit 
[`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d).
* This patch **fails MiMa tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `trait HasK extends Params `
 * `class _LDAParams(HasMaxIter, HasFeaturesCol, HasSeed, 
HasCheckpointInterval, HasK):`
 * `class _PowerIterationClusteringParams(HasMaxIter, HasWeightCol, HasK):`
 * `class HasK(Params):`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631833038


   **[Test build #122911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)**
 for PR 28595 at commit 
[`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631836059







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631833284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631833284







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



SparkQA commented on pull request #28595:
URL: https://github.com/apache/spark/pull/28595#issuecomment-631833038


   **[Test build #122911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)**
 for PR 28595 at commit 
[`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao opened a new pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params

2020-05-20 Thread GitBox



huaxingao opened a new pull request #28595:
URL: https://github.com/apache/spark/pull/28595


   
   ### What changes were proposed in this pull request?
   Param k (number of clusters) is used for all the clustering algorithms, so 
move it to shared params.
   
   
   ### Why are the changes needed?
   Code reuse
   
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   existing tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



dongjoon-hyun closed pull request #28594:
URL: https://github.com/apache/spark/pull/28594


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dbtsai commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



dbtsai commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631825289


   LGTM.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631824322







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631824311


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27551/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631824322







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



dongjoon-hyun commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631817059


   Hi, @dbtsai .
   Could you review this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631816007







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631816007







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631816084


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27551/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631815642


   **[Test build #122910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)**
 for PR 28593 at commit 
[`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810838


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122909/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631809545


   **[Test build #122909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)**
 for PR 28593 at commit 
[`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810835


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810824


   **[Test build #122909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)**
 for PR 28593 at commit 
[`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810835







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810012







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631810012







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA commented on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631809545


   **[Test build #122909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)**
 for PR 28593 at commit 
[`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804786


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/27550/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804784


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804956


   **[Test build #122908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)**
 for PR 28594 at commit 
[`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804775


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27550/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



AmplabJenkins commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631804784







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #27436: [SPARK-30705][SQL] Improve CaseWhen sub-expression equality

2020-05-20 Thread GitBox



github-actions[bot] commented on pull request #27436:
URL: https://github.com/apache/spark/pull/27436#issuecomment-631803512


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #27231: [SPARK-28478][SQL] Remove redundant null checks

2020-05-20 Thread GitBox



github-actions[bot] commented on pull request #27231:
URL: https://github.com/apache/spark/pull/27231#issuecomment-631803519


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631803216


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27550/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-05-20 Thread GitBox



HeartSaVioR commented on a change in pull request #28363:
URL: https://github.com/apache/spark/pull/28363#discussion_r428373935



##
File path: docs/structured-streaming-programming-guide.md
##
@@ -1860,7 +1860,10 @@ Here are the details of all the sinks in Spark.
 File Sink
 Append
 
-path: path to the output directory, must be specified.
+path: path to the output directory, must be 
specified.
+outputRetentionMs: time to live (TTL) for output files. 
Output files which batches were

Review comment:
   I guess we avoid exposing the implementation details in docs. e.g. If 
I'm not mistaken, there's no explanation of the format of the metadata, hence 
it would be confusing which field is being used because end users even don't 
know what they are.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files

2020-05-20 Thread GitBox



HeartSaVioR commented on a change in pull request #28363:
URL: https://github.com/apache/spark/pull/28363#discussion_r428372910



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala
##
@@ -45,7 +46,20 @@ case class SinkFileStatus(
 modificationTime: Long,
 blockReplication: Int,
 blockSize: Long,
-action: String) {
+action: String,
+commitTime: Long) {

Review comment:
   So the introduce of "commit time" came from the concern about uncertain 
of HDFS file timestamp in previous PR. If we are sure about the modification 
time, no need to use "commit time".





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



SparkQA commented on pull request #28594:
URL: https://github.com/apache/spark/pull/28594#issuecomment-631791158


   **[Test build #122907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)**
 for PR 28594 at commit 
[`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test

2020-05-20 Thread GitBox



dongjoon-hyun opened a new pull request #28594:
URL: https://github.com/apache/spark/pull/28594


   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631785210


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122906/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp

2020-05-20 Thread GitBox



SparkQA removed a comment on pull request #28593:
URL: https://github.com/apache/spark/pull/28593#issuecomment-631783497


   **[Test build #122906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122906/testReport)**
 for PR 28593 at commit 
[`4577fa8`](https://github.com/apache/spark/commit/4577fa813b15e828a46ee322d4119b17e796ee8d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener

2020-05-20 Thread GitBox



AmplabJenkins removed a comment on pull request #28128:
URL: https://github.com/apache/spark/pull/28128#issuecomment-631785345







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 475 matches

Mail list logo