[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428478287 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: Ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins removed a comment on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
AmplabJenkins commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915587 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
cloud-fan commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r428474929 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project, UnaryNode} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * Wraps `LogicalRelation` to provide the number of buckets for coalescing. + */ +case class CoalesceBuckets( +numCoalescedBuckets: Int, +child: LogicalRelation) extends UnaryNode { + require(numCoalescedBuckets > 0, +s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.") + + override def output: Seq[Attribute] = child.output +} + +/** + * This rule adds a `CoalesceBuckets` logical plan if the following conditions are met: + * - Two bucketed tables are joined. + * - Join is the equi-join. + * - The larger bucket number is divisible by the smaller bucket number. + * - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true. + * - The difference in the number of buckets is less than the value set in + * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff". + */ +object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan] { + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if r.bucketSpec.nonEmpty => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && ((large - small) <= conf.coalesceBucketsInJoinMaxNumBucketsDiff)) { + Some(small) +} else { + None +} + } + + private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) => +CoalesceBuckets(numCoalescedBuckets, l) +} + } + + object ExtractJoinWithBuckets { +def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = { + plan match { +case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) => Review comment: yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428474898 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: Let's also don't forget to update the PR title and description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
SparkQA commented on pull request #28556: URL: https://github.com/apache/spark/pull/28556#issuecomment-631915185 **[Test build #122916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122916/testReport)** for PR 28556 at commit [`b77a1ba`](https://github.com/apache/spark/commit/b77a1ba23f5c39fb541bb8f61ebbbd62ffbb975d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
viirya commented on a change in pull request #28556: URL: https://github.com/apache/spark/pull/28556#discussion_r428473655 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -204,15 +211,8 @@ object GeneratorNestedColumnAliasing { g: Generate, nestedFieldToAlias: Map[ExtractValue, Alias], attrToAliases: Map[ExprId, Seq[Alias]]): LogicalPlan = { -val newGenerator = g.generator.transform { - case f: ExtractValue if nestedFieldToAlias.contains(f) => -nestedFieldToAlias(f).toAttribute -}.asInstanceOf[Generator] - // Defer updating `Generate.unrequiredChildIndex` to next round of `ColumnPruning`. -val newGenerate = g.copy(generator = newGenerator) - -NestedColumnAliasing.replaceChildrenWithAliases(newGenerate, attrToAliases) +NestedColumnAliasing.replaceChildrenWithAliases(g, nestedFieldToAlias, attrToAliases) Review comment: Changed the method name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #28556: [SPARK-31736][SQL] Nested column aliasing for RepartitionByExpression/Join
viirya commented on a change in pull request #28556: URL: https://github.com/apache/spark/pull/28556#discussion_r428473708 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala ## @@ -34,7 +34,8 @@ object NestedColumnAliasing { : Option[(Map[ExtractValue, Alias], Map[ExprId, Seq[Alias]])] = plan match { case Project(projectList, child) if SQLConf.get.nestedSchemaPruningEnabled && canProjectPushThrough(child) => - getAliasSubMap(projectList) + val exprsToPrune = projectList ++ child.expressions Review comment: changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] GuoPhilipse commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
GuoPhilipse commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428473216 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)" code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" } - private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * (long)$MICROS_PER_SECOND" + private[this] def longToTimeStampCode(l: ExprValue): Block = { +if (SQLConf.get.numericConvertToTimestampInSeconds) code"" + Review comment: yes,let me correct it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428472274 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), Review comment: `timestamp_millisecond` -> `timestamp_millis` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428472315 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MillisToTimestamp]("timestamp_milliseconds"), +expression[MicrosToTimestamp]("timestamp_microseconds"), Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631909451 **[Test build #122915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122915/testReport)** for PR 28534 at commit [`97263e8`](https://github.com/apache/spark/commit/97263e84877bb767e476cfde635e5fb991dc4d6e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631892624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631892624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631892304 **[Test build #122913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122913/testReport)** for PR 28534 at commit [`370bba7`](https://github.com/apache/spark/commit/370bba7676689deb92617303bce3bb173a62e0c3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631892298 **[Test build #122914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122914/testReport)** for PR 28595 at commit [`2d6e87b`](https://github.com/apache/spark/commit/2d6e87b1892683b6035bfdf6ed8c475bada5bbc8). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631890539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631890539 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r42873 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: Not a big deal but yeah let's do that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428443365 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: @cloud-fan, is it means we shall rename `MilliSecondsToTimestamp` to `MillisToTimestamp` and `MicroSecondsToTimestamp` to `MicrosToTimestamp`. I will correct it if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] TJX2014 commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
TJX2014 commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428443365 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: @cloud-fan, is it means we shall rename `MilliSecondsToTimestamp` to `MillisToTimestamp` and `MicroSecondsToTimestamp` to `MicrosToTimestamp`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins removed a comment on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631876701 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
AmplabJenkins commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631876701 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
SparkQA commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631876272 **[Test build #122912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122912/testReport)** for PR 28534 at commit [`a6383a1`](https://github.com/apache/spark/commit/a6383a188f5bc0da524793ac34bfa823bf55adbf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28534: [SPARK-31710][SQL] Adds TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS functions
cloud-fan commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428439032 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: oh, I didn't realize it's `MILLIS` not `MILLISECONDS`. Let's follow the short name. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on pull request #28123: URL: https://github.com/apache/spark/pull/28123#issuecomment-631869859 > 1. It's not possible to have an accurate cost model to guide this optimization. Do we have a heuristic? like coalescing 1 buckets to 2 is very likely to cause regression as parallelism is reduced too much. A heuristic is discussed [here](https://github.com/apache/spark/pull/28123/files#r411809034) to set the default value for `spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff`. And the config should help preventing the regression by the reduced parallelism. > 2. Are we going to support joins with more than 2 tables? e.g. "100 buckets table" join "50 buckets table" join "10 buckets table". Good idea. I can do a follow-up PR to improve the current rule (which doesn't support nested joins)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28523: [SPARK-31706][SQL] add back the support of streaming update mode
cloud-fan commented on pull request #28523: URL: https://github.com/apache/spark/pull/28523#issuecomment-631869094 I thought you have canceled your veto in https://github.com/apache/spark/pull/28523#issuecomment-628164147 BTW, if someone leaves a veto, please be active to defend it. You can't just leave a veto and then disappear, assuming no one would merge the PR. It's a 3.0 blocker and your last review was 7 days ago. If you are busy with other stuff, please let people know when you are able to come back to review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable
imback82 commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r428435196 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInEquiJoin.scala ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.bucketing + +import org.apache.spark.sql.catalyst.catalog.BucketSpec +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.planning.ExtractEquiJoinKeys +import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, Project, UnaryNode} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, LogicalRelation} +import org.apache.spark.sql.internal.SQLConf + +/** + * Wraps `LogicalRelation` to provide the number of buckets for coalescing. + */ +case class CoalesceBuckets( +numCoalescedBuckets: Int, +child: LogicalRelation) extends UnaryNode { + require(numCoalescedBuckets > 0, +s"Number of coalesced buckets ($numCoalescedBuckets) must be positive.") + + override def output: Seq[Attribute] = child.output +} + +/** + * This rule adds a `CoalesceBuckets` logical plan if the following conditions are met: + * - Two bucketed tables are joined. + * - Join is the equi-join. + * - The larger bucket number is divisible by the smaller bucket number. + * - "spark.sql.bucketing.coalesceBucketsInJoin.enabled" is set to true. + * - The difference in the number of buckets is less than the value set in + * "spark.sql.bucketing.coalesceBucketsInJoin.maxNumBucketsDiff". + */ +object CoalesceBucketsInEquiJoin extends Rule[LogicalPlan] { + private def isPlanEligible(plan: LogicalPlan): Boolean = { +def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = { + p(plan) && plan.children.forall(forall(_)(p)) +} + +forall(plan) { + case _: Filter | _: Project | _: LogicalRelation => true + case _ => false +} + } + + private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = { +if (isPlanEligible(plan)) { + plan.collectFirst { +case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _) if r.bucketSpec.nonEmpty => + r.bucketSpec.get + } +} else { + None +} + } + + private def mayCoalesce(numBuckets1: Int, numBuckets2: Int, conf: SQLConf): Option[Int] = { +assert(numBuckets1 != numBuckets2) +val (small, large) = (math.min(numBuckets1, numBuckets2), math.max(numBuckets1, numBuckets2)) +// A bucket can be coalesced only if the bigger number of buckets is divisible by the smaller +// number of buckets because bucket id is calculated by modding the total number of buckets. +if ((large % small == 0) && ((large - small) <= conf.coalesceBucketsInJoinMaxNumBucketsDiff)) { + Some(small) +} else { + None +} + } + + private def addCoalesceBuckets(plan: LogicalPlan, numCoalescedBuckets: Int): LogicalPlan = { +plan.transformUp { + case l @ LogicalRelation(_: HadoopFsRelation, _, _, _) => +CoalesceBuckets(numCoalescedBuckets, l) +} + } + + object ExtractJoinWithBuckets { +def unapply(plan: LogicalPlan): Option[(Join, Int, Int)] = { + plan match { +case join @ ExtractEquiJoinKeys(_, _, _, _, left, right, _) => Review comment: Sure, I guess the concern here is the unnecessarily reduced parallelism if broadcast join is picked. Is `QueryExecution.preparations` a reasonable place to insert the rule? (That seems to be the only place where you can plug in a rule that takes in a `SparkPlan` and returns a `SparkPlan`.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428433186 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -401,6 +401,81 @@ case class DayOfYear(child: Expression) extends UnaryExpression with ImplicitCas } } +abstract class NumberToTimestampBase extends UnaryExpression + with ImplicitCastInputTypes { + + protected def upScaleFactor: Long + + override def inputTypes: Seq[AbstractDataType] = Seq(LongType) + + override def dataType: DataType = TimestampType + + override def nullSafeEval(input: Any): Any = { +Math.multiplyExact(input.asInstanceOf[Long], upScaleFactor) + } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +if (upScaleFactor == 1) { + defineCodeGen(ctx, ev, c => c) +} else { + defineCodeGen(ctx, ev, c => s"java.lang.Math.multiplyExact($c, $upScaleFactor)") +} + } +} + +@ExpressionDescription( + usage = "_FUNC_(seconds) - Creates timestamp from the number of seconds since UTC epoch.", + examples = """ +Examples: + > SELECT _FUNC_(1230219000); + 2008-12-25 07:30:00 + """, + group = "datetime_funcs", + since = "3.1.0") +case class SecondsToTimestamp(child: Expression) + extends NumberToTimestampBase { + + override def upScaleFactor: Long = MICROS_PER_SECOND + + override def prettyName: String = "timestamp_seconds" +} + +@ExpressionDescription( + usage = "_FUNC_(milliseconds) - " + +"Creates timestamp from the number of milliseconds since UTC epoch.", Review comment: nit: let's do either: ``` usage = """ _FUNC_(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch. ... """ ``` or ``` // scalastyle:off line.size.limit @ExpressionDescription( usage = "_FUNC_(milliseconds) - Creates timestamp from the number of milliseconds since UTC epoch.", ... // scalastyle:on line.size.limit ``` just for the sake of consistency. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #28582: [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString
cloud-fan commented on pull request #28582: URL: https://github.com/apache/spark/pull/28582#issuecomment-631865472 thanks, merging to master/3.0! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28582: [SPARK-31762][SQL] Fix perf regression of date/timestamp formatting in toHiveString
cloud-fan closed pull request #28582: URL: https://github.com/apache/spark/pull/28582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428431767 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: I think it's okay to have expressions presumably to get away from `cast(ts as long)` behaviour which is not invasive. But it would be more interesting to see how other DBMSes solved this problem. From arbitrary googling, vendors have different approaches. Seems like BigQuery has these three functions (https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=en#timestamp_seconds). Shall we match the naming at least? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
HyukjinKwon commented on a change in pull request #28534: URL: https://github.com/apache/spark/pull/28534#discussion_r428431767 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -424,6 +424,9 @@ object FunctionRegistry { expression[MakeInterval]("make_interval"), expression[DatePart]("date_part"), expression[Extract]("extract"), +expression[SecondsToTimestamp]("timestamp_seconds"), +expression[MilliSecondsToTimestamp]("timestamp_milliseconds"), +expression[MicroSecondsToTimestamp]("timestamp_microseconds"), Review comment: I think it's okay to have expressions presumably to get away from `cast(ts as long)` behaviour which is not invasive. But it would be more interesting to see how other DBMSes solved this problem. From arbitrary googling, vendors have different approaches. Seems like BigQuery has these three functions (https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions?hl=ko#timestamp_seconds). Shall we match the naming at least? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #28534: [SPARK-31710][SQL]TIMESTAMP_SECONDS, TIMESTAMP_MILLISECONDS and TIMESTAMP_MICROSECONDS to timestamp transfer
HyukjinKwon commented on pull request #28534: URL: https://github.com/apache/spark/pull/28534#issuecomment-631862270 I would prefer this way over #28593. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
HyukjinKwon commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428428811 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,22 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE = +buildConf("spark.sql.legacy.numericConvertToTimestampEnable") + .doc("when true,use legacy numberic can convert to timestamp") + .version("3.0.0") + .booleanConf + .createWithDefault(false) + + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS = +buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds") + .internal() + .doc("The legacy only works when LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." + +"when true,the value will be interpreted as seconds,which follow spark style," + +"when false,value is interpreted as milliseconds,which follow hive style") Review comment: Sorry but I can't still follow why Spark should take care about following Hive style here. Most likely the legacy users are already depending on this behaviour, and few users might had to do the workaround by themselves. I don't think even `cast(ts as long)` is a standard and an widely accepted behaviour. -1 from me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
HyukjinKwon commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428428811 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,22 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE = +buildConf("spark.sql.legacy.numericConvertToTimestampEnable") + .doc("when true,use legacy numberic can convert to timestamp") + .version("3.0.0") + .booleanConf + .createWithDefault(false) + + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS = +buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds") + .internal() + .doc("The legacy only works when LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." + +"when true,the value will be interpreted as seconds,which follow spark style," + +"when false,value is interpreted as milliseconds,which follow hive style") Review comment: Sorry but I can't still follow why Spark should follow Hive style. Most likely the legacy users are already depending on this behaviour, and few users might had to do the workaround by themselves. I don't think even `cast(ts as long)` is a standard and an widely accepted behaviour. -1 from me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
HyukjinKwon commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428428811 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -2586,6 +2586,22 @@ object SQLConf { .checkValue(_ > 0, "The timeout value must be positive") .createWithDefault(10L) + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE = +buildConf("spark.sql.legacy.numericConvertToTimestampEnable") + .doc("when true,use legacy numberic can convert to timestamp") + .version("3.0.0") + .booleanConf + .createWithDefault(false) + + val LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_IN_SECONDS = +buildConf("spark.sql.legacy.numericConvertToTimestampInSeconds") + .internal() + .doc("The legacy only works when LEGACY_NUMERIC_CONVERT_TO_TIMESTAMP_ENABLE is true." + +"when true,the value will be interpreted as seconds,which follow spark style," + +"when false,value is interpreted as milliseconds,which follow hive style") Review comment: Sorry but I can't still follow why Spark should follow Hive style even by default. Most likely the legacy users are already depending on this behaviour, and few users might had to do the workaround by themselves. I don't think even `cast(ts as long)` is a standard and an widely accepted behaviour. -1 from me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] duanmeng commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data
duanmeng commented on a change in pull request #28525: URL: https://github.com/apache/spark/pull/28525#discussion_r428428247 ## File path: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala ## @@ -626,16 +628,61 @@ final class ShuffleBlockFetcherIterator( buf.release() throwFetchFailedException(blockId, mapIndex, address, e) } + + // If shuffle digest enabled is true, check the block with checkSum. + var failedOnDigestCheck = false + if (digestEnabled) { +if (digest >= 0) { + val digestToCheck = try { +DigestUtils.getDigest(in) + } catch { +case e: IOException => + logError("Error occurs when checking digest", e) + buf.release() + throwFetchFailedException(blockId, mapIndex, address, e) + } + failedOnDigestCheck = digest != digestToCheck + if (!failedOnDigestCheck) { Review comment: should be `if (failedOndigestCheck)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] duanmeng commented on a change in pull request #28525: [SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data
duanmeng commented on a change in pull request #28525: URL: https://github.com/apache/spark/pull/28525#discussion_r428427803 ## File path: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockResolver.scala ## @@ -170,11 +190,38 @@ private[spark] class IndexShuffleBlockResolver( // There is only one IndexShuffleBlockResolver per executor, this synchronization make sure // the following check and rename are atomic. synchronized { -val existingLengths = checkIndexAndDataFile(indexFile, dataFile, lengths.length) -if (existingLengths != null) { +val digests = new Array[Long](lengths.length) +val dateIn = if (dataTmp != null && dataTmp.exists()) { + new FileInputStream(dataTmp) +} else { + null +} +Utils.tryWithSafeFinally { + if (digestEnable && dateIn != null) { +for (i <- (0 until lengths.length)) { + val length = lengths(i) + if (length == 0) { +digests(i) = -1L + } else { +digests(i) = DigestUtils.getDigest(new LimitedInputStream(dateIn, length)) + } +} + } +} { + if (dateIn != null) { +dateIn.close() + } +} + +val existingLengthsDigests = + checkIndexAndDataFile(indexFile, dataFile, lengths.length, digests) +if (existingLengthsDigests != null) { + val existingLengths = existingLengthsDigests._1 + val existingDigests = existingLengthsDigests._2 // Another attempt for the same task has already written our map outputs successfully, // so just use the existing partition lengths and delete our temporary map outputs. System.arraycopy(existingLengths, 0, lengths, 0, lengths.length) + System.arraycopy(existingDigests, 0, digests, 0, digests.length) Review comment: This may be unnecessary This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
cloud-fan commented on a change in pull request #28128: URL: https://github.com/apache/spark/pull/28128#discussion_r428427640 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!SparkSession.listenerRegistered.get()) { Review comment: nit: we are in the same class so we can remove `SparkSession.` ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1064,6 +1055,20 @@ object SparkSession extends Logging { // Private methods from now on + private val listenerRegistered: AtomicBoolean = new AtomicBoolean(false) + + /** Register the AppEnd listener onto the Context */ + private def registerContextListener(sparkContext: SparkContext): Unit = { +if (!SparkSession.listenerRegistered.get()) { + sparkContext.addSparkListener(new SparkListener { +override def onApplicationEnd(applicationEnd: SparkListenerApplicationEnd): Unit = { + defaultSession.set(null) +} + }) + SparkSession.listenerRegistered.set(true) Review comment: ditto This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on pull request #28584: [SPARK-31730][CORE][TEST] Fix flaky tests in BarrierTaskContextSuite
WeichenXu123 commented on pull request #28584: URL: https://github.com/apache/spark/pull/28584#issuecomment-631859882 Also fix this test like: ``` test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 4, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) Iterator.single(messages.toList) } // Take a sorted list of all the partitionId messages val messages_list = rdd2.collect() assert (messages_list.length === 4) for (messages <- messages_list) { assert (messages === (0 until 4).map(_.toString).toList) } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
HyukjinKwon commented on a change in pull request #28593: URL: https://github.com/apache/spark/pull/28593#discussion_r428422910 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -1277,7 +1285,11 @@ abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression wit val block = inline"new java.math.BigDecimal($MICROS_PER_SECOND)" code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" } - private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * (long)$MICROS_PER_SECOND" + private[this] def longToTimeStampCode(l: ExprValue): Block = { +if (SQLConf.get.numericConvertToTimestampInSeconds) code"" + Review comment: Let's change `l` to something else per https://github.com/databricks/scala-style-guide#variable-naming while we're here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631851933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631851933 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804956 **[Test build #122908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)** for PR 28594 at commit [`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #28572: [SPARK-31750][SQL] Eliminate UpCast if child's dataType is DecimalType
Ngone51 commented on pull request #28572: URL: https://github.com/apache/spark/pull/28572#issuecomment-631851718 thanks all! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631851365 **[Test build #122908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)** for PR 28594 at commit [`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631848306 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122910/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631848296 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631848296 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631815642 **[Test build #122910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)** for PR 28593 at commit [`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631848096 **[Test build #122910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)** for PR 28593 at commit [`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631836881 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631836881 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631791158 **[Test build #122907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)** for PR 28594 at commit [`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631836062 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122911/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631836233 **[Test build #122907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)** for PR 28594 at commit [`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631836059 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631836040 **[Test build #122911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)** for PR 28595 at commit [`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait HasK extends Params ` * `class _LDAParams(HasMaxIter, HasFeaturesCol, HasSeed, HasCheckpointInterval, HasK):` * `class _PowerIterationClusteringParams(HasMaxIter, HasWeightCol, HasK):` * `class HasK(Params):` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631833038 **[Test build #122911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)** for PR 28595 at commit [`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631836059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins removed a comment on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631833284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
AmplabJenkins commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631833284 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
SparkQA commented on pull request #28595: URL: https://github.com/apache/spark/pull/28595#issuecomment-631833038 **[Test build #122911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122911/testReport)** for PR 28595 at commit [`d259e5e`](https://github.com/apache/spark/commit/d259e5eda4404e34a4ac7c6fc8afef40cddcbe0d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao opened a new pull request #28595: [SPARK-31781][ML][PySpark] Move param k (number of clusters) to shared params
huaxingao opened a new pull request #28595: URL: https://github.com/apache/spark/pull/28595 ### What changes were proposed in this pull request? Param k (number of clusters) is used for all the clustering algorithms, so move it to shared params. ### Why are the changes needed? Code reuse ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
dongjoon-hyun closed pull request #28594: URL: https://github.com/apache/spark/pull/28594 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dbtsai commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
dbtsai commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631825289 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631824322 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631824311 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27551/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631824322 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
dongjoon-hyun commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631817059 Hi, @dbtsai . Could you review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631816007 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631816007 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631816084 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27551/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631815642 **[Test build #122910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122910/testReport)** for PR 28593 at commit [`7f0ba76`](https://github.com/apache/spark/commit/7f0ba76a5e9d4604b2f586b1e1bc512f7675115b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810838 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122909/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631809545 **[Test build #122909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)** for PR 28593 at commit [`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810835 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810824 **[Test build #122909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)** for PR 28593 at commit [`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810835 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810012 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631810012 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA commented on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631809545 **[Test build #122909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122909/testReport)** for PR 28593 at commit [`a39067d`](https://github.com/apache/spark/commit/a39067d6f326df4dc1292ff65d2e70b74baf0fe1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/27550/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins removed a comment on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804784 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804956 **[Test build #122908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122908/testReport)** for PR 28594 at commit [`0a37e82`](https://github.com/apache/spark/commit/0a37e821a097b0cb842e8cbc23a788afe1929ac5). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804775 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27550/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
AmplabJenkins commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631804784 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27436: [SPARK-30705][SQL] Improve CaseWhen sub-expression equality
github-actions[bot] commented on pull request #27436: URL: https://github.com/apache/spark/pull/27436#issuecomment-631803512 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #27231: [SPARK-28478][SQL] Remove redundant null checks
github-actions[bot] commented on pull request #27231: URL: https://github.com/apache/spark/pull/27231#issuecomment-631803519 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631803216 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/27550/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR commented on a change in pull request #28363: URL: https://github.com/apache/spark/pull/28363#discussion_r428373935 ## File path: docs/structured-streaming-programming-guide.md ## @@ -1860,7 +1860,10 @@ Here are the details of all the sinks in Spark. File Sink Append -path: path to the output directory, must be specified. +path: path to the output directory, must be specified. +outputRetentionMs: time to live (TTL) for output files. Output files which batches were Review comment: I guess we avoid exposing the implementation details in docs. e.g. If I'm not mistaken, there's no explanation of the format of the metadata, hence it would be confusing which field is being used because end users even don't know what they are. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #28363: [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files
HeartSaVioR commented on a change in pull request #28363: URL: https://github.com/apache/spark/pull/28363#discussion_r428372910 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSinkLog.scala ## @@ -45,7 +46,20 @@ case class SinkFileStatus( modificationTime: Long, blockReplication: Int, blockSize: Long, -action: String) { +action: String, +commitTime: Long) { Review comment: So the introduce of "commit time" came from the concern about uncertain of HDFS file timestamp in previous PR. If we are sure about the modification time, no need to use "commit time". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
SparkQA commented on pull request #28594: URL: https://github.com/apache/spark/pull/28594#issuecomment-631791158 **[Test build #122907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122907/testReport)** for PR 28594 at commit [`a26fb26`](https://github.com/apache/spark/commit/a26fb26afdeed73fbaaaf91281680e4ac2c41817). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #28594: [SPARK-31780][K8S][TESTS] Add R test tag to exclude R K8s image building and test
dongjoon-hyun opened a new pull request #28594: URL: https://github.com/apache/spark/pull/28594 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
AmplabJenkins removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631785210 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/122906/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #28593: [SPARK-31710][SQL] Add two compatibility flag to cast long to timestamp
SparkQA removed a comment on pull request #28593: URL: https://github.com/apache/spark/pull/28593#issuecomment-631783497 **[Test build #122906 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122906/testReport)** for PR 28593 at commit [`4577fa8`](https://github.com/apache/spark/commit/4577fa813b15e828a46ee322d4119b17e796ee8d). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #28128: [SPARK-31354] SparkContext only register one SparkSession ApplicationEnd listener
AmplabJenkins removed a comment on pull request #28128: URL: https://github.com/apache/spark/pull/28128#issuecomment-631785345 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org