[GitHub] [spark] AnywalkerGiser commented on a diff in pull request #36537: [SPARK-39176][PYSPARK] Fixed a problem with pyspark serializing pre-1970 datetime in windows
AnywalkerGiser commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873359793 ## python/pyspark/tests/test_rdd.py: ## @@ -669,6 +670,12 @@ def test_sample(self): wr_s21 = rdd.sample(True, 0.4, 21).collect() self.assertNotEqual(set(wr_s11), set(wr_s21)) +def test_datetime(self): Review Comment: It has been added and modified, please approve it again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk opened a new pull request, #36558: [SPARK-39187][SQL][3.3] Remove `SparkIllegalStateException`
MaxGekk opened a new pull request, #36558: URL: https://github.com/apache/spark/pull/36558 ### What changes were proposed in this pull request? Remove `SparkIllegalStateException` and replace it by `IllegalStateException` where it was used. This is a backport of https://github.com/apache/spark/pull/36550. ### Why are the changes needed? To improve code maintenance and be consistent to other places where `IllegalStateException` is used in illegal states (for instance, see https://github.com/apache/spark/pull/36524). After the PR https://github.com/apache/spark/pull/36500, the exception is substituted by `SparkException` w/ the `INTERNAL_ERROR` error class. ### Does this PR introduce _any_ user-facing change? No. Users shouldn't face to the exception in regular cases. ### How was this patch tested? By running the affected test suites: ``` $ build/sbt "sql/test:testOnly *QueryExecutionErrorsSuite*" $ build/sbt "test:testOnly *ArrowUtilsSuite" ``` Authored-by: Max Gekk Signed-off-by: Max Gekk (cherry picked from commit 1a90512f605c490255f7b38215c207e64621475b) Signed-off-by: Max Gekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AnywalkerGiser closed pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
AnywalkerGiser closed pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows URL: https://github.com/apache/spark/pull/36537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36530: [SPARK-39172][SQL] Remove outer join if all output come from streamed side and buffered side keys exist unique key
cloud-fan commented on code in PR #36530: URL: https://github.com/apache/spark/pull/36530#discussion_r873346931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -211,6 +219,15 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { if projectList.forall(_.deterministic) && p.references.subsetOf(right.outputSet) && allDuplicateAgnostic(aggExprs) => a.copy(child = p.copy(child = right)) + +case p @ Project(_, ExtractEquiJoinKeys(LeftOuter, _, rightKeys, _, _, left, right, _)) +if right.distinctKeys.exists(_.subsetOf(ExpressionSet(rightKeys))) && + p.references.subsetOf(left.outputSet) => + p.copy(child = left) Review Comment: For a left outer join with only left-side columns being selected, the join can only change the result if we can find more than one matched row on the right side. If the right side join keys are unique, apparently we can't find more than one match. So this optimization LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36530: [SPARK-39172][SQL] Remove outer join if all output come from streamed side and buffered side keys exist unique key
cloud-fan commented on code in PR #36530: URL: https://github.com/apache/spark/pull/36530#discussion_r873344595 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -139,6 +139,14 @@ object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper { * SELECT t1.c1, max(t1.c2) FROM t1 GROUP BY t1.c1 * }}} * + * 3. Remove outer join if all output comes from streamed side and the join keys from buffered side + * exist unique key. Review Comment: it looks a bit weird to talk about stream side and buffer side in the logical plan phase. Can we explain this optimization in a different way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r873341127 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.read; + +import org.apache.spark.annotation.Evolving; + +/** + * A mix-in interface for {@link ScanBuilder}. Data sources can implement this interface to + * push down OFFSET. Please note that the combination of OFFSET with other operations + * such as AGGREGATE, GROUP BY, SORT BY, CLUSTER BY, DISTRIBUTE BY, etc. is NOT pushed down. Review Comment: BTW we need to update `ScanBuider`'s classdoc for new pushdown support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r873340929 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.read; + +import org.apache.spark.annotation.Evolving; + +/** + * A mix-in interface for {@link ScanBuilder}. Data sources can implement this interface to + * push down OFFSET. Please note that the combination of OFFSET with other operations + * such as AGGREGATE, GROUP BY, SORT BY, CLUSTER BY, DISTRIBUTE BY, etc. is NOT pushed down. Review Comment: I understand that this is copied from other pushdown interfaces, but I find it really hard to follow. We can push down OFFSET with many other operators if they follow the operator order we defined in `ScanBuilder`'s class doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer
MaxGekk commented on PR #36479: URL: https://github.com/apache/spark/pull/36479#issuecomment-1127239102 @panbingkun Since this PR modified error classes, could you backport it to branch-3.3, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer
MaxGekk closed pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer URL: https://github.com/apache/spark/pull/36479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics
cloud-fan closed pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics URL: https://github.com/apache/spark/pull/36412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics
cloud-fan commented on PR #36412: URL: https://github.com/apache/spark/pull/36412#issuecomment-1127235625 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics
cloud-fan commented on code in PR #36412: URL: https://github.com/apache/spark/pull/36412#discussion_r87309 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala: ## @@ -80,10 +80,15 @@ private[sql] class PruneHiveTablePartitions(session: SparkSession) val colStats = filteredStats.map(_.attributeStats.map { case (attr, colStat) => (attr.name, colStat.toCatalogColumnStat(attr.name, attr.dataType)) }) + val rowCount = if (prunedPartitions.forall(_.stats.flatMap(_.rowCount).exists(_ > 0))) { Review Comment: you are right, I misread the code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk closed pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`
MaxGekk closed pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException` URL: https://github.com/apache/spark/pull/36550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet
cloud-fan closed pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet URL: https://github.com/apache/spark/pull/36121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`
MaxGekk commented on PR #36550: URL: https://github.com/apache/spark/pull/36550#issuecomment-1127234215 Merging to master. Thank you, @HyukjinKwon and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet
cloud-fan commented on PR #36121: URL: https://github.com/apache/spark/pull/36121#issuecomment-1127234077 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AnywalkerGiser commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
AnywalkerGiser commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127233836 @HyukjinKwon It hasn't been tested in master, I found the problem in 3.0.1, and I can test it in master later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36541: [SPARK-39180][SQL] Simplify the planning of limit and offset
cloud-fan commented on code in PR #36541: URL: https://github.com/apache/spark/pull/36541#discussion_r873317698 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -82,52 +82,45 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { object SpecialLimits extends Strategy { override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case ReturnAnswer(rootPlan) => rootPlan match { -case Limit(IntegerLiteral(limit), Sort(order, true, child)) Review Comment: As I mentioned in the PR description, we don't need to plan `TakeOrderedAndProjectExec` under `ReturnAnswer`, as we don't have special logic for it. It will still be planned in the normal code path, which is `case other => planLater(other) :: Nil` and we do have planner rule to match `Limit(IntegerLiteral(limit), Sort(order, true, child))` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36541: [SPARK-39180][SQL] Simplify the planning of limit and offset
cloud-fan commented on code in PR #36541: URL: https://github.com/apache/spark/pull/36541#discussion_r873317698 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -82,52 +82,45 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan] { object SpecialLimits extends Strategy { override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case ReturnAnswer(rootPlan) => rootPlan match { -case Limit(IntegerLiteral(limit), Sort(order, true, child)) Review Comment: As I mentioned in the PR description, we don't need to plan `TakeOrderedAndProjectExec` under `ReturnAnswer`, as we don't have special logic for it. It will still be planned in the normal code path, which is `case other => planLater(other) :: Nil` and we do have planner rule to match `Limit(IntegerLiteral(limit), Sort(order, true, child))` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a diff in pull request #36531: [SPARK-39171][SQL] Unify the Cast expression
cloud-fan commented on code in PR #36531: URL: https://github.com/apache/spark/pull/36531#discussion_r873314783 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -2117,7 +2265,9 @@ case class Cast( child: Expression, dataType: DataType, timeZoneId: Option[String] = None, -override val ansiEnabled: Boolean = SQLConf.get.ansiEnabled) +override val ansiEnabled: Boolean = SQLConf.get.ansiEnabled, +fallbackConfKey: String = SQLConf.ANSI_ENABLED.key, +fallbackConfValue: String = "false") Review Comment: Can we make it an abstract class so that implementations can override? I'm really worried about changing the class constructor as many spark plugins use `Cast.apply/unapply`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a diff in pull request #36557: [SPARK-39190][SQL] Provide query context for decimal precision overflow error when WSCG is off
gengliangwang commented on code in PR #36557: URL: https://github.com/apache/spark/pull/36557#discussion_r873307369 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -128,7 +128,7 @@ case class PromotePrecision(child: Expression) extends UnaryExpression { case class CheckOverflow( Review Comment: Note: we need to change CheckOverflowInSum as well. However, the error context is actually empty even when WSCG is available. I need more time for that. I am making this one to catch up the Spark 3.3 RC2, which is happening soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request, #36557: [SPARK-39190][SQL] Provide query context for decimal precision overflow error when WSCG is off
gengliangwang opened a new pull request, #36557: URL: https://github.com/apache/spark/pull/36557 ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides query context for decimal precision overflow error when WSCG is off ### Why are the changes needed? Enhance the runtime error query context of checking decimal overflow. After changes, it works when the whole stage codegen is not available. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AnywalkerGiser commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
AnywalkerGiser commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873305033 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo - else time.mktime(dt.timetuple())) +seconds = 0.0 +try: +seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo + else time.mktime(dt.timetuple())) +except: Review Comment: Sure, I'll change the test again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #36056: [SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir
AngersZh commented on PR #36056: URL: https://github.com/apache/spark/pull/36056#issuecomment-1127179471 Gentle ping @cloud-fan Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
AngersZh commented on PR #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1127178691 Any more suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
HyukjinKwon commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127177497 @AnywalkerGiser mind creating a PR against `master` branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873298180 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo - else time.mktime(dt.timetuple())) +seconds = 0.0 +try: +seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo + else time.mktime(dt.timetuple())) +except: Review Comment: Can we do this with an if-else with OS and negative value check? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297988 ## python/pyspark/tests/test_rdd.py: ## @@ -669,6 +670,12 @@ def test_sample(self): wr_s21 = rdd.sample(True, 0.4, 21).collect() self.assertNotEqual(set(wr_s11), set(wr_s21)) +def test_datetime(self): Review Comment: Should probably add a comment like: ``` SPARK-39176: ... ``` See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297660 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo - else time.mktime(dt.timetuple())) +seconds = 0.0 +try: +seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo + else time.mktime(dt.timetuple())) +except: Review Comment: I think we shouldn't better rely on exception handling for regular data parsing path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297554 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo - else time.mktime(dt.timetuple())) +seconds = 0.0 +try: +seconds = (calendar.timegm(dt.utctimetuple()) if dt.tzinfo + else time.mktime(dt.timetuple())) +except: +# On Windows, the current value is converted to a timestamp when the current value is less than 1970 +seconds = (dt - datetime.datetime.fromtimestamp(int(time.localtime(0).tm_sec) / 1000)).total_seconds() Review Comment: IIRC 1970 handling issue is not OS specific problem. It would be great if you link some reported issues related to that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`
AngersZh commented on code in PR #36550: URL: https://github.com/apache/spark/pull/36550#discussion_r873294811 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -582,8 +582,8 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { |in operator ${operator.simpleString(SQLConf.get.maxToStringFields)} """.stripMargin) - case _: UnresolvedHint => -throw QueryExecutionErrors.logicalHintOperatorNotRemovedDuringAnalysisError + case _: UnresolvedHint => throw new IllegalStateException( +"Logical hint operator should be removed during analysis.") Review Comment: How about ``` case _: UnresolvedHint => throw new IllegalStateException("Logical hint operator should be removed during analysis.") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer opened a new pull request, #36556: [SPARK-39162][SQL][3.3] Jdbc dialect should decide which function could be pushed down
beliefer opened a new pull request, #36556: URL: https://github.com/apache/spark/pull/36556 ### What changes were proposed in this pull request? This PR used to back port https://github.com/apache/spark/pull/36521 to 3.3 ### Why are the changes needed? Let function push-down more flexible. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? Exists tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AnywalkerGiser commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows
AnywalkerGiser commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127149821 Is there a supervisor for approval? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #36521: [SPARK-39162][SQL] Jdbc dialect should decide which function could be pushed down
beliefer commented on PR #36521: URL: https://github.com/apache/spark/pull/36521#issuecomment-1127146479 @cloud-fan @huaxingao Thank you a lot! I will create back port to 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer closed pull request #36520: [SPARK-38633][SQL] Support push down AnsiCast to JDBC data source V2
beliefer closed pull request #36520: [SPARK-38633][SQL] Support push down AnsiCast to JDBC data source V2 URL: https://github.com/apache/spark/pull/36520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a diff in pull request #36531: [SPARK-39171][SQL] Unify the Cast expression
beliefer commented on code in PR #36531: URL: https://github.com/apache/spark/pull/36531#discussion_r873277888 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -275,6 +376,53 @@ object Cast { case _ => null } } + + // Show suggestion on how to complete the disallowed explicit casting with built-in type + // conversion functions. + private def suggestionOnConversionFunctions ( + from: DataType, + to: DataType, + functionNames: String): String = { +// scalastyle:off line.size.limit +s"""cannot cast ${from.catalogString} to ${to.catalogString}. + |To convert values from ${from.catalogString} to ${to.catalogString}, you can use $functionNames instead. + |""".stripMargin +// scalastyle:on line.size.limit + } + + def typeCheckFailureMessage( + from: DataType, + to: DataType, + fallbackConfKey: Option[String], + fallbackConfValue: Option[String]): String = +(from, to) match { + case (_: NumericType, TimestampType) => +suggestionOnConversionFunctions(from, to, + "functions TIMESTAMP_SECONDS/TIMESTAMP_MILLIS/TIMESTAMP_MICROS") + + case (TimestampType, _: NumericType) => +suggestionOnConversionFunctions(from, to, "functions UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS") + + case (_: NumericType, DateType) => +suggestionOnConversionFunctions(from, to, "function DATE_FROM_UNIX_DATE") + + case (DateType, _: NumericType) => +suggestionOnConversionFunctions(from, to, "function UNIX_DATE") + + // scalastyle:off line.size.limit + case _ if fallbackConfKey.isDefined && fallbackConfValue.isDefined && Cast.canCast(from, to) => +s""" + | cannot cast ${from.catalogString} to ${to.catalogString} with ANSI mode on. + | If you have to cast ${from.catalogString} to ${to.catalogString}, you can set ${fallbackConfKey.get} as ${fallbackConfValue.get}. + |""".stripMargin + // scalastyle:on line.size.limit + + case _ => s"cannot cast ${from.catalogString} to ${to.catalogString}" +} + + def ansiCast(child: Expression, dataType: DataType, timeZoneId: Option[String] = None): Cast = +Cast(child, dataType, timeZoneId, true, + SQLConf.STORE_ASSIGNMENT_POLICY.key, SQLConf.StoreAssignmentPolicy.LEGACY.toString) } abstract class CastBase extends UnaryExpression with TimeZoneAwareExpression with NullIntolerant { Review Comment: Yes. `TryCast` extends this parent class too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.
LuciferYang commented on PR #36515: URL: https://github.com/apache/spark/pull/36515#issuecomment-1127140077 thanks @huaxingao @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #36555: [SPARK-39189][PYTHON] Support limit_area parameter in pandas API on Spark
zhengruifeng commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1127136933 @HyukjinKwon Sure! will update soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beobest2 commented on pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list
beobest2 commented on PR #36509: URL: https://github.com/apache/spark/pull/36509#issuecomment-1127127677 @bjornjorgensen Seems like a good idea! I can simply add a column to display parameters that only exist in pandas. However, it is necessary to discuss whether or not it meets the intent of the document. Any chance of confusing pandas users? cc. @HyukjinKwon @Yikun @xinrong-databricks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36555: [SPARK-39189][PYTHON] Support limit_area parameter in pandas API on Spark
HyukjinKwon commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1127098019 @zhengruifeng mind showing the example of this argument usage in the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness
HyukjinKwon closed pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness URL: https://github.com/apache/spark/pull/36554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness
HyukjinKwon commented on PR #36554: URL: https://github.com/apache/spark/pull/36554#issuecomment-1127097656 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #35357: [SPARK-21195][CORE] MetricSystem should pick up dynamically registered metrics in sources
github-actions[bot] closed pull request #35357: [SPARK-21195][CORE] MetricSystem should pick up dynamically registered metrics in sources URL: https://github.com/apache/spark/pull/35357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bjornjorgensen commented on pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list
bjornjorgensen commented on PR #36509: URL: https://github.com/apache/spark/pull/36509#issuecomment-1127032945 Yes, very good. I was thinking, pandas API on spark has some more options then pandas have. Like to_json() have `ignoreNullFields=True` and `num_files=1` Can we add another column for the extra things or? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tiagovrtr commented on pull request #33675: [SPARK-27997][K8S] Add support for kubernetes OAuth Token refresh
tiagovrtr commented on PR #33675: URL: https://github.com/apache/spark/pull/33675#issuecomment-1126996196 this patch seems only to bring the latest changes from master, anything else to do here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local RDD blocks in case of IO related errors
mridulm commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r873204359 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,10 +933,29 @@ private[spark] class BlockManager( }) Some(new BlockResult(ci, DataReadMethod.Memory, info.size)) } else if (level.useDisk && diskStore.contains(blockId)) { - try { -val diskData = diskStore.getBytes(blockId) -val iterToReturn: Iterator[Any] = { - if (level.deserialized) { + var retryCount = 0 + val retryLimit = 3 Review Comment: My main concern is, usually with bad disks/etc, the reads can take an inordinate amount of delay (due to various layers down retrying, trying to recover) - so a read which should typically take a few ms can go into minutes or higher: hence why I want to understand how to estimate/configure this. One option is to make it a private config and make it user configurable - with 3 (or 2 ?) as the default. Thoughts ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer
MaxGekk commented on code in PR #36479: URL: https://github.com/apache/spark/pull/36479#discussion_r873201532 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -147,14 +147,17 @@ object QueryCompilationErrors extends QueryErrorsBase { dataType: DataType, desiredType: String): Throwable = { val quantifier = if (desiredType.equals("array")) "an" else "a" new AnalysisException( - s"need $quantifier $desiredType field but got " + dataType.catalogString) + errorClass = "UNSUPPORTED_DESERIALIZER", + messageParameters = +Array("DATA_TYPE_MISMATCH", quantifier, desiredType, toSQLType(dataType))) Review Comment: Please, double quote `desiredType` and upper case it for consistency. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.
huaxingao commented on PR #36515: URL: https://github.com/apache/spark/pull/36515#issuecomment-1126965449 Thanks! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao closed pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.
huaxingao closed pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`. URL: https://github.com/apache/spark/pull/36515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request, #36555: [SPARK-39189][PYTHON] interpolate supports limit_area
zhengruifeng opened a new pull request, #36555: URL: https://github.com/apache/spark/pull/36555 ### What changes were proposed in this pull request? interpolate supports param `limit_area` ### Why are the changes needed? to increase api coverage ### Does this PR introduce _any_ user-facing change? yes, one param added ### How was this patch tested? updated UT -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org