Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]
cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1575677485 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -51,13 +51,30 @@ abstract class QueryStageExec extends LeafExecNode { */ val plan: SparkPlan + /** + * Name of this query stage which is unique in the entire query plan. + */ + val name: String = s"${this.getClass.getSimpleName}-$id" + + /** + * This flag aims to detect if the stage materialization is started. This helps + * to avoid unnecessary stage materialization when the stage is canceled. + */ + private val materializationStarted = new AtomicBoolean() Review Comment: sorry for the last-minute proposal, but I'm wondering if it's more efficient to push this cancelation optimization into shuffle and broadcast nodes. It looks a bit fragile to operate on the `shuffleFuture` directly in `ShuffleQueryStageExec.cancel`. I think we should let `ShuffleExchangeLike` provide clear APIs to do it. Today it provides `submitShuffleJob`, and it should also provide `cancelShuffleJob`. Within `ShuffleExchangeLike`, we can do more optimizations. e.g. even if we cancel the shuffle stage after the shuffle stage is submitted, we can still avoid submitting the shuffle job, as the shuffle node might be doing other preparation work: generating the shuffle dependency, waiting for subqueries to finish, etc. It's more efficient to check the isCanceled flag at the last minute, right before submitting the shuffle job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47943] Add `GitHub Action` CI for Java Build and Test [spark-kubernetes-operator]
dongjoon-hyun commented on PR #7: URL: https://github.com/apache/spark-kubernetes-operator/pull/7#issuecomment-2071455107 Merged to main. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47943] Add `GitHub Action` CI for Java Build and Test [spark-kubernetes-operator]
dongjoon-hyun closed pull request #7: [SPARK-47943] Add `GitHub Action` CI for Java Build and Test URL: https://github.com/apache/spark-kubernetes-operator/pull/7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47929] Setup Static Analysis for Operator [spark-kubernetes-operator]
dongjoon-hyun closed pull request #6: [SPARK-47929] Setup Static Analysis for Operator URL: https://github.com/apache/spark-kubernetes-operator/pull/6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests [spark]
yaooqinn commented on code in PR #46173: URL: https://github.com/apache/spark/pull/46173#discussion_r1575669023 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -265,7 +273,7 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.String")) assert(types(8).equals("class [B")) assert(row.getString(0).length == 10) -assert(row.getString(0).trim.equals("the")) +assert(row.getString(0).equals("the".padTo(10, ' '))) Review Comment: Anyway, I removed this LOC -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests [spark]
yaooqinn commented on code in PR #46173: URL: https://github.com/apache/spark/pull/46173#discussion_r1575667729 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -265,7 +273,7 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.String")) assert(types(8).equals("class [B")) assert(row.getString(0).length == 10) -assert(row.getString(0).trim.equals("the")) +assert(row.getString(0).equals("the".padTo(10, ' '))) Review Comment: Char padding is ANSI-irrelevant, AFAIK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests [spark]
yaooqinn commented on code in PR #46173: URL: https://github.com/apache/spark/pull/46173#discussion_r1575666204 ## docs/sql-data-sources-jdbc.md: ## @@ -1441,3 +1441,192 @@ The Spark Catalyst data types below are not supported with suitable Oracle types - NullType - ObjectType - VariantType + +### Mapping Spark SQL Data Types from Microsoft SQL Server + +The below table describes the data type conversions from Microsoft SQL Server data types to Spark SQL Data Types, +when reading data from a Microsoft SQL Server table using the built-in jdbc data source with the mssql-jdbc +as the activated JDBC Driver. + + + + + + SQL Server Data Type + Spark SQL Data Type + Remarks + + + + + bit + BooleanType + + + + tinyint + ShortType + + + + smallint + ShortType + + + + int + IntegerType + + + + bigint + LongType + + + + float(p), real + FloatType + 1 p 24 + + + float[(p)] + DoubleType + 25 p 53 + + + double precision + DoubleType + + + + smallmoney + DecimalType(10, 4) + + + + money + DecimalType(19, 4) + + + + decimal[(p[, s])], numeric[(p[, s])] + DecimalType(p, s) + + + + date + DateType + + + + datetime + TimestampType + (Default)preferTimestampNTZ=false or spark.sql.timestampType=TIMESTAMP_LTZ + + + datetime + TimestampNTZType + preferTimestampNTZ=true or spark.sql.timestampType=TIMESTAMP_NTZ + + + datetime2 [ (fractional seconds precision) ] + TimestampType + (Default)preferTimestampNTZ=false or spark.sql.timestampType=TIMESTAMP_LTZ + + + datetime2 [ (fractional seconds precision) ] + TimestampNTZType + preferTimestampNTZ=true or spark.sql.timestampType=TIMESTAMP_NTZ + + + datetimeoffset [ (fractional seconds precision) ] + StringType Review Comment: https://github.com/apache/spark/blob/9d715ba491710969340d9e8a49a21d11f51ef7d3/sql/core/src/main/scala/org/apache/spark/sql/jdbc/MsSqlServerDialect.scala#L112-L114 This comment appears not true as we use mssql-jdbc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47730][K8S] Support `APP_ID` and `EXECUTOR_ID` placeholders in labels [spark]
dongjoon-hyun commented on PR #46149: URL: https://github.com/apache/spark/pull/46149#issuecomment-2071446978 Gentle ping, @jshmchenxi . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on PR #46130: URL: https://github.com/apache/spark/pull/46130#issuecomment-2071445935 @panbingkun Thanks. Please resolve the conflicts so that I can merge this one later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests [spark]
dongjoon-hyun commented on code in PR #46173: URL: https://github.com/apache/spark/pull/46173#discussion_r1575663710 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -265,7 +273,7 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { assert(types(7).equals("class java.lang.String")) assert(types(8).equals("class [B")) assert(row.getString(0).length == 10) -assert(row.getString(0).trim.equals("the")) +assert(row.getString(0).equals("the".padTo(10, ' '))) Review Comment: This seems to be ANSI result. Is this safe in non-ANSI CI, @yaooqinn ? - https://github.com/apache/spark/actions/workflows/build_non_ansi.yml -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions [spark]
dongjoon-hyun commented on PR #46160: URL: https://github.com/apache/spark/pull/46160#issuecomment-2071442761 Merged to master. Thank you, @pan3793 and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions [spark]
dongjoon-hyun closed pull request #46160: [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions URL: https://github.com/apache/spark/pull/46160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error [spark]
dongjoon-hyun commented on PR #46164: URL: https://github.com/apache/spark/pull/46164#issuecomment-2071441869 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error [spark]
yaooqinn commented on code in PR #46164: URL: https://github.com/apache/spark/pull/46164#discussion_r1575661507 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -437,4 +437,13 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { .load() assert(df.collect().toSet === expectedResult) } + + Review Comment: @dongjoon-hyun Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error [spark]
dongjoon-hyun closed pull request #46164: [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error URL: https://github.com/apache/spark/pull/46164 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error [spark]
dongjoon-hyun commented on code in PR #46164: URL: https://github.com/apache/spark/pull/46164#discussion_r1575660470 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -437,4 +437,13 @@ class MsSqlServerIntegrationSuite extends DockerJDBCIntegrationSuite { .load() assert(df.collect().toSet === expectedResult) } + + Review Comment: nit. Redundant empty line. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang closed pull request #46151: [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework URL: https://github.com/apache/spark/pull/46151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on PR #46151: URL: https://github.com/apache/spark/pull/46151#issuecomment-2071433391 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic [spark]
HyukjinKwon closed pull request #46155: [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic URL: https://github.com/apache/spark/pull/46155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic [spark]
HyukjinKwon commented on PR #46155: URL: https://github.com/apache/spark/pull/46155#issuecomment-2071421918 Merged to master. For naming, I will send an email soon. For others, please leave the comments. I will followup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47943] Add Operator CI Task for Java Build and Test [spark-kubernetes-operator]
jiangzho opened a new pull request, #7: URL: https://github.com/apache/spark-kubernetes-operator/pull/7 ### What changes were proposed in this pull request? This PR adds an additional CI build task for operator. ### Why are the changes needed? The additional CI task is needed in order to build and test Java code for upcoming operator pull requests. When Java plugin is enabled and Java source is checked in, `./gradlew build` [task](https://docs.gradle.org/3.3/userguide/java_plugin.html#sec:java_tasks) by default includes a set of tasks to compile and run tests. This can serve as pull request build. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? tested locally. ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47945][SQL] MsSQLServer: Document Mapping Spark SQL Data Types from Microsoft SQL Server and add tests [spark]
yaooqinn opened a new pull request, #46173: URL: https://github.com/apache/spark/pull/46173 ### What changes were proposed in this pull request? This PR added Document Mapping Spark SQL Data Types from Microsoft SQL Server and added more tests ### Why are the changes needed? doc improvement ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new test and doc build ![image](https://github.com/apache/spark/assets/8326978/29ad3814-9284-45a3-aefb-6f0fb8f26593) ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47938][SQL] MsSQLServer: Cannot find data type BYTE error [spark]
yaooqinn commented on PR #46164: URL: https://github.com/apache/spark/pull/46164#issuecomment-2071365832 cc @dongjoon-hyun @cloud-fan, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]
cloud-fan commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1575603485 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageExec.scala: ## @@ -198,18 +215,23 @@ case class ShuffleQueryStageExec( reuse } - override def cancel(): Unit = shuffleFuture match { -case action: FutureAction[MapOutputStatistics] if !action.isCompleted => - action.cancel() -case _ => + override def cancel(): Unit = { Review Comment: nit: it seems better to follow `def materialize` and `def doMaterialize`, to have `def cancel` and `def doCancel`. We can put this materialization check in `def cancel`, and maybe more common logic related to cancelation in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47482] Add HiveDialect to sql module [spark]
beliefer commented on PR #45644: URL: https://github.com/apache/spark/pull/45644#issuecomment-2071347859 I think we do not add HiveDialect as built-in dialect. Users could add custom dialect with https://github.com/apache/spark/pull/45626. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]
HeartSaVioR commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1575597023 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) Review Comment: Ah OK you are passing this over the base tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1575592659 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) + extends StatefulProcessor[String, InputEvent, OutputEvent] +with Logging { + + @transient private var _mapState: MapStateImplWithTTL[String, Int] = _ + + override def init( + outputMode: OutputMode, + timeMode: TimeMode): Unit = { +_mapState = getHandle + .getMapState("mapState", Encoders.STRING, Encoders.scalaInt, ttlConfig) + .asInstanceOf[MapStateImplWithTTL[String, Int]] + } + override def handleInputRows( +key: String, +inputRows: Iterator[InputEvent], +timerValues: TimerValues, +expiredTimerInfo: ExpiredTimerInfo): Iterator[OutputEvent] = { +var results = List[OutputEvent]() + +for (row <- inputRows) { + val resultIter = processRow(row, _mapState) + resultIter.foreach { r => +results = r :: results + } +} + +results.iterator + } + + def processRow( + row: InputEvent, + mapState: MapStateImplWithTTL[String, Int]): Iterator[OutputEvent] = { +var results = List[OutputEvent]() +val key = row.key +val userKey = "key" +if (row.action == "get") { + if (mapState.containsKey(userKey)) { +results = OutputEvent(key, mapState.getValue(userKey), isTTLValue = false, -1) :: results + } +} else if (row.action == "get_without_enforcing_ttl") { + val currState = mapState.getWithoutEnforcingTTL(userKey) + if (currState.isDefined) { +results = OutputEvent(key, currState.get, isTTLValue = false, -1) :: results + } +} else if (row.action == "get_ttl_value_from_state") { + val ttlValue = mapState.getTTLValue(userKey) + if (ttlValue.isDefined) { +val value = ttlValue.get._1 +val ttlExpiration = ttlValue.get._2 +results = OutputEvent(key, value, isTTLValue = true, ttlExpiration) :: results + } +} else if (row.action == "put") { + mapState.updateValue(userKey, row.value) +} else if (row.action == "get_values_in_ttl_state") { + val ttlValues = mapState.getKeyValuesInTTLState() + ttlValues.foreach { v => +results = OutputEvent(key, -1, isTTLValue = true, ttlValue = v._2) :: results + } +} + +results.iterator + } +} + +case class MapInputEvent( +key: String, +userKey: String, +action: String, +value: Int) + +case class MapOutputEvent( +key: String, +userKey: String, +value: Int, +isTTLValue: Boolean, +ttlValue: Long) + +class MapStateTTLProcessor(ttlConfig: TTLConfig) + extends StatefulProcessor[String, MapInputEvent, MapOutputEvent] +with Logging { + + @transient private var _mapState: MapStateImplWithTTL[String, Int] = _ + + override def init( + outputMode: OutputMode, + timeMode: TimeMode): Unit = { +_mapState = getHandle + .getMapState("mapState", Encoders.STRING, Encoders.scalaInt, ttlConfig) + .asInstanceOf[MapStateImplWithTTL[String, Int]] + } + + override def handleInputRows( + key: String, + inputRows: Iterator[MapInputEvent], + timerValues: TimerValues, + expiredTimerInfo: ExpiredTimerInfo): Iterator[MapOutputEvent] = { +var results = List[MapOutputEvent]() + +for (row <- inputRows) { + val resultIter = processRow(row, _mapState) + resultIter.foreach { r => +results = r :: results + } +} + +results.iterator + } + + def processRow( + row: MapInputEvent, + mapState: MapStateImplWithTTL[String, Int]):
Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]
ericm-db commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1575586347 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) Review Comment: We need this processor since it has different input/output types than the other processor in this file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47944][BUILD] Upgrade icu4j to 75.1 [spark]
LuciferYang commented on PR #46172: URL: https://github.com/apache/spark/pull/46172#issuecomment-2071319910 The icu-related features appear to still be under development. Although I am not aware of the criteria for selecting the initial version(72.1), I believe it would be advisable to wait until the related features are stable and have undergone benchmark testing (if it affects performance) before attempting to upgrade its version. cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47936][BUILD] Improve `toUpperCase` & `toLowerCase` with `Locale.ROOT` rules [spark]
LuciferYang commented on code in PR #46147: URL: https://github.com/apache/spark/pull/46147#discussion_r1575579520 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -206,7 +210,7 @@ public static StringSearch getStringSearch( * Returns if the given collationName is valid one. */ public static boolean isValidCollation(String collationName) { -return collationNameToIdMap.containsKey(collationName.toUpperCase()); +return collationNameToIdMap.containsKey(collationName.toUpperCase(Locale.ROOT)); Review Comment: `toUpperCase(Locale.ROOT)` uses the root language environment, which is independent of the default language environment of the machine, ensuring consistent results across different machines, I think it's ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]
HeartSaVioR commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1575569366 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) Review Comment: Maybe redundant to have this separately? ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) + extends StatefulProcessor[String, InputEvent, OutputEvent] +with Logging { Review Comment: nit: we don't use this, right? ## sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithMapStateTTLSuite.scala: ## @@ -0,0 +1,308 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.streaming + +import java.time.Duration + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoders +import org.apache.spark.sql.execution.streaming.{MapStateImplWithTTL, MemoryStream} +import org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreProvider +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.streaming.util.StreamManualClock + +class MapStateSingleKeyTTLProcessor(ttlConfig: TTLConfig) + extends StatefulProcessor[String, InputEvent, OutputEvent] +with Logging { + + @transient private var _mapState: MapStateImplWithTTL[String, Int] = _ + + override def init( + outputMode: OutputMode, + timeMode: TimeMode): Unit = { +_mapState = getHandle + .getMapState("mapState", Encoders.STRING, Encoders.scalaInt, ttlConfig) +
Re: [PR] [SPARK-44305][SQL] Dynamically choose whether to broadcast hadoop conf [spark]
7mming7 commented on PR #46162: URL: https://github.com/apache/spark/pull/46162#issuecomment-2071298400 cc @HyukjinKwon @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic [spark]
HyukjinKwon commented on code in PR #46155: URL: https://github.com/apache/spark/pull/46155#discussion_r1575562214 ## python/pyspark/ml/connect/functions.py: ## @@ -15,7 +15,7 @@ # limitations under the License. # from pyspark.ml import functions as PyMLFunctions -from pyspark.sql.connect.column import Column +from pyspark.sql.column import Column Review Comment: It is actually related to the type hints. We're using `pyspark.sql.Column` for type hints, and when we actually need to create the instance, we should use `pyspark.sql.classic.column import Column` or `pyspark.sql.connect.column import Column` .. it is also a bit messy. I should probably clean those up separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47933][PYTHON] Parent Column class for Spark Connect and Spark Classic [spark]
HyukjinKwon commented on code in PR #46155: URL: https://github.com/apache/spark/pull/46155#discussion_r1575561525 ## python/pyspark/ml/stat.py: ## @@ -22,7 +22,8 @@ from pyspark.ml.common import _java2py, _py2java from pyspark.ml.linalg import Matrix, Vector from pyspark.ml.wrapper import JavaWrapper, _jvm -from pyspark.sql.column import Column, _to_seq +from pyspark.sql.column import Column Review Comment: This file is only used in Spark Classic so it should probably be fine. It is a bit messy and complicated .. I will try to refactor them separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47847][CORE] Deprecate spark.network.remoteReadNioBufferConversion [spark]
pan3793 commented on code in PR #46047: URL: https://github.com/apache/spark/pull/46047#discussion_r1575558648 ## core/src/main/scala/org/apache/spark/SparkConf.scala: ## @@ -640,7 +640,8 @@ private[spark] object SparkConf extends Logging { DeprecatedConfig("spark.blacklist.killBlacklistedExecutors", "3.1.0", "Please use spark.excludeOnFailure.killExcludedExecutors"), DeprecatedConfig("spark.yarn.blacklist.executor.launch.blacklisting.enabled", "3.1.0", -"Please use spark.yarn.executor.launch.excludeOnFailure.enabled") +"Please use spark.yarn.executor.launch.excludeOnFailure.enabled"), + DeprecatedConfig("spark.network.remoteReadNioBufferConversion", "3.5.2", "") Review Comment: I fill the deprecated message with "Not used anymore", to be consistent with existing items ``` DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used anymore"), DeprecatedConfig("spark.executor.port", "2.0.0", "Not used anymore"), ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47944][BUILD] Upgrade icu4j to 75.1 [spark]
panbingkun opened a new pull request, #46172: URL: https://github.com/apache/spark/pull/46172 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
zhengruifeng commented on PR #46163: URL: https://github.com/apache/spark/pull/46163#issuecomment-2071275104 thank you @dongjoon-hyun and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47805][SS] Implementing TTL for MapState [spark]
HeartSaVioR commented on code in PR #45991: URL: https://github.com/apache/spark/pull/45991#discussion_r1575549218 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImplWithTTL.scala: ## @@ -0,0 +1,280 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.streaming + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Encoder +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.execution.streaming.TransformWithStateKeyValueRowSchema.{COMPOSITE_KEY_ROW_SCHEMA, VALUE_ROW_SCHEMA_WITH_TTL} +import org.apache.spark.sql.execution.streaming.state.{PrefixKeyScanStateEncoderSpec, StateStore, StateStoreErrors} +import org.apache.spark.sql.streaming.{MapState, TTLConfig} +import org.apache.spark.util.NextIterator + +/** + * Class that provides a concrete implementation for map state associated with state + * variables (with ttl expiration support) used in the streaming transformWithState operator. + * @param store - reference to the StateStore instance to be used for storing state + * @param stateName - name of the state variable + * @param keyExprEnc - Spark SQL encoder for key + * @param userKeyEnc - Spark SQL encoder for the map key + * @param valEncoder - SQL encoder for state variable + * @param ttlConfig - the ttl configuration (time to live duration etc.) + * @param batchTimestampMs - current batch processing timestamp. + * @tparam K - type of key for map state variable + * @tparam V - type of value for map state variable + * @return - instance of MapState of type [K,V] that can be used to store state persistently + */ +class MapStateImplWithTTL[K, V]( +store: StateStore, +stateName: String, +keyExprEnc: ExpressionEncoder[Any], +userKeyEnc: Encoder[K], +valEncoder: Encoder[V], +ttlConfig: TTLConfig, +batchTimestampMs: Long) extends CompositeKeyTTLStateImpl(stateName, store, batchTimestampMs) + with MapState[K, V] with Logging { + + private val keySerializer = keyExprEnc.createSerializer() + private val stateTypesEncoder = new CompositeKeyStateEncoder( +keySerializer, userKeyEnc, valEncoder, COMPOSITE_KEY_ROW_SCHEMA, stateName, hasTtl = true) + + private val ttlExpirationMs = +StateTTL.calculateExpirationTimeForDuration(ttlConfig.ttlDuration, batchTimestampMs) + + createColumnFamily() + + private def createColumnFamily(): Unit = { +store.createColFamilyIfAbsent(stateName, COMPOSITE_KEY_ROW_SCHEMA, VALUE_ROW_SCHEMA_WITH_TTL, + PrefixKeyScanStateEncoderSpec(COMPOSITE_KEY_ROW_SCHEMA, 1)) + } + + /** Whether state exists or not. */ + override def exists(): Boolean = { +iterator().nonEmpty + } + + /** Get the state value if it exists */ + override def getValue(key: K): V = { +StateStoreErrors.requireNonNullStateValue(key, stateName) +val encodedCompositeKey = stateTypesEncoder.encodeCompositeKey(key) +val retRow = store.get(encodedCompositeKey, stateName) + +if (retRow != null) { + val resState = stateTypesEncoder.decodeValue(retRow) + + if (!stateTypesEncoder.isExpired(retRow, batchTimestampMs)) { +resState + } else { +null.asInstanceOf[V] + } +} else { + null.asInstanceOf[V] +} + } + + /** Check if the user key is contained in the map */ + override def containsKey(key: K): Boolean = { +StateStoreErrors.requireNonNullStateValue(key, stateName) +getValue(key) != null + } + + /** Update value for given user key */ + override def updateValue(key: K, value: V): Unit = { +StateStoreErrors.requireNonNullStateValue(key, stateName) +StateStoreErrors.requireNonNullStateValue(value, stateName) + +val encodedValue = stateTypesEncoder.encodeValue(value, ttlExpirationMs) +val encodedCompositeKey = stateTypesEncoder.encodeCompositeKey(key) +store.put(encodedCompositeKey, encodedValue, stateName) + +val serializedGroupingKey = stateTypesEncoder.serializeGroupingKey() +val serializedUserKey = stateTypesEncoder.serializeUserKey(key) +upsertTTLForStateKey(ttlExpirationMs, serializedGroupingKey, serializedUserKey) + } + + /** Get the map associated
Re: [PR] [SPARK-46632][SQL] Fix subexpression elimination when equivalent ternary expressions have different children [spark]
zml1206 commented on PR #46135: URL: https://github.com/apache/spark/pull/46135#issuecomment-2071270386 cc @cloud-fan Do you have time to help take a look? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47928][SQL][TEST] Speed up test "Add jar support Ivy URI in SQL" [spark]
yaooqinn commented on PR #46150: URL: https://github.com/apache/spark/pull/46150#issuecomment-2071269848 Hi @dongjoon-hyun, tests of HiveVersionsSuite might cover hive dependency -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47912][SQL] Infer serde class from format classes [spark]
wForget commented on PR #46132: URL: https://github.com/apache/spark/pull/46132#issuecomment-2071268489 > Structured Streaming supports writing Spark SQL and using SQL to write stream processing logic. Is this possible, similar to the syntax of Flink. SQL can satisfy the flow processing process. This doesn't seem to be relevant to the current PR, you can get more help from https://spark.apache.org/community.html. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR] Remove unnecessary `imports` [spark]
dongjoon-hyun commented on PR #46161: URL: https://github.com/apache/spark/pull/46161#issuecomment-2071267699 Thanks. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun closed pull request #46163: [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` URL: https://github.com/apache/spark/pull/46163 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun commented on PR #46163: URL: https://github.com/apache/spark/pull/46163#issuecomment-2071267535 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR] Remove unnecessary `imports` [spark]
dongjoon-hyun closed pull request #46161: [MINOR] Remove unnecessary `imports` URL: https://github.com/apache/spark/pull/46161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move `pyspark.classic` references [spark]
HyukjinKwon closed pull request #46171: [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move `pyspark.classic` references URL: https://github.com/apache/spark/pull/46171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move `pyspark.classic` references [spark]
HyukjinKwon commented on PR #46171: URL: https://github.com/apache/spark/pull/46171#issuecomment-2071260842 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]
liangyouze commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2071259936 > Thank you for making a PR, but I'm not sure if this is a right layer to do. For me, it sounds like you are hitting your K8s cluster issue or K8s client library issue. Could you elaborate your environment and the error message more, @liangyouze ? > > > When creating Kubernetes resources, we occasionally encounter situations where resources such as ConfigMap cannot be successfully created, resulting in the driver pod remaining in the 'ContainerCreating' state. Therefore, it is necessary to add a verification mechanism after creating other resources to ensure that the resources are actually created It's the same as described in SPARK-44050,I've encountered the same issue. When creating resources such as configmaps, occasionally this situation occurs: the code does not throw any exceptions, but the configmap resource is not actually created, causing the driver pod to remain in a ContainerCreating state and unable to proceed to the next step. This may be a Kubernetes issue, or a feature (as far as I know, Kubernetes has some rate-limiting policies that may cause certain requests to be dropped, but I'm not sure if it's related), but in any case, Spark should not be stuck because of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT [spark]
pan3793 commented on PR #46167: URL: https://github.com/apache/spark/pull/46167#issuecomment-2071252279 thank you, @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47018][BUILD] Upgrade built-in Hive to 2.3.10 [spark]
pan3793 commented on code in PR #45372: URL: https://github.com/apache/spark/pull/45372#discussion_r1575536345 ## dev/test-dependencies.sh: ## @@ -49,7 +49,7 @@ OLD_VERSION=$($MVN -q \ --non-recursive \ org.codehaus.mojo:exec-maven-plugin:1.6.0:exec | grep -E '[0-9]+\.[0-9]+\.[0-9]+') # dependency:get for guava and jetty-io are workaround for SPARK-37302. -GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9.]+$") +GUAVA_VERSION=$(build/mvn help:evaluate -Dexpression=guava.version -q -DforceStdout | grep -E "^[0-9\.]+") Review Comment: The recent Guava version has the suffix `-jre` or `-android`, the change is required to match the new version -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions [spark]
pan3793 commented on code in PR #46160: URL: https://github.com/apache/spark/pull/46160#discussion_r1575535040 ## pom.xml: ## @@ -220,6 +220,9 @@ 4.1.109.Final 2.0.65.Final 72.1 +5.9.1 Review Comment: @dongjoon-hyun sorry, that's a typo, corrected -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions [spark]
pan3793 commented on code in PR #46160: URL: https://github.com/apache/spark/pull/46160#discussion_r1575534643 ## pom.xml: ## @@ -220,6 +220,9 @@ 4.1.109.Final 2.0.65.Final 72.1 +5.9.1 Review Comment: ```suggestion 5.9.3 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
zhengruifeng commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575493828 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: Review Comment: this line is kind of long (in my screen), so I break it to multiple lines: https://github.com/apache/spark/assets/7322292/86029dd9-9939-4b14-8df2-fc9c89feffdb;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47909][CONNECT][PYTHON][TESTS][FOLLOW-UP] Move `pyspark.classic` references [spark]
HyukjinKwon opened a new pull request, #46171: URL: https://github.com/apache/spark/pull/46171 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/46129 that moves `pyspark.classic` references to the actual test methods so they are not references during `pyspark-connect` only test (that does not have `pyspark.classic` package). ### Why are the changes needed? To recover the CI: https://github.com/apache/spark/actions/runs/8789489804/job/24119356874 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Manually ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47604][CORE] Resource managers: Migrate logInfo with variables to structured logging framework [spark]
panbingkun commented on code in PR #46130: URL: https://github.com/apache/spark/pull/46130#discussion_r1575532352 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: ## @@ -464,7 +465,7 @@ private[spark] class ApplicationMaster( val dummyRunner = new ExecutorRunnable(None, yarnConf, _sparkConf, driverUrl, "", "", executorMemory, executorCores, appId, securityMgr, localResources, ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID) - dummyRunner.launchContextDebugInfo() + log"${MDC(LogKey.LAUNCH_CONTEXT_DEBUG_INFO, dummyRunner.launchContextDebugInfo())}" Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44305][SQL] Dynamically choose whether to broadcast hadoop conf [spark]
7mming7 commented on PR #46162: URL: https://github.com/apache/spark/pull/46162#issuecomment-2071214561 > Could you provide some concrete numbers about your claim? @dongjoon-hyun Yes, it is found here that in the case of small queries with large concurrency, the consumption is more obvious . Tested some performance of 50 concurrency on SSB, parquet as the source data has a difference of 10%-13% ![image](https://github.com/apache/spark/assets/5662745/fe505244-d654-4f7a-9b66-f9c020d9b6be) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [CONNECT] Use v1 as spark connect go library starting version [spark-connect-go]
viirya commented on PR #19: URL: https://github.com/apache/spark-connect-go/pull/19#issuecomment-2071185132 Does this Spark connect go library have any previous release yet? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [MINOR] Remove unnecessary `imports` [spark]
panbingkun commented on PR #46161: URL: https://github.com/apache/spark/pull/46161#issuecomment-2071176182 > To @panbingkun , please don't use `[WIP]` in the PR title. GitHub `Draft` feature is better to prevent meting. > > It seems that you forget `[WIP]` of the PR title frequently. :) Yeah, I was in a hurry to have `breakfast` just now, and I forgot to remove it. Sorry, haha ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
zhengruifeng commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575494685 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: if lgConfigK is None: return _invoke_function_over_columns("hll_sketch_agg", col) else: -_lgConfigK = lit(lgConfigK) if isinstance(lgConfigK, int) else lgConfigK Review Comment: I add this change to the PR desriptions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
zhengruifeng commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575493828 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: Review Comment: this line is kind of long (to my screen), so I break it to multiple lines: https://github.com/apache/spark/assets/7322292/86029dd9-9939-4b14-8df2-fc9c89feffdb;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
zhengruifeng commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575492977 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: if lgConfigK is None: return _invoke_function_over_columns("hll_sketch_agg", col) else: -_lgConfigK = lit(lgConfigK) if isinstance(lgConfigK, int) else lgConfigK Review Comment: oh, it is not a logical change. The `lit` function actually accepts a `Column` input and return the column itself, so for a `Union[int, Column]` input, we don't not need this `if else` (there should be other similar cases, I just update here BTW) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][MINOR] Remove unnecessary `imports` [spark]
dongjoon-hyun commented on PR #46161: URL: https://github.com/apache/spark/pull/46161#issuecomment-2071159435 To @panbingkun , please don't use `[WIP]` in the PR title. GitHub `Draft` feature is better to prevent meting. It seems that you forget `[WIP]` of the PR title frequently. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47903][PYTHON][FOLLOW-UP] Removed changes relating to try_parse_json [spark]
harshmotw-db commented on code in PR #46170: URL: https://github.com/apache/spark/pull/46170#discussion_r1575488191 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpectsInputTypes.scala: ## Review Comment: This change is not accidental. A very old version of the `try_parse_json` PR had removed a line in this file, and that had been fixed in that PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47903][PYTHON][FOLLOW-UP] Removed changes relating to try_parse_json [spark]
harshmotw-db opened a new pull request, #46170: URL: https://github.com/apache/spark/pull/46170 ### What changes were proposed in this pull request? Removed changes relating to `try_parse_json` that were accidentally pushed during the late stages of this PR. ### Why are the changes needed? There is already another PR in progress adding support for `try_parse_json` and the implementation that was accidentally pushed is outdated. ### Does this PR introduce _any_ user-facing change? Yes, it removes the `try_parse_json` that was added just now. This feature will be added again soon. ### How was this patch tested? NA ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47935][INFRA][PYTHON] Pin `pandas==2.0.3` for `pypy3.8` [spark]
zhengruifeng commented on PR #46159: URL: https://github.com/apache/spark/pull/46159#issuecomment-2071153358 thank you @dongjoon-hyun and @HyukjinKwon for reviews -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]
HyukjinKwon closed pull request #46122: [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library URL: https://github.com/apache/spark/pull/46122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47903][PYTHON] Add support for remaining scalar types in the PySpark Variant library [spark]
HyukjinKwon commented on PR #46122: URL: https://github.com/apache/spark/pull/46122#issuecomment-2071123524 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [CONNECT] Use v1 as spark connect go library starting version [spark-connect-go]
hiboyang opened a new pull request, #19: URL: https://github.com/apache/spark-connect-go/pull/19 ### What changes were proposed in this pull request? Use v1 for this Spark Connect Go Client library, instead of previously using v34. ### Why are the changes needed? There is a recent discussion in Spark community for Spark Operator version naming convention. People like to use version independent of Spark versions. That applies to Spark Connect Go Client as well. Thus change the version here to start from v1. ### Does this PR introduce _any_ user-facing change? Yes, Spark Connect Go Client users will use v1 instead of use v34. ### How was this patch tested? Run unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [WIP][MINOR] Remove unnecessary `imports` [spark]
panbingkun commented on PR #46161: URL: https://github.com/apache/spark/pull/46161#issuecomment-2071119544 > +1, LGTM. > > BTW, do you know why this is not detected by `UnusedImport` rule? After this PR (which should be after `scala 2.13`), we see that `scala.collection.immutable.IndexedSeq` has been added to `src/library/scala/package.scala`, and this class means that we do not need to explicitly import when using `IndexedSeq` https://github.com/scala/scala/commit/457dfd07449a8b7943b2a6c614fb57ea2a7b452e#diff-4b7ff743cb5f9fa1d39314acc17d6d0eebd6b9d74ce333dd6c5549cca13de731 https://github.com/apache/spark/assets/15246973/8fa2c595-7f80-4081-877f-12a039bf34ab;> https://github.com/apache/spark/assets/15246973/a16136a5-c3e6-4b7c-bb0b-7db141581265;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47793][SS][PYTHON] Implement SimpleDataSourceStreamReader for python streaming data source [spark]
chaoqin-li1123 commented on code in PR #45977: URL: https://github.com/apache/spark/pull/45977#discussion_r1575445077 ## python/pyspark/sql/datasource.py: ## @@ -469,6 +501,200 @@ def stop(self) -> None: ... +class SimpleInputPartition(InputPartition): Review Comment: I see, moved it to another file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47929]Setup Static Analysis for Operator [spark-kubernetes-operator]
jiangzho commented on code in PR #6: URL: https://github.com/apache/spark-kubernetes-operator/pull/6#discussion_r1575438072 ## config/checkstyle/checkstyle.xml: ## @@ -0,0 +1,195 @@ + + +https://checkstyle.org/dtds/configuration_1_3.dtd;> + + + + Review Comment: Yes, this is taken from Spark repo. The major difference is to * Remove `checkstyle-suppressions.xml` as there's no expected SuppressionFilter expected for this repository yet. * Style difference for line length / indent .etc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47929]Setup Static Analysis for Operator [spark-kubernetes-operator]
jiangzho commented on PR #6: URL: https://github.com/apache/spark-kubernetes-operator/pull/6#issuecomment-2071080506 Thanks @dongjoon-hyun for the review! yes, we target latest versions as possible [Checkstyle](https://checkstyle.sourceforge.io/) latest 10.15.0 [pmd](https://pmd.github.io/) latest 7.0.0 - it's a major version released on Mar 22, 2024, with another minor version 7.1.0 expected on Apr 26. Thus we started with latest released 6.x version in this PR while evaluating v7. [Spotbugs plugin](https://plugins.gradle.org/plugin/com.github.spotbugs) latest version 6.0.12 [spotbugs](https://github.com/spotbugs/spotbugs) latest version 4.8.4 [Spotless plugin](https://plugins.gradle.org/plugin/com.diffplug.gradle.spotless) latest version 6.25.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
zeotuan commented on code in PR #46151: URL: https://github.com/apache/spark/pull/46151#discussion_r1575426488 ## mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala: ## @@ -358,15 +359,17 @@ class KMeans private ( } val iterationTimeInSeconds = (System.nanoTime() - iterationStartTime) / 1e9 -logInfo(f"Iterations took $iterationTimeInSeconds%.3f seconds.") +logInfo(log"Iterations took ${MDC(TIME_UNITS, iterationTimeInSeconds%.3f)} seconds.") if (iteration == maxIterations) { - logInfo(s"KMeans reached the max number of iterations: $maxIterations.") + logInfo(log"KMeans reached the max number of" + +log" iterations: ${MDC(NUM_ITERATIONS, maxIterations)}.") } else { - logInfo(s"KMeans converged in $iteration iterations.") + logInfo(log"KMeans converged in " + +log"${MDC(NUM_ITERATIONS, iteration)} iterations.") Review Comment: Merged to one line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
dongjoon-hyun commented on PR #46169: URL: https://github.com/apache/spark/pull/46169#issuecomment-2071065080 Merged to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
dongjoon-hyun closed pull request #46169: [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType URL: https://github.com/apache/spark/pull/46169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
zeotuan commented on code in PR #46151: URL: https://github.com/apache/spark/pull/46151#discussion_r1575423794 ## mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala: ## @@ -253,6 +254,13 @@ private[spark] class OptionalInstrumentation private( } } + override def logInfo(logEntry: LogEntry): Unit = { Review Comment: This is so that classes that use the `OptionalInstrumentation` for logging such as [WeightedLeastSquares](mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala), [CrossValidator](mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala) can also log MessageWithContext. Else I will get `Type Mismatched String => MessageWithContext` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47910][CORE] close stream when DiskBlockObjectWriter closeResources to avoid memory leak [spark]
dongjoon-hyun commented on code in PR #46131: URL: https://github.com/apache/spark/pull/46131#discussion_r1575409704 ## core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala: ## @@ -174,23 +174,46 @@ private[spark] class DiskBlockObjectWriter( * Should call after committing or reverting partial writes. */ private def closeResources(): Unit = { -if (initialized) { - Utils.tryWithSafeFinally { -mcs.manualClose() - } { -channel = null -mcs = null -bs = null -fos = null -ts = null -objOut = null -initialized = false -streamOpen = false -hasBeenClosed = true +try { + if (streamOpen) { +Utils.tryWithSafeFinally { + objOut = closeIfNonNull(objOut) + bs = null +} { + bs = closeIfNonNull(bs) +} + } +} catch { + case e: IOException => +logError(log"Exception occurred while closing the output stream" + + log"${MDC(ERROR, e.getMessage)}") +} finally { + if (initialized) { +Utils.tryWithSafeFinally { + mcs.manualClose() +} { + channel = null + mcs = null + bs = null + fos = null + ts = null + objOut = null + initialized = false + streamOpen = false + hasBeenClosed = true +} } } } + Review Comment: nit. Redundant empty line. Please remove this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-44170][BUILD][FOLLOWUP] Align JUnit5 dependency's version and clean up exclusions [spark]
dongjoon-hyun commented on code in PR #46160: URL: https://github.com/apache/spark/pull/46160#discussion_r1575402128 ## pom.xml: ## @@ -220,6 +220,9 @@ 4.1.109.Final 2.0.65.Final 72.1 +5.9.1 Review Comment: May I ask why we downgrade this, @pan3793 ? Previously, we have `5.9.3`. https://github.com/apache/spark/blob/ac9a12ef6e062ae07e878e202521b22de9979a17/pom.xml#L1239 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575399787 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: if lgConfigK is None: return _invoke_function_over_columns("hll_sketch_agg", col) else: -_lgConfigK = lit(lgConfigK) if isinstance(lgConfigK, int) else lgConfigK Review Comment: This is a logical code change, isn't it? If the previous code is a bug, please file a JIRA and handle it separately, @zhengruifeng . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575400474 ## python/pyspark/sql/functions/builtin.py: ## @@ -19551,8 +19554,7 @@ def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] if lgConfigK is None: return _invoke_function_over_columns("hll_sketch_agg", col) else: -_lgConfigK = lit(lgConfigK) if isinstance(lgConfigK, int) else lgConfigK Review Comment: ditto. This is a logical code change which is irrelevant to `docstring`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575400033 ## python/pyspark/sql/functions/builtin.py: ## @@ -19502,7 +19502,10 @@ def unwrap_udt(col: "ColumnOrName") -> Column: @_try_remote_functions -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: Review Comment: ditto. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47937][PYTHON][DOCS] Fix docstring of `hll_sketch_agg` [spark]
dongjoon-hyun commented on code in PR #46163: URL: https://github.com/apache/spark/pull/46163#discussion_r1575399123 ## python/pyspark/sql/connect/functions/builtin.py: ## @@ -3804,12 +3804,14 @@ def sha2(col: "ColumnOrName", numBits: int) -> Column: sha2.__doc__ = pysparkfuncs.sha2.__doc__ -def hll_sketch_agg(col: "ColumnOrName", lgConfigK: Optional[Union[int, Column]] = None) -> Column: +def hll_sketch_agg( +col: "ColumnOrName", +lgConfigK: Optional[Union[int, Column]] = None, +) -> Column: Review Comment: This doesn't look like a doc string. May I ask if we need this style change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support [spark]
dongjoon-hyun commented on PR #46168: URL: https://github.com/apache/spark/pull/46168#issuecomment-2071008564 Merged to master for Apache Spark 4.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support [spark]
dongjoon-hyun closed pull request #46168: [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support URL: https://github.com/apache/spark/pull/46168 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
sadikovi commented on PR #46169: URL: https://github.com/apache/spark/pull/46169#issuecomment-2071002159 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47904][SQL][3.5] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
sadikovi opened a new pull request, #46169: URL: https://github.com/apache/spark/pull/46169 ### What changes were proposed in this pull request? Backport of https://github.com/apache/spark/pull/46126 to branch-3.5. When `enableStableIdentifiersForUnionType` is enabled, all of the types are lowercased which creates a problem when field types are case-sensitive: Union type with fields: ``` Schema.createEnum("myENUM", "", null, List[String]("E1", "e2").asJava), Schema.createRecord("myRecord2", "", null, false, List[Schema.Field](new Schema.Field("F", Schema.create(Type.FLOAT))).asJava) ``` would become ``` struct> ``` but instead should be ``` struct> ``` ### Why are the changes needed? Fixes a bug of lowercasing the field name (the type portion). ### Does this PR introduce _any_ user-facing change? Yes, if a user enables `enableStableIdentifiersForUnionType` and has Union types, all fields will preserve the case. Previously, the field names would be all in lowercase. ### How was this patch tested? I added a test case to verify the new field names. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
sadikovi commented on PR #46126: URL: https://github.com/apache/spark/pull/46126#issuecomment-2070979626 Yes, I will open a separate PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
dongjoon-hyun commented on PR #46126: URL: https://github.com/apache/spark/pull/46126#issuecomment-2070968153 There is a small conflict. Could you make a backporting PR to `branch-3.5`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType [spark]
dongjoon-hyun closed pull request #46126: [SPARK-47904][SQL] Preserve case in Avro schema when using enableStableIdentifiersForUnionType URL: https://github.com/apache/spark/pull/46126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support [spark]
dongjoon-hyun commented on PR #46168: URL: https://github.com/apache/spark/pull/46168#issuecomment-2070956886 Thank you, @viirya ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support [spark]
dongjoon-hyun commented on PR #46168: URL: https://github.com/apache/spark/pull/46168#issuecomment-2070953251 Could you review this K8s version doc PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47907][SQL] Put bang under a config [spark]
gengliangwang closed pull request #46138: [SPARK-47907][SQL] Put bang under a config URL: https://github.com/apache/spark/pull/46138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-47942][K8S][DOCS] Drop K8s v1.26 Support [spark]
dongjoon-hyun opened a new pull request, #46168: URL: https://github.com/apache/spark/pull/46168 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? ### Was this patch authored or co-authored using generative AI tooling? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47907] Put bang under a config [spark]
gengliangwang commented on PR #46138: URL: https://github.com/apache/spark/pull/46138#issuecomment-2070929813 Thanks, merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on PR #46151: URL: https://github.com/apache/spark/pull/46151#issuecomment-2070917462 @zeotuan Thanks for the work! LGTM overall except for some minor comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT [spark]
dongjoon-hyun closed pull request #46167: [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT URL: https://github.com/apache/spark/pull/46167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47940][BUILD][TESTS] Upgrade `guava` dependency to `33.1.0-jre` in Docker IT [spark]
dongjoon-hyun commented on PR #46167: URL: https://github.com/apache/spark/pull/46167#issuecomment-2070905656 Thank you! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46151: URL: https://github.com/apache/spark/pull/46151#discussion_r1575322770 ## mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala: ## @@ -358,15 +359,17 @@ class KMeans private ( } val iterationTimeInSeconds = (System.nanoTime() - iterationStartTime) / 1e9 -logInfo(f"Iterations took $iterationTimeInSeconds%.3f seconds.") +logInfo(log"Iterations took ${MDC(TIME_UNITS, iterationTimeInSeconds%.3f)} seconds.") if (iteration == maxIterations) { - logInfo(s"KMeans reached the max number of iterations: $maxIterations.") + logInfo(log"KMeans reached the max number of" + +log" iterations: ${MDC(NUM_ITERATIONS, maxIterations)}.") } else { - logInfo(s"KMeans converged in $iteration iterations.") + logInfo(log"KMeans converged in " + +log"${MDC(NUM_ITERATIONS, iteration)} iterations.") Review Comment: nit: can we merge into one line? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46151: URL: https://github.com/apache/spark/pull/46151#discussion_r1575320474 ## mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala: ## @@ -253,6 +254,13 @@ private[spark] class OptionalInstrumentation private( } } + override def logInfo(logEntry: LogEntry): Unit = { Review Comment: why do we need this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47600][CORE] MLLib: Migrate logInfo with variables to structured logging framework [spark]
gengliangwang commented on code in PR #46151: URL: https://github.com/apache/spark/pull/46151#discussion_r1575319360 ## mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala: ## @@ -53,8 +54,8 @@ private[spark] class Instrumentation private () extends Logging with MLEvents { // estimator.getClass.getSimpleName can cause Malformed class name error, // call safer `Utils.getSimpleName` instead val className = Utils.getSimpleName(stage.getClass) -logInfo(s"Stage class: $className") -logInfo(s"Stage uid: ${stage.uid}") +logInfo(log"Stage class: ${MDC(CLASS_NAME, className)}") +logInfo(log"Stage uid: ${MDC(STAGE_ID, stage.uid)}") Review Comment: ```suggestion logInfo(log"Stage uid: ${MDC(PIPELINE_STAGE_UID, stage.uid)}") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org