[spark] branch master updated: [SPARK-45267][PS] Change the default value for numeric_only
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 8cbc741320d [SPARK-45267][PS] Change the default value for numeric_only 8cbc741320d is described below commit 8cbc741320dac60ce814ce0a9b3e72239248efb8 Author: Haejoon Lee AuthorDate: Wed Sep 27 14:04:54 2023 +0800 [SPARK-45267][PS] Change the default value for numeric_only ### What changes were proposed in this pull request? This PR proposes to change the default value for `numeric_only` with related functions. ### Why are the changes needed? There are many functions that support `numeric_only` parameter have changed their default value from `True` to `False` from Pandas 2.0.0, so we should follow their behavior. See https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html for more detail. ### Does this PR introduce _any_ user-facing change? Yes, the default value for `numeric_only` is changed to `False`. ### How was this patch tested? Updated the related UTs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43043 from itholic/numeric_only. Authored-by: Haejoon Lee Signed-off-by: Ruifeng Zheng --- python/pyspark/pandas/frame.py | 38 +++ python/pyspark/pandas/groupby.py | 54 +++--- python/pyspark/pandas/series.py| 13 -- .../pandas/tests/computation/test_compute.py | 8 +++- 4 files changed, 47 insertions(+), 66 deletions(-) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index 08450c0be87..faa595f80e3 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -747,7 +747,7 @@ class DataFrame(Frame, Generic[T]): sfun: Callable[["Series"], PySparkColumn], name: str, axis: Optional[Axis] = None, -numeric_only: bool = True, +numeric_only: bool = False, skipna: bool = True, **kwargs: Any, ) -> "Series": @@ -762,10 +762,8 @@ class DataFrame(Frame, Generic[T]): axis: used only for sanity check because the series only supports index axis. name : original pandas API name. axis : axis to apply. 0 or 1, or 'index' or 'columns. -numeric_only : bool, default True -Include only float, int, boolean columns. False is not supported. This parameter -is mainly for pandas compatibility. Only 'DataFrame.count' uses this parameter -currently. +numeric_only : bool, default False +Include only float, int, boolean columns. skipna : bool, default True Exclude NA/null values when computing the result. """ @@ -11150,7 +11148,7 @@ defaultdict(, {'col..., 'col...})] # TODO: add axis, pct, na_option parameter def rank( -self, method: str = "average", ascending: bool = True, numeric_only: Optional[bool] = None +self, method: str = "average", ascending: bool = True, numeric_only: bool = False ) -> "DataFrame": """ Compute numerical data ranks (1 through n) along axis. Equal values are @@ -11171,9 +11169,13 @@ defaultdict(, {'col..., 'col...})] * dense: like 'min', but rank always increases by 1 between groups ascending : boolean, default True False for ranks by high (1) to low (N) -numeric_only : bool, optional +numeric_only : bool, default False For DataFrame objects, rank only numeric columns if set to True. +.. versionchanged:: 4.0.0 +The default value of ``numeric_only`` is now ``False``. + + Returns --- ranks : same type as caller @@ -11238,11 +11240,6 @@ defaultdict(, {'col..., 'col...})] 2 2.5 3 4.0 """ -warnings.warn( -"Default value of `numeric_only` will be changed to `False` " -"instead of `None` in 4.0.0.", -FutureWarning, -) if numeric_only: numeric_col_names = [] for label in self._internal.column_labels: @@ -12206,7 +12203,7 @@ defaultdict(, {'col..., 'col...})] self, q: Union[float, Iterable[float]] = 0.5, axis: Axis = 0, -numeric_only: bool = True, +numeric_only: bool = False, accuracy: int = 1, ) -> DataFrameOrSeries: """ @@ -1,9 +12219,12 @@ defaultdict(, {'col..., 'col...})] 0 <= q <= 1, the quantile(s) to compute. axis : int or str, default 0 or 'index' Can only be set to 0 now. -numeric_only : bool, default True -If False, the quantile of datetime and time
[spark] branch master updated: [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt`
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 644c8b25bfd [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt` 644c8b25bfd is described below commit 644c8b25bfda1fe323a7b621d14a340018560136 Author: Ruifeng Zheng AuthorDate: Wed Sep 27 14:03:31 2023 +0800 [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt` ### What changes were proposed in this pull request? Correct the group of `ElementAt` and `TryElementAt`, they both support array and map input, so should be in `collection functions`. Existing category strategy seems like this: if a function support more than one types, it should be in `collection functions`: - `Size` supports both array and map, it is in `collection functions`; - `Reverse` supports both array and string, it is in `collection functions`; So far, I didn't find other places with incorrect groups. ### Why are the changes needed? for docs ### Does this PR introduce _any_ user-facing change? yes, they will be in `collection functions` in SQL references ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes #43121 from zhengruifeng/group_element_at. Lead-authored-by: Ruifeng Zheng Co-authored-by: Ruifeng Zheng Signed-off-by: Ruifeng Zheng --- .../apache/spark/sql/catalyst/expressions/collectionOperations.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala index 4a3c7bbc2be..759000bc5f5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala @@ -2333,7 +2333,7 @@ case class Get( b """, since = "2.4.0", - group = "map_funcs") + group = "collection_funcs") case class ElementAt( left: Expression, right: Expression, @@ -2557,7 +2557,7 @@ case class ElementAt( b """, since = "3.3.0", - group = "map_funcs") + group = "collection_funcs") case class TryElementAt(left: Expression, right: Expression, replacement: Expression) extends RuntimeReplaceable with InheritAnalysisRules { def this(left: Expression, right: Expression) = - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (eff46ea77e9 -> 28dc555821b)
This is an automated email from the ASF dual-hosted git repository. ruifengz pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath` add 28dc555821b [SPARK-45329][PYTHON][CONNECT] DataFrame methods skip pandas conversion No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/dataframe.py | 66 + 1 file changed, 34 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath` eff46ea77e9 is described below commit eff46ea77e9bebef3076277bef1e086833dd Author: Max Gekk AuthorDate: Wed Sep 27 08:28:45 2023 +0300 [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath` ### What changes were proposed in this pull request? In the PR, I propose to remove already deprecated SQL config `spark.sql.hive.verifyPartitionPath`, and the code under the config. The config has been deprecated since Spark 3.0. ### Why are the changes needed? To improve code maintainability by remove unused code. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the modified test suite: ``` $ build/sbt "test:testOnly *SQLConfSuite" $ build/sbt "test:testOnly *QueryPartitionSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43130 from MaxGekk/remove-verifyPartitionPath. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../org/apache/spark/sql/internal/SQLConf.scala| 17 ++--- .../apache/spark/sql/internal/SQLConfSuite.scala | 4 +-- .../org/apache/spark/sql/hive/TableReader.scala| 41 +- .../spark/sql/hive/QueryPartitionSuite.scala | 12 ++- 4 files changed, 8 insertions(+), 66 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 43eb0756d8d..aeef531dbcd 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -34,7 +34,6 @@ import org.apache.hadoop.fs.Path import org.apache.spark.{ErrorMessageFormat, SparkConf, SparkContext, TaskContext} import org.apache.spark.internal.Logging import org.apache.spark.internal.config._ -import org.apache.spark.internal.config.{IGNORE_MISSING_FILES => SPARK_IGNORE_MISSING_FILES} import org.apache.spark.network.util.ByteUnit import org.apache.spark.sql.catalyst.ScalaReflection import org.apache.spark.sql.catalyst.analysis.{HintErrorLogger, Resolver} @@ -1261,14 +1260,6 @@ object SQLConf { .booleanConf .createWithDefault(false) - val HIVE_VERIFY_PARTITION_PATH = buildConf("spark.sql.hive.verifyPartitionPath") -.doc("When true, check all the partition paths under the table\'s root directory " + - "when reading data stored in HDFS. This configuration will be deprecated in the future " + - s"releases and replaced by ${SPARK_IGNORE_MISSING_FILES.key}.") -.version("1.4.0") -.booleanConf -.createWithDefault(false) - val HIVE_METASTORE_DROP_PARTITION_BY_NAME = buildConf("spark.sql.hive.dropPartitionByName.enabled") .doc("When true, Spark will get partition name rather than partition object " + @@ -4472,8 +4463,6 @@ object SQLConf { PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_NAME.key, "2.4", "The config allows to switch to the behaviour before Spark 2.4 " + "and will be removed in the future releases."), - DeprecatedConfig(HIVE_VERIFY_PARTITION_PATH.key, "3.0", -s"This config is replaced by '${SPARK_IGNORE_MISSING_FILES.key}'."), DeprecatedConfig(ARROW_EXECUTION_ENABLED.key, "3.0", s"Use '${ARROW_PYSPARK_EXECUTION_ENABLED.key}' instead of it."), DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0", @@ -4552,7 +4541,9 @@ object SQLConf { RemovedConfig("spark.sql.ansi.strictIndexOperator", "3.4.0", "true", "This was an internal configuration. It is not needed anymore since Spark SQL always " + "returns null when getting a map value with a non-existing key. See SPARK-40066 " + - "for more details.") + "for more details."), + RemovedConfig("spark.sql.hive.verifyPartitionPath", "4.0.0", "false", +s"This config was replaced by '${IGNORE_MISSING_FILES.key}'.") ) Map(configs.map { cfg => cfg.key -> cfg } : _*) @@ -4766,8 +4757,6 @@ class SQLConf extends Serializable with Logging with SqlApiConf { def isOrcSchemaMergingEnabled: Boolean = getConf(ORC_SCHEMA_MERGING_ENABLED) - def verifyPartitionPath: Boolean = getConf(HIVE_VERIFY_PARTITION_PATH) - def metastoreDropPartitionsByName: Boolean = getConf(HIVE_METASTORE_DROP_PARTITION_BY_NAME) def metastorePartitionPruning: Boolean = getConf(HIVE_METASTORE_PARTITION_PRUNING) diff --git a/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala b/sql/core/src/test/scala/org/apache/sp
[spark] branch master updated: [SPARK-44780][DOC] SQL temporary variables
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new f6c6acbc00d [SPARK-44780][DOC] SQL temporary variables f6c6acbc00d is described below commit f6c6acbc00d2d96c43298c282e4bd8ebeb160ad1 Author: Serge Rielau AuthorDate: Wed Sep 27 13:05:19 2023 +0800 [SPARK-44780][DOC] SQL temporary variables ### What changes were proposed in this pull request? Document the previously pushed feature SQL temporary variables ### Why are the changes needed? If it's not documented, it doesn't exist ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? build docs, verify HTML Closes #42467 from srielau/SPARK-44780-Doc-sql-session-variables. Lead-authored-by: Serge Rielau Co-authored-by: srielau Signed-off-by: Wenchen Fan --- docs/sql-ref-syntax-aux-conf-mgmt-set.md| 3 + docs/sql-ref-syntax-aux-set-var.md | 98 + docs/sql-ref-syntax-ddl-declare-variable.md | 82 docs/sql-ref-syntax-ddl-drop-variable.md| 66 +++ docs/sql-ref-syntax.md | 3 + 5 files changed, 252 insertions(+) diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set.md b/docs/sql-ref-syntax-aux-conf-mgmt-set.md index f97b7f2a8ef..9e57a221f96 100644 --- a/docs/sql-ref-syntax-aux-conf-mgmt-set.md +++ b/docs/sql-ref-syntax-aux-conf-mgmt-set.md @@ -23,6 +23,8 @@ license: | The SET command sets a property, returns the value of an existing property or returns all SQLConf properties with value and meaning. +To set SQL variables defined with [DECLARE VARIABLE](sql-ref-syntax-ddl-declare-variable.html) use [SET VAR](sql-ref-syntax-aux-set-var.html). + ### Syntax ```sql @@ -69,3 +71,4 @@ SET spark.sql.variable.substitute; ### Related Statements * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html) +* [SET VAR](sql-ref-syntax-aux-set-var.html) diff --git a/docs/sql-ref-syntax-aux-set-var.md b/docs/sql-ref-syntax-aux-set-var.md new file mode 100644 index 000..9ce9e68cd4f --- /dev/null +++ b/docs/sql-ref-syntax-aux-set-var.md @@ -0,0 +1,98 @@ +--- +layout: global +title: SET VAR +displayTitle: SET VAR +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +### Description + +The `SET VAR` command sets a temporary variable which has been previously declared in the current session. + +To set a config variable or a hive variable use [SET](sql-ref-syntax-aux-conf-mgmt-set.html). + +### Syntax + +```sql +SET { VAR | VARIABLE } + { { variable_name = { expression | DEFAULT } } [, ...] | +( variable_name [, ...] ) = ( query ) } +``` + +### Parameters + +* **variable_name** + + Specifies an existing variable. + If you specify multiple variables, there must not be any duplicates. + +* **expression** + + Any expression, including scalar subqueries. + +* **DEFAULT** + + If you specify `DEFAULT`, the default expression of the variable is assigned, + or `NULL` if there is none. + +* **query** + + A [query](sql-ref-syntax-qry-select.html) that returns at most one row and as many columns as + the number of specified variables. Each column must be implicitly castable to the data type of the + corresponding variable. + If the query returns no row `NULL` values are assigned. + +### Examples + +```sql +-- +DECLARE VARIABLE var1 INT DEFAULT 7; +DECLARE VARIABLE var2 STRING; + +-- A simple assignment +SET VAR var1 = 5; +SELECT var1; + 5 + +-- A complex expression assignment +SET VARIABLE var1 = (SELECT max(c1) FROM VALUES(1), (2) AS t(c1)); +SELECT var1; + 2 + +-- resetting the variable to DEFAULT +SET VAR var1 = DEFAULT; +SELECT var1; + 7 + +-- A multi variable assignment +SET VAR (var1, var2) = (SELECT max(c1), CAST(min(c1) AS STRING) FROM VALUES(1), (2) AS t(c1)); +SELECT var1, var2; + 2 1 + +-- Too many rows +SET VAR (var1, var2) = (SELECT c1, CAST(c1 AS STRING) FROM VALUES(1), (2) AS t(c1)); +Error: ROW_SUBQUERY_TOO_MANY_ROWS + +-- No rows +SET VAR (var1, var2) =
[spark] branch master updated: [SPARK-43850][BUILD][GRAPHX] Remove the import for `scala.language.higherKinds` and delete the corresponding suppression rule
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 1597d8174a5 [SPARK-43850][BUILD][GRAPHX] Remove the import for `scala.language.higherKinds` and delete the corresponding suppression rule 1597d8174a5 is described below commit 1597d8174a54b8657572f2e40897a20c985d2794 Author: yangjie01 AuthorDate: Wed Sep 27 12:26:46 2023 +0800 [SPARK-43850][BUILD][GRAPHX] Remove the import for `scala.language.higherKinds` and delete the corresponding suppression rule ### What changes were proposed in this pull request? `scala.language.higherKinds` is deprecated and no longer needs to be imported explicitly in Scala 2.13, so this PR removes the imports for scala.language.higherKinds and the corresponding compiler suppression rules. ### Why are the changes needed? In SPARK-43849(https://github.com/apache/spark/pull/41356), I added compiler suppression rules to allow `unused imports` checks to work with both Scala 2.12 and Scala 2.13. As there is no longer a need to support Scala 2.12, we can now clean them up. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43128 from LuciferYang/SPARK-43850. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala| 1 - .../scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala | 1 - pom.xml | 6 -- project/SparkBuild.scala| 4 4 files changed, 12 deletions(-) diff --git a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala index 8da46db98be..bbc4bca5016 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala @@ -17,7 +17,6 @@ package org.apache.spark.graphx.impl -import scala.language.higherKinds import scala.reflect.ClassTag import org.apache.spark.graphx._ diff --git a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala index a8ed59b09bb..cf4c8ca2a9c 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala @@ -17,7 +17,6 @@ package org.apache.spark.graphx.impl -import scala.language.higherKinds import scala.language.implicitConversions import scala.reflect.ClassTag diff --git a/pom.xml b/pom.xml index 1d0ab387900..6dc54a9bc94 100644 --- a/pom.xml +++ b/pom.xml @@ -2970,12 +2970,6 @@ -Wconf:cat=unchecked&msg=outer reference:s -Wconf:cat=unchecked&msg=eliminated by erasure:s -Wconf:msg=^(?=.*?a value of type)(?=.*?cannot also be).+$:s - - -Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBase.scala:s - -Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBaseOps.scala:s diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index ad2b67c67c6..817f79a84a4 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -258,10 +258,6 @@ object SparkBuild extends PomBuild { "-Wconf:cat=unchecked&msg=outer reference:s", "-Wconf:cat=unchecked&msg=eliminated by erasure:s", "-Wconf:msg=^(?=.*?a value of type)(?=.*?cannot also be).+$:s", -// TODO(SPARK-43850): Remove the following suppression rules and remove `import scala.language.higherKinds` -// from the corresponding files when Scala 2.12 is no longer supported. - "-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBase.scala:s", - "-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBaseOps.scala:s", // SPARK-40497 Upgrade Scala to 2.13.11 and suppress `Implicit definition should have explicit type` "-Wconf:msg=Implicit definition should have explicit type:s" ) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 7e8aafd2c0f [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter 7e8aafd2c0f is described below commit 7e8aafd2c0f1f6fcd03a69afe2b85fd3fda95d20 Author: lanmengran1 AuthorDate: Tue Sep 26 21:01:02 2023 -0500 [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter ### What changes were proposed in this pull request? Remove one line of comment, the detail info is described in JIRA https://issues.apache.org/jira/browse/SPARK-45334 ### Why are the changes needed? The comment is outdated and misleading. - the parquet-hive module has been removed from the parquet-mr project https://issues.apache.org/jira/browse/PARQUET-1676 - Hive always uses "array_element" as the name ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need ### Was this patch authored or co-authored using generative AI tooling? No Closes #43119 from amoylan2/remove_misleading_comment_in_parquetSchemaConverter. Authored-by: lanmengran1 Signed-off-by: Sean Owen --- .../spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala | 1 - 1 file changed, 1 deletion(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala index 9c9e7ce729c..eedd165278a 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala @@ -646,7 +646,6 @@ class SparkToParquetSchemaConverter( .buildGroup(repetition).as(LogicalTypeAnnotation.listType()) .addField(Types .buildGroup(REPEATED) -// "array" is the name chosen by parquet-hive (1.7.0 and prior version) .addField(convertField(StructField("array", elementType, nullable))) .named("bag")) .named(field.name) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45302][PYTHON] Remove PID communication between Python workers when no demon is used
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 17430fe4702 [SPARK-45302][PYTHON] Remove PID communication between Python workers when no demon is used 17430fe4702 is described below commit 17430fe47029f1d27c7913468b95abfd856fddcc Author: Hyukjin Kwon AuthorDate: Wed Sep 27 10:48:17 2023 +0900 [SPARK-45302][PYTHON] Remove PID communication between Python workers when no demon is used ### What changes were proposed in this pull request? This PR removes the legacy workaround for JDK 8 in `PythonWorkerFactory`. ### Why are the changes needed? No need to manually send the PID around through the socket. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are existing unittests for the daemon disabled. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43087 from HyukjinKwon/SPARK-45302. Lead-authored-by: Hyukjin Kwon Co-authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- core/src/main/scala/org/apache/spark/SparkEnv.scala | 4 ++-- .../scala/org/apache/spark/api/python/PythonRunner.scala | 10 +- .../org/apache/spark/api/python/PythonWorkerFactory.scala | 15 +++ python/pyspark/daemon.py | 4 ++-- .../sql/connect/streaming/worker/foreach_batch_worker.py | 2 -- .../sql/connect/streaming/worker/listener_worker.py | 2 -- python/pyspark/sql/worker/analyze_udtf.py | 3 --- python/pyspark/worker.py | 3 --- .../spark/sql/execution/python/PythonArrowOutput.scala| 2 +- .../spark/sql/execution/python/PythonUDFRunner.scala | 2 +- 10 files changed, 18 insertions(+), 29 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala b/core/src/main/scala/org/apache/spark/SparkEnv.scala index e404c9ee8b4..937170b5ee8 100644 --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala @@ -128,7 +128,7 @@ class SparkEnv ( pythonExec: String, workerModule: String, daemonModule: String, - envVars: Map[String, String]): (PythonWorker, Option[Int]) = { + envVars: Map[String, String]): (PythonWorker, Option[Long]) = { synchronized { val key = PythonWorkersKey(pythonExec, workerModule, daemonModule, envVars) pythonWorkers.getOrElseUpdate(key, @@ -139,7 +139,7 @@ class SparkEnv ( private[spark] def createPythonWorker( pythonExec: String, workerModule: String, - envVars: Map[String, String]): (PythonWorker, Option[Int]) = { + envVars: Map[String, String]): (PythonWorker, Option[Long]) = { createPythonWorker( pythonExec, workerModule, PythonWorkerFactory.defaultDaemonModule, envVars) } diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala index db95e6c2bd6..2a63298d0a1 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala @@ -84,7 +84,7 @@ private object BasePythonRunner { private lazy val faultHandlerLogDir = Utils.createTempDir(namePrefix = "faulthandler") - private def faultHandlerLogPath(pid: Int): Path = { + private def faultHandlerLogPath(pid: Long): Path = { new File(faultHandlerLogDir, pid.toString).toPath } } @@ -200,7 +200,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( envVars.put("SPARK_JOB_ARTIFACT_UUID", jobArtifactUUID.getOrElse("default")) -val (worker: PythonWorker, pid: Option[Int]) = env.createPythonWorker( +val (worker: PythonWorker, pid: Option[Long]) = env.createPythonWorker( pythonExec, workerModule, daemonModule, envVars.asScala.toMap) // Whether is the worker released into idle pool or closed. When any codes try to release or // close a worker, they should use `releasedOrClosed.compareAndSet` to flip the state to make @@ -253,7 +253,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( startTime: Long, env: SparkEnv, worker: PythonWorker, - pid: Option[Int], + pid: Option[Long], releasedOrClosed: AtomicBoolean, context: TaskContext): Iterator[OUT] @@ -463,7 +463,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT]( startTime: Long, env: SparkEnv, worker: PythonWorker, - pid: Option[Int], + pid: Option[Long], releasedOrClosed: AtomicBoolean, context: TaskContext) extends Iterator[OUT] { @@ -838,7 +838,7 @@ private[spark] class PythonRunn
[spark] branch master updated (e6d1e9ed384 -> 17881eb7eca)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git from e6d1e9ed384 [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name add 17881eb7eca [SPARK-45339][PYTHON][CONNECT] Pyspark should log errors it retries No new revisions were added by this update. Summary of changes: python/pyspark/sql/connect/client/core.py | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e6d1e9ed384 [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name e6d1e9ed384 is described below commit e6d1e9ed3843352e6a39ad5bb18d9b849442a1de Author: Jia Fan AuthorDate: Wed Sep 27 09:38:39 2023 +0900 [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name ### What changes were proposed in this pull request? The `xmlExpressions.scala` file in package `org.apache.spark.sql.catalyst.expressions`, but it package name is `org.apache.spark.sql.catalyst.expressions.xml`. ### Why are the changes needed? Fix not correct package name. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? exist test. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43102 from Hisoka-X/xml-package-name-fix. Authored-by: Jia Fan Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala | 2 +- sql/core/src/main/scala/org/apache/spark/sql/functions.scala| 1 - sql/core/src/test/resources/sql-functions/sql-expression-schema.md | 6 +++--- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala index c0fd725943d..df63429ae33 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala @@ -14,7 +14,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ -package org.apache.spark.sql.catalyst.expressions.xml +package org.apache.spark.sql.catalyst.expressions import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.analysis.TypeCheckResult diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala index 2a7ed263c74..a2343ed04d4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala @@ -30,7 +30,6 @@ import org.apache.spark.sql.catalyst.analysis.{Star, UnresolvedFunction} import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ -import org.apache.spark.sql.catalyst.expressions.xml._ import org.apache.spark.sql.catalyst.plans.logical.{BROADCAST, HintInfo, ResolvedHint} import org.apache.spark.sql.catalyst.util.CharVarcharUtils import org.apache.spark.sql.errors.{DataTypeErrors, QueryCompilationErrors} diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md index d21ceaeb14b..4fd493d1a3c 100644 --- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md +++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md @@ -274,6 +274,7 @@ | org.apache.spark.sql.catalyst.expressions.RowNumber | row_number | SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b) | struct | | org.apache.spark.sql.catalyst.expressions.SchemaOfCsv | schema_of_csv | SELECT schema_of_csv('1,abc') | struct | | org.apache.spark.sql.catalyst.expressions.SchemaOfJson | schema_of_json | SELECT schema_of_json('[{"col":0}]') | struct | +| org.apache.spark.sql.catalyst.expressions.SchemaOfXml | schema_of_xml | SELECT schema_of_xml('1') | struct1):string> | | org.apache.spark.sql.catalyst.expressions.Sec | sec | SELECT sec(0) | struct | | org.apache.spark.sql.catalyst.expressions.Second | second | SELECT second('2009-07-30 12:58:59') | struct | | org.apache.spark.sql.catalyst.expressions.SecondsToTimestamp | timestamp_seconds | SELECT timestamp_seconds(1230219000) | struct | @@ -365,6 +366,7 @@ | org.apache.spark.sql.catalyst.expressions.WeekOfYear | weekofyear | SELECT weekofyear('2008-02-20') | struct | | org.apache.spark.sql.catalyst.expressions.WidthBucket | width_bucket | SELECT width_bucket(5.3, 0.2, 10.6, 5) | struct | | org.apache.spark.sql.catalyst.expressions.WindowTime | window_time | SELECT a, window.start as start, window.end as end, window_time(window), cnt FROM (SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), ('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', '2021-01-01 00:01:00') AS tab(a, b) GROUP by a
[spark] branch master updated: [SPARK-45328][SQL] Remove Hive support prior to 2.0.0
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3c84c229d16 [SPARK-45328][SQL] Remove Hive support prior to 2.0.0 3c84c229d16 is described below commit 3c84c229d167a6ab2857649e91fff6f0d57bb12c Author: Hyukjin Kwon AuthorDate: Wed Sep 27 07:20:14 2023 +0900 [SPARK-45328][SQL] Remove Hive support prior to 2.0.0 ### What changes were proposed in this pull request? This PR proposes to remove Hive support prior to 2.0.0 (`spark.sql.hive.metastore.version`). ### Why are the changes needed? We dropped JDK 8 and 11, and Hive prior to 2.0.0 cannot work together. They are actually already the dead code. ### Does this PR introduce _any_ user-facing change? Technically no, because this wouldn't already work. ### How was this patch tested? Nope because there is no way to test them. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43116 from HyukjinKwon/SPARK-45328. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/hive/HiveUtils.scala | 2 +- .../spark/sql/hive/client/HiveClientImpl.scala | 6 --- .../apache/spark/sql/hive/client/HiveShim.scala| 12 +++--- .../sql/hive/client/IsolatedClientLoader.scala | 6 --- .../org/apache/spark/sql/hive/client/package.scala | 46 +- .../spark/sql/hive/execution/HiveTempPath.scala| 40 ++- .../spark/sql/hive/client/HiveClientVersions.scala | 7 +--- .../hive/client/HivePartitionFilteringSuites.scala | 3 +- 9 files changed, 16 insertions(+), 107 deletions(-) diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md index 56a3c8292cd..a28f6fd284d 100644 --- a/docs/sql-migration-guide.md +++ b/docs/sql-migration-guide.md @@ -26,6 +26,7 @@ license: | - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set `spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`). - Since Spark 4.0, any read of SQL tables takes into consideration the SQL configs `spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` instead of the core config `spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`. +- Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users should migrate to higher versions. ## Upgrading from Spark SQL 3.4 to 3.5 diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala index a01246520f3..794838a1190 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala @@ -73,7 +73,7 @@ private[spark] object HiveUtils extends Logging { val HIVE_METASTORE_VERSION = buildStaticConf("spark.sql.hive.metastore.version") .doc("Version of the Hive metastore. Available options are " + -"0.12.0 through 2.3.9 and " + +"2.0.0 through 2.3.9 and " + "3.0.0 through 3.1.3.") .version("1.4.0") .stringConf diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala index f3d7d7e66a5..4e4ef6ce9f7 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala @@ -115,12 +115,6 @@ private[hive] class HiveClientImpl( private val outputBuffer = new CircularBuffer() private val shim = version match { -case hive.v12 => new Shim_v0_12() -case hive.v13 => new Shim_v0_13() -case hive.v14 => new Shim_v0_14() -case hive.v1_0 => new Shim_v1_0() -case hive.v1_1 => new Shim_v1_1() -case hive.v1_2 => new Shim_v1_2() case hive.v2_0 => new Shim_v2_0() case hive.v2_1 => new Shim_v2_1() case hive.v2_2 => new Shim_v2_2() diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala index 338498d3d48..e12fe857c88 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala @@ -255,7 +255,7 @@ private[client] sealed abstract class Shim { } } -private[client] class Shim_v0_12 extends Shim with Logging { +private class Shim_v0_12 extends Shim with Logging { // See
[GitHub] [spark-website] mateiz commented on pull request #480: Fix UI issue for `published` docs about Switch languages consistently across docs for all code snippets
mateiz commented on PR #480: URL: https://github.com/apache/spark-website/pull/480#issuecomment-1736040999 Super excited to see this getting fixed! I really pushed for the original switcher years ago to improve docs usability and I was sad when I noticed it was gone. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ea3104fa71b [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt ea3104fa71b is described below commit ea3104fa71b0d7b6ac5e74292c28e40acb1e6537 Author: Ismaël Mejía AuthorDate: Tue Sep 26 10:40:08 2023 -0700 [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt ### What changes were proposed in this pull request? This PR adds missing parts of the upgrade to Avro 1.11.3 ### Why are the changes needed? Because there are missing references to the version fo the library ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass the CI + Verify docs ### Was this patch authored or co-authored using generative AI tooling? No Closes #43118 from iemejia/master. Authored-by: Ismaël Mejía Signed-off-by: Dongjoon Hyun --- .../avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala | 4 ++-- docs/sql-data-sources-avro.md | 4 ++-- project/SparkBuild.scala | 2 +- .../test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala | 2 +- 4 files changed, 6 insertions(+), 6 deletions(-) diff --git a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala index edaaa8835cc..a0db82f9871 100644 --- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala +++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala @@ -81,14 +81,14 @@ private[sql] class AvroOptions( /** * Top level record name in write result, which is required in Avro spec. - * See https://avro.apache.org/docs/1.11.2/specification/#schema-record . + * See https://avro.apache.org/docs/1.11.3/specification/#schema-record . * Default value is "topLevelRecord" */ val recordName: String = parameters.getOrElse(RECORD_NAME, "topLevelRecord") /** * Record namespace in write result. Default value is "". - * See Avro spec for details: https://avro.apache.org/docs/1.11.2/specification/#schema-record . + * See Avro spec for details: https://avro.apache.org/docs/1.11.3/specification/#schema-record . */ val recordNamespace: String = parameters.getOrElse(RECORD_NAMESPACE, "") diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md index b01174b9182..72741b0e9d1 100644 --- a/docs/sql-data-sources-avro.md +++ b/docs/sql-data-sources-avro.md @@ -417,7 +417,7 @@ applications. Read the [Advanced Dependency Management](https://spark.apache Submission Guide for more details. ## Supported types for Avro -> Spark SQL conversion -Currently Spark supports reading all [primitive types](https://avro.apache.org/docs/1.11.2/specification/#primitive-types) and [complex types](https://avro.apache.org/docs/1.11.2/specification/#complex-types) under records of Avro. +Currently Spark supports reading all [primitive types](https://avro.apache.org/docs/1.11.3/specification/#primitive-types) and [complex types](https://avro.apache.org/docs/1.11.3/specification/#complex-types) under records of Avro. Avro typeSpark SQL type @@ -481,7 +481,7 @@ In addition to the types listed above, it supports reading `union` types. The fo 3. `union(something, null)`, where something is any supported Avro type. This will be mapped to the same Spark SQL type as that of something, with nullable set to true. All other union types are considered complex. They will be mapped to StructType where field names are member0, member1, etc., in accordance with members of the union. This is consistent with the behavior when converting between Avro and Parquet. -It also supports reading the following Avro [logical types](https://avro.apache.org/docs/1.11.2/specification/#logical-types): +It also supports reading the following Avro [logical types](https://avro.apache.org/docs/1.11.3/specification/#logical-types): Avro logical typeAvro typeSpark SQL type diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 400ee8c5f28..ad2b67c67c6 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -1071,7 +1071,7 @@ object DependencyOverrides { dependencyOverrides += "com.google.guava" % "guava" % guavaVersion, dependencyOverrides += "xerces" % "xercesImpl" % "2.12.2", dependencyOverrides += "jline" % "jline" % "2.14.6", -dependencyOverrides += "org.apache.avro" % "avro" % "1.11.2") +dependencyOverrides += "org.apache.avro" % "avro" % "1.11.3") } /** diff --git a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala b/sql/h
[spark] branch master updated: [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 13cd291c354 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1 13cd291c354 is described below commit 13cd291c3549467dfd5d10a665e2d6a577f35bcb Author: yangjie01 AuthorDate: Tue Sep 26 11:14:21 2023 -0500 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1 ### What changes were proposed in this pull request? This pr is aims upgrade `antlr4` from 4.9.3 to 4.13.1 ### Why are the changes needed? After 4.10, antlr4 is using Java 11 for the source code and the compiled .class files for the ANTLR tool. There are some bug fix and Improvements after 4.9.3: - https://github.com/antlr/antlr4/pull/3399 - https://github.com/antlr/antlr4/issues/1105 - https://github.com/antlr/antlr4/issues/2788 - https://github.com/antlr/antlr4/pull/3957 - https://github.com/antlr/antlr4/pull/4394 The full release notes as follows: - https://github.com/antlr/antlr4/releases/tag/4.13.1 - https://github.com/antlr/antlr4/releases/tag/4.13.0 - https://github.com/antlr/antlr4/releases/tag/4.12.0 - https://github.com/antlr/antlr4/releases/tag/4.11.1 - https://github.com/antlr/antlr4/releases/tag/4.11.0 - https://github.com/antlr/antlr4/releases/tag/4.10.1 - https://github.com/antlr/antlr4/releases/tag/4.10 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43075 from LuciferYang/antlr4-4131. Authored-by: yangjie01 Signed-off-by: Sean Owen --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 206361e1efa..5c17d727b0a 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -12,7 +12,7 @@ aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar annotations/17.0.0//annotations-17.0.0.jar antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar -antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar +antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar arpack/3.0.3//arpack-3.0.3.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar diff --git a/pom.xml b/pom.xml index 5fd3e173857..1d0ab387900 100644 --- a/pom.xml +++ b/pom.xml @@ -212,7 +212,7 @@ 3.0.0 0.12.0 -4.9.3 +4.13.1 1.1 4.12.1 4.12.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry
This is an automated email from the ASF dual-hosted git repository. mridulm80 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new ff084d2852e [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry ff084d2852e is described below commit ff084d2852e62c6670e074ef423ae16c915710bc Author: Harunobu Daikoku AuthorDate: Tue Sep 26 11:07:41 2023 -0500 [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry ### What changes were proposed in this pull request? This PR fixes a bug in `RetryingBlockTransferor` that happens when retry initiation has failed. With this patch, the callers of `RetryingBlockTransfeathror#initiateRetry()` will catch any error and invoke the parent listener's exception handler. ### Why are the changes needed? This is needed to prevent an edge case where retry initiation fails and executor gets stuck. More details in SPARK-44756 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Added a new test case in `RetryingBlockTransferorSuite` that simulates the problematic scenario. https://github.com/apache/spark/assets/17327104/f20ec327-f5c9-4d74-b861-1ea4e05eb46b";> Closes #42426 from hdaikoku/SPARK-44756. Authored-by: Harunobu Daikoku Signed-off-by: Mridul Muralidharan gmail.com> --- .../network/shuffle/RetryingBlockTransferor.java | 47 -- .../shuffle/RetryingBlockTransferorSuite.java | 34 +++- 2 files changed, 67 insertions(+), 14 deletions(-) diff --git a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java index 892de991612..c628b201b20 100644 --- a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java +++ b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java @@ -144,6 +144,11 @@ public class RetryingBlockTransferor { this(conf, transferStarter, blockIds, listener, ErrorHandler.NOOP_ERROR_HANDLER); } + @VisibleForTesting + synchronized void setCurrentListener(RetryingBlockTransferListener listener) { +this.currentListener = listener; + } + /** * Initiates the transfer of all blocks provided in the constructor, with possible retries * in the event of transient IOExceptions. @@ -176,12 +181,14 @@ public class RetryingBlockTransferor { listener.getTransferType(), blockIdsToTransfer.length, numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e); - if (shouldRetry(e)) { -initiateRetry(e); - } else { -for (String bid : blockIdsToTransfer) { - listener.onBlockTransferFailure(bid, e); -} + if (shouldRetry(e) && initiateRetry(e)) { +// successfully initiated a retry +return; + } + + // retry is not possible, so fail remaining blocks + for (String bid : blockIdsToTransfer) { +listener.onBlockTransferFailure(bid, e); } } } @@ -189,8 +196,10 @@ public class RetryingBlockTransferor { /** * Lightweight method which initiates a retry in a different thread. The retry will involve * calling transferAllOutstanding() after a configured wait time. + * Returns true if the retry was successfully initiated, false otherwise. */ - private synchronized void initiateRetry(Throwable e) { + @VisibleForTesting + synchronized boolean initiateRetry(Throwable e) { if (enableSaslRetries && e instanceof SaslTimeoutException) { saslRetryCount += 1; } @@ -201,10 +210,17 @@ public class RetryingBlockTransferor { listener.getTransferType(), retryCount, maxRetries, outstandingBlocksIds.size(), retryWaitTime); -executorService.submit(() -> { - Uninterruptibles.sleepUninterruptibly(retryWaitTime, TimeUnit.MILLISECONDS); - transferAllOutstanding(); -}); +try { + executorService.execute(() -> { +Uninterruptibles.sleepUninterruptibly(retryWaitTime, TimeUnit.MILLISECONDS); +transferAllOutstanding(); + }); +} catch (Throwable t) { + logger.error("Exception while trying to initiate retry", t); + return false; +} + +return true; } /** @@ -240,7 +256,8 @@ public class RetryingBlockTransferor { * listener. Note that in the event of a retry, we will immediately replace the 'currentListener' * field, indicating that any responses from non-current Listeners should be ignored. */ - private class RetryingBlockTransferListener implements + @VisibleForTesting + class RetryingBlockTransferListener implements BlockFetchingLis
[spark] branch master updated: [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 60d02b444e2 [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD` 60d02b444e2 is described below commit 60d02b444e2225b3afbe4955dabbea505e9f769c Author: Max Gekk AuthorDate: Tue Sep 26 17:33:07 2023 +0300 [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD` ### What changes were proposed in this pull request? In the PR, I propose to add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`, and set it to the current value of: - `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles` in Spark `core`, - `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` when the rdds created in Spark SQL. ### Why are the changes needed? 1. To make `HadoopRDD` and `NewHadoopRDD` consistent to other RDDs like `FileScanRDD` created by Spark SQL that take into account the SQL configs `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles`. 2. To improve user experience with Spark SQL, so, users can control ignoring of missing files without re-creating spark context. ### Does this PR introduce _any_ user-facing change? Yes, `HadoopRDD`/`NewHadoopRDD` invoked by SQL code such hive table scans respect the SQL configs `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` and don't respect the core configs `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles`. ### How was this patch tested? By running the affected tests: ``` $ build/sbt "test:testOnly *QueryPartitionSuite" $ build/sbt "test:testOnly *FileSuite" $ build/sbt "test:testOnly *FileBasedDataSourceSuite" ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43097 from MaxGekk/dynamic-ignoreMissingFiles. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../scala/org/apache/spark/rdd/HadoopRDD.scala | 31 ++ .../scala/org/apache/spark/rdd/NewHadoopRDD.scala | 27 +++ docs/sql-migration-guide.md| 1 + .../org/apache/spark/sql/hive/TableReader.scala| 9 --- .../spark/sql/hive/QueryPartitionSuite.scala | 6 ++--- 5 files changed, 58 insertions(+), 16 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala index cad107256c5..0b5f6a3d716 100644 --- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala @@ -89,6 +89,8 @@ private[spark] class HadoopPartition(rddId: Int, override val index: Int, s: Inp * @param keyClass Class of the key associated with the inputFormatClass. * @param valueClass Class of the value associated with the inputFormatClass. * @param minPartitions Minimum number of HadoopRDD partitions (Hadoop Splits) to generate. + * @param ignoreCorruptFiles Whether to ignore corrupt files. + * @param ignoreMissingFiles Whether to ignore missing files. * * @note Instantiating this class directly is not recommended, please use * `org.apache.spark.SparkContext.hadoopRDD()` @@ -101,13 +103,36 @@ class HadoopRDD[K, V]( inputFormatClass: Class[_ <: InputFormat[K, V]], keyClass: Class[K], valueClass: Class[V], -minPartitions: Int) +minPartitions: Int, +ignoreCorruptFiles: Boolean, +ignoreMissingFiles: Boolean) extends RDD[(K, V)](sc, Nil) with Logging { if (initLocalJobConfFuncOpt.isDefined) { sparkContext.clean(initLocalJobConfFuncOpt.get) } + def this( + sc: SparkContext, + broadcastedConf: Broadcast[SerializableConfiguration], + initLocalJobConfFuncOpt: Option[JobConf => Unit], + inputFormatClass: Class[_ <: InputFormat[K, V]], + keyClass: Class[K], + valueClass: Class[V], + minPartitions: Int) = { +this( + sc, + broadcastedConf, + initLocalJobConfFuncOpt, + inputFormatClass, + keyClass, + valueClass, + minPartitions, + ignoreCorruptFiles = sc.conf.get(IGNORE_CORRUPT_FILES), + ignoreMissingFiles = sc.conf.get(IGNORE_MISSING_FILES) +) + } + def this( sc: SparkContext, conf: JobConf, @@ -135,10 +160,6 @@ class HadoopRDD[K, V]( private val shouldCloneJobConf = sparkContext.conf.getBoolean("spark.hadoop.cloneConf", false) - private val ignoreCorruptFiles = sparkContext.conf.get(IGNORE_CORRUPT_FILES) - - private val ignoreMissingFiles = sparkContext.conf.get(IGNORE_MISSING_FILES) - private val ignoreEmptySplits = sparkContext.conf.get(HADOOP_RDD_IGNORE_EMPTY_SPLITS) //
[spark] branch master updated: [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2aa06fcf160 [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors 2aa06fcf160 is described below commit 2aa06fcf1607bbad9e09649e587493032e739e35 Author: panbingkun AuthorDate: Tue Sep 26 19:35:27 2023 +0800 [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors ### What changes were proposed in this pull request? The pr aims to - merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION - delete some unused method in QueryCompilationErrors - refactoring some methods to reduce call hierarchy ### Why are the changes needed? The changes improve the error framework. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? - Pass GA - Manually test ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43044 from panbingkun/LEGACY_ERROR_TEMP_1113. Authored-by: panbingkun Signed-off-by: Wenchen Fan --- .../src/main/resources/error/error-classes.json| 5 -- .../spark/sql/catalyst/plans/logical/object.scala | 12 ++- .../spark/sql/errors/QueryCompilationErrors.scala | 88 +- .../main/scala/org/apache/spark/sql/Dataset.scala | 2 +- .../datasources/v2/TableCapabilityCheck.scala | 2 +- .../streaming/test/DataStreamTableAPISuite.scala | 13 +++- 6 files changed, 57 insertions(+), 65 deletions(-) diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-classes.json index 9bcbcbc1962..5d827c67482 100644 --- a/common/utils/src/main/resources/error/error-classes.json +++ b/common/utils/src/main/resources/error/error-classes.json @@ -4097,11 +4097,6 @@ "DESCRIBE does not support partition for v2 tables." ] }, - "_LEGACY_ERROR_TEMP_1113" : { -"message" : [ - "Table does not support ." -] - }, "_LEGACY_ERROR_TEMP_1114" : { "message" : [ "The streaming sources in a query do not have a common supported execution mode.", diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala index d4851019db8..9bf8db0b4fa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala @@ -727,16 +727,20 @@ object JoinWith { if a.sameRef(b) => catalyst.expressions.EqualTo( plan.left.resolveQuoted(a.name, resolver).getOrElse( - throw QueryCompilationErrors.resolveException(a.name, plan.left.schema.fieldNames)), + throw QueryCompilationErrors.unresolvedColumnError( +a.name, plan.left.schema.fieldNames)), plan.right.resolveQuoted(b.name, resolver).getOrElse( - throw QueryCompilationErrors.resolveException(b.name, plan.right.schema.fieldNames))) + throw QueryCompilationErrors.unresolvedColumnError( +b.name, plan.right.schema.fieldNames))) case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: AttributeReference) if a.sameRef(b) => catalyst.expressions.EqualNullSafe( plan.left.resolveQuoted(a.name, resolver).getOrElse( - throw QueryCompilationErrors.resolveException(a.name, plan.left.schema.fieldNames)), + throw QueryCompilationErrors.unresolvedColumnError( +a.name, plan.left.schema.fieldNames)), plan.right.resolveQuoted(b.name, resolver).getOrElse( - throw QueryCompilationErrors.resolveException(b.name, plan.right.schema.fieldNames))) + throw QueryCompilationErrors.unresolvedColumnError( +b.name, plan.right.schema.fieldNames))) } } plan.copy(condition = cond) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala index 3536626d239..9d2b1225825 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala @@ -818,10 +818,6 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase with Compilat messageParameters = Map("hintName" -> hintName)) } - def attributeNameSyntaxError(name: String): Throwable = {
[spark] branch master updated: [SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e54c866701d [SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17 e54c866701d is described below commit e54c866701dda617f625545192f321e88b3e614e Author: Hyukjin Kwon AuthorDate: Tue Sep 26 19:59:04 2023 +0900 [SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17 ### What changes were proposed in this pull request? This PR removes all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17. ### Why are the changes needed? - To remove unused code. - We dropped JDK 8 and 11 at SPARK-44112 so no need to check lower versions conditionally. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? CI in this PR should test them out. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43098 from HyukjinKwon/SPARK-45309. Authored-by: Hyukjin Kwon Signed-off-by: Hyukjin Kwon --- .../org/apache/spark/sql/ClientE2ETestSuite.scala | 23 +++- .../apache/spark/sql/SQLImplicitsTestSuite.scala | 11 .../org/apache/spark/internal/config/UI.scala | 4 +-- .../org/apache/spark/storage/StorageUtils.scala| 32 ++ .../org/apache/spark/util/ClosureCleaner.scala | 5 ++-- .../sql/hive/execution/InsertIntoHiveTable.scala | 23 .../hive/HiveExternalCatalogVersionsSuite.scala| 6 +--- .../spark/sql/hive/HiveSparkSubmitSuite.scala | 13 +++-- .../spark/sql/hive/client/HiveClientSuite.scala| 9 ++ .../spark/sql/hive/execution/HiveQuerySuite.scala | 8 +- 10 files changed, 35 insertions(+), 99 deletions(-) diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala index 55718ed9c0b..c8999a2f22c 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala @@ -26,7 +26,6 @@ import scala.collection.mutable import org.apache.commons.io.FileUtils import org.apache.commons.io.output.TeeOutputStream -import org.apache.commons.lang3.{JavaVersion, SystemUtils} import org.scalactic.TolerantNumerics import org.scalatest.PrivateMethodTester @@ -410,18 +409,16 @@ class ClientE2ETestSuite extends RemoteSparkSession with SQLHelper with PrivateM test("write jdbc") { assume(IntegrationTestUtils.isSparkHiveJarAvailable) -if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) { - val url = "jdbc:derby:memory:1234" - val table = "t1" - try { -spark.range(10).write.jdbc(url = s"$url;create=true", table, new Properties()) -val result = spark.read.jdbc(url = url, table, new Properties()).collect() -assert(result.length == 10) - } finally { -// clean up -assertThrows[SparkException] { - spark.read.jdbc(url = s"$url;drop=true", table, new Properties()).collect() -} +val url = "jdbc:derby:memory:1234" +val table = "t1" +try { + spark.range(10).write.jdbc(url = s"$url;create=true", table, new Properties()) + val result = spark.read.jdbc(url = url, table, new Properties()).collect() + assert(result.length == 10) +} finally { + // clean up + assertThrows[SparkException] { +spark.read.jdbc(url = s"$url;drop=true", table, new Properties()).collect() } } } diff --git a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala index 680380c91a0..2e258a356fc 100644 --- a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala +++ b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala @@ -23,7 +23,7 @@ import java.util.concurrent.atomic.AtomicLong import io.grpc.inprocess.InProcessChannelBuilder import org.apache.arrow.memory.RootAllocator -import org.apache.commons.lang3.{JavaVersion, SystemUtils} +import org.apache.commons.lang3.SystemUtils import org.scalatest.BeforeAndAfterAll import org.apache.spark.sql.connect.client.SparkConnectClient @@ -146,13 +146,12 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with BeforeAndAfterAll { testImplicit(BigDecimal(decimal)) testImplicit(Date.valueOf(LocalDate.now())) testImplicit(LocalDate.now()) -// SPARK-42770: Run `LocalDateTime.now()` and `Instant.now()
[spark] branch master updated: [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 9933e9c2c54 [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4 9933e9c2c54 is described below commit 9933e9c2c54e7081ef3f23c4b3804d3ecdd175ff Author: Bjørn Jørgensen AuthorDate: Tue Sep 26 19:57:46 2023 +0900 [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4 ### What changes were proposed in this pull request? Upgrade snappy from 1.1.10.3 to 1.1.10.4 ### Why are the changes needed? Security Fix Fixed SnappyInputStream so as not to allocate too large memory when decompressing data with an extremely large chunk size by tunnelshade ([code change](https://github.com/xerial/snappy-java/commit/9f8c3cf74223ed0a8a834134be9c917b9f10ceb5)) This does not affect users only using Snappy.compress/uncompress methods [Release note](https://github.com/xerial/snappy-java/releases) Details While performing mitigation efforts related to [CVE-2023-34455](https://nvd.nist.gov/vuln/detail/CVE-2023-34455) in Confluent products, our Application Security team closely analyzed the fix that was accepted and merged into snappy-java version 1.1.10.1 in [this](https://github.com/xerial/snappy-java/commit/3bf67857fcf70d9eea56eed4af7c925671e8eaea) commit. The check on [line 421](https://github.com/xerial/snappy-java/commit/3bf67857fcf70d9eea56eed4af7c925671e8eaea#diff-c3e536102670929 [...] ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43108 Closes #43109 from bjornjorgensen/snappy_compress. Authored-by: Bjørn Jørgensen Signed-off-by: Hyukjin Kwon --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +- pom.xml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index f11a7d757f1..206361e1efa 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -240,7 +240,7 @@ shims/0.9.45//shims-0.9.45.jar slf4j-api/2.0.9//slf4j-api-2.0.9.jar snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar snakeyaml/2.0//snakeyaml-2.0.jar -snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar +snappy-java/1.1.10.4//snappy-java-1.1.10.4.jar spire-macros_2.13/0.18.0//spire-macros_2.13-0.18.0.jar spire-platform_2.13/0.18.0//spire-platform_2.13-0.18.0.jar spire-util_2.13/0.18.0//spire-util_2.13-0.18.0.jar diff --git a/pom.xml b/pom.xml index 33dc854dd26..5fd3e173857 100644 --- a/pom.xml +++ b/pom.xml @@ -188,7 +188,7 @@ 2.15.2 2.3.0 3.0.2 -1.1.10.3 +1.1.10.4 3.0.3 1.16.0 1.24.0 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4 deleted (was 2b18d0c7daa)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4 in repository https://gitbox.apache.org/repos/asf/spark.git was 2b18d0c7daa Bump org.xerial.snappy:snappy-java from 1.1.10.3 to 1.1.10.4 The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files
This is an automated email from the ASF dual-hosted git repository. yangjie01 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 2094e712170 [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files 2094e712170 is described below commit 2094e71217005655349f4e78847ccbdb6b886bd0 Author: yangjie01 AuthorDate: Tue Sep 26 18:39:24 2023 +0800 [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files ### What changes were proposed in this pull request? The purpose of this pr is to clean up the binary files used to assist with Scala 2.12 testing. They include: - `core/src/test/resources/TestHelloV3_2.12.jar` and `core/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44246(https://github.com/apache/spark/pull/41789). - `connector/connect/client/jvm/src/test/resources/udf2.12` and `connector/connect/client/jvm/src/test/resources/udf2.12.jar` added by SPARK-43744(https://github.com/apache/spark/pull/42069) - `connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar` added by SPARK-44293(https://github.com/apache/spark/pull/41844) - `sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar` added by SPARK-25304(https://github.com/apache/spark/pull/22308) ### Why are the changes needed? Spark 4.0 no longer supports Scala 2.12. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43106 from LuciferYang/SPARK-45321. Authored-by: yangjie01 Signed-off-by: yangjie01 --- .../client/jvm/src/test/resources/TestHelloV2_2.12.jar | Bin 3784 -> 0 bytes connector/connect/client/jvm/src/test/resources/udf2.12 | Bin 1520 -> 0 bytes .../connect/client/jvm/src/test/resources/udf2.12.jar| Bin 5332 -> 0 bytes core/src/test/resources/TestHelloV2_2.12.jar | Bin 3784 -> 0 bytes core/src/test/resources/TestHelloV3_2.12.jar | Bin 3595 -> 0 bytes .../resources/regression-test-SPARK-8489/test-2.12.jar | Bin 7179 -> 0 bytes 6 files changed, 0 insertions(+), 0 deletions(-) diff --git a/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar b/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar deleted file mode 100644 index d89cf6543a2..000 Binary files a/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar and /dev/null differ diff --git a/connector/connect/client/jvm/src/test/resources/udf2.12 b/connector/connect/client/jvm/src/test/resources/udf2.12 deleted file mode 100644 index 1090bc90d9b..000 Binary files a/connector/connect/client/jvm/src/test/resources/udf2.12 and /dev/null differ diff --git a/connector/connect/client/jvm/src/test/resources/udf2.12.jar b/connector/connect/client/jvm/src/test/resources/udf2.12.jar deleted file mode 100644 index 6ce6799678f..000 Binary files a/connector/connect/client/jvm/src/test/resources/udf2.12.jar and /dev/null differ diff --git a/core/src/test/resources/TestHelloV2_2.12.jar b/core/src/test/resources/TestHelloV2_2.12.jar deleted file mode 100644 index d89cf6543a2..000 Binary files a/core/src/test/resources/TestHelloV2_2.12.jar and /dev/null differ diff --git a/core/src/test/resources/TestHelloV3_2.12.jar b/core/src/test/resources/TestHelloV3_2.12.jar deleted file mode 100644 index b175a6c8640..000 Binary files a/core/src/test/resources/TestHelloV3_2.12.jar and /dev/null differ diff --git a/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar b/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar deleted file mode 100644 index b0d3fd17a41..000 Binary files a/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar and /dev/null differ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [MINOR][BUILD] Fix lint-js
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 73d17d52a8bc [MINOR][BUILD] Fix lint-js 73d17d52a8bc is described below commit 73d17d52a8bc2d761412e1954eaa6c0bdef44a9d Author: panbingkun AuthorDate: Tue Sep 26 19:18:31 2023 +0900 [MINOR][BUILD] Fix lint-js ### What changes were proposed in this pull request? The pr aims to fix lint-js. ### Why are the changes needed? https://github.com/panbingkun/spark/actions/runs/6306820397/job/17123186216 https://github.com/apache/spark/assets/15246973/7e70617a-c15e-47de-8282-5b06b5426567";> ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually test. https://github.com/apache/spark/assets/15246973/d62e1dbf-da12-478c-855a-f82df804bd75";> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #43122 from panbingkun/mirror_lint-js. Authored-by: panbingkun Signed-off-by: Hyukjin Kwon --- .../resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js index 8999d6ff1fed..96a7a7a3cc0e 100644 --- a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js +++ b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js @@ -258,7 +258,7 @@ function onClickAdditionalMetricsCheckbox(checkboxNode) { window.localStorage.setItem("stageId-and-taskId-checked", isChecked); } -function togglePlanViz() { +function togglePlanViz() { // eslint-disable-line no-unused-vars const arrow = d3.select("#plan-viz-graph-arrow"); arrow.each(function () { $(this).toggleClass("arrow-open").toggleClass("arrow-closed") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org