date:20230926

[spark] branch master updated: [SPARK-45267][PS] Change the default value for numeric_only

2023-09-26 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8cbc741320d [SPARK-45267][PS] Change the default value for numeric_only
8cbc741320d is described below

commit 8cbc741320dac60ce814ce0a9b3e72239248efb8
Author: Haejoon Lee 
AuthorDate: Wed Sep 27 14:04:54 2023 +0800

[SPARK-45267][PS] Change the default value for numeric_only

### What changes were proposed in this pull request?

This PR proposes to change the default value for `numeric_only` with 
related functions.

### Why are the changes needed?

There are many functions that support `numeric_only` parameter have changed 
their default value from `True` to `False` from Pandas 2.0.0, so we should 
follow their behavior. See 
https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html for more detail.

### Does this PR introduce _any_ user-facing change?

Yes, the default value for `numeric_only` is changed to `False`.

### How was this patch tested?

Updated the related UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43043 from itholic/numeric_only.

Authored-by: Haejoon Lee 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/pandas/frame.py | 38 +++
 python/pyspark/pandas/groupby.py   | 54 +++---
 python/pyspark/pandas/series.py| 13 --
 .../pandas/tests/computation/test_compute.py   |  8 +++-
 4 files changed, 47 insertions(+), 66 deletions(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 08450c0be87..faa595f80e3 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -747,7 +747,7 @@ class DataFrame(Frame, Generic[T]):
 sfun: Callable[["Series"], PySparkColumn],
 name: str,
 axis: Optional[Axis] = None,
-numeric_only: bool = True,
+numeric_only: bool = False,
 skipna: bool = True,
 **kwargs: Any,
 ) -> "Series":
@@ -762,10 +762,8 @@ class DataFrame(Frame, Generic[T]):
 axis: used only for sanity check because the series only supports 
index axis.
 name : original pandas API name.
 axis : axis to apply. 0 or 1, or 'index' or 'columns.
-numeric_only : bool, default True
-Include only float, int, boolean columns. False is not supported. 
This parameter
-is mainly for pandas compatibility. Only 'DataFrame.count' uses 
this parameter
-currently.
+numeric_only : bool, default False
+Include only float, int, boolean columns.
 skipna : bool, default True
 Exclude NA/null values when computing the result.
 """
@@ -11150,7 +11148,7 @@ defaultdict(, {'col..., 'col...})]
 
 # TODO: add axis, pct, na_option parameter
 def rank(
-self, method: str = "average", ascending: bool = True, numeric_only: 
Optional[bool] = None
+self, method: str = "average", ascending: bool = True, numeric_only: 
bool = False
 ) -> "DataFrame":
 """
 Compute numerical data ranks (1 through n) along axis. Equal values are
@@ -11171,9 +11169,13 @@ defaultdict(, {'col..., 'col...})]
 * dense: like 'min', but rank always increases by 1 between groups
 ascending : boolean, default True
 False for ranks by high (1) to low (N)
-numeric_only : bool, optional
+numeric_only : bool, default False
 For DataFrame objects, rank only numeric columns if set to True.
 
+.. versionchanged:: 4.0.0
+The default value of ``numeric_only`` is now ``False``.
+
+
 Returns
 ---
 ranks : same type as caller
@@ -11238,11 +11240,6 @@ defaultdict(, {'col..., 'col...})]
 2  2.5
 3  4.0
 """
-warnings.warn(
-"Default value of `numeric_only` will be changed to `False` "
-"instead of `None` in 4.0.0.",
-FutureWarning,
-)
 if numeric_only:
 numeric_col_names = []
 for label in self._internal.column_labels:
@@ -12206,7 +12203,7 @@ defaultdict(, {'col..., 'col...})]
 self,
 q: Union[float, Iterable[float]] = 0.5,
 axis: Axis = 0,
-numeric_only: bool = True,
+numeric_only: bool = False,
 accuracy: int = 1,
 ) -> DataFrameOrSeries:
 """
@@ -1,9 +12219,12 @@ defaultdict(, {'col..., 'col...})]
 0 <= q <= 1, the quantile(s) to compute.
 axis : int or str, default 0 or 'index'
 Can only be set to 0 now.
-numeric_only : bool, default True
-If False, the quantile of datetime and time

[spark] branch master updated: [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt`

2023-09-26 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 644c8b25bfd [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` 
and `TryElementAt`
644c8b25bfd is described below

commit 644c8b25bfda1fe323a7b621d14a340018560136
Author: Ruifeng Zheng 
AuthorDate: Wed Sep 27 14:03:31 2023 +0800

[SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt`

### What changes were proposed in this pull request?
Correct the group of `ElementAt` and `TryElementAt`, they both support 
array and map input, so should be in `collection functions`.

Existing category strategy seems like this: if a function support more than 
one types, it should be in `collection functions`:

- `Size` supports both array and map, it is in `collection functions`;
- `Reverse` supports both array and string, it is in `collection functions`;

So far, I didn't find other places with incorrect groups.

### Why are the changes needed?
for docs

### Does this PR introduce _any_ user-facing change?
yes, they will be in `collection functions` in SQL references

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43121 from zhengruifeng/group_element_at.

Lead-authored-by: Ruifeng Zheng 
Co-authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 .../apache/spark/sql/catalyst/expressions/collectionOperations.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
index 4a3c7bbc2be..759000bc5f5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
@@ -2333,7 +2333,7 @@ case class Get(
b
   """,
   since = "2.4.0",
-  group = "map_funcs")
+  group = "collection_funcs")
 case class ElementAt(
 left: Expression,
 right: Expression,
@@ -2557,7 +2557,7 @@ case class ElementAt(
b
   """,
   since = "3.3.0",
-  group = "map_funcs")
+  group = "collection_funcs")
 case class TryElementAt(left: Expression, right: Expression, replacement: 
Expression)
   extends RuntimeReplaceable with InheritAnalysisRules {
   def this(left: Expression, right: Expression) =


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eff46ea77e9 -> 28dc555821b)

2023-09-26 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config 
`spark.sql.hive.verifyPartitionPath`
 add 28dc555821b [SPARK-45329][PYTHON][CONNECT] DataFrame methods skip 
pandas conversion

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/dataframe.py | 66 +
 1 file changed, 34 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath`

2023-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eff46ea77e9 [SPARK-45340][SQL] Remove the SQL config 
`spark.sql.hive.verifyPartitionPath`
eff46ea77e9 is described below

commit eff46ea77e9bebef3076277bef1e086833dd
Author: Max Gekk 
AuthorDate: Wed Sep 27 08:28:45 2023 +0300

[SPARK-45340][SQL] Remove the SQL config 
`spark.sql.hive.verifyPartitionPath`

### What changes were proposed in this pull request?
In the PR, I propose to remove already deprecated SQL config 
`spark.sql.hive.verifyPartitionPath`, and the code under the config. The config 
has been deprecated since Spark 3.0.

### Why are the changes needed?
To improve code maintainability by remove unused code.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the modified test suite:
```
$ build/sbt "test:testOnly *SQLConfSuite"
$ build/sbt "test:testOnly *QueryPartitionSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43130 from MaxGekk/remove-verifyPartitionPath.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../org/apache/spark/sql/internal/SQLConf.scala| 17 ++---
 .../apache/spark/sql/internal/SQLConfSuite.scala   |  4 +--
 .../org/apache/spark/sql/hive/TableReader.scala| 41 +-
 .../spark/sql/hive/QueryPartitionSuite.scala   | 12 ++-
 4 files changed, 8 insertions(+), 66 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 43eb0756d8d..aeef531dbcd 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -34,7 +34,6 @@ import org.apache.hadoop.fs.Path
 import org.apache.spark.{ErrorMessageFormat, SparkConf, SparkContext, 
TaskContext}
 import org.apache.spark.internal.Logging
 import org.apache.spark.internal.config._
-import org.apache.spark.internal.config.{IGNORE_MISSING_FILES => 
SPARK_IGNORE_MISSING_FILES}
 import org.apache.spark.network.util.ByteUnit
 import org.apache.spark.sql.catalyst.ScalaReflection
 import org.apache.spark.sql.catalyst.analysis.{HintErrorLogger, Resolver}
@@ -1261,14 +1260,6 @@ object SQLConf {
 .booleanConf
 .createWithDefault(false)
 
-  val HIVE_VERIFY_PARTITION_PATH = 
buildConf("spark.sql.hive.verifyPartitionPath")
-.doc("When true, check all the partition paths under the table\'s root 
directory " +
- "when reading data stored in HDFS. This configuration will be 
deprecated in the future " +
- s"releases and replaced by ${SPARK_IGNORE_MISSING_FILES.key}.")
-.version("1.4.0")
-.booleanConf
-.createWithDefault(false)
-
   val HIVE_METASTORE_DROP_PARTITION_BY_NAME =
 buildConf("spark.sql.hive.dropPartitionByName.enabled")
   .doc("When true, Spark will get partition name rather than partition 
object " +
@@ -4472,8 +4463,6 @@ object SQLConf {
 PANDAS_GROUPED_MAP_ASSIGN_COLUMNS_BY_NAME.key, "2.4",
 "The config allows to switch to the behaviour before Spark 2.4 " +
   "and will be removed in the future releases."),
-  DeprecatedConfig(HIVE_VERIFY_PARTITION_PATH.key, "3.0",
-s"This config is replaced by '${SPARK_IGNORE_MISSING_FILES.key}'."),
   DeprecatedConfig(ARROW_EXECUTION_ENABLED.key, "3.0",
 s"Use '${ARROW_PYSPARK_EXECUTION_ENABLED.key}' instead of it."),
   DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
@@ -4552,7 +4541,9 @@ object SQLConf {
   RemovedConfig("spark.sql.ansi.strictIndexOperator", "3.4.0", "true",
 "This was an internal configuration. It is not needed anymore since 
Spark SQL always " +
   "returns null when getting a map value with a non-existing key. See 
SPARK-40066 " +
-  "for more details.")
+  "for more details."),
+  RemovedConfig("spark.sql.hive.verifyPartitionPath", "4.0.0", "false",
+s"This config was replaced by '${IGNORE_MISSING_FILES.key}'.")
 )
 
 Map(configs.map { cfg => cfg.key -> cfg } : _*)
@@ -4766,8 +4757,6 @@ class SQLConf extends Serializable with Logging with 
SqlApiConf {
 
   def isOrcSchemaMergingEnabled: Boolean = getConf(ORC_SCHEMA_MERGING_ENABLED)
 
-  def verifyPartitionPath: Boolean = getConf(HIVE_VERIFY_PARTITION_PATH)
-
   def metastoreDropPartitionsByName: Boolean = 
getConf(HIVE_METASTORE_DROP_PARTITION_BY_NAME)
 
   def metastorePartitionPruning: Boolean = 
getConf(HIVE_METASTORE_PARTITION_PRUNING)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala 
b/sql/core/src/test/scala/org/apache/sp

[spark] branch master updated: [SPARK-44780][DOC] SQL temporary variables

2023-09-26 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f6c6acbc00d [SPARK-44780][DOC] SQL temporary variables
f6c6acbc00d is described below

commit f6c6acbc00d2d96c43298c282e4bd8ebeb160ad1
Author: Serge Rielau 
AuthorDate: Wed Sep 27 13:05:19 2023 +0800

[SPARK-44780][DOC] SQL temporary variables

### What changes were proposed in this pull request?

Document the previously pushed feature SQL temporary variables

### Why are the changes needed?

If it's not documented, it doesn't exist

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

build docs, verify HTML

Closes #42467 from srielau/SPARK-44780-Doc-sql-session-variables.

Lead-authored-by: Serge Rielau 
Co-authored-by: srielau 
Signed-off-by: Wenchen Fan 
---
 docs/sql-ref-syntax-aux-conf-mgmt-set.md|  3 +
 docs/sql-ref-syntax-aux-set-var.md  | 98 +
 docs/sql-ref-syntax-ddl-declare-variable.md | 82 
 docs/sql-ref-syntax-ddl-drop-variable.md| 66 +++
 docs/sql-ref-syntax.md  |  3 +
 5 files changed, 252 insertions(+)

diff --git a/docs/sql-ref-syntax-aux-conf-mgmt-set.md 
b/docs/sql-ref-syntax-aux-conf-mgmt-set.md
index f97b7f2a8ef..9e57a221f96 100644
--- a/docs/sql-ref-syntax-aux-conf-mgmt-set.md
+++ b/docs/sql-ref-syntax-aux-conf-mgmt-set.md
@@ -23,6 +23,8 @@ license: |
 
 The SET command sets a property, returns the value of an existing property or 
returns all SQLConf properties with value and meaning.
 
+To set SQL variables defined with [DECLARE 
VARIABLE](sql-ref-syntax-ddl-declare-variable.html) use [SET 
VAR](sql-ref-syntax-aux-set-var.html).
+
 ### Syntax
 
 ```sql
@@ -69,3 +71,4 @@ SET spark.sql.variable.substitute;
 ### Related Statements
 
 * [RESET](sql-ref-syntax-aux-conf-mgmt-reset.html)
+* [SET VAR](sql-ref-syntax-aux-set-var.html)
diff --git a/docs/sql-ref-syntax-aux-set-var.md 
b/docs/sql-ref-syntax-aux-set-var.md
new file mode 100644
index 000..9ce9e68cd4f
--- /dev/null
+++ b/docs/sql-ref-syntax-aux-set-var.md
@@ -0,0 +1,98 @@
+---
+layout: global
+title: SET VAR
+displayTitle: SET VAR
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+ 
+ http://www.apache.org/licenses/LICENSE-2.0
+ 
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+### Description
+
+The `SET VAR` command sets a temporary variable which has been previously 
declared in the current session.
+
+To set a config variable or a hive variable use 
[SET](sql-ref-syntax-aux-conf-mgmt-set.html).
+
+### Syntax
+
+```sql
+SET { VAR | VARIABLE }
+  { { variable_name = { expression | DEFAULT } } [, ...] |
+( variable_name [, ...] ) = ( query ) }
+```
+
+### Parameters
+
+* **variable_name**
+
+  Specifies an existing variable.
+  If you specify multiple variables, there must not be any duplicates.
+
+* **expression** 
+
+  Any expression, including scalar subqueries.
+
+* **DEFAULT**
+
+  If you specify `DEFAULT`, the default expression of the variable is assigned,
+  or `NULL` if there is none.
+
+* **query**
+
+  A [query](sql-ref-syntax-qry-select.html) that returns at most one row and 
as many columns as
+  the number of specified variables. Each column must be implicitly castable 
to the data type of the
+  corresponding variable.
+  If the query returns no row `NULL` values are assigned.
+
+### Examples
+
+```sql
+-- 
+DECLARE VARIABLE var1 INT DEFAULT 7;
+DECLARE VARIABLE var2 STRING;
+
+-- A simple assignment
+SET VAR var1 = 5;
+SELECT var1;
+  5
+
+-- A complex expression assignment
+SET VARIABLE var1 = (SELECT max(c1) FROM VALUES(1), (2) AS t(c1));
+SELECT var1;
+  2
+
+-- resetting the variable to DEFAULT
+SET VAR var1 = DEFAULT;
+SELECT var1;
+  7
+
+-- A multi variable assignment
+SET VAR (var1, var2) = (SELECT max(c1), CAST(min(c1) AS STRING) FROM 
VALUES(1), (2) AS t(c1));
+SELECT var1, var2;
+ 2 1
+
+-- Too many rows
+SET VAR (var1, var2) = (SELECT c1, CAST(c1 AS STRING) FROM VALUES(1), (2) AS 
t(c1));
+Error: ROW_SUBQUERY_TOO_MANY_ROWS
+
+-- No rows
+SET VAR (var1, var2) =

[spark] branch master updated: [SPARK-43850][BUILD][GRAPHX] Remove the import for `scala.language.higherKinds` and delete the corresponding suppression rule

2023-09-26 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1597d8174a5 [SPARK-43850][BUILD][GRAPHX] Remove the import for 
`scala.language.higherKinds` and delete the corresponding suppression rule
1597d8174a5 is described below

commit 1597d8174a54b8657572f2e40897a20c985d2794
Author: yangjie01 
AuthorDate: Wed Sep 27 12:26:46 2023 +0800

[SPARK-43850][BUILD][GRAPHX] Remove the import for 
`scala.language.higherKinds` and delete the corresponding suppression rule

### What changes were proposed in this pull request?
`scala.language.higherKinds` is deprecated and no longer needs to be 
imported explicitly in Scala 2.13, so this PR removes the imports for 
scala.language.higherKinds and the corresponding compiler suppression rules.

### Why are the changes needed?
In SPARK-43849(https://github.com/apache/spark/pull/41356), I added 
compiler suppression rules to allow `unused imports` checks to work with both 
Scala 2.12 and Scala 2.13. As there is no longer a need to support Scala 2.12, 
we can now clean them up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43128 from LuciferYang/SPARK-43850.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala| 1 -
 .../scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala | 1 -
 pom.xml | 6 --
 project/SparkBuild.scala| 4 
 4 files changed, 12 deletions(-)

diff --git 
a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala
index 8da46db98be..bbc4bca5016 100644
--- 
a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala
+++ 
b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBase.scala
@@ -17,7 +17,6 @@
 
 package org.apache.spark.graphx.impl
 
-import scala.language.higherKinds
 import scala.reflect.ClassTag
 
 import org.apache.spark.graphx._
diff --git 
a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala
 
b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala
index a8ed59b09bb..cf4c8ca2a9c 100644
--- 
a/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala
+++ 
b/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexPartitionBaseOps.scala
@@ -17,7 +17,6 @@
 
 package org.apache.spark.graphx.impl
 
-import scala.language.higherKinds
 import scala.language.implicitConversions
 import scala.reflect.ClassTag
 
diff --git a/pom.xml b/pom.xml
index 1d0ab387900..6dc54a9bc94 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2970,12 +2970,6 @@
   -Wconf:cat=unchecked&msg=outer reference:s
   -Wconf:cat=unchecked&msg=eliminated by erasure:s
   -Wconf:msg=^(?=.*?a value of type)(?=.*?cannot also 
be).+$:s
-  
-  
-Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBase.scala:s
-  
-Wconf:cat=unused-imports&src=org\/apache\/spark\/graphx\/impl\/VertexPartitionBaseOps.scala:s
   
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index ad2b67c67c6..817f79a84a4 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -258,10 +258,6 @@ object SparkBuild extends PomBuild {
 "-Wconf:cat=unchecked&msg=outer reference:s",
 "-Wconf:cat=unchecked&msg=eliminated by erasure:s",
 "-Wconf:msg=^(?=.*?a value of type)(?=.*?cannot also be).+$:s",
-// TODO(SPARK-43850): Remove the following suppression rules and 
remove `import scala.language.higherKinds`
-// from the corresponding files when Scala 2.12 is no longer supported.
-
"-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBase.scala:s",
-
"-Wconf:cat=unused-imports&src=org\\/apache\\/spark\\/graphx\\/impl\\/VertexPartitionBaseOps.scala:s",
 // SPARK-40497 Upgrade Scala to 2.13.11 and suppress `Implicit 
definition should have explicit type`
 "-Wconf:msg=Implicit definition should have explicit type:s"
   )


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7e8aafd2c0f [SPARK-45334][SQL] Remove misleading comment in 
parquetSchemaConverter
7e8aafd2c0f is described below

commit 7e8aafd2c0f1f6fcd03a69afe2b85fd3fda95d20
Author: lanmengran1 
AuthorDate: Tue Sep 26 21:01:02 2023 -0500

[SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter

### What changes were proposed in this pull request?

Remove one line of comment, the detail info is described in JIRA 
https://issues.apache.org/jira/browse/SPARK-45334

### Why are the changes needed?

The comment is outdated and misleading.
- the parquet-hive module has been removed from the parquet-mr project 
https://issues.apache.org/jira/browse/PARQUET-1676
- Hive always uses "array_element" as the name

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No need

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43119 from 
amoylan2/remove_misleading_comment_in_parquetSchemaConverter.

Authored-by: lanmengran1 
Signed-off-by: Sean Owen 
---
 .../spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
index 9c9e7ce729c..eedd165278a 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
@@ -646,7 +646,6 @@ class SparkToParquetSchemaConverter(
   .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
   .addField(Types
 .buildGroup(REPEATED)
-// "array" is the name chosen by parquet-hive (1.7.0 and prior 
version)
 .addField(convertField(StructField("array", elementType, 
nullable)))
 .named("bag"))
   .named(field.name)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45302][PYTHON] Remove PID communication between Python workers when no demon is used

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 17430fe4702 [SPARK-45302][PYTHON] Remove PID communication between 
Python workers when no demon is used
17430fe4702 is described below

commit 17430fe47029f1d27c7913468b95abfd856fddcc
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 27 10:48:17 2023 +0900

[SPARK-45302][PYTHON] Remove PID communication between Python workers when 
no demon is used

### What changes were proposed in this pull request?

This PR removes the legacy workaround for JDK 8 in `PythonWorkerFactory`.

### Why are the changes needed?

No need to manually send the PID around through the socket.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

There are existing unittests for the daemon disabled.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43087 from HyukjinKwon/SPARK-45302.

Lead-authored-by: Hyukjin Kwon 
Co-authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 core/src/main/scala/org/apache/spark/SparkEnv.scala   |  4 ++--
 .../scala/org/apache/spark/api/python/PythonRunner.scala  | 10 +-
 .../org/apache/spark/api/python/PythonWorkerFactory.scala | 15 +++
 python/pyspark/daemon.py  |  4 ++--
 .../sql/connect/streaming/worker/foreach_batch_worker.py  |  2 --
 .../sql/connect/streaming/worker/listener_worker.py   |  2 --
 python/pyspark/sql/worker/analyze_udtf.py |  3 ---
 python/pyspark/worker.py  |  3 ---
 .../spark/sql/execution/python/PythonArrowOutput.scala|  2 +-
 .../spark/sql/execution/python/PythonUDFRunner.scala  |  2 +-
 10 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
b/core/src/main/scala/org/apache/spark/SparkEnv.scala
index e404c9ee8b4..937170b5ee8 100644
--- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
+++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
@@ -128,7 +128,7 @@ class SparkEnv (
   pythonExec: String,
   workerModule: String,
   daemonModule: String,
-  envVars: Map[String, String]): (PythonWorker, Option[Int]) = {
+  envVars: Map[String, String]): (PythonWorker, Option[Long]) = {
 synchronized {
   val key = PythonWorkersKey(pythonExec, workerModule, daemonModule, 
envVars)
   pythonWorkers.getOrElseUpdate(key,
@@ -139,7 +139,7 @@ class SparkEnv (
   private[spark] def createPythonWorker(
   pythonExec: String,
   workerModule: String,
-  envVars: Map[String, String]): (PythonWorker, Option[Int]) = {
+  envVars: Map[String, String]): (PythonWorker, Option[Long]) = {
 createPythonWorker(
   pythonExec, workerModule, PythonWorkerFactory.defaultDaemonModule, 
envVars)
   }
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index db95e6c2bd6..2a63298d0a1 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -84,7 +84,7 @@ private object BasePythonRunner {
 
   private lazy val faultHandlerLogDir = Utils.createTempDir(namePrefix = 
"faulthandler")
 
-  private def faultHandlerLogPath(pid: Int): Path = {
+  private def faultHandlerLogPath(pid: Long): Path = {
 new File(faultHandlerLogDir, pid.toString).toPath
   }
 }
@@ -200,7 +200,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
 
 envVars.put("SPARK_JOB_ARTIFACT_UUID", 
jobArtifactUUID.getOrElse("default"))
 
-val (worker: PythonWorker, pid: Option[Int]) = env.createPythonWorker(
+val (worker: PythonWorker, pid: Option[Long]) = env.createPythonWorker(
   pythonExec, workerModule, daemonModule, envVars.asScala.toMap)
 // Whether is the worker released into idle pool or closed. When any codes 
try to release or
 // close a worker, they should use `releasedOrClosed.compareAndSet` to 
flip the state to make
@@ -253,7 +253,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
   startTime: Long,
   env: SparkEnv,
   worker: PythonWorker,
-  pid: Option[Int],
+  pid: Option[Long],
   releasedOrClosed: AtomicBoolean,
   context: TaskContext): Iterator[OUT]
 
@@ -463,7 +463,7 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
   startTime: Long,
   env: SparkEnv,
   worker: PythonWorker,
-  pid: Option[Int],
+  pid: Option[Long],
   releasedOrClosed: AtomicBoolean,
   context: TaskContext)
 extends Iterator[OUT] {
@@ -838,7 +838,7 @@ private[spark] class PythonRunn

[spark] branch master updated (e6d1e9ed384 -> 17881eb7eca)

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from e6d1e9ed384 [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` 
package name
 add 17881eb7eca [SPARK-45339][PYTHON][CONNECT] Pyspark should log errors 
it retries

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/client/core.py | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e6d1e9ed384 [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` 
package name
e6d1e9ed384 is described below

commit e6d1e9ed3843352e6a39ad5bb18d9b849442a1de
Author: Jia Fan 
AuthorDate: Wed Sep 27 09:38:39 2023 +0900

[SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name

### What changes were proposed in this pull request?
The `xmlExpressions.scala` file in package 
`org.apache.spark.sql.catalyst.expressions`, but it package name is 
`org.apache.spark.sql.catalyst.expressions.xml`.

### Why are the changes needed?
Fix not correct package name.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
exist test.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43102 from Hisoka-X/xml-package-name-fix.

Authored-by: Jia Fan 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala| 1 -
 sql/core/src/test/resources/sql-functions/sql-expression-schema.md  | 6 +++---
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
index c0fd725943d..df63429ae33 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/xmlExpressions.scala
@@ -14,7 +14,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.spark.sql.catalyst.expressions.xml
+package org.apache.spark.sql.catalyst.expressions
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 2a7ed263c74..a2343ed04d4 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -30,7 +30,6 @@ import org.apache.spark.sql.catalyst.analysis.{Star, 
UnresolvedFunction}
 import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.aggregate._
-import org.apache.spark.sql.catalyst.expressions.xml._
 import org.apache.spark.sql.catalyst.plans.logical.{BROADCAST, HintInfo, 
ResolvedHint}
 import org.apache.spark.sql.catalyst.util.CharVarcharUtils
 import org.apache.spark.sql.errors.{DataTypeErrors, QueryCompilationErrors}
diff --git a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md 
b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
index d21ceaeb14b..4fd493d1a3c 100644
--- a/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
+++ b/sql/core/src/test/resources/sql-functions/sql-expression-schema.md
@@ -274,6 +274,7 @@
 | org.apache.spark.sql.catalyst.expressions.RowNumber | row_number | SELECT a, 
b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 
1), ('A2', 3), ('A1', 1) tab(a, b) | struct |
 | org.apache.spark.sql.catalyst.expressions.SchemaOfCsv | schema_of_csv | 
SELECT schema_of_csv('1,abc') | struct |
 | org.apache.spark.sql.catalyst.expressions.SchemaOfJson | schema_of_json | 
SELECT schema_of_json('[{"col":0}]') | 
struct |
+| org.apache.spark.sql.catalyst.expressions.SchemaOfXml | schema_of_xml | 
SELECT schema_of_xml('1') | 
struct1):string> |
 | org.apache.spark.sql.catalyst.expressions.Sec | sec | SELECT sec(0) | 
struct |
 | org.apache.spark.sql.catalyst.expressions.Second | second | SELECT 
second('2009-07-30 12:58:59') | struct |
 | org.apache.spark.sql.catalyst.expressions.SecondsToTimestamp | 
timestamp_seconds | SELECT timestamp_seconds(1230219000) | 
struct |
@@ -365,6 +366,7 @@
 | org.apache.spark.sql.catalyst.expressions.WeekOfYear | weekofyear | SELECT 
weekofyear('2008-02-20') | struct |
 | org.apache.spark.sql.catalyst.expressions.WidthBucket | width_bucket | 
SELECT width_bucket(5.3, 0.2, 10.6, 5) | struct |
 | org.apache.spark.sql.catalyst.expressions.WindowTime | window_time | SELECT 
a, window.start as start, window.end as end, window_time(window), cnt FROM 
(SELECT a, window, count(*) as cnt FROM VALUES ('A1', '2021-01-01 00:00:00'), 
('A1', '2021-01-01 00:04:30'), ('A1', '2021-01-01 00:06:00'), ('A2', 
'2021-01-01 00:01:00') AS tab(a, b) GROUP by a

[spark] branch master updated: [SPARK-45328][SQL] Remove Hive support prior to 2.0.0

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3c84c229d16 [SPARK-45328][SQL] Remove Hive support prior to 2.0.0
3c84c229d16 is described below

commit 3c84c229d167a6ab2857649e91fff6f0d57bb12c
Author: Hyukjin Kwon 
AuthorDate: Wed Sep 27 07:20:14 2023 +0900

[SPARK-45328][SQL] Remove Hive support prior to 2.0.0

### What changes were proposed in this pull request?

This PR proposes to remove Hive support prior to 2.0.0 
(`spark.sql.hive.metastore.version`).

### Why are the changes needed?

We dropped JDK 8 and 11, and Hive prior to 2.0.0 cannot work together. They 
are actually already the dead code.

### Does this PR introduce _any_ user-facing change?

Technically no, because this wouldn't already work.

### How was this patch tested?

Nope because there is no way to test them.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43116 from HyukjinKwon/SPARK-45328.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/hive/HiveUtils.scala  |  2 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |  6 ---
 .../apache/spark/sql/hive/client/HiveShim.scala| 12 +++---
 .../sql/hive/client/IsolatedClientLoader.scala |  6 ---
 .../org/apache/spark/sql/hive/client/package.scala | 46 +-
 .../spark/sql/hive/execution/HiveTempPath.scala| 40 ++-
 .../spark/sql/hive/client/HiveClientVersions.scala |  7 +---
 .../hive/client/HivePartitionFilteringSuites.scala |  3 +-
 9 files changed, 16 insertions(+), 107 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 56a3c8292cd..a28f6fd284d 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -26,6 +26,7 @@ license: |
 
 - Since Spark 4.0, the default value of `spark.sql.maxSinglePartitionBytes` is 
changed from `Long.MaxValue` to `128m`. To restore the previous behavior, set 
`spark.sql.maxSinglePartitionBytes` to `9223372036854775807`(`Long.MaxValue`).
 - Since Spark 4.0, any read of SQL tables takes into consideration the SQL 
configs 
`spark.sql.files.ignoreCorruptFiles`/`spark.sql.files.ignoreMissingFiles` 
instead of the core config 
`spark.files.ignoreCorruptFiles`/`spark.files.ignoreMissingFiles`.
+- Since Spark 4.0, `spark.sql.hive.metastore` drops the support of Hive prior 
to 2.0.0 as they require JDK 8 that Spark does not support anymore. Users 
should migrate to higher versions.
 
 ## Upgrading from Spark SQL 3.4 to 3.5
 
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index a01246520f3..794838a1190 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -73,7 +73,7 @@ private[spark] object HiveUtils extends Logging {
 
   val HIVE_METASTORE_VERSION = 
buildStaticConf("spark.sql.hive.metastore.version")
 .doc("Version of the Hive metastore. Available options are " +
-"0.12.0 through 2.3.9 and " +
+"2.0.0 through 2.3.9 and " +
 "3.0.0 through 3.1.3.")
 .version("1.4.0")
 .stringConf
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index f3d7d7e66a5..4e4ef6ce9f7 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -115,12 +115,6 @@ private[hive] class HiveClientImpl(
   private val outputBuffer = new CircularBuffer()
 
   private val shim = version match {
-case hive.v12 => new Shim_v0_12()
-case hive.v13 => new Shim_v0_13()
-case hive.v14 => new Shim_v0_14()
-case hive.v1_0 => new Shim_v1_0()
-case hive.v1_1 => new Shim_v1_1()
-case hive.v1_2 => new Shim_v1_2()
 case hive.v2_0 => new Shim_v2_0()
 case hive.v2_1 => new Shim_v2_1()
 case hive.v2_2 => new Shim_v2_2()
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
index 338498d3d48..e12fe857c88 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
@@ -255,7 +255,7 @@ private[client] sealed abstract class Shim {
   }
 }
 
-private[client] class Shim_v0_12 extends Shim with Logging {
+private class Shim_v0_12 extends Shim with Logging {
   // See

[GitHub] [spark-website] mateiz commented on pull request #480: Fix UI issue for `published` docs about Switch languages consistently across docs for all code snippets

2023-09-26 Thread via GitHub



mateiz commented on PR #480:
URL: https://github.com/apache/spark-website/pull/480#issuecomment-1736040999

   Super excited to see this getting fixed! I really pushed for the original 
switcher years ago to improve docs usability and I was sad when I noticed it 
was gone.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt

2023-09-26 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ea3104fa71b [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt
ea3104fa71b is described below

commit ea3104fa71b0d7b6ac5e74292c28e40acb1e6537
Author: Ismaël Mejía 
AuthorDate: Tue Sep 26 10:40:08 2023 -0700

[SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt

### What changes were proposed in this pull request?

This PR adds missing parts of the upgrade to Avro 1.11.3

### Why are the changes needed?

Because there are missing references to the version fo the library

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the CI + Verify docs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43118 from iemejia/master.

Authored-by: Ismaël Mejía 
Signed-off-by: Dongjoon Hyun 
---
 .../avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala   | 4 ++--
 docs/sql-data-sources-avro.md | 4 ++--
 project/SparkBuild.scala  | 2 +-
 .../test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git 
a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala 
b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
index edaaa8835cc..a0db82f9871 100644
--- a/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
+++ b/connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroOptions.scala
@@ -81,14 +81,14 @@ private[sql] class AvroOptions(
 
   /**
* Top level record name in write result, which is required in Avro spec.
-   * See https://avro.apache.org/docs/1.11.2/specification/#schema-record .
+   * See https://avro.apache.org/docs/1.11.3/specification/#schema-record .
* Default value is "topLevelRecord"
*/
   val recordName: String = parameters.getOrElse(RECORD_NAME, "topLevelRecord")
 
   /**
* Record namespace in write result. Default value is "".
-   * See Avro spec for details: 
https://avro.apache.org/docs/1.11.2/specification/#schema-record .
+   * See Avro spec for details: 
https://avro.apache.org/docs/1.11.3/specification/#schema-record .
*/
   val recordNamespace: String = parameters.getOrElse(RECORD_NAMESPACE, "")
 
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index b01174b9182..72741b0e9d1 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -417,7 +417,7 @@ applications. Read the [Advanced Dependency 
Management](https://spark.apache
 Submission Guide for more details.
 
 ## Supported types for Avro -> Spark SQL conversion
-Currently Spark supports reading all [primitive 
types](https://avro.apache.org/docs/1.11.2/specification/#primitive-types) and 
[complex 
types](https://avro.apache.org/docs/1.11.2/specification/#complex-types) under 
records of Avro.
+Currently Spark supports reading all [primitive 
types](https://avro.apache.org/docs/1.11.3/specification/#primitive-types) and 
[complex 
types](https://avro.apache.org/docs/1.11.3/specification/#complex-types) under 
records of Avro.
 
   Avro typeSpark SQL 
type
   
@@ -481,7 +481,7 @@ In addition to the types listed above, it supports reading 
`union` types. The fo
 3. `union(something, null)`, where something is any supported Avro type. This 
will be mapped to the same Spark SQL type as that of something, with nullable 
set to true.
 All other union types are considered complex. They will be mapped to 
StructType where field names are member0, member1, etc., in accordance with 
members of the union. This is consistent with the behavior when converting 
between Avro and Parquet.
 
-It also supports reading the following Avro [logical 
types](https://avro.apache.org/docs/1.11.2/specification/#logical-types):
+It also supports reading the following Avro [logical 
types](https://avro.apache.org/docs/1.11.3/specification/#logical-types):
 
 
   Avro logical typeAvro 
typeSpark SQL type
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 400ee8c5f28..ad2b67c67c6 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1071,7 +1071,7 @@ object DependencyOverrides {
 dependencyOverrides += "com.google.guava" % "guava" % guavaVersion,
 dependencyOverrides += "xerces" % "xercesImpl" % "2.12.2",
 dependencyOverrides += "jline" % "jline" % "2.14.6",
-dependencyOverrides += "org.apache.avro" % "avro" % "1.11.2")
+dependencyOverrides += "org.apache.avro" % "avro" % "1.11.3")
 }
 
 /**
diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala
 
b/sql/h

[spark] branch master updated: [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1

2023-09-26 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 13cd291c354 [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1
13cd291c354 is described below

commit 13cd291c3549467dfd5d10a665e2d6a577f35bcb
Author: yangjie01 
AuthorDate: Tue Sep 26 11:14:21 2023 -0500

[SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1

### What changes were proposed in this pull request?
This pr is aims upgrade `antlr4` from 4.9.3 to 4.13.1

### Why are the changes needed?
After 4.10, antlr4 is using Java 11 for the source code and the compiled 
.class files for the ANTLR tool. There are some bug fix and Improvements after 
4.9.3:
- https://github.com/antlr/antlr4/pull/3399
- https://github.com/antlr/antlr4/issues/1105
- https://github.com/antlr/antlr4/issues/2788
- https://github.com/antlr/antlr4/pull/3957
- https://github.com/antlr/antlr4/pull/4394

The full release notes as follows:

- https://github.com/antlr/antlr4/releases/tag/4.13.1
- https://github.com/antlr/antlr4/releases/tag/4.13.0
- https://github.com/antlr/antlr4/releases/tag/4.12.0
- https://github.com/antlr/antlr4/releases/tag/4.11.1
- https://github.com/antlr/antlr4/releases/tag/4.11.0
- https://github.com/antlr/antlr4/releases/tag/4.10.1
- https://github.com/antlr/antlr4/releases/tag/4.10

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43075 from LuciferYang/antlr4-4131.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index 206361e1efa..5c17d727b0a 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -12,7 +12,7 @@ aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
 aliyun-sdk-oss/3.13.0//aliyun-sdk-oss-3.13.0.jar
 annotations/17.0.0//annotations-17.0.0.jar
 antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
-antlr4-runtime/4.9.3//antlr4-runtime-4.9.3.jar
+antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar
 aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar
 arpack/3.0.3//arpack-3.0.3.jar
 arpack_combined_all/0.1//arpack_combined_all-0.1.jar
diff --git a/pom.xml b/pom.xml
index 5fd3e173857..1d0ab387900 100644
--- a/pom.xml
+++ b/pom.xml
@@ -212,7 +212,7 @@
 3.0.0
 0.12.0
 
-4.9.3
+4.13.1
 1.1
 4.12.1
 4.12.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-26 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ff084d2852e [SPARK-44756][CORE] Executor hangs when 
RetryingBlockTransferor fails to initiate retry
ff084d2852e is described below

commit ff084d2852e62c6670e074ef423ae16c915710bc
Author: Harunobu Daikoku 
AuthorDate: Tue Sep 26 11:07:41 2023 -0500

[SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to 
initiate retry

### What changes were proposed in this pull request?
This PR fixes a bug in `RetryingBlockTransferor` that happens when retry 
initiation has failed.

With this patch, the callers of 
`RetryingBlockTransfeathror#initiateRetry()` will catch any error and invoke 
the parent listener's exception handler.

### Why are the changes needed?
This is needed to prevent an edge case where retry initiation fails and 
executor gets stuck.

More details in SPARK-44756

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
Added a new test case in `RetryingBlockTransferorSuite` that simulates the 
problematic scenario.

https://github.com/apache/spark/assets/17327104/f20ec327-f5c9-4d74-b861-1ea4e05eb46b";>

Closes #42426 from hdaikoku/SPARK-44756.

Authored-by: Harunobu Daikoku 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../network/shuffle/RetryingBlockTransferor.java   | 47 --
 .../shuffle/RetryingBlockTransferorSuite.java  | 34 +++-
 2 files changed, 67 insertions(+), 14 deletions(-)

diff --git 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java
 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java
index 892de991612..c628b201b20 100644
--- 
a/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java
+++ 
b/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java
@@ -144,6 +144,11 @@ public class RetryingBlockTransferor {
 this(conf, transferStarter, blockIds, listener, 
ErrorHandler.NOOP_ERROR_HANDLER);
   }
 
+  @VisibleForTesting
+  synchronized void setCurrentListener(RetryingBlockTransferListener listener) 
{
+this.currentListener = listener;
+  }
+
   /**
* Initiates the transfer of all blocks provided in the constructor, with 
possible retries
* in the event of transient IOExceptions.
@@ -176,12 +181,14 @@ public class RetryingBlockTransferor {
 listener.getTransferType(), blockIdsToTransfer.length,
 numRetries > 0 ? "(after " + numRetries + " retries)" : ""), e);
 
-  if (shouldRetry(e)) {
-initiateRetry(e);
-  } else {
-for (String bid : blockIdsToTransfer) {
-  listener.onBlockTransferFailure(bid, e);
-}
+  if (shouldRetry(e) && initiateRetry(e)) {
+// successfully initiated a retry
+return;
+  }
+
+  // retry is not possible, so fail remaining blocks
+  for (String bid : blockIdsToTransfer) {
+listener.onBlockTransferFailure(bid, e);
   }
 }
   }
@@ -189,8 +196,10 @@ public class RetryingBlockTransferor {
   /**
* Lightweight method which initiates a retry in a different thread. The 
retry will involve
* calling transferAllOutstanding() after a configured wait time.
+   * Returns true if the retry was successfully initiated, false otherwise.
*/
-  private synchronized void initiateRetry(Throwable e) {
+  @VisibleForTesting
+  synchronized boolean initiateRetry(Throwable e) {
 if (enableSaslRetries && e instanceof SaslTimeoutException) {
   saslRetryCount += 1;
 }
@@ -201,10 +210,17 @@ public class RetryingBlockTransferor {
   listener.getTransferType(), retryCount, maxRetries, 
outstandingBlocksIds.size(),
   retryWaitTime);
 
-executorService.submit(() -> {
-  Uninterruptibles.sleepUninterruptibly(retryWaitTime, 
TimeUnit.MILLISECONDS);
-  transferAllOutstanding();
-});
+try {
+  executorService.execute(() -> {
+Uninterruptibles.sleepUninterruptibly(retryWaitTime, 
TimeUnit.MILLISECONDS);
+transferAllOutstanding();
+  });
+} catch (Throwable t) {
+  logger.error("Exception while trying to initiate retry", t);
+  return false;
+}
+
+return true;
   }
 
   /**
@@ -240,7 +256,8 @@ public class RetryingBlockTransferor {
* listener. Note that in the event of a retry, we will immediately replace 
the 'currentListener'
* field, indicating that any responses from non-current Listeners should be 
ignored.
*/
-  private class RetryingBlockTransferListener implements
+  @VisibleForTesting
+  class RetryingBlockTransferListener implements
   BlockFetchingLis

[spark] branch master updated: [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`

2023-09-26 Thread maxgekk

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 60d02b444e2 [SPARK-45316][CORE][SQL] Add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`
60d02b444e2 is described below

commit 60d02b444e2225b3afbe4955dabbea505e9f769c
Author: Max Gekk 
AuthorDate: Tue Sep 26 17:33:07 2023 +0300

[SPARK-45316][CORE][SQL] Add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`

### What changes were proposed in this pull request?
In the PR, I propose to add new parameters 
`ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`, 
and set it to the current value of:
- `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles` in Spark `core`,
- `spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` when the rdds 
created in Spark SQL.

### Why are the changes needed?
1. To make `HadoopRDD` and `NewHadoopRDD` consistent to other RDDs like 
`FileScanRDD` created by Spark SQL that take into account the SQL configs 
`spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles`.
2. To improve user experience with Spark SQL, so, users can control 
ignoring of missing files without re-creating spark context.

### Does this PR introduce _any_ user-facing change?
Yes, `HadoopRDD`/`NewHadoopRDD` invoked by SQL code such hive table scans 
respect the SQL configs 
`spark.sql.files.ignoreCorruptFiles`/`ignoreMissingFiles` and don't respect the 
core configs `spark.files.ignoreCorruptFiles`/`ignoreMissingFiles`.

### How was this patch tested?
By running the affected tests:
```
$ build/sbt "test:testOnly *QueryPartitionSuite"
$ build/sbt "test:testOnly *FileSuite"
$ build/sbt "test:testOnly *FileBasedDataSourceSuite"
```

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43097 from MaxGekk/dynamic-ignoreMissingFiles.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../scala/org/apache/spark/rdd/HadoopRDD.scala | 31 ++
 .../scala/org/apache/spark/rdd/NewHadoopRDD.scala  | 27 +++
 docs/sql-migration-guide.md|  1 +
 .../org/apache/spark/sql/hive/TableReader.scala|  9 ---
 .../spark/sql/hive/QueryPartitionSuite.scala   |  6 ++---
 5 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
index cad107256c5..0b5f6a3d716 100644
--- a/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala
@@ -89,6 +89,8 @@ private[spark] class HadoopPartition(rddId: Int, override val 
index: Int, s: Inp
  * @param keyClass Class of the key associated with the inputFormatClass.
  * @param valueClass Class of the value associated with the inputFormatClass.
  * @param minPartitions Minimum number of HadoopRDD partitions (Hadoop Splits) 
to generate.
+ * @param ignoreCorruptFiles Whether to ignore corrupt files.
+ * @param ignoreMissingFiles Whether to ignore missing files.
  *
  * @note Instantiating this class directly is not recommended, please use
  * `org.apache.spark.SparkContext.hadoopRDD()`
@@ -101,13 +103,36 @@ class HadoopRDD[K, V](
 inputFormatClass: Class[_ <: InputFormat[K, V]],
 keyClass: Class[K],
 valueClass: Class[V],
-minPartitions: Int)
+minPartitions: Int,
+ignoreCorruptFiles: Boolean,
+ignoreMissingFiles: Boolean)
   extends RDD[(K, V)](sc, Nil) with Logging {
 
   if (initLocalJobConfFuncOpt.isDefined) {
 sparkContext.clean(initLocalJobConfFuncOpt.get)
   }
 
+  def this(
+  sc: SparkContext,
+  broadcastedConf: Broadcast[SerializableConfiguration],
+  initLocalJobConfFuncOpt: Option[JobConf => Unit],
+  inputFormatClass: Class[_ <: InputFormat[K, V]],
+  keyClass: Class[K],
+  valueClass: Class[V],
+  minPartitions: Int) = {
+this(
+  sc,
+  broadcastedConf,
+  initLocalJobConfFuncOpt,
+  inputFormatClass,
+  keyClass,
+  valueClass,
+  minPartitions,
+  ignoreCorruptFiles = sc.conf.get(IGNORE_CORRUPT_FILES),
+  ignoreMissingFiles = sc.conf.get(IGNORE_MISSING_FILES)
+)
+  }
+
   def this(
   sc: SparkContext,
   conf: JobConf,
@@ -135,10 +160,6 @@ class HadoopRDD[K, V](
 
   private val shouldCloneJobConf = 
sparkContext.conf.getBoolean("spark.hadoop.cloneConf", false)
 
-  private val ignoreCorruptFiles = sparkContext.conf.get(IGNORE_CORRUPT_FILES)
-
-  private val ignoreMissingFiles = sparkContext.conf.get(IGNORE_MISSING_FILES)
-
   private val ignoreEmptySplits = 
sparkContext.conf.get(HADOOP_RDD_IGNORE_EMPTY_SPLITS)
 
   //

[spark] branch master updated: [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors

2023-09-26 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2aa06fcf160 [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into 
TABLE_OPERATION & delete some unused method in QueryCompilationErrors
2aa06fcf160 is described below

commit 2aa06fcf1607bbad9e09649e587493032e739e35
Author: panbingkun 
AuthorDate: Tue Sep 26 19:35:27 2023 +0800

[SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & 
delete some unused method in QueryCompilationErrors

### What changes were proposed in this pull request?
The pr aims to
- merge _LEGACY_ERROR_TEMP_1113 into UNSUPPORTED_FEATURE.TABLE_OPERATION
- delete some unused method in QueryCompilationErrors
- refactoring some methods to reduce call hierarchy

### Why are the changes needed?
The changes improve the error framework.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Pass GA
- Manually test

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43044 from panbingkun/LEGACY_ERROR_TEMP_1113.

Authored-by: panbingkun 
Signed-off-by: Wenchen Fan 
---
 .../src/main/resources/error/error-classes.json|  5 --
 .../spark/sql/catalyst/plans/logical/object.scala  | 12 ++-
 .../spark/sql/errors/QueryCompilationErrors.scala  | 88 +-
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  2 +-
 .../datasources/v2/TableCapabilityCheck.scala  |  2 +-
 .../streaming/test/DataStreamTableAPISuite.scala   | 13 +++-
 6 files changed, 57 insertions(+), 65 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-classes.json 
b/common/utils/src/main/resources/error/error-classes.json
index 9bcbcbc1962..5d827c67482 100644
--- a/common/utils/src/main/resources/error/error-classes.json
+++ b/common/utils/src/main/resources/error/error-classes.json
@@ -4097,11 +4097,6 @@
   "DESCRIBE does not support partition for v2 tables."
 ]
   },
-  "_LEGACY_ERROR_TEMP_1113" : {
-"message" : [
-  "Table  does not support ."
-]
-  },
   "_LEGACY_ERROR_TEMP_1114" : {
 "message" : [
   "The streaming sources in a query do not have a common supported 
execution mode.",
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
index d4851019db8..9bf8db0b4fa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala
@@ -727,16 +727,20 @@ object JoinWith {
   if a.sameRef(b) =>
   catalyst.expressions.EqualTo(
 plan.left.resolveQuoted(a.name, resolver).getOrElse(
-  throw QueryCompilationErrors.resolveException(a.name, 
plan.left.schema.fieldNames)),
+  throw QueryCompilationErrors.unresolvedColumnError(
+a.name, plan.left.schema.fieldNames)),
 plan.right.resolveQuoted(b.name, resolver).getOrElse(
-  throw QueryCompilationErrors.resolveException(b.name, 
plan.right.schema.fieldNames)))
+  throw QueryCompilationErrors.unresolvedColumnError(
+b.name, plan.right.schema.fieldNames)))
 case catalyst.expressions.EqualNullSafe(a: AttributeReference, b: 
AttributeReference)
   if a.sameRef(b) =>
   catalyst.expressions.EqualNullSafe(
 plan.left.resolveQuoted(a.name, resolver).getOrElse(
-  throw QueryCompilationErrors.resolveException(a.name, 
plan.left.schema.fieldNames)),
+  throw QueryCompilationErrors.unresolvedColumnError(
+a.name, plan.left.schema.fieldNames)),
 plan.right.resolveQuoted(b.name, resolver).getOrElse(
-  throw QueryCompilationErrors.resolveException(b.name, 
plan.right.schema.fieldNames)))
+  throw QueryCompilationErrors.unresolvedColumnError(
+b.name, plan.right.schema.fieldNames)))
   }
 }
 plan.copy(condition = cond)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 3536626d239..9d2b1225825 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -818,10 +818,6 @@ private[sql] object QueryCompilationErrors extends 
QueryErrorsBase with Compilat
   messageParameters = Map("hintName" -> hintName))
   }
 
-  def attributeNameSyntaxError(name: String): Throwable = {

[spark] branch master updated: [SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e54c866701d [SPARK-45309][SQL] Remove all 
SystemUtils.isJavaVersionAtLeast with JDK 9/11/17
e54c866701d is described below

commit e54c866701dda617f625545192f321e88b3e614e
Author: Hyukjin Kwon 
AuthorDate: Tue Sep 26 19:59:04 2023 +0900

[SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 
9/11/17

### What changes were proposed in this pull request?

This PR removes all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17.

### Why are the changes needed?

- To remove unused code.
- We dropped JDK 8 and 11 at SPARK-44112 so no need to check lower versions 
conditionally.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI in this PR should test them out.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #43098 from HyukjinKwon/SPARK-45309.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 .../org/apache/spark/sql/ClientE2ETestSuite.scala  | 23 +++-
 .../apache/spark/sql/SQLImplicitsTestSuite.scala   | 11 
 .../org/apache/spark/internal/config/UI.scala  |  4 +--
 .../org/apache/spark/storage/StorageUtils.scala| 32 ++
 .../org/apache/spark/util/ClosureCleaner.scala |  5 ++--
 .../sql/hive/execution/InsertIntoHiveTable.scala   | 23 
 .../hive/HiveExternalCatalogVersionsSuite.scala|  6 +---
 .../spark/sql/hive/HiveSparkSubmitSuite.scala  | 13 +++--
 .../spark/sql/hive/client/HiveClientSuite.scala|  9 ++
 .../spark/sql/hive/execution/HiveQuerySuite.scala  |  8 +-
 10 files changed, 35 insertions(+), 99 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
index 55718ed9c0b..c8999a2f22c 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala
@@ -26,7 +26,6 @@ import scala.collection.mutable
 
 import org.apache.commons.io.FileUtils
 import org.apache.commons.io.output.TeeOutputStream
-import org.apache.commons.lang3.{JavaVersion, SystemUtils}
 import org.scalactic.TolerantNumerics
 import org.scalatest.PrivateMethodTester
 
@@ -410,18 +409,16 @@ class ClientE2ETestSuite extends RemoteSparkSession with 
SQLHelper with PrivateM
 
   test("write jdbc") {
 assume(IntegrationTestUtils.isSparkHiveJarAvailable)
-if (SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9)) {
-  val url = "jdbc:derby:memory:1234"
-  val table = "t1"
-  try {
-spark.range(10).write.jdbc(url = s"$url;create=true", table, new 
Properties())
-val result = spark.read.jdbc(url = url, table, new 
Properties()).collect()
-assert(result.length == 10)
-  } finally {
-// clean up
-assertThrows[SparkException] {
-  spark.read.jdbc(url = s"$url;drop=true", table, new 
Properties()).collect()
-}
+val url = "jdbc:derby:memory:1234"
+val table = "t1"
+try {
+  spark.range(10).write.jdbc(url = s"$url;create=true", table, new 
Properties())
+  val result = spark.read.jdbc(url = url, table, new 
Properties()).collect()
+  assert(result.length == 10)
+} finally {
+  // clean up
+  assertThrows[SparkException] {
+spark.read.jdbc(url = s"$url;drop=true", table, new 
Properties()).collect()
   }
 }
   }
diff --git 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
index 680380c91a0..2e258a356fc 100644
--- 
a/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
+++ 
b/connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/SQLImplicitsTestSuite.scala
@@ -23,7 +23,7 @@ import java.util.concurrent.atomic.AtomicLong
 
 import io.grpc.inprocess.InProcessChannelBuilder
 import org.apache.arrow.memory.RootAllocator
-import org.apache.commons.lang3.{JavaVersion, SystemUtils}
+import org.apache.commons.lang3.SystemUtils
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.spark.sql.connect.client.SparkConnectClient
@@ -146,13 +146,12 @@ class SQLImplicitsTestSuite extends ConnectFunSuite with 
BeforeAndAfterAll {
 testImplicit(BigDecimal(decimal))
 testImplicit(Date.valueOf(LocalDate.now()))
 testImplicit(LocalDate.now())
-// SPARK-42770: Run `LocalDateTime.now()` and `Instant.now()

[spark] branch master updated: [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9933e9c2c54 [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4
9933e9c2c54 is described below

commit 9933e9c2c54e7081ef3f23c4b3804d3ecdd175ff
Author: Bjørn Jørgensen 
AuthorDate: Tue Sep 26 19:57:46 2023 +0900

[SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4

### What changes were proposed in this pull request?
Upgrade snappy from 1.1.10.3 to 1.1.10.4

### Why are the changes needed?
Security Fix
Fixed SnappyInputStream so as not to allocate too large memory when 
decompressing data with an extremely large chunk size by tunnelshade ([code 
change](https://github.com/xerial/snappy-java/commit/9f8c3cf74223ed0a8a834134be9c917b9f10ceb5))
This does not affect users only using Snappy.compress/uncompress methods

[Release note](https://github.com/xerial/snappy-java/releases)

Details
While performing mitigation efforts related to 
[CVE-2023-34455](https://nvd.nist.gov/vuln/detail/CVE-2023-34455) in Confluent 
products, our Application Security team closely analyzed the fix that was 
accepted and merged into snappy-java version 1.1.10.1 in 
[this](https://github.com/xerial/snappy-java/commit/3bf67857fcf70d9eea56eed4af7c925671e8eaea)
 commit. The check on [line 
421](https://github.com/xerial/snappy-java/commit/3bf67857fcf70d9eea56eed4af7c925671e8eaea#diff-c3e536102670929
 [...]

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43108

Closes #43109 from bjornjorgensen/snappy_compress.

Authored-by: Bjørn Jørgensen 
Signed-off-by: Hyukjin Kwon 
---
 dev/deps/spark-deps-hadoop-3-hive-2.3 | 2 +-
 pom.xml   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 
b/dev/deps/spark-deps-hadoop-3-hive-2.3
index f11a7d757f1..206361e1efa 100644
--- a/dev/deps/spark-deps-hadoop-3-hive-2.3
+++ b/dev/deps/spark-deps-hadoop-3-hive-2.3
@@ -240,7 +240,7 @@ shims/0.9.45//shims-0.9.45.jar
 slf4j-api/2.0.9//slf4j-api-2.0.9.jar
 snakeyaml-engine/2.6//snakeyaml-engine-2.6.jar
 snakeyaml/2.0//snakeyaml-2.0.jar
-snappy-java/1.1.10.3//snappy-java-1.1.10.3.jar
+snappy-java/1.1.10.4//snappy-java-1.1.10.4.jar
 spire-macros_2.13/0.18.0//spire-macros_2.13-0.18.0.jar
 spire-platform_2.13/0.18.0//spire-platform_2.13-0.18.0.jar
 spire-util_2.13/0.18.0//spire-util_2.13-0.18.0.jar
diff --git a/pom.xml b/pom.xml
index 33dc854dd26..5fd3e173857 100644
--- a/pom.xml
+++ b/pom.xml
@@ -188,7 +188,7 @@
 
2.15.2
 2.3.0
 3.0.2
-1.1.10.3
+1.1.10.4
 3.0.3
 1.16.0
 1.24.0


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4 deleted (was 2b18d0c7daa)

2023-09-26 Thread github-bot

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4
in repository https://gitbox.apache.org/repos/asf/spark.git


 was 2b18d0c7daa Bump org.xerial.snappy:snappy-java from 1.1.10.3 to 
1.1.10.4

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files

2023-09-26 Thread yangjie01

This is an automated email from the ASF dual-hosted git repository.

yangjie01 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2094e712170 [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 
related binary files
2094e712170 is described below

commit 2094e71217005655349f4e78847ccbdb6b886bd0
Author: yangjie01 
AuthorDate: Tue Sep 26 18:39:24 2023 +0800

[SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary 
files

### What changes were proposed in this pull request?
The purpose of this pr is to clean up the binary files used to assist with 
Scala 2.12 testing.

They include:
- `core/src/test/resources/TestHelloV3_2.12.jar` and 
`core/src/test/resources/TestHelloV2_2.12.jar` added by 
SPARK-44246(https://github.com/apache/spark/pull/41789).
- `connector/connect/client/jvm/src/test/resources/udf2.12` and 
`connector/connect/client/jvm/src/test/resources/udf2.12.jar` added by 
SPARK-43744(https://github.com/apache/spark/pull/42069)
- `connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar` 
added by SPARK-44293(https://github.com/apache/spark/pull/41844)
- `sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar` 
added by SPARK-25304(https://github.com/apache/spark/pull/22308)

### Why are the changes needed?
Spark 4.0 no longer supports Scala 2.12.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43106 from LuciferYang/SPARK-45321.

Authored-by: yangjie01 
Signed-off-by: yangjie01 
---
 .../client/jvm/src/test/resources/TestHelloV2_2.12.jar   | Bin 3784 -> 0 bytes
 connector/connect/client/jvm/src/test/resources/udf2.12  | Bin 1520 -> 0 bytes
 .../connect/client/jvm/src/test/resources/udf2.12.jar| Bin 5332 -> 0 bytes
 core/src/test/resources/TestHelloV2_2.12.jar | Bin 3784 -> 0 bytes
 core/src/test/resources/TestHelloV3_2.12.jar | Bin 3595 -> 0 bytes
 .../resources/regression-test-SPARK-8489/test-2.12.jar   | Bin 7179 -> 0 bytes
 6 files changed, 0 insertions(+), 0 deletions(-)

diff --git 
a/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar 
b/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar
deleted file mode 100644
index d89cf6543a2..000
Binary files 
a/connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar and 
/dev/null differ
diff --git a/connector/connect/client/jvm/src/test/resources/udf2.12 
b/connector/connect/client/jvm/src/test/resources/udf2.12
deleted file mode 100644
index 1090bc90d9b..000
Binary files a/connector/connect/client/jvm/src/test/resources/udf2.12 and 
/dev/null differ
diff --git a/connector/connect/client/jvm/src/test/resources/udf2.12.jar 
b/connector/connect/client/jvm/src/test/resources/udf2.12.jar
deleted file mode 100644
index 6ce6799678f..000
Binary files a/connector/connect/client/jvm/src/test/resources/udf2.12.jar and 
/dev/null differ
diff --git a/core/src/test/resources/TestHelloV2_2.12.jar 
b/core/src/test/resources/TestHelloV2_2.12.jar
deleted file mode 100644
index d89cf6543a2..000
Binary files a/core/src/test/resources/TestHelloV2_2.12.jar and /dev/null differ
diff --git a/core/src/test/resources/TestHelloV3_2.12.jar 
b/core/src/test/resources/TestHelloV3_2.12.jar
deleted file mode 100644
index b175a6c8640..000
Binary files a/core/src/test/resources/TestHelloV3_2.12.jar and /dev/null differ
diff --git 
a/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar 
b/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar
deleted file mode 100644
index b0d3fd17a41..000
Binary files 
a/sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar and 
/dev/null differ


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][BUILD] Fix lint-js

2023-09-26 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73d17d52a8bc [MINOR][BUILD] Fix lint-js
73d17d52a8bc is described below

commit 73d17d52a8bc2d761412e1954eaa6c0bdef44a9d
Author: panbingkun 
AuthorDate: Tue Sep 26 19:18:31 2023 +0900

[MINOR][BUILD] Fix lint-js

### What changes were proposed in this pull request?
The pr aims to fix lint-js.

### Why are the changes needed?
https://github.com/panbingkun/spark/actions/runs/6306820397/job/17123186216
https://github.com/apache/spark/assets/15246973/7e70617a-c15e-47de-8282-5b06b5426567";>

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manually test.
https://github.com/apache/spark/assets/15246973/d62e1dbf-da12-478c-855a-f82df804bd75";>

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #43122 from panbingkun/mirror_lint-js.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 .../resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
 
b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
index 8999d6ff1fed..96a7a7a3cc0e 100644
--- 
a/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
+++ 
b/sql/core/src/main/resources/org/apache/spark/sql/execution/ui/static/spark-sql-viz.js
@@ -258,7 +258,7 @@ function onClickAdditionalMetricsCheckbox(checkboxNode) {
   window.localStorage.setItem("stageId-and-taskId-checked", isChecked);
 }
 
-function togglePlanViz() {
+function togglePlanViz() { // eslint-disable-line no-unused-vars
   const arrow = d3.select("#plan-viz-graph-arrow");
   arrow.each(function () {
 $(this).toggleClass("arrow-open").toggleClass("arrow-closed")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-45267][PS] Change the default value for numeric_only

[spark] branch master updated: [SPARK-45335][SQL][DOCS] Correct the group of `ElementAt` and `TryElementAt`

[spark] branch master updated (eff46ea77e9 -> 28dc555821b)

[spark] branch master updated: [SPARK-45340][SQL] Remove the SQL config `spark.sql.hive.verifyPartitionPath`

[spark] branch master updated: [SPARK-44780][DOC] SQL temporary variables

[spark] branch master updated: [SPARK-43850][BUILD][GRAPHX] Remove the import for `scala.language.higherKinds` and delete the corresponding suppression rule

[spark] branch master updated: [SPARK-45334][SQL] Remove misleading comment in parquetSchemaConverter

[spark] branch master updated: [SPARK-45302][PYTHON] Remove PID communication between Python workers when no demon is used

[spark] branch master updated (e6d1e9ed384 -> 17881eb7eca)

[spark] branch master updated: [SPARK-44751][SQL][FOLLOWUP] Change `xmlExpressions.scala` package name

[spark] branch master updated: [SPARK-45328][SQL] Remove Hive support prior to 2.0.0

[GitHub] [spark-website] mateiz commented on pull request #480: Fix UI issue for `published` docs about Switch languages consistently across docs for all code snippets

[spark] branch master updated: [SPARK-45325][BUILD][FOLLOWUP] Update docs and sbt

[spark] branch master updated: [SPARK-44366][BUILD] Upgrade antlr4 to 4.13.1

[spark] branch master updated: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

[spark] branch master updated: [SPARK-45316][CORE][SQL] Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`

[spark] branch master updated: [SPARK-45271][SQL] Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors

[spark] branch master updated: [SPARK-45309][SQL] Remove all SystemUtils.isJavaVersionAtLeast with JDK 9/11/17

[spark] branch master updated: [SPARK-45323][BUILD] Upgrade `snappy` to 1.1.10.4

[spark] branch dependabot/maven/org.xerial.snappy-snappy-java-1.1.10.4 deleted (was 2b18d0c7daa)

[spark] branch master updated: [SPARK-45321][TESTS] Clean up the unnecessary Scala 2.12 related binary files

[spark] branch master updated: [MINOR][BUILD] Fix lint-js

22 matches

Site Navigation

Mail list logo

Footer information