date:20211207

[spark] branch branch-3.1 updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

2021-12-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new c51f644  [SPARK-37392][SQL] Fix the performance bug when inferring 
constraints for Generate
c51f644 is described below

commit c51f6449d38d30d0bff22df895dca515898a520b
Author: Wenchen Fan 
AuthorDate: Wed Dec 8 13:04:40 2021 +0800

[SPARK-37392][SQL] Fix the performance bug when inferring constraints for 
Generate

This is a performance regression since Spark 3.1, caused by 
https://issues.apache.org/jira/browse/SPARK-32295

If you run the query in the JIRA ticket
```
Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, 
_12 l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show()
```
You will hit OOM. The reason is that:
1. We infer additional predicates with `Generate`. In this case, it's 
`size(array(cast(_1#21 as string), _2#22, _3#23, ...) > 0`
2. Because of the cast, the `ConstantFolding` rule can't optimize this 
`size(array(...))`.
3. We end up with a plan containing this part
```
   +- Project [_1#21 AS a#106, _2#22 AS b#107, _3#23 AS c#108, _4#24 AS 
d#109, _5#25 AS e#110, _6#26 AS f#111, _7#27 AS g#112, _8#28 AS h#113, _9#29 AS 
i#114, _10#30 AS j#115, _11#31 AS k#116, _12#32 AS l#117, _13#33 AS m#118, 
_14#34 AS n#119, _15#35 AS o#120, _16#36 AS p#121, _17#37 AS q#122, _18#38 AS 
r#123, _19#39 AS s#124, _20#40 AS t#125, _21#41 AS u#126]
  +- Filter (size(array(cast(_1#21 as string), _2#22, _3#23, _4#24, 
_5#25, _6#26, _7#27, _8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, 
_15#35, _16#36, _17#37, _18#38, _19#39, _20#40, _21#41), true) > 0)
 +- LogicalRDD [_1#21, _2#22, _3#23, _4#24, _5#25, _6#26, _7#27, 
_8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, _15#35, _16#36, _17#37, 
_18#38, _19#39, _20#40, _21#41]
```
When calculating the constraints of the `Project`, we generate around 2^20 
expressions, due to this code
```
var allConstraints = child.constraints
projectList.foreach {
  case a  Alias(l: Literal, _) =>
allConstraints += EqualNullSafe(a.toAttribute, l)
  case a  Alias(e, _) =>
// For every alias in `projectList`, replace the reference in 
constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
  case expr: Expression if expr.semanticEquals(e) =>
a.toAttribute
})
allConstraints += EqualNullSafe(e, a.toAttribute)
  case _ => // Don't change.
}
```

There are 3 issues here:
1. We may infer complicated predicates from `Generate`
2. `ConstanFolding` rule is too conservative. At least `Cast` has no side 
effect with ANSI-off.
3. When calculating constraints, we should have a upper bound to avoid 
generating too many expressions.

This fixes the first 2 issues, and leaves the third one for the future.

fix a performance issue

no

new tests, and run the query in JIRA ticket locally.

Closes #34823 from cloud-fan/perf.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 1fac7a9d9992b7c120f325cdfa6a935b52c7f3bc)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 41 +
 .../spark/sql/catalyst/optimizer/expressions.scala |  1 +
 .../optimizer/InferFiltersFromGenerateSuite.scala  | 98 ++
 3 files changed, 67 insertions(+), 73 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 99b5240..e39fa23 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -893,25 +893,30 @@ object TransposeWindow extends Rule[LogicalPlan] {
  * by this [[Generate]] can be removed earlier - before joins and in data 
sources.
  */
 object InferFiltersFromGenerate extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-// This rule does not infer filters from foldable expressions to avoid 
constant filters
-// like 'size([1, 2, 3]) > 0'. These

[spark] branch branch-3.2 updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

2021-12-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 59380fe  [SPARK-37392][SQL] Fix the performance bug when inferring 
constraints for Generate
59380fe is described below

commit 59380fe448a9695f8b0c609af955410dbd7ce89a
Author: Wenchen Fan 
AuthorDate: Wed Dec 8 13:04:40 2021 +0800

[SPARK-37392][SQL] Fix the performance bug when inferring constraints for 
Generate

### What changes were proposed in this pull request?

This is a performance regression since Spark 3.1, caused by 
https://issues.apache.org/jira/browse/SPARK-32295

If you run the query in the JIRA ticket
```
Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, 
_12 l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show()
```
You will hit OOM. The reason is that:
1. We infer additional predicates with `Generate`. In this case, it's 
`size(array(cast(_1#21 as string), _2#22, _3#23, ...) > 0`
2. Because of the cast, the `ConstantFolding` rule can't optimize this 
`size(array(...))`.
3. We end up with a plan containing this part
```
   +- Project [_1#21 AS a#106, _2#22 AS b#107, _3#23 AS c#108, _4#24 AS 
d#109, _5#25 AS e#110, _6#26 AS f#111, _7#27 AS g#112, _8#28 AS h#113, _9#29 AS 
i#114, _10#30 AS j#115, _11#31 AS k#116, _12#32 AS l#117, _13#33 AS m#118, 
_14#34 AS n#119, _15#35 AS o#120, _16#36 AS p#121, _17#37 AS q#122, _18#38 AS 
r#123, _19#39 AS s#124, _20#40 AS t#125, _21#41 AS u#126]
  +- Filter (size(array(cast(_1#21 as string), _2#22, _3#23, _4#24, 
_5#25, _6#26, _7#27, _8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, 
_15#35, _16#36, _17#37, _18#38, _19#39, _20#40, _21#41), true) > 0)
 +- LogicalRDD [_1#21, _2#22, _3#23, _4#24, _5#25, _6#26, _7#27, 
_8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, _15#35, _16#36, _17#37, 
_18#38, _19#39, _20#40, _21#41]
```
When calculating the constraints of the `Project`, we generate around 2^20 
expressions, due to this code
```
var allConstraints = child.constraints
projectList.foreach {
  case a  Alias(l: Literal, _) =>
allConstraints += EqualNullSafe(a.toAttribute, l)
  case a  Alias(e, _) =>
// For every alias in `projectList`, replace the reference in 
constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
  case expr: Expression if expr.semanticEquals(e) =>
a.toAttribute
})
allConstraints += EqualNullSafe(e, a.toAttribute)
  case _ => // Don't change.
}
```

There are 3 issues here:
1. We may infer complicated predicates from `Generate`
2. `ConstanFolding` rule is too conservative. At least `Cast` has no side 
effect with ANSI-off.
3. When calculating constraints, we should have a upper bound to avoid 
generating too many expressions.

This fixes the first 2 issues, and leaves the third one for the future.

### Why are the changes needed?

fix a performance issue

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests, and run the query in JIRA ticket locally.

Closes #34823 from cloud-fan/perf.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
(cherry picked from commit 1fac7a9d9992b7c120f325cdfa6a935b52c7f3bc)
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 39 +
 .../spark/sql/catalyst/optimizer/expressions.scala |  1 +
 .../optimizer/InferFiltersFromGenerateSuite.scala  | 98 ++
 3 files changed, 66 insertions(+), 72 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index ae2d2ed..a92c1d0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -1103,24 +1103,29 @@ object TransposeWindow extends Rule[LogicalPlan] {
 object InferFiltersFromGenerate extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(

[spark] branch master updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

2021-12-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1fac7a9  [SPARK-37392][SQL] Fix the performance bug when inferring 
constraints for Generate
1fac7a9 is described below

commit 1fac7a9d9992b7c120f325cdfa6a935b52c7f3bc
Author: Wenchen Fan 
AuthorDate: Wed Dec 8 13:04:40 2021 +0800

[SPARK-37392][SQL] Fix the performance bug when inferring constraints for 
Generate

### What changes were proposed in this pull request?

This is a performance regression since Spark 3.1, caused by 
https://issues.apache.org/jira/browse/SPARK-32295

If you run the query in the JIRA ticket
```
Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, 
_12 l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show()
```
You will hit OOM. The reason is that:
1. We infer additional predicates with `Generate`. In this case, it's 
`size(array(cast(_1#21 as string), _2#22, _3#23, ...) > 0`
2. Because of the cast, the `ConstantFolding` rule can't optimize this 
`size(array(...))`.
3. We end up with a plan containing this part
```
   +- Project [_1#21 AS a#106, _2#22 AS b#107, _3#23 AS c#108, _4#24 AS 
d#109, _5#25 AS e#110, _6#26 AS f#111, _7#27 AS g#112, _8#28 AS h#113, _9#29 AS 
i#114, _10#30 AS j#115, _11#31 AS k#116, _12#32 AS l#117, _13#33 AS m#118, 
_14#34 AS n#119, _15#35 AS o#120, _16#36 AS p#121, _17#37 AS q#122, _18#38 AS 
r#123, _19#39 AS s#124, _20#40 AS t#125, _21#41 AS u#126]
  +- Filter (size(array(cast(_1#21 as string), _2#22, _3#23, _4#24, 
_5#25, _6#26, _7#27, _8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, 
_15#35, _16#36, _17#37, _18#38, _19#39, _20#40, _21#41), true) > 0)
 +- LogicalRDD [_1#21, _2#22, _3#23, _4#24, _5#25, _6#26, _7#27, 
_8#28, _9#29, _10#30, _11#31, _12#32, _13#33, _14#34, _15#35, _16#36, _17#37, 
_18#38, _19#39, _20#40, _21#41]
```
When calculating the constraints of the `Project`, we generate around 2^20 
expressions, due to this code
```
var allConstraints = child.constraints
projectList.foreach {
  case a  Alias(l: Literal, _) =>
allConstraints += EqualNullSafe(a.toAttribute, l)
  case a  Alias(e, _) =>
// For every alias in `projectList`, replace the reference in 
constraints by its attribute.
allConstraints ++= allConstraints.map(_ transform {
  case expr: Expression if expr.semanticEquals(e) =>
a.toAttribute
})
allConstraints += EqualNullSafe(e, a.toAttribute)
  case _ => // Don't change.
}
```

There are 3 issues here:
1. We may infer complicated predicates from `Generate`
2. `ConstanFolding` rule is too conservative. At least `Cast` has no side 
effect with ANSI-off.
3. When calculating constraints, we should have a upper bound to avoid 
generating too many expressions.

This fixes the first 2 issues, and leaves the third one for the future.

### Why are the changes needed?

fix a performance issue

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new tests, and run the query in JIRA ticket locally.

Closes #34823 from cloud-fan/perf.

Authored-by: Wenchen Fan 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   | 39 +
 .../spark/sql/catalyst/optimizer/expressions.scala |  1 +
 .../optimizer/InferFiltersFromGenerateSuite.scala  | 98 ++
 3 files changed, 66 insertions(+), 72 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 687711a..12e6888 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -1170,24 +1170,29 @@ object TransposeWindow extends Rule[LogicalPlan] {
 object InferFiltersFromGenerate extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan.transformUpWithPruning(
 _.containsPattern(GENERATE)) {
-// This rule does not infer filters from foldable expressions to avoid 
constant

[spark] branch master updated: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark

2021-12-07 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 26f4953  [SPARK-37516][PYTHON][SQL] Uses Python's standard string 
formatter for SQL API in PySpark
26f4953 is described below

commit 26f495370fb45071f52cde6fff199d7f4b674bc7
Author: Hyukjin Kwon 
AuthorDate: Wed Dec 8 13:57:35 2021 +0900

[SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL 
API in PySpark

### What changes were proposed in this pull request?

This PR proposes to use [Python's standard string 
formatter](https://docs.python.org/3/library/string.html#custom-string-formatting)
 in `SparkSession.sql`, see also https://github.com/apache/spark/pull/34677.

### Why are the changes needed?

To improve usability in PySpark. It works together with Python standard 
string formatter.

### Does this PR introduce _any_ user-facing change?

By default, there is no user-facing change. If `kwargs` is specified, yes.

1. Attribute supports from frame (standard Python support):

```python
mydf = spark.range(10)
spark.sql("SELECT {tbl.id}, {tbl[id]} FROM {tbl}", tbl=mydf)
```

2. Understanding `DataFrame`:

```python
mydf = spark.range(10)
spark.sql("SELECT * FROM {tbl}", tbl=mydf)
```

3. Understanding `Column`. (explicit column reference only):

```python
mydf = spark.range(10)
spark.sql("SELECT {c} FROM {tbl}", c=col("id"), tbl=mydf)
```

4. Leveraging other Python string format:

```python
mydf = spark.range(10)
spark.sql(
"SELECT {col} FROM {mydf} WHERE id IN {x}",
col=mydf.id, mydf=mydf, x=tuple(range(4)))
```

### How was this patch tested?

Doctests were added.

Closes #34774 from HyukjinKwon/SPARK-37516.

Authored-by: Hyukjin Kwon 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/sql_formatter.py   | 10 ++--
 python/pyspark/pandas/tests/test_sql.py  |  4 --
 python/pyspark/sql/session.py| 90 +---
 python/pyspark/sql/sql_formatter.py  | 84 +
 python/pyspark/sql/tests/test_session.py | 10 +++-
 5 files changed, 182 insertions(+), 16 deletions(-)

diff --git a/python/pyspark/pandas/sql_formatter.py 
b/python/pyspark/pandas/sql_formatter.py
index 685ee25..4ade2b9 100644
--- a/python/pyspark/pandas/sql_formatter.py
+++ b/python/pyspark/pandas/sql_formatter.py
@@ -163,7 +163,7 @@ def sql(
 return sql_processor.sql(query, index_col=index_col, **kwargs)
 
 session = default_session()
-formatter = SQLStringFormatter(session)
+formatter = PandasSQLStringFormatter(session)
 try:
 sdf = session.sql(formatter.format(query, **kwargs))
 finally:
@@ -178,7 +178,7 @@ def sql(
 )
 
 
-class SQLStringFormatter(string.Formatter):
+class PandasSQLStringFormatter(string.Formatter):
 """
 A standard ``string.Formatter`` in Python that can understand 
pandas-on-Spark instances
 with basic Python objects. This object has to be clear after the use for 
single SQL
@@ -191,7 +191,7 @@ class SQLStringFormatter(string.Formatter):
 self._ref_sers: List[Tuple[Series, str]] = []
 
 def vformat(self, format_string: str, args: Sequence[Any], kwargs: 
Mapping[str, Any]) -> str:
-ret = super(SQLStringFormatter, self).vformat(format_string, args, 
kwargs)
+ret = super(PandasSQLStringFormatter, self).vformat(format_string, 
args, kwargs)
 
 for ref, n in self._ref_sers:
 if not any((ref is v for v in df._pssers.values()) for df, _ in 
self._temp_views):
@@ -200,7 +200,7 @@ class SQLStringFormatter(string.Formatter):
 return ret
 
 def get_field(self, field_name: str, args: Sequence[Any], kwargs: 
Mapping[str, Any]) -> Any:
-obj, first = super(SQLStringFormatter, self).get_field(field_name, 
args, kwargs)
+obj, first = super(PandasSQLStringFormatter, 
self).get_field(field_name, args, kwargs)
 return self._convert_value(obj, field_name), first
 
 def _convert_value(self, val: Any, name: str) -> Optional[str]:
@@ -256,7 +256,7 @@ def _test() -> None:
 globs["ps"] = pyspark.pandas
 spark = (
 SparkSession.builder.master("local[4]")
-.appName("pyspark.pandas.sql_processor tests")
+.appName("pyspark.pandas.sql_formatter tests")
 .getOrCreate()
 )
 (failure_count, test_count) = doctest.testmod(
diff --git a/python/pyspark/pandas/tests/test_sql.py 
b/python/pyspark/pandas/tests/test_sql.py
index ca0dd99..5a5d6d4 100644
--- a/python/pyspark/pandas/tests/test_sql.py
+++ b/python/pyspark/pandas/tests/test_sql.py
@@ -26,10 +26,6 @@

[spark] branch master updated (fb40c0e -> fdc276b)

2021-12-07 Thread sunchao

This is an automated email from the ASF dual-hosted git repository.

sunchao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb40c0e  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
 add fdc276b  [SPARK-37445][BUILD] Rename the maven hadoop profile to 
hadoop-3 and hadoop-2

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml   | 14 
 dev/create-release/release-build.sh| 10 +++---
 ...p-2.7-hive-2.3 => spark-deps-hadoop-2-hive-2.3} |  0
 ...p-3.2-hive-2.3 => spark-deps-hadoop-3-hive-2.3} |  0
 dev/run-tests-jenkins.py   |  8 ++---
 dev/run-tests.py   |  8 ++---
 dev/test-dependencies.sh   | 10 +++---
 docs/building-spark.md |  4 +--
 hadoop-cloud/pom.xml   |  8 ++---
 pom.xml| 14 
 python/docs/source/getting_started/install.rst | 16 -
 python/pyspark/install.py  | 31 
 python/pyspark/tests/test_install_spark.py | 42 +++---
 .../kubernetes/integration-tests/README.md |  6 ++--
 .../dev/dev-run-integration-tests.sh   |  2 +-
 .../kubernetes/integration-tests/pom.xml   |  4 +--
 resource-managers/yarn/pom.xml |  4 +--
 .../org/apache/spark/deploy/yarn/ClientSuite.scala |  2 +-
 sql/core/pom.xml   |  2 +-
 .../hive/HiveExternalCatalogVersionsSuite.scala| 12 ---
 20 files changed, 126 insertions(+), 71 deletions(-)
 rename dev/deps/{spark-deps-hadoop-2.7-hive-2.3 => 
spark-deps-hadoop-2-hive-2.3} (100%)
 rename dev/deps/{spark-deps-hadoop-3.2-hive-2.3 => 
spark-deps-hadoop-3-hive-2.3} (100%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (ce414f8 -> fd7bd3e)

2021-12-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ce414f8  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
 add e9d2a27  Preparing Spark release v3.2.1-rc1
 new fd7bd3e  Preparing development version 3.2.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.2.2-SNAPSHOT

2021-12-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit fd7bd3ef6b7635ce35b4cc35362d814d5204cd27
Author: Huaxin Gao 
AuthorDate: Tue Dec 7 22:29:06 2021 +

Preparing development version 3.2.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..5590c86 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a852011..9584884 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 11cf0cb..167e69f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9957a77..eaf1c1e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b3ea287..811e503 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8fb7d4e..23513f6 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7e4c6c3..c5c6161 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index

[spark] 01/01: Preparing Spark release v3.2.1-rc1

2021-12-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to tag v3.2.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit e9d2a2796a25cab5e7237edbb95b18259e426fe4
Author: Huaxin Gao 
AuthorDate: Tue Dec 7 22:28:57 2021 +

Preparing Spark release v3.2.1-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 38 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 5b2f449..a852011 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index ff66ac6..11cf0cb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9db4231..9957a77 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 1788b0f..b3ea287 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 1528026..8fb7d4e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index be18e9b..7e4c6c3 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index ea063d9..bdf992c 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 0c9842c..e2db52b 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git

[spark] tag v3.2.1-rc1 created (now e9d2a27)

2021-12-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to tag v3.2.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at e9d2a27  (commit)
This tag includes the following new commits:

 new e9d2a27  Preparing Spark release v3.2.1-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

2021-12-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new 113f750  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
113f750 is described below

commit 113f75058f99465281cd2065c22c0456c344be71
Author: Daniel Dai 
AuthorDate: Tue Dec 7 08:48:23 2021 -0600

[SPARK-37556][SQL] Deser void class fail with Java serialization

**What changes were proposed in this pull request?**
Change the deserialization mapping for primitive type void.

**Why are the changes needed?**
The void primitive type in Scala should be classOf[Unit] not classOf[Void]. 
Spark erroneously [map 
it](https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala#L80)
 differently than all other primitive types. Here is the code:
```
private object JavaDeserializationStream {
  val primitiveMappings = Map[String, Class[_]](
"boolean" -> classOf[Boolean],
"byte" -> classOf[Byte],
"char" -> classOf[Char],
"short" -> classOf[Short],
"int" -> classOf[Int],
"long" -> classOf[Long],
"float" -> classOf[Float],
"double" -> classOf[Double],
"void" -> classOf[Void]
  )
}
```
Spark code is Here is the demonstration:
```
scala> classOf[Long]
val res0: Class[Long] = long

scala> classOf[Double]
val res1: Class[Double] = double

scala> classOf[Byte]
val res2: Class[Byte] = byte

scala> classOf[Void]
val res3: Class[Void] = class java.lang.Void  <--- this is wrong

scala> classOf[Unit]
val res4: Class[Unit] = void < this is right
```

It will result in Spark deserialization error if the Spark code contains 
void primitive type:
`java.io.InvalidClassException: java.lang.Void; local class name 
incompatible with stream class name "void"`

**Does this PR introduce any user-facing change?**
no

**How was this patch tested?**
Changed test, also tested e2e with the code results deserialization error 
and it pass now.

Closes #34816 from daijyc/voidtype.

Authored-by: Daniel Dai 
Signed-off-by: Sean Owen 
(cherry picked from commit fb40c0e19f84f2de9a3d69d809e9e4031f76ef90)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala  | 4 ++--
 .../test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 077b035..3c13401 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -87,8 +87,8 @@ private object JavaDeserializationStream {
 "long" -> classOf[Long],
 "float" -> classOf[Float],
 "double" -> classOf[Double],
-"void" -> classOf[Void]
-  )
+"void" -> classOf[Unit])
+
 }
 
 private[spark] class JavaSerializerInstance(
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala 
b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
index 6a6ea42..03349f8 100644
--- a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
@@ -47,5 +47,5 @@ private class ContainsPrimitiveClass extends Serializable {
   val floatClass = classOf[Float]
   val booleanClass = classOf[Boolean]
   val byteClass = classOf[Byte]
-  val voidClass = classOf[Void]
+  val voidClass = classOf[Unit]
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

2021-12-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 2816017  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
2816017 is described below

commit 281601739de100521de6009b4a65efc3e922622a
Author: Daniel Dai 
AuthorDate: Tue Dec 7 08:48:23 2021 -0600

[SPARK-37556][SQL] Deser void class fail with Java serialization

**What changes were proposed in this pull request?**
Change the deserialization mapping for primitive type void.

**Why are the changes needed?**
The void primitive type in Scala should be classOf[Unit] not classOf[Void]. 
Spark erroneously [map 
it](https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala#L80)
 differently than all other primitive types. Here is the code:
```
private object JavaDeserializationStream {
  val primitiveMappings = Map[String, Class[_]](
"boolean" -> classOf[Boolean],
"byte" -> classOf[Byte],
"char" -> classOf[Char],
"short" -> classOf[Short],
"int" -> classOf[Int],
"long" -> classOf[Long],
"float" -> classOf[Float],
"double" -> classOf[Double],
"void" -> classOf[Void]
  )
}
```
Spark code is Here is the demonstration:
```
scala> classOf[Long]
val res0: Class[Long] = long

scala> classOf[Double]
val res1: Class[Double] = double

scala> classOf[Byte]
val res2: Class[Byte] = byte

scala> classOf[Void]
val res3: Class[Void] = class java.lang.Void  <--- this is wrong

scala> classOf[Unit]
val res4: Class[Unit] = void < this is right
```

It will result in Spark deserialization error if the Spark code contains 
void primitive type:
`java.io.InvalidClassException: java.lang.Void; local class name 
incompatible with stream class name "void"`

**Does this PR introduce any user-facing change?**
no

**How was this patch tested?**
Changed test, also tested e2e with the code results deserialization error 
and it pass now.

Closes #34816 from daijyc/voidtype.

Authored-by: Daniel Dai 
Signed-off-by: Sean Owen 
(cherry picked from commit fb40c0e19f84f2de9a3d69d809e9e4031f76ef90)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala  | 4 ++--
 .../test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 077b035..3c13401 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -87,8 +87,8 @@ private object JavaDeserializationStream {
 "long" -> classOf[Long],
 "float" -> classOf[Float],
 "double" -> classOf[Double],
-"void" -> classOf[Void]
-  )
+"void" -> classOf[Unit])
+
 }
 
 private[spark] class JavaSerializerInstance(
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala 
b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
index 6a6ea42..03349f8 100644
--- a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
@@ -47,5 +47,5 @@ private class ContainsPrimitiveClass extends Serializable {
   val floatClass = classOf[Float]
   val booleanClass = classOf[Boolean]
   val byteClass = classOf[Byte]
-  val voidClass = classOf[Void]
+  val voidClass = classOf[Unit]
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

2021-12-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new ce414f8  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
ce414f8 is described below

commit ce414f82eb69a1888f0a166ce8f3bd3f209b15a6
Author: Daniel Dai 
AuthorDate: Tue Dec 7 08:48:23 2021 -0600

[SPARK-37556][SQL] Deser void class fail with Java serialization

**What changes were proposed in this pull request?**
Change the deserialization mapping for primitive type void.

**Why are the changes needed?**
The void primitive type in Scala should be classOf[Unit] not classOf[Void]. 
Spark erroneously [map 
it](https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala#L80)
 differently than all other primitive types. Here is the code:
```
private object JavaDeserializationStream {
  val primitiveMappings = Map[String, Class[_]](
"boolean" -> classOf[Boolean],
"byte" -> classOf[Byte],
"char" -> classOf[Char],
"short" -> classOf[Short],
"int" -> classOf[Int],
"long" -> classOf[Long],
"float" -> classOf[Float],
"double" -> classOf[Double],
"void" -> classOf[Void]
  )
}
```
Spark code is Here is the demonstration:
```
scala> classOf[Long]
val res0: Class[Long] = long

scala> classOf[Double]
val res1: Class[Double] = double

scala> classOf[Byte]
val res2: Class[Byte] = byte

scala> classOf[Void]
val res3: Class[Void] = class java.lang.Void  <--- this is wrong

scala> classOf[Unit]
val res4: Class[Unit] = void < this is right
```

It will result in Spark deserialization error if the Spark code contains 
void primitive type:
`java.io.InvalidClassException: java.lang.Void; local class name 
incompatible with stream class name "void"`

**Does this PR introduce any user-facing change?**
no

**How was this patch tested?**
Changed test, also tested e2e with the code results deserialization error 
and it pass now.

Closes #34816 from daijyc/voidtype.

Authored-by: Daniel Dai 
Signed-off-by: Sean Owen 
(cherry picked from commit fb40c0e19f84f2de9a3d69d809e9e4031f76ef90)
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala  | 4 ++--
 .../test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala  | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 077b035..3c13401 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -87,8 +87,8 @@ private object JavaDeserializationStream {
 "long" -> classOf[Long],
 "float" -> classOf[Float],
 "double" -> classOf[Double],
-"void" -> classOf[Void]
-  )
+"void" -> classOf[Unit])
+
 }
 
 private[spark] class JavaSerializerInstance(
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala 
b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
index 6a6ea42..03349f8 100644
--- a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
@@ -47,5 +47,5 @@ private class ContainsPrimitiveClass extends Serializable {
   val floatClass = classOf[Float]
   val booleanClass = classOf[Boolean]
   val byteClass = classOf[Byte]
-  val voidClass = classOf[Void]
+  val voidClass = classOf[Unit]
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

2021-12-07 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb40c0e  [SPARK-37556][SQL] Deser void class fail with Java 
serialization
fb40c0e is described below

commit fb40c0e19f84f2de9a3d69d809e9e4031f76ef90
Author: Daniel Dai 
AuthorDate: Tue Dec 7 08:48:23 2021 -0600

[SPARK-37556][SQL] Deser void class fail with Java serialization

**What changes were proposed in this pull request?**
Change the deserialization mapping for primitive type void.

**Why are the changes needed?**
The void primitive type in Scala should be classOf[Unit] not classOf[Void]. 
Spark erroneously [map 
it](https://github.com/apache/spark/blob/v3.2.0/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala#L80)
 differently than all other primitive types. Here is the code:
```
private object JavaDeserializationStream {
  val primitiveMappings = Map[String, Class[_]](
"boolean" -> classOf[Boolean],
"byte" -> classOf[Byte],
"char" -> classOf[Char],
"short" -> classOf[Short],
"int" -> classOf[Int],
"long" -> classOf[Long],
"float" -> classOf[Float],
"double" -> classOf[Double],
"void" -> classOf[Void]
  )
}
```
Spark code is Here is the demonstration:
```
scala> classOf[Long]
val res0: Class[Long] = long

scala> classOf[Double]
val res1: Class[Double] = double

scala> classOf[Byte]
val res2: Class[Byte] = byte

scala> classOf[Void]
val res3: Class[Void] = class java.lang.Void  <--- this is wrong

scala> classOf[Unit]
val res4: Class[Unit] = void < this is right
```

It will result in Spark deserialization error if the Spark code contains 
void primitive type:
`java.io.InvalidClassException: java.lang.Void; local class name 
incompatible with stream class name "void"`

**Does this PR introduce any user-facing change?**
no

**How was this patch tested?**
Changed test, also tested e2e with the code results deserialization error 
and it pass now.

Closes #34816 from daijyc/voidtype.

Authored-by: Daniel Dai 
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala| 2 +-
 .../test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
index 9d76611..95d2bdc 100644
--- a/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/JavaSerializer.scala
@@ -99,7 +99,7 @@ private object JavaDeserializationStream {
 "long" -> classOf[Long],
 "float" -> classOf[Float],
 "double" -> classOf[Double],
-"void" -> classOf[Void])
+"void" -> classOf[Unit])
 
 }
 
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala 
b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
index 77226af..6a35fd0 100644
--- a/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/serializer/JavaSerializerSuite.scala
@@ -69,5 +69,5 @@ private class ContainsPrimitiveClass extends Serializable {
   val floatClass = classOf[Float]
   val booleanClass = classOf[Boolean]
   val byteClass = classOf[Byte]
-  val voidClass = classOf[Void]
+  val voidClass = classOf[Unit]
 }

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cd4476f -> f6e2e65)

2021-12-07 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from cd4476f  [SPARK-37469][WEBUI] unified shuffle read block time to 
shuffle read fetch wait time in StagePage
 add f6e2e65  [SPARK-37478][SQL][TESTS] Unify v1 and v2 DROP NAMESPACE tests

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/parser/DDLParserSuite.scala |  27 -
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  64 
 .../spark/sql/execution/command/DDLSuite.scala |  81 +--
 .../command/DescribeNamespaceSuiteBase.scala   |   3 -
 ...rSuite.scala => DropNamespaceParserSuite.scala} |  34 +++---
 .../execution/command/DropNamespaceSuiteBase.scala | 114 +
 ...espacesSuite.scala => DropNamespaceSuite.scala} |  31 +++---
 ...ocationSuite.scala => DropNamespaceSuite.scala} |  18 ++--
 ...titionsSuite.scala => DropNamespaceSuite.scala} |  10 +-
 9 files changed, 170 insertions(+), 212 deletions(-)
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/{AlterNamespaceSetLocationParserSuite.scala
 => DropNamespaceParserSuite.scala} (52%)
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DropNamespaceSuiteBase.scala
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/{ShowNamespacesSuite.scala
 => DropNamespaceSuite.scala} (58%)
 copy 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/{AlterNamespaceSetLocationSuite.scala
 => DropNamespaceSuite.scala} (63%)
 copy 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/{AlterTableRecoverPartitionsSuite.scala
 => DropNamespaceSuite.scala} (79%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

[spark] branch branch-3.2 updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

[spark] branch master updated: [SPARK-37392][SQL] Fix the performance bug when inferring constraints for Generate

[spark] branch master updated: [SPARK-37516][PYTHON][SQL] Uses Python's standard string formatter for SQL API in PySpark

[spark] branch master updated (fb40c0e -> fdc276b)

[spark] branch branch-3.2 updated (ce414f8 -> fd7bd3e)

[spark] 01/01: Preparing development version 3.2.2-SNAPSHOT

[spark] 01/01: Preparing Spark release v3.2.1-rc1

[spark] tag v3.2.1-rc1 created (now e9d2a27)

[spark] branch branch-3.0 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

[spark] branch branch-3.1 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

[spark] branch branch-3.2 updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

[spark] branch master updated: [SPARK-37556][SQL] Deser void class fail with Java serialization

[spark] branch master updated (cd4476f -> f6e2e65)

14 matches

Site Navigation

Mail list logo

Footer information