from:"huaxingao"

[spark] branch master updated: [SPARK-44060][SQL] Code-gen for build side outer shuffled hash join

2023-06-30 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2db8cfb3bd9 [SPARK-44060][SQL] Code-gen for build side outer shuffled 
hash join
2db8cfb3bd9 is described below

commit 2db8cfb3bd9bf5e85379c6d5ca414d36cfd9292d
Author: Szehon Ho 
AuthorDate: Fri Jun 30 22:04:22 2023 -0700

[SPARK-44060][SQL] Code-gen for build side outer shuffled hash join

### What changes were proposed in this pull request?
Codegen of shuffled hash join of build side outer join (ie, left outer join 
build left or right outer join build right)

 ### Why are the changes needed?
The implementation of https://github.com/apache/spark/pull/41398 was only 
for non-codegen version, and codegen was disabled in this scenario.

 ### Does this PR introduce _any_ user-facing change?
No

 ### How was this patch tested?
New unit test in WholeStageCodegenSuite

Closes #41614 from szehon-ho/same_side_outer_join_codegen_master.

Authored-by: Szehon Ho 
Signed-off-by: huaxingao 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|   9 ++
 .../sql/execution/joins/ShuffledHashJoinExec.scala |  68 ++
 .../scala/org/apache/spark/sql/JoinSuite.scala | 146 +++--
 .../sql/execution/WholeStageCodegenSuite.scala |  89 +
 4 files changed, 217 insertions(+), 95 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index d60f5d170e7..270508139e4 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2182,6 +2182,15 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val ENABLE_BUILD_SIDE_OUTER_SHUFFLED_HASH_JOIN_CODEGEN =
+buildConf("spark.sql.codegen.join.buildSideOuterShuffledHashJoin.enabled")
+  .internal()
+  .doc("When true, enable code-gen for an OUTER shuffled hash join where 
outer side" +
+" is the build side.")
+  .version("3.5.0")
+  .booleanConf
+  .createWithDefault(true)
+
   val ENABLE_FULL_OUTER_SORT_MERGE_JOIN_CODEGEN =
 buildConf("spark.sql.codegen.join.fullOuterSortMergeJoin.enabled")
   .internal()
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
index 8953bf19f35..974f6f9e50c 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
@@ -340,8 +340,10 @@ case class ShuffledHashJoinExec(
 
   override def supportCodegen: Boolean = joinType match {
 case FullOuter => 
conf.getConf(SQLConf.ENABLE_FULL_OUTER_SHUFFLED_HASH_JOIN_CODEGEN)
-case LeftOuter if buildSide == BuildLeft => false
-case RightOuter if buildSide == BuildRight => false
+case LeftOuter if buildSide == BuildLeft =>
+  conf.getConf(SQLConf.ENABLE_BUILD_SIDE_OUTER_SHUFFLED_HASH_JOIN_CODEGEN)
+case RightOuter if buildSide == BuildRight =>
+  conf.getConf(SQLConf.ENABLE_BUILD_SIDE_OUTER_SHUFFLED_HASH_JOIN_CODEGEN)
 case _ => true
   }
 
@@ -362,9 +364,15 @@ case class ShuffledHashJoinExec(
   }
 
   override def doProduce(ctx: CodegenContext): String = {
-// Specialize `doProduce` code for full outer join, because full outer 
join needs to
-// iterate streamed and build side separately.
-if (joinType != FullOuter) {
+// Specialize `doProduce` code for full outer join and build-side outer 
join,
+// because we need to iterate streamed and build side separately.
+val specializedProduce = joinType match {
+  case FullOuter => true
+  case LeftOuter if buildSide == BuildLeft => true
+  case RightOuter if buildSide == BuildRight => true
+  case _ => false
+}
+if (!specializedProduce) {
   return super.doProduce(ctx)
 }
 
@@ -407,21 +415,24 @@ case class ShuffledHashJoinExec(
   case BuildLeft => buildResultVars ++ streamedResultVars
   case BuildRight => streamedResultVars ++ buildResultVars
 }
-val consumeFullOuterJoinRow = ctx.freshName("consumeFullOuterJoinRow")
-ctx.addNewFunction(consumeFullOuterJoinRow,
+val consumeOuterJoinRow = ctx.freshName("consumeOuterJoinRow")
+ctx.addNewFunction(consumeOuterJoinRow,
   s"""
- |private void $consumeFullOuterJoinRow() throws java.io.IOException {
+ |private void $consumeOuterJoinRow() thr

[spark] branch master updated: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0effbec16ed [SPARK-36612][SQL] Support left outer join build left or 
right outer join build right in shuffled hash join
0effbec16ed is described below

commit 0effbec16edc27c644d4089bdf266cd4ecbed235
Author: Szehon Ho 
AuthorDate: Fri Jun 2 10:48:32 2023 -0700

[SPARK-36612][SQL] Support left outer join build left or right outer join 
build right in shuffled hash join

### What changes were proposed in this pull request?
Add support for shuffle-hash join for following scenarios:

* left outer join with left-side build
* right outer join with right-side build

The algorithm is similar to SPARK-32399, which supports shuffle-hash join 
for full outer join.

The same methods fullOuterJoinWithUniqueKey and 
fullOuterJoinWithNonUniqueKey are improved to support the new case. These 
methods are called after the HashedRelation is already constructed of the build 
side, and do these two iterations:

1.  Iterate Stream side.
  a. If find match on build side, mark.
  b. If no match on build side, join with null build-side row and add to 
result
2. Iterate build side.
  a. If find marked for match, add joined row to result
  b. If no match marked, join with null stream-side row

The left outer join with left-side build, and right outer join with 
right-side build, need only a subset of these logics, namely replacing 1b above 
with a no-op.

Codegen is left for a follow-up PR.

### Why are the changes needed?
For joins of these types, shuffle-hash join can be more performant than 
sort-merge join, especially if the big table is large, as it skips an expensive 
sort of the big table.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Unit test in JoinSuite.scala

Closes #41398 from szehon-ho/same_side_outer_build_join_master.

Authored-by: Szehon Ho 
Signed-off-by: huaxingao 
---
 .../spark/sql/catalyst/optimizer/joins.scala   |  4 +-
 .../spark/sql/execution/joins/HashedRelation.scala |  1 -
 .../sql/execution/joins/ShuffledHashJoinExec.scala | 74 ++---
 .../scala/org/apache/spark/sql/JoinHintSuite.scala | 30 ++---
 .../scala/org/apache/spark/sql/JoinSuite.scala | 77 ++
 5 files changed, 151 insertions(+), 35 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
index 972be43a946..48b4007a897 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
@@ -374,14 +374,14 @@ trait JoinSelectionHelper {
 
   def canBuildShuffledHashJoinLeft(joinType: JoinType): Boolean = {
 joinType match {
-  case _: InnerLike | RightOuter | FullOuter => true
+  case _: InnerLike | LeftOuter | FullOuter | RightOuter => true
   case _ => false
 }
   }
 
   def canBuildShuffledHashJoinRight(joinType: JoinType): Boolean = {
 joinType match {
-  case _: InnerLike | LeftOuter | FullOuter |
+  case _: InnerLike | LeftOuter | FullOuter | RightOuter |
LeftSemi | LeftAnti | _: ExistenceJoin => true
   case _ => false
 }
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
index 4d3e63282fa..16345bb35db 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
@@ -127,7 +127,6 @@ private[execution] object HashedRelation {
* Create a HashedRelation from an Iterator of InternalRow.
*
* @param allowsNullKeyAllow NULL keys in HashedRelation.
-   * This is used for full outer join in 
`ShuffledHashJoinExec` only.
* @param ignoresDuplicatedKey Ignore rows with duplicated keys in 
HashedRelation.
* This is only used for semi and anti join 
without join condition in
* `ShuffledHashJoinExec` only.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
index cfe35d04778..8953bf19f35 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHas

[spark] branch branch-3.3 updated: [SPARK-41660][SQL][3.3] Only propagate metadata columns if they are used

2023-04-21 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 6d0a271c059 [SPARK-41660][SQL][3.3] Only propagate metadata columns if 
they are used
6d0a271c059 is described below

commit 6d0a271c0595d46384e18f8d292afe2d2e04e2c2
Author: huaxingao 
AuthorDate: Fri Apr 21 07:46:00 2023 -0700

[SPARK-41660][SQL][3.3] Only propagate metadata columns if they are used

### What changes were proposed in this pull request?
backporting https://github.com/apache/spark/pull/39152 to 3.3

### Why are the changes needed?
bug fixing

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
UT

Closes #40889 from huaxingao/metadata.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala   | 16 ++--
 .../spark/sql/connector/MetadataColumnSuite.scala   | 17 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 4a5c0b4aa88..526dfd8ab5e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -925,7 +925,7 @@ class Analyzer(override val catalogManager: CatalogManager)
 if (metaCols.isEmpty) {
   node
 } else {
-  val newNode = addMetadataCol(node)
+  val newNode = node.mapChildren(addMetadataCol(_, 
metaCols.map(_.exprId).toSet))
   // We should not change the output schema of the plan. We should 
project away the extra
   // metadata columns if necessary.
   if (newNode.sameOutput(node)) {
@@ -959,16 +959,20 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   })
 }
 
-private def addMetadataCol(plan: LogicalPlan): LogicalPlan = plan match {
-  case s: ExposesMetadataColumns => s.withMetadataColumns()
-  case p: Project =>
+private def addMetadataCol(
+plan: LogicalPlan,
+requiredAttrIds: Set[ExprId]): LogicalPlan = plan match {
+  case s: ExposesMetadataColumns if s.metadataOutput.exists(a =>
+requiredAttrIds.contains(a.exprId)) =>
+s.withMetadataColumns()
+  case p: Project if p.metadataOutput.exists(a => 
requiredAttrIds.contains(a.exprId)) =>
 val newProj = p.copy(
   // Do not leak the qualified-access-only restriction to normal plan 
outputs.
   projectList = p.projectList ++ 
p.metadataOutput.map(_.markAsAllowAnyAccess()),
-  child = addMetadataCol(p.child))
+  child = addMetadataCol(p.child, requiredAttrIds))
 newProj.copyTagsFrom(p)
 newProj
-  case _ => plan.withNewChildren(plan.children.map(addMetadataCol))
+  case _ => plan.withNewChildren(plan.children.map(addMetadataCol(_, 
requiredAttrIds)))
 }
   }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala
index 7f0e74f6bc7..70338bffed0 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/MetadataColumnSuite.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.connector
 
 import org.apache.spark.sql.{AnalysisException, Row}
+import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation
 import org.apache.spark.sql.functions.struct
 
 class MetadataColumnSuite extends DatasourceV2SQLBase {
@@ -232,4 +233,20 @@ class MetadataColumnSuite extends DatasourceV2SQLBase {
   )
 }
   }
+
+  test("SPARK-41660: only propagate metadata columns if they are used") {
+withTable(tbl) {
+  prepareTable()
+  val df = sql(s"SELECT t2.id FROM $tbl t1 JOIN $tbl t2 USING (id)")
+  val scans = df.logicalPlan.collect {
+case d: DataSourceV2Relation => d
+  }
+  assert(scans.length == 2)
+  scans.foreach { scan =>
+// The query only access join hidden columns, and scan nodes should 
not expose its metadata
+// columns.
+assert(scan.output.map(_.name) == Seq("id", "data"))
+  }
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42470][SQL] Remove unused declarations from Hive module

2023-02-17 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1389c9f5a93 [SPARK-42470][SQL] Remove unused declarations from Hive 
module
1389c9f5a93 is described below

commit 1389c9f5a932bb085c9589a6d5f1455e70d0d583
Author: yangjie01 
AuthorDate: Fri Feb 17 20:51:24 2023 -0800

[SPARK-42470][SQL] Remove unused declarations from Hive module

### What changes were proposed in this pull request?
This pr cleans up unused declarations in the Hive module:

- Input parameter `dataTypes` of `HiveInspectors#wrap` method: the input 
parameter `dataTypes` introduced by SPARK-9354, but after SPARK-17509, the 
implementation of `HiveInspectors#wrap` no longer needs to explicitly pass 
`dataTypes` and it becomes a unused, and `inputDataTypes` in `HiveSimpleUDF` 
becomes a unused after this pr

- `UNLIMITED_DECIMAL_PRECISION` and `UNLIMITED_DECIMAL_SCALE` in 
`HiveShim`: these two `val` introduced by SPARK-6909 for unlimited decimals, 
but SPARK-9069 remove unlimited precision support for DecimalType and  
SPARK-14877 deleted `object HiveMetastoreTypes` and used `.catalogString` 
instead, these two `val` are not used anymore.

### Why are the changes needed?
Code clean up.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #40053 from LuciferYang/sql-hive-unused.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala| 3 +--
 sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala | 4 
 sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala | 5 +
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
index 9d8437b068d..8ff96fa63c2 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
@@ -806,8 +806,7 @@ private[hive] trait HiveInspectors {
   def wrap(
   row: Seq[Any],
   wrappers: Array[(Any) => Any],
-  cache: Array[AnyRef],
-  dataTypes: Array[DataType]): Array[AnyRef] = {
+  cache: Array[AnyRef]): Array[AnyRef] = {
 var i = 0
 val length = wrappers.length
 while (i < length) {
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
index 351cde58427..6605d297010 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
@@ -40,10 +40,6 @@ import org.apache.spark.sql.types.Decimal
 import org.apache.spark.util.Utils
 
 private[hive] object HiveShim {
-  // Precision and scale to pass for unlimited decimals; these are the same as 
the precision and
-  // scale Hive 0.13 infers for BigDecimals from sources that don't specify 
them (e.g. UDFs)
-  val UNLIMITED_DECIMAL_PRECISION = 38
-  val UNLIMITED_DECIMAL_SCALE = 18
   val HIVE_GENERIC_UDF_MACRO_CLS = 
"org.apache.hadoop.hive.ql.udf.generic.GenericUDFMacro"
 
   /*
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
index d5cff31ed64..67229d494a2 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
@@ -91,12 +91,9 @@ private[hive] case class HiveSimpleUDF(
   @transient
   private lazy val cached: Array[AnyRef] = new Array[AnyRef](children.length)
 
-  @transient
-  private lazy val inputDataTypes: Array[DataType] = 
children.map(_.dataType).toArray
-
   // TODO: Finish input output types.
   override def eval(input: InternalRow): Any = {
-val inputs = wrap(children.map(_.eval(input)), wrappers, cached, 
inputDataTypes)
+val inputs = wrap(children.map(_.eval(input)), wrappers, cached)
 val ret = FunctionRegistry.invoke(
   method,
   function,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-42188][BUILD][3.2] Force SBT protobuf version to match Maven

2023-01-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 1800bff498a [SPARK-42188][BUILD][3.2] Force SBT protobuf version to 
match Maven
1800bff498a is described below

commit 1800bff498a0d52ddfdc6f376dd164ff379e7c82
Author: Steve Vaughan Jr 
AuthorDate: Wed Jan 25 18:23:00 2023 -0800

[SPARK-42188][BUILD][3.2] Force SBT protobuf version to match Maven

### What changes were proposed in this pull request?
Update `SparkBuild.scala` to force SBT use of `protobuf-java` to match the 
Maven version.  The Maven dependencyManagement section forces `protobuf-java` 
to use `2.5.0`, but SBT is using `3.14.0`.

### Why are the changes needed?
Define `protoVersion` in `SparkBuild.scala` and use it in 
`DependencyOverrides` to force the SBT version of `protobuf-java` to match the 
setting defined in the Maven top-level `pom.xml`.  Add comments to both 
`pom.xml` and `SparkBuild.scala` to ensure that the values are kept in sync.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Before the update, SBT reported using `3.14.0`:
```
% build/sbt dependencyTree | grep proto | sed 's/^.*-com/com/' | sort | 
uniq -c
   8 com.google.protobuf:protobuf-java:2.5.0 (evicted by: 3.14.0)
  70 com.google.protobuf:protobuf-java:3.14.0
```

After the patch is applied, SBT reports using `2.5.0`:
```
% build/sbt dependencyTree | grep proto | sed 's/^.*-com/com/' | sort | 
uniq -c
  70 com.google.protobuf:protobuf-java:2.5.0
```

Closes #39745 from snmvaughan/feature/SPARK-42188.

Authored-by: Steve Vaughan Jr 
Signed-off-by: huaxingao 
---
 pom.xml  | 1 +
 project/SparkBuild.scala | 6 ++
 2 files changed, 7 insertions(+)

diff --git a/pom.xml b/pom.xml
index 8127d69476c..99ed2308da3 100644
--- a/pom.xml
+++ b/pom.xml
@@ -121,6 +121,7 @@
 1.7.30
 1.2.17
 3.3.1
+
 2.5.0
 ${hadoop.version}
 3.6.2
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 8c1dbceb8b3..435040344c3 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -79,6 +79,9 @@ object BuildCommons {
   val testTempDir = s"$sparkHome/target/tmp"
 
   val javaVersion = settingKey[String]("source and target JVM version for 
javac and scalac")
+
+  // SPARK-42188: needs to be consistent with `protobuf.version` in `pom.xml`.
+  val protoVersion = "2.5.0"
 }
 
 object SparkBuild extends PomBuild {
@@ -676,9 +679,12 @@ object KubernetesIntegrationTests {
  * Overrides to work around sbt's dependency resolution being different from 
Maven's.
  */
 object DependencyOverrides {
+  import BuildCommons.protoVersion
+
   lazy val guavaVersion = sys.props.get("guava.version").getOrElse("14.0.1")
   lazy val settings = Seq(
 dependencyOverrides += "com.google.guava" % "guava" % guavaVersion,
+dependencyOverrides += "com.google.protobuf" % "protobuf-java" % 
protoVersion,
 dependencyOverrides += "xerces" % "xercesImpl" % "2.12.0",
 dependencyOverrides += "jline" % "jline" % "2.14.6",
 dependencyOverrides += "org.apache.avro" % "avro" % "1.10.2")


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated (d69e7b6ae46 -> 518a24c6116)

2023-01-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from d69e7b6ae46 [SPARK-42179][BUILD][SQL][3.3] Upgrade ORC to 1.7.8
 add 518a24c6116 [SPARK-42188][BUILD][3.3] Force SBT protobuf version to 
match Maven

No new revisions were added by this update.

Summary of changes:
 pom.xml  | 1 +
 project/SparkBuild.scala | 6 ++
 2 files changed, 7 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 8f09a699025 [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() 
to handle filters without referenced attributes
8f09a699025 is described below

commit 8f09a699025015e1882bb593b38be1b1acf62ff0
Author: Peter Toth 
AuthorDate: Fri Jan 20 18:35:33 2023 -0800

[SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle 
filters without referenced attributes

### What changes were proposed in this pull request?
This is a small correctness fix to 
`DataSourceUtils.getPartitionFiltersAndDataFilters()` to handle filters without 
any referenced attributes correctly. E.g. without the fix the following query 
on ParquetV2 source:
```
spark.conf.set("spark.sql.sources.useV1SourceList", "")
spark.range(1).write.mode("overwrite").format("parquet").save(path)
df = spark.read.parquet(path).toDF("i")
f = udf(lambda x: False, "boolean")(lit(1))
val r = df.filter(f)
r.show()
```
returns
```
+---+
|  i|
+---+
|  0|
+---+
```
but it should return with empty results.
The root cause of the issue is that during `V2ScanRelationPushDown` a 
filter that doesn't reference any column is incorrectly identified as partition 
filter.

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Added new UT.

Closes #39676 from 
peter-toth/SPARK-42134-fix-getpartitionfiltersanddatafilters.
    
Authored-by: Peter Toth 
Signed-off-by: huaxingao 
(cherry picked from commit dcdcb80c53681d1daff416c007cf8a2810155625)
Signed-off-by: huaxingao 
---
 python/pyspark/sql/tests/test_udf.py   | 18 ++
 .../sql/execution/datasources/DataSourceUtils.scala|  2 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_udf.py 
b/python/pyspark/sql/tests/test_udf.py
index 34ac08cb818..1df16f151e1 100644
--- a/python/pyspark/sql/tests/test_udf.py
+++ b/python/pyspark/sql/tests/test_udf.py
@@ -682,6 +682,24 @@ class UDFTests(ReusedSQLTestCase):
 finally:
 shutil.rmtree(path)
 
+# SPARK-42134
+def test_file_dsv2_with_udf_filter(self):
+from pyspark.sql.functions import lit
+
+path = tempfile.mkdtemp()
+shutil.rmtree(path)
+
+try:
+with self.sql_conf({"spark.sql.sources.useV1SourceList": ""}):
+
self.spark.range(1).write.mode("overwrite").format("parquet").save(path)
+df = self.spark.read.parquet(path).toDF("i")
+f = udf(lambda x: False, "boolean")(lit(1))
+result = df.filter(f)
+self.assertEqual(0, result.count())
+
+finally:
+shutil.rmtree(path)
+
 # SPARK-25591
 def test_same_accumulator_in_udfs(self):
 data_schema = StructType(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
index 15d40a78f23..b61f19f4765 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
@@ -274,7 +274,7 @@ object DataSourceUtils extends PredicateHelper {
 }
 val partitionSet = AttributeSet(partitionColumns)
 val (partitionFilters, dataFilters) = normalizedFilters.partition(f =>
-  f.references.subsetOf(partitionSet)
+  f.references.nonEmpty && f.references.subsetOf(partitionSet)
 )
 val extraPartitionFilter =
   dataFilters.flatMap(extractPredicatesWithinOutputSet(_, partitionSet))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dcdcb80c536 [SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() 
to handle filters without referenced attributes
dcdcb80c536 is described below

commit dcdcb80c53681d1daff416c007cf8a2810155625
Author: Peter Toth 
AuthorDate: Fri Jan 20 18:35:33 2023 -0800

[SPARK-42134][SQL] Fix getPartitionFiltersAndDataFilters() to handle 
filters without referenced attributes

### What changes were proposed in this pull request?
This is a small correctness fix to 
`DataSourceUtils.getPartitionFiltersAndDataFilters()` to handle filters without 
any referenced attributes correctly. E.g. without the fix the following query 
on ParquetV2 source:
```
spark.conf.set("spark.sql.sources.useV1SourceList", "")
spark.range(1).write.mode("overwrite").format("parquet").save(path)
df = spark.read.parquet(path).toDF("i")
f = udf(lambda x: False, "boolean")(lit(1))
val r = df.filter(f)
r.show()
```
returns
```
+---+
|  i|
+---+
|  0|
+---+
```
but it should return with empty results.
The root cause of the issue is that during `V2ScanRelationPushDown` a 
filter that doesn't reference any column is incorrectly identified as partition 
filter.

### Why are the changes needed?
To fix a correctness issue.

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Added new UT.

Closes #39676 from 
peter-toth/SPARK-42134-fix-getpartitionfiltersanddatafilters.
    
Authored-by: Peter Toth 
Signed-off-by: huaxingao 
---
 python/pyspark/sql/tests/test_udf.py   | 18 ++
 .../sql/execution/datasources/DataSourceUtils.scala|  2 +-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/python/pyspark/sql/tests/test_udf.py 
b/python/pyspark/sql/tests/test_udf.py
index 79fee7a48e5..1a2ec213ca6 100644
--- a/python/pyspark/sql/tests/test_udf.py
+++ b/python/pyspark/sql/tests/test_udf.py
@@ -681,6 +681,24 @@ class BaseUDFTests(object):
 finally:
 shutil.rmtree(path)
 
+# SPARK-42134
+def test_file_dsv2_with_udf_filter(self):
+from pyspark.sql.functions import lit
+
+path = tempfile.mkdtemp()
+shutil.rmtree(path)
+
+try:
+with self.sql_conf({"spark.sql.sources.useV1SourceList": ""}):
+
self.spark.range(1).write.mode("overwrite").format("parquet").save(path)
+df = self.spark.read.parquet(path).toDF("i")
+f = udf(lambda x: False, "boolean")(lit(1))
+result = df.filter(f)
+self.assertEqual(0, result.count())
+
+finally:
+shutil.rmtree(path)
+
 # SPARK-25591
 def test_same_accumulator_in_udfs(self):
 data_schema = StructType(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
index 26f22670a51..5eb422f80e2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
@@ -273,7 +273,7 @@ object DataSourceUtils extends PredicateHelper {
 }
 val partitionSet = AttributeSet(partitionColumns)
 val (partitionFilters, dataFilters) = normalizedFilters.partition(f =>
-  f.references.subsetOf(partitionSet)
+  f.references.nonEmpty && f.references.subsetOf(partitionSet)
 )
 val extraPartitionFilter =
   dataFilters.flatMap(extractPredicatesWithinOutputSet(_, partitionSet))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override

2023-01-12 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new af935e6ef19 [SPARK-42031][CORE][SQL] Clean up `remove` methods that do 
not need override
af935e6ef19 is described below

commit af935e6ef19e34329d0210ebc17cc6e7359458d2
Author: yangjie01 
AuthorDate: Thu Jan 12 13:47:56 2023 -0800

[SPARK-42031][CORE][SQL] Clean up `remove` methods that do not need override

### What changes were proposed in this pull request?
Java 8 began to provide the default remove method implementation for the 
`java.util.Iterator` interface.


https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94

```java
default void remove() {
throw new UnsupportedOperationException("remove");
}
```

So this pr cleans up unnecessary remove method `overrdie` in Spark code.

### Why are the changes needed?
Code cleanup

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

Closes #39533 from LuciferYang/cleanup-remove.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java   | 5 -
 .../src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java | 5 -
 .../src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java | 5 -
 core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java  | 5 -
 .../src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java | 5 -
 .../apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala   | 4 
 .../src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java| 5 -
 7 files changed, 34 deletions(-)

diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
index 431c7e42774..a353a53d4b8 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java
@@ -468,11 +468,6 @@ public class InMemoryStore implements KVStore {
   return iter.next();
 }
 
-@Override
-public void remove() {
-  throw new UnsupportedOperationException();
-}
-
 @Override
 public List next(int max) {
   List list = new ArrayList<>(max);
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
index 62f7768ea7f..35d0c6065fb 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java
@@ -143,11 +143,6 @@ class LevelDBIterator implements KVStoreIterator {
 }
   }
 
-  @Override
-  public void remove() {
-throw new UnsupportedOperationException();
-  }
-
   @Override
   public List next(int max) {
 List list = new ArrayList<>(max);
diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java
 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java
index fce50a5fc22..2b12fddef65 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java
@@ -134,11 +134,6 @@ class RocksDBIterator implements KVStoreIterator {
 }
   }
 
-  @Override
-  public void remove() {
-throw new UnsupportedOperationException();
-  }
-
   @Override
   public List next(int max) {
 List list = new ArrayList<>(max);
diff --git 
a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java 
b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
index f4f4052b4fa..35c5efc77f6 100644
--- a/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
+++ b/core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java
@@ -387,11 +387,6 @@ public final class BytesToBytesMap extends MemoryConsumer {
   return released;
 }
 
-@Override
-public void remove() {
-  throw new UnsupportedOperationException();
-}
-
 private void handleFailedDelete() {
   if (spillWriters.size() > 0) {
 // remove the spill file from disk
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java 
b/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java
index b0fd1486420..9e859e77644 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java
+++ 
b/sql/catalyst

[spark] branch branch-3.3 updated (cd9f5642060 -> b18d582c7a0)

2022-09-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from cd9f5642060 [SPARK-40389][SQL] Decimals can't upcast as integral types 
if the cast can overflow
 add b18d582c7a0 [SPARK-40280][SQL][FOLLOWUP][3.3] Fix 'ParquetFilterSuite' 
issue

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (24818bf4c9d -> 8a882d5da58)

2022-09-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from 24818bf4c9d [SPARK-40280][SQL] Add support for parquet push down for 
annotated int and long
 add 8a882d5da58 [SPARK-40280][SQL][FOLLOWUP][3.2] Fix 'ParquetFilterSuite' 
issue

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (64dd81d97da -> cee30cb4994)

2022-08-30 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 64dd81d97da [SPARK-40056][BUILD] Upgrade mvn-scalafmt from 1.0.4 to 
1.1.1640084764.9f463a9
 add cee30cb4994 [SPARK-40113][SQL] Reactor ParquetScanBuilder DataSourceV2 
interface implementations

No new revisions were added by this update.

Summary of changes:
 .../v2/parquet/ParquetScanBuilder.scala| 26 --
 1 file changed, 9 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (cbd09346d40 -> 42719d9425b)

2022-08-24 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from cbd09346d40 [SPARK-40094][CORE] Send TaskEnd event when task failed 
with NotSerializableException or TaskOutputFileAlreadyExistException
 add 42719d9425b [SPARK-39528][SQL][FOLLOWUP] Make 
DynamicPartitionPruningV2FilterSuite extend DynamicPartitionPruningV2Suite

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala| 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-40064][SQL] Use V2 Filter in SupportsOverwrite

2022-08-15 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1103343e71f [SPARK-40064][SQL] Use V2 Filter in SupportsOverwrite
1103343e71f is described below

commit 1103343e71fbcb478fa41941c87d2c28b0c09281
Author: huaxingao 
AuthorDate: Mon Aug 15 10:58:14 2022 -0700

[SPARK-40064][SQL] Use V2 Filter in SupportsOverwrite

### What changes were proposed in this pull request?
Migrate `SupportsOverwrite` to use V2 Filter

### Why are the changes needed?
this is part of the V2Filter migration work

### Does this PR introduce _any_ user-facing change?
Yes
add `SupportsOverwriteV2`

### How was this patch tested?
new tests

Closes #37502 from huaxingao/v2overwrite.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../sql/connector/catalog/TableCapability.java |   2 +-
 .../connector/write/SupportsDynamicOverwrite.java  |   2 +-
 .../sql/connector/write/SupportsOverwrite.java |  31 ++-
 ...ortsOverwrite.java => SupportsOverwriteV2.java} |  31 ++-
 .../sql/connector/catalog/InMemoryBaseTable.scala  | 138 +++-
 .../sql/connector/catalog/InMemoryTable.scala  |  99 -
 .../catalog/InMemoryTableWithV2Filter.scala|  72 +--
 .../sql/execution/datasources/v2/V2Writes.scala|  23 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 233 +++--
 .../spark/sql/connector/DeleteFromTests.scala  | 132 
 .../spark/sql/connector/V1WriteFallbackSuite.scala |   4 +-
 11 files changed, 412 insertions(+), 355 deletions(-)

diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCapability.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCapability.java
index 5bb42fb4b31..5732c0f3af4 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCapability.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableCapability.java
@@ -76,7 +76,7 @@ public enum TableCapability {
* Signals that the table can replace existing data that matches a filter 
with appended data in
* a write operation.
* 
-   * See {@link org.apache.spark.sql.connector.write.SupportsOverwrite}.
+   * See {@link org.apache.spark.sql.connector.write.SupportsOverwriteV2}.
*/
   OVERWRITE_BY_FILTER,
 
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsDynamicOverwrite.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsDynamicOverwrite.java
index 422cd71d345..0288a679891 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsDynamicOverwrite.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsDynamicOverwrite.java
@@ -27,7 +27,7 @@ import org.apache.spark.annotation.Evolving;
  * write does not contain data will remain unchanged.
  * 
  * This is provided to implement SQL compatible with Hive table operations but 
is not recommended.
- * Instead, use the {@link SupportsOverwrite overwrite by filter API} to 
explicitly replace data.
+ * Instead, use the {@link SupportsOverwriteV2 overwrite by filter API} to 
explicitly replace data.
  *
  * @since 3.0.0
  */
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsOverwrite.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsOverwrite.java
index b4e60257942..51bec236088 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsOverwrite.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsOverwrite.java
@@ -18,6 +18,8 @@
 package org.apache.spark.sql.connector.write;
 
 import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.connector.expressions.filter.Predicate;
+import org.apache.spark.sql.internal.connector.PredicateUtils;
 import org.apache.spark.sql.sources.AlwaysTrue$;
 import org.apache.spark.sql.sources.Filter;
 
@@ -30,7 +32,24 @@ import org.apache.spark.sql.sources.Filter;
  * @since 3.0.0
  */
 @Evolving
-public interface SupportsOverwrite extends WriteBuilder, SupportsTruncate {
+public interface SupportsOverwrite extends SupportsOverwriteV2 {
+
+  /**
+   * Checks whether it is possible to overwrite data from a data source table 
that matches filter
+   * expressions.
+   * 
+   * Rows should be overwritten from the data source iff all of the filter 
expressions match.
+   * That is, the expressions must be interpreted as a set of filters that are 
ANDed together.
+   *
+   * @param filters V2 filter expressions, used to match data to overwrite
+   * @return true if the delete operation can be performed
+   *
+   * @since 3.4.0
+   */
+  default boolean canOverwrite(Fil

[spark] branch master updated (71792411083 -> 9dff034bdef)

2022-08-11 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 71792411083 [SPARK-40027][PYTHON][SS][DOCS] Add self-contained 
examples for pyspark.sql.streaming.readwriter
 add 9dff034bdef [SPARK-39966][SQL] Use V2 Filter in SupportsDelete

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/catalog/SupportsDelete.java  |  14 +-
 .../{SupportsDelete.java => SupportsDeleteV2.java} |  32 +-
 .../connector/read/SupportsRuntimeFiltering.java   |  19 +-
 .../catalyst/analysis/RewriteDeleteFromTable.scala |   6 +-
 .../sql/catalyst/plans/logical/v2Commands.scala|   5 +-
 .../spark/sql/errors/QueryCompilationErrors.scala  |   3 +-
 .../datasources/v2/DataSourceV2Implicits.scala |   6 +-
 .../sql/internal/connector/PredicateUtils.scala|   8 +-
 ...InMemoryTable.scala => InMemoryBaseTable.scala} |  38 +-
 .../sql/connector/catalog/InMemoryTable.scala  | 646 +
 .../catalog/InMemoryTableWithV2Filter.scala|  81 ++-
 .../datasources/v2/DataSourceV2Strategy.scala  |   8 +-
 .../datasources/v2/DeleteFromTableExec.scala   |   8 +-
 .../v2/OptimizeMetadataOnlyDeleteFromTable.scala   |  12 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala |  89 ++-
 .../spark/sql/connector/DatasourceV2SQLBase.scala  |   4 +-
 .../spark/sql/connector/V1WriteFallbackSuite.scala |   4 +-
 17 files changed, 265 insertions(+), 718 deletions(-)
 copy 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/{SupportsDelete.java
 => SupportsDeleteV2.java} (74%)
 copy 
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/{InMemoryTable.scala
 => InMemoryBaseTable.scala} (94%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion

2022-08-01 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2ef738205c0 [SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion
2ef738205c0 is described below

commit 2ef738205c0d4598a577a248afc117ac0844f3ad
Author: huaxingao 
AuthorDate: Mon Aug 1 11:23:13 2022 -0700

[SPARK-39914][SQL] Add DS V2 Filter to V1 Filter conversion

### What changes were proposed in this pull request?
Add util methods to convert DS V2 Filter to V1 Filter.

### Why are the changes needed?
Provide convenient methods to convert V2 to V1 Filters. These methods can 
be used by 
[`SupportsRuntimeFiltering`](https://github.com/apache/spark/pull/36918/files#diff-0d3268f351817ca948e75e7b6641e5cc67c4d773c3234920a7aa62faf11f6c8e)
 and later be used by `SupportsDelete`

### Does this PR introduce _any_ user-facing change?
No. These are intended for internal use only

### How was this patch tested?
new tests

Closes #37332 from huaxingao/toV1.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../sql/internal/connector/PredicateUtils.scala| 92 +-
 .../datasources/v2/V2PredicateSuite.scala  | 85 
 2 files changed, 174 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
index ace6b30d4cc..263edd82197 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/PredicateUtils.scala
@@ -19,14 +19,25 @@ package org.apache.spark.sql.internal.connector
 
 import org.apache.spark.sql.catalyst.CatalystTypeConverters
 import org.apache.spark.sql.connector.expressions.{LiteralValue, 
NamedReference}
-import org.apache.spark.sql.connector.expressions.filter.Predicate
-import org.apache.spark.sql.sources.{Filter, In}
+import org.apache.spark.sql.connector.expressions.filter.{And => V2And, Not => 
V2Not, Or => V2Or, Predicate}
+import org.apache.spark.sql.sources.{AlwaysFalse, AlwaysTrue, And, 
EqualNullSafe, EqualTo, Filter, GreaterThan, GreaterThanOrEqual, In, IsNotNull, 
IsNull, LessThan, LessThanOrEqual, Not, Or, StringContains, StringEndsWith, 
StringStartsWith}
+import org.apache.spark.sql.types.StringType
 
 private[sql] object PredicateUtils {
 
   def toV1(predicate: Predicate): Option[Filter] = {
+
+def isValidBinaryPredicate(): Boolean = {
+  if (predicate.children().length == 2 &&
+predicate.children()(0).isInstanceOf[NamedReference] &&
+predicate.children()(1).isInstanceOf[LiteralValue[_]]) {
+true
+  } else {
+false
+  }
+}
+
 predicate.name() match {
-  // TODO: add conversion for other V2 Predicate
   case "IN" if predicate.children()(0).isInstanceOf[NamedReference] =>
 val attribute = predicate.children()(0).toString
 val values = predicate.children().drop(1)
@@ -43,6 +54,81 @@ private[sql] object PredicateUtils {
   Some(In(attribute, Array.empty[Any]))
 }
 
+  case "=" | "<=>" | ">" | "<" | ">=" | "<=" if isValidBinaryPredicate =>
+val attribute = predicate.children()(0).toString
+val value = predicate.children()(1).asInstanceOf[LiteralValue[_]]
+val v1Value = CatalystTypeConverters.convertToScala(value.value, 
value.dataType)
+val v1Filter = predicate.name() match {
+  case "=" => EqualTo(attribute, v1Value)
+  case "<=>" => EqualNullSafe(attribute, v1Value)
+  case ">" => GreaterThan(attribute, v1Value)
+  case ">=" => GreaterThanOrEqual(attribute, v1Value)
+  case "<" => LessThan(attribute, v1Value)
+  case "<=" => LessThanOrEqual(attribute, v1Value)
+}
+Some(v1Filter)
+
+  case "IS_NULL" | "IS_NOT_NULL" if predicate.children().length == 1 &&
+  predicate.children()(0).isInstanceOf[NamedReference] =>
+val attribute = predicate.children()(0).toString
+val v1Filter = predicate.name() match {
+  case "IS_NULL" => IsNull(attribute)
+  case "IS_NOT_NULL" => IsNotNull(attribute)
+}
+Some(v1Filter)
+
+  case "STARTS_WITH" | "ENDS_WITH" | "CONTAINS" if isValidBinaryPredicate 
=>
+val attribute = predicate.children()(0).toString
+val value = predicate.chil

[spark] branch master updated: [SPARK-39909] Organize the check of push down information for JDBCV2Suite

2022-07-29 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 5eb5077acf4 [SPARK-39909] Organize the check of push down information 
for JDBCV2Suite
5eb5077acf4 is described below

commit 5eb5077acf47e911f7a5299bb029d4e5a2702f9f
Author: chenliang.lu 
AuthorDate: Fri Jul 29 10:33:53 2022 -0700

[SPARK-39909] Organize the check of push down information for JDBCV2Suite

### What changes were proposed in this pull request?
This PR changes the check method from `check(one_large_string)` to 
`check(small_string1, small_string2, ...)`

### Why are the changes needed?
It can help us check the results individually and make the code more 
clearer.

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes #37342 from yabola/fix.

Authored-by: chenliang.lu 
Signed-off-by: huaxingao 
---
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 307 ++---
 1 file changed, 203 insertions(+), 104 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index d64b1815007..3b226d60643 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -210,7 +210,9 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   test("TABLESAMPLE (integer_expression ROWS) is the same as LIMIT") {
 val df = sql("SELECT NAME FROM h2.test.employee TABLESAMPLE (3 ROWS)")
 checkSchemaNames(df, Seq("NAME"))
-checkPushedInfo(df, "PushedFilters: [], PushedLimit: LIMIT 3, ")
+checkPushedInfo(df,
+  "PushedFilters: []",
+  "PushedLimit: LIMIT 3")
 checkAnswer(df, Seq(Row("amy"), Row("alex"), Row("cathy")))
   }
 
@@ -238,7 +240,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   .where($"dept" === 1).limit(1)
 checkLimitRemoved(df1)
 checkPushedInfo(df1,
-  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1], PushedLimit: LIMIT 1, ")
+  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1]",
+  "PushedLimit: LIMIT 1")
 checkAnswer(df1, Seq(Row(1, "amy", 1.00, 1000.0, true)))
 
 val df2 = spark.read
@@ -251,14 +254,16 @@ class JDBCV2Suite extends QueryTest with 
SharedSparkSession with ExplainSuiteHel
   .limit(1)
 checkLimitRemoved(df2, false)
 checkPushedInfo(df2,
-  "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
+  "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
+  "PushedLimit: LIMIT 1")
 checkAnswer(df2, Seq(Row(2, "alex", 12000.00, 1200.0, false)))
 
 val df3 = sql("SELECT name FROM h2.test.employee WHERE dept > 1 LIMIT 1")
 checkSchemaNames(df3, Seq("NAME"))
 checkLimitRemoved(df3)
 checkPushedInfo(df3,
-  "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
+  "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
+  "PushedLimit: LIMIT 1")
 checkAnswer(df3, Seq(Row("alex")))
 
 val df4 = spark.read
@@ -283,7 +288,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   .limit(1)
 checkLimitRemoved(df5, false)
 // LIMIT is pushed down only if all the filters are pushed down
-checkPushedInfo(df5, "PushedFilters: [], ")
+checkPushedInfo(df5, "PushedFilters: []")
 checkAnswer(df5, Seq(Row(1.00, 1000.0, "amy")))
   }
 
@@ -305,7 +310,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   .offset(1)
 checkOffsetRemoved(df1)
 checkPushedInfo(df1,
-  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1], PushedOffset: OFFSET 1,")
+  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1]",
+  "PushedOffset: OFFSET 1")
 checkAnswer(df1, Seq(Row(1, "cathy", 9000.00, 1200.0, false)))
 
 val df2 = spark.read
@@ -315,7 +321,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   .offset(1)
 checkOffsetRemoved(df2, false)
 checkPushedInfo(df2,
-  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1], ReadSchema:")
+  "PushedFilters: [DEPT IS NOT NULL, DEPT = 1]",
+  "ReadSchema:")
 checkAnswer(df2, Seq(Row(1, "cathy", 9000.00, 1200.0, false)))
 
 val df3 = spark.read
@@ -325,7 +332,8 @@ cl

[spark] branch branch-3.3 updated (ee8cafbd0ff -> 609efe1515f)

2022-07-27 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from ee8cafbd0ff [SPARK-39839][SQL] Handle special case of null 
variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural 
integrity check
 add 609efe1515f [SPARK-39857][SQL][3.3] V2ExpressionBuilder uses the wrong 
LiteralValue data type for In predicate

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/util/V2ExpressionBuilder.scala|   4 +-
 .../datasources/v2/DataSourceV2StrategySuite.scala | 282 -
 2 files changed, 280 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39812][SQL] Simplify code which construct `AggregateExpression` with `toAggregateExpression`

2022-07-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4547c9c90e3 [SPARK-39812][SQL] Simplify code which construct 
`AggregateExpression` with `toAggregateExpression`
4547c9c90e3 is described below

commit 4547c9c90e3d35436afe89b10c794050ed8d04d7
Author: Jiaan Geng 
AuthorDate: Sat Jul 23 15:05:14 2022 -0700

[SPARK-39812][SQL] Simplify code which construct `AggregateExpression` with 
`toAggregateExpression`

### What changes were proposed in this pull request?
Currently, Spark provides the `toAggregateExpression` to simplify the code.
But we can find many places still use `AggregateExpression.apply`.

This PR will use `toAggregateExpression` replaces with 
`AggregateExpression.apply`.

### Why are the changes needed?
Simplify code with `toAggregateExpression`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just change the inner implementation.

### How was this patch tested?
N/A

Closes #37224 from beliefer/SPARK-39812.

Authored-by: Jiaan Geng 
Signed-off-by: huaxingao 
---
 .../main/scala/org/apache/spark/ml/stat/Summarizer.scala |  4 ++--
 .../apache/spark/sql/catalyst/analysis/Analyzer.scala|  6 +++---
 .../sql/catalyst/optimizer/InjectRuntimeFilter.scala |  6 +++---
 .../apache/spark/sql/catalyst/optimizer/Optimizer.scala  |  6 +++---
 .../catalyst/optimizer/RewriteDistinctAggregates.scala   |  7 ++-
 .../spark/sql/catalyst/analysis/AnalysisErrorSuite.scala | 16 +---
 .../expressions/aggregate/AggregateExpressionSuite.scala |  2 +-
 .../org/apache/spark/sql/expressions/Aggregator.scala|  7 +--
 .../spark/sql/expressions/UserDefinedFunction.scala  |  3 +--
 .../scala/org/apache/spark/sql/expressions/udaf.scala| 12 ++--
 10 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
index 7fd99faf0c8..bf9d07338db 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala
@@ -27,7 +27,7 @@ import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.Column
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.{Expression, 
ImplicitCastInputTypes}
-import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, TypedImperativeAggregate}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate
 import org.apache.spark.sql.catalyst.trees.BinaryLike
 import org.apache.spark.sql.functions.lit
 import org.apache.spark.sql.types._
@@ -256,7 +256,7 @@ private[ml] class SummaryBuilderImpl(
   mutableAggBufferOffset = 0,
   inputAggBufferOffset = 0)
 
-new Column(AggregateExpression(agg, mode = Complete, isDistinct = false))
+new Column(agg.toAggregateExpression())
   }
 }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 7667b4fef71..cc79048b7c7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -2219,9 +2219,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 throw 
QueryCompilationErrors.functionWithUnsupportedSyntaxError(
   agg.prettyName, "IGNORE NULLS")
 }
-AggregateExpression(aggFunc, Complete, u.isDistinct, u.filter)
+aggFunc.toAggregateExpression(u.isDistinct, u.filter)
   } else {
-AggregateExpression(agg, Complete, u.isDistinct, u.filter)
+agg.toAggregateExpression(u.isDistinct, u.filter)
   }
 // This function is not an aggregate function, just return the 
resolved one.
 case other if u.isDistinct =>
@@ -2332,7 +2332,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   aggFunc.name(), "IGNORE NULLS")
   }
   val aggregator = V2Aggregator(aggFunc, arguments)
-  AggregateExpression(aggregator, Complete, u.isDistinct, u.filter)
+  aggregator.toAggregateExpression(u.isDistinct, u.filter)
 }
 
 /**
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
index baaf82c00db..236636ac7ea 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala
+++ 
b/sql/cata

[spark] branch master updated: [SPARK-39784][SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

2022-07-22 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e2b1ae1021 [SPARK-39784][SQL] Put Literal values on the right side of 
the data source filter after translating Catalyst Expression to data source 
filter
2e2b1ae1021 is described below

commit 2e2b1ae1021bc4bc99f9749e05e4770be3aec43f
Author: huaxingao 
AuthorDate: Fri Jul 22 13:49:00 2022 -0700

[SPARK-39784][SQL] Put Literal values on the right side of the data source 
filter after translating Catalyst Expression to data source filter

### What changes were proposed in this pull request?

Even though the literal value could be on both sides of the filter, e.g. 
both `a > 1` and `1 < a` are valid, after translating Catalyst Expression to 
data source filter, we want the literal value on the right side so it's easier 
for the data source to handle these filters. We do this kind of normalization 
for V1 Filter. We should have the same behavior for V2 Filter.

Before this PR, for the filters that have literal values on the right side, 
e.g. `1 > a`, we keep it as is. After this PR, we will normalize it to `a < 1` 
so the data source doesn't need to check each of the filters (and do the flip).

### Why are the changes needed?
I think we should follow V1 Filter's behavior, normalize the filters during 
catalyst Expression to DS Filter translation time to make the literal values on 
the right side, so later on, data source doesn't need to check every single 
filter to figure out if it needs to flip the sides.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
new test

Closes #37197 from huaxingao/flip.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../sql/catalyst/util/V2ExpressionBuilder.scala| 21 +++
 .../datasources/v2/DataSourceV2StrategySuite.scala | 67 +-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
index 8bb65a88044..59cbcf48334 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala
@@ -233,6 +233,10 @@ class V2ExpressionBuilder(e: Expression, isPredicate: 
Boolean = false) {
   val r = generateExpression(b.right)
   if (l.isDefined && r.isDefined) {
 b match {
+  case _: Predicate if isBinaryComparisonOperator(b.sqlOperator) &&
+  l.get.isInstanceOf[LiteralValue[_]] && 
r.get.isInstanceOf[FieldReference] =>
+Some(new V2Predicate(flipComparisonOperatorName(b.sqlOperator),
+  Array[V2Expression](r.get, l.get)))
   case _: Predicate =>
 Some(new V2Predicate(b.sqlOperator, Array[V2Expression](l.get, 
r.get)))
   case _ =>
@@ -408,6 +412,23 @@ class V2ExpressionBuilder(e: Expression, isPredicate: 
Boolean = false) {
   }
 case _ => None
   }
+
+  private def isBinaryComparisonOperator(operatorName: String): Boolean = {
+operatorName match {
+  case ">" | "<" | ">=" | "<=" | "=" | "<=>" => true
+  case _ => false
+}
+  }
+
+  private def flipComparisonOperatorName(operatorName: String): String = {
+operatorName match {
+  case ">" => "<"
+  case "<" => ">"
+  case ">=" => "<="
+  case "<=" => ">="
+  case _ => operatorName
+}
+  }
 }
 
 object ColumnOrField {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
index 66dc65cf681..c3f51bed269 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StrategySuite.scala
@@ -18,14 +18,77 @@
 package org.apache.spark.sql.execution.datasources.v2
 
 import org.apache.spark.sql.catalyst.dsl.expressions._
-import org.apache.spark.sql.catalyst.expressions.Expression
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.connector.expressions.{FieldReference, 
LiteralValue}
 import org.apache.spark.sql.connector.expressions.filter.Predicate

[spark] branch master updated: [SPARK-39759][SQL] Implement listIndexes in JDBC (H2 dialect)

2022-07-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6278becfbed [SPARK-39759][SQL] Implement listIndexes in JDBC (H2 
dialect)
6278becfbed is described below

commit 6278becfbed412bad3d00f2b7989fd19a3a0ff07
Author: panbingkun 
AuthorDate: Mon Jul 18 23:34:28 2022 -0700

[SPARK-39759][SQL] Implement listIndexes in JDBC (H2 dialect)

### What changes were proposed in this pull request?
Implementing listIndexes in DS V2 JDBC for H2 dialect.

### Why are the changes needed?
This is a subtask of the V2 Index 
support(https://issues.apache.org/jira/browse/SPARK-36525).
**It can better test the index interface locally.**
> This PR implements listIndexes in H2 dialect.

### Does this PR introduce _any_ user-facing change?
Yes, listIndexes in DS V2 JDBC for H2

### How was this patch tested?
Update existed UT.

Closes #37172 from panbingkun/h2dialect_listindex_dev.

Authored-by: panbingkun 
Signed-off-by: huaxingao 
---
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  4 +-
 .../execution/datasources/v2/jdbc/JDBCTable.scala  |  2 +-
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  | 66 +-
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   |  2 +-
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   |  4 +-
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala|  9 +++
 6 files changed, 78 insertions(+), 9 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index 4401ee4564e..60ecd2ff60b 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -1072,10 +1072,10 @@ object JdbcUtils extends Logging with SQLConfHelper {
*/
   def listIndexes(
   conn: Connection,
-  tableName: String,
+  tableIdent: Identifier,
   options: JDBCOptions): Array[TableIndex] = {
 val dialect = JdbcDialects.get(options.url)
-dialect.listIndexes(conn, tableName, options)
+dialect.listIndexes(conn, tableIdent, options)
   }
 
   private def executeStatement(conn: Connection, options: JDBCOptions, sql: 
String): Unit = {
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala
index be8e1c68b7c..0a184116a0f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala
@@ -83,7 +83,7 @@ case class JDBCTable(ident: Identifier, schema: StructType, 
jdbcOptions: JDBCOpt
 
   override def listIndexes(): Array[TableIndex] = {
 JdbcUtils.withConnection(jdbcOptions) { conn =>
-  JdbcUtils.listIndexes(conn, name, jdbcOptions)
+  JdbcUtils.listIndexes(conn, ident, jdbcOptions)
 }
   }
 }
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index d41929225a8..4200ba91fb1 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -25,12 +25,14 @@ import java.util.concurrent.ConcurrentHashMap
 import scala.collection.JavaConverters._
 import scala.util.control.NonFatal
 
+import org.apache.commons.lang3.StringUtils
+
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.analysis.{IndexAlreadyExistsException, 
NoSuchIndexException, NoSuchNamespaceException, NoSuchTableException, 
TableAlreadyExistsException}
 import org.apache.spark.sql.connector.catalog.Identifier
 import org.apache.spark.sql.connector.catalog.functions.UnboundFunction
-import org.apache.spark.sql.connector.expressions.Expression
-import org.apache.spark.sql.connector.expressions.NamedReference
+import org.apache.spark.sql.connector.catalog.index.TableIndex
+import org.apache.spark.sql.connector.expressions.{Expression, FieldReference, 
NamedReference}
 import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
 import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, 
DecimalType, ShortType, StringType}
 
@@ -110,6 +112,64 @@ private[sql] object H2Dialect extends JdbcDialect {
 JdbcUtils.checkIfIndexExists(conn, sql, options)
   }
 
+  // See
+  // 
https://www.h2database.com/html/systemtables.html?#information_schema_indexes
+  // 
https://www.h2database.com/html/systemtables.html?#information_schema_index_c

[spark] branch master updated: [SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect)

2022-07-13 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b1077caf31 [SPARK-39704][SQL] Implement createIndex & dropIndex & 
indexExists in JDBC (H2 dialect)
0b1077caf31 is described below

commit 0b1077caf319f86b5175afbee88e1f14da5a
Author: panbingkun 
AuthorDate: Tue Jul 12 23:15:35 2022 -0700

[SPARK-39704][SQL] Implement createIndex & dropIndex & indexExists in JDBC 
(H2 dialect)

### What changes were proposed in this pull request?
Implementing createIndex/dropIndex/indexExists in DS V2 JDBC for H2 dialect.

### Why are the changes needed?
This is a subtask of the V2 Index 
support(https://issues.apache.org/jira/browse/SPARK-36525).
This PR implements createIndex, dropIndex and indexExists.
After review for some changes in this PR, I will create new PR for 
listIndexs.
**It can better test the index interface locally.**
> This PR only implements createIndex, IndexExists and dropIndex in H2 
dialect.

### Does this PR introduce _any_ user-facing change?
Yes, createIndex/dropIndex/indexExists in DS V2 JDBC for H2

### How was this patch tested?
New UT.

Closes #37112 from panbingkun/h2dialect-create-drop.

Authored-by: panbingkun 
Signed-off-by: huaxingao 
---
 .../sql/execution/datasources/jdbc/JdbcUtils.scala | 14 +++---
 .../execution/datasources/v2/jdbc/JDBCTable.scala  |  6 +--
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  | 56 --
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 14 +++---
 .../org/apache/spark/sql/jdbc/MySQLDialect.scala   | 14 +++---
 .../apache/spark/sql/jdbc/PostgresDialect.scala| 11 +++--
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 17 +++
 7 files changed, 101 insertions(+), 31 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
index fa4c032fcb0..5d8838906bf 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -39,7 +39,7 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils, 
GenericArrayData}
 import org.apache.spark.sql.catalyst.util.DateTimeUtils.{instantToMicros, 
localDateTimeToMicros, localDateToDays, toJavaDate, toJavaTimestamp, 
toJavaTimestampNoRebase}
-import org.apache.spark.sql.connector.catalog.TableChange
+import org.apache.spark.sql.connector.catalog.{Identifier, TableChange}
 import org.apache.spark.sql.connector.catalog.index.{SupportsIndex, TableIndex}
 import org.apache.spark.sql.connector.expressions.NamedReference
 import org.apache.spark.sql.errors.{QueryCompilationErrors, 
QueryExecutionErrors}
@@ -1033,14 +1033,14 @@ object JdbcUtils extends Logging with SQLConfHelper {
   def createIndex(
   conn: Connection,
   indexName: String,
-  tableName: String,
+  tableIdent: Identifier,
   columns: Array[NamedReference],
   columnsProperties: util.Map[NamedReference, util.Map[String, String]],
   properties: util.Map[String, String],
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
 executeStatement(conn, options,
-  dialect.createIndex(indexName, tableName, columns, columnsProperties, 
properties))
+  dialect.createIndex(indexName, tableIdent, columns, columnsProperties, 
properties))
   }
 
   /**
@@ -1049,10 +1049,10 @@ object JdbcUtils extends Logging with SQLConfHelper {
   def indexExists(
   conn: Connection,
   indexName: String,
-  tableName: String,
+  tableIdent: Identifier,
   options: JDBCOptions): Boolean = {
 val dialect = JdbcDialects.get(options.url)
-dialect.indexExists(conn, indexName, tableName, options)
+dialect.indexExists(conn, indexName, tableIdent, options)
   }
 
   /**
@@ -1061,10 +1061,10 @@ object JdbcUtils extends Logging with SQLConfHelper {
   def dropIndex(
   conn: Connection,
   indexName: String,
-  tableName: String,
+  tableIdent: Identifier,
   options: JDBCOptions): Unit = {
 val dialect = JdbcDialects.get(options.url)
-executeStatement(conn, options, dialect.dropIndex(indexName, tableName))
+executeStatement(conn, options, dialect.dropIndex(indexName, tableIdent))
   }
 
   /**
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTable.scal

[spark] branch master updated: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-11 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d10db29ca16 [SPARK-39711][TESTS] Remove redundant trait: 
BeforeAndAfterAll & BeforeAndAfterEach & Logging
d10db29ca16 is described below

commit d10db29ca1609c34f082068f0cc7419c5ecef190
Author: panbingkun 
AuthorDate: Mon Jul 11 19:30:30 2022 -0700

[SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & 
BeforeAndAfterEach & Logging

### What changes were proposed in this pull request?
SparkFunSuite declare as follow:
```
abstract class SparkFunSuite
extends AnyFunSuite
with BeforeAndAfterAll
with BeforeAndAfterEach
with ThreadAudit
with Logging
```
some suite extends SparkFunSuite and meanwhile with BeforeAndAfterAll or 
BeforeAndAfterEach or Logging, it is redundant.

### Why are the changes needed?
Eliminate redundant information and make the code cleaner.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #37123 from panbingkun/remove_BeforeAndAfterAll.

Authored-by: panbingkun 
    Signed-off-by: huaxingao 
---
 .../scala/org/apache/spark/kafka010/KafkaTokenSparkConfSuite.scala | 3 +--
 .../apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala   | 3 +--
 .../scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala  | 3 +--
 core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala | 3 +--
 .../org/apache/spark/SparkContextSchedulerCreationSuite.scala  | 4 +---
 core/src/test/scala/org/apache/spark/ThreadingSuite.scala  | 4 +---
 .../test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala | 3 +--
 .../org/apache/spark/deploy/history/ApplicationCacheSuite.scala| 2 +-
 .../org/apache/spark/deploy/history/FsHistoryProviderSuite.scala   | 4 +---
 .../spark/deploy/history/RealBrowserUIHistoryServerSuite.scala | 4 +---
 .../scala/org/apache/spark/deploy/master/ui/MasterWebUISuite.scala | 4 +---
 .../org/apache/spark/deploy/rest/StandaloneRestSubmitSuite.scala   | 3 +--
 .../org/apache/spark/input/WholeTextFileInputFormatSuite.scala | 5 +
 .../org/apache/spark/input/WholeTextFileRecordReaderSuite.scala| 4 +---
 .../org/apache/spark/internal/plugin/PluginContainerSuite.scala| 3 +--
 .../test/scala/org/apache/spark/memory/MemoryManagerSuite.scala| 4 +---
 .../src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala | 3 +--
 core/src/test/scala/org/apache/spark/rdd/SortingSuite.scala| 3 +--
 core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala | 3 +--
 .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 5 +
 .../test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala | 4 +---
 .../scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 6 ++
 .../scala/org/apache/spark/scheduler/TaskSetExcludelistSuite.scala | 3 +--
 .../org/apache/spark/serializer/SerializationDebuggerSuite.scala   | 5 +
 .../scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala   | 3 +--
 .../org/apache/spark/shuffle/ShuffleDriverComponentsSuite.scala| 4 +---
 .../apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala | 3 +--
 .../shuffle/sort/io/LocalDiskShuffleMapOutputWriterSuite.scala | 3 +--
 .../scala/org/apache/spark/storage/BlockInfoManagerSuite.scala | 4 +---
 .../test/scala/org/apache/spark/storage/BlockManagerSuite.scala| 7 +++
 .../scala/org/apache/spark/storage/DiskBlockManagerSuite.scala | 4 +---
 .../org/apache/spark/storage/DiskBlockObjectWriterSuite.scala  | 4 +---
 .../scala/org/apache/spark/ui/RealBrowserUISeleniumSuite.scala | 3 +--
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala  | 3 +--
 core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala  | 4 ++--
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 3 +--
 .../apache/spark/util/collection/ExternalSorterSpillSuite.scala| 3 +--
 .../test/scala/org/apache/spark/util/collection/SorterSuite.scala  | 3 +--
 .../apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala  | 3 +--
 mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala   | 5 +
 .../test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala   | 3 +--
 .../src/test/scala/org/apache/spark/ml/stat/CorrelationSuite.scala | 4 +---
 .../org/apache/spark/ml/tree/impl/GradientBoostedTreesSuite.scala  | 3 +--
 .../test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala| 3 +--
 .../test/scala/org/apache/spark/mllib/stat/CorrelationSuite.scala  | 3 +--
 .../org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala| 3 +--
 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala| 4 +---
 repl/src/test/scala-2.13/

[spark] branch master updated: [SPARK-39724][CORE] Remove duplicate `.setAccessible(true)` call in `kvstore.KVTypeInfo`

2022-07-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ad37925afb [SPARK-39724][CORE] Remove duplicate 
`.setAccessible(true)`  call in `kvstore.KVTypeInfo`
7ad37925afb is described below

commit 7ad37925afb1441a49ba3cbd8528e67240b9fd0b
Author: yangjie01 
AuthorDate: Sat Jul 9 15:34:40 2022 -0700

[SPARK-39724][CORE] Remove duplicate `.setAccessible(true)`  call in 
`kvstore.KVTypeInfo`

### What changes were proposed in this pull request?
This pr just  remove duplicate `.setAccessible(true)`  call in 
`kvstore.KVTypeInfo`.

### Why are the changes needed?
Delete unnecessary method invokes

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

Closes #37136 from LuciferYang/remove-dup-setAccessible.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java | 2 --
 1 file changed, 2 deletions(-)

diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java
index a7e5831846a..a15d07cf599 100644
--- a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java
+++ b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java
@@ -48,7 +48,6 @@ public class KVTypeInfo {
 checkIndex(idx, indices);
 f.setAccessible(true);
 indices.put(idx.value(), idx);
-f.setAccessible(true);
 accessors.put(idx.value(), new FieldAccessor(f));
   }
 }
@@ -61,7 +60,6 @@ public class KVTypeInfo {
   "Annotated method %s::%s should not have any parameters.", 
type.getName(), m.getName());
 m.setAccessible(true);
 indices.put(idx.value(), idx);
-m.setAccessible(true);
 accessors.put(idx.value(), new MethodAccessor(m));
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated (2edd344392a -> f9e3668dbb1)

2022-07-05 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 2edd344392a [SPARK-39611][PYTHON][PS] Fix wrong aliases in 
__array_ufunc__
 add f9e3668dbb1 [SPARK-39656][SQL][3.3] Fix wrong namespace in 
DescribeNamespaceExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala | 3 ++-
 .../apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (3d084fe3217 -> 1c0bd4c15a2)

2022-07-05 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3d084fe3217 [SPARK-39677][SQL][DOCS][3.2] Fix args formatting of the 
regexp and like functions
 add 1c0bd4c15a2 [SPARK-39656][SQL][3.2] Fix wrong namespace in 
DescribeNamespaceExec

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala  | 3 ++-
 .../scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL][TESTS] Remove unused super class & unused variable in JDBCXXXSuite

2022-07-04 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dabe57eca87 [MINOR][SQL][TESTS] Remove unused super class & unused 
variable in JDBCXXXSuite
dabe57eca87 is described below

commit dabe57eca879f7f8bfbdeb158450b6188ee4e2c7
Author: panbingkun 
AuthorDate: Mon Jul 4 08:45:18 2022 -0700

[MINOR][SQL][TESTS] Remove unused super class & unused variable in 
JDBCXXXSuite

### What changes were proposed in this pull request?
> Remove unused super class "BeforeAndAfter" & " PrivateMethodTester" in 
JDBCSuite
> Remove unused variable "conn" in JDBCV2Suite & JDBCTableCatalogSuite

### Why are the changes needed?
Eliminate redundant information and make the code cleaner

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA

Closes #37062 from panbingkun/minor-jdbcsuites.
    
Authored-by: panbingkun 
Signed-off-by: huaxingao 
---
 .../sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala | 1 -
 sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala | 4 +---
 sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala   | 1 -
 3 files changed, 1 insertion(+), 5 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
index 8d8d13211fd..7aa8adc07ed 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalogSuite.scala
@@ -35,7 +35,6 @@ class JDBCTableCatalogSuite extends QueryTest with 
SharedSparkSession {
   val tempDir = Utils.createTempDir()
   val url = 
s"jdbc:h2:${tempDir.getCanonicalPath};user=testUser;password=testPass"
   val defaultMetadata = new MetadataBuilder().putLong("scale", 0).build()
-  var conn: java.sql.Connection = null
 
   override def sparkConf: SparkConf = super.sparkConf
 .set("spark.sql.catalog.h2", classOf[JDBCTableCatalog].getName)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
index 494ae6d5487..b87fee6cec2 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
@@ -26,7 +26,6 @@ import scala.collection.JavaConverters._
 
 import org.mockito.ArgumentMatchers._
 import org.mockito.Mockito._
-import org.scalatest.{BeforeAndAfter, PrivateMethodTester}
 
 import org.apache.spark.SparkException
 import org.apache.spark.sql.{AnalysisException, DataFrame, QueryTest, Row}
@@ -45,8 +44,7 @@ import org.apache.spark.sql.test.SharedSparkSession
 import org.apache.spark.sql.types._
 import org.apache.spark.util.Utils
 
-class JDBCSuite extends QueryTest
-  with BeforeAndAfter with PrivateMethodTester with SharedSparkSession {
+class JDBCSuite extends QueryTest with SharedSparkSession {
   import testImplicits._
 
   val url = "jdbc:h2:mem:testdb0"
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index 90ab976d9d5..1cc5f87e5fc 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -44,7 +44,6 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 
   val tempDir = Utils.createTempDir()
   val url = 
s"jdbc:h2:${tempDir.getCanonicalPath};user=testUser;password=testPass"
-  var conn: java.sql.Connection = null
 
   val testH2Dialect = new JdbcDialect {
 override def canHandle(url: String): Boolean = H2Dialect.canHandle(url)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR] whitespace must between Curly braces and the bodies of classes, methods

2022-07-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 00d099c8ce8 [MINOR] whitespace must between Curly braces and the 
bodies of classes, methods
00d099c8ce8 is described below

commit 00d099c8ce8f0711a4c6057613bf90d9ece033b9
Author: panbingkun 
AuthorDate: Sat Jul 2 20:36:53 2022 -0700

[MINOR] whitespace must between Curly braces and the bodies of classes, 
methods

### What changes were proposed in this pull request?
general and consistent code style

### Why are the changes needed?
whitespace must between Curly braces and the bodies of classes, methods

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #37057 from panbingkun/minor-code-style.

Authored-by: panbingkun 
Signed-off-by: huaxingao 
---
 core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala | 2 +-
 core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala | 2 +-
 core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala| 2 +-
 .../test/scala/org/apache/spark/metrics/MetricsSystemSuite.scala| 2 +-
 .../org/apache/spark/shuffle/HostLocalShuffleReadingSuite.scala | 6 +++---
 .../apache/spark/util/collection/ExternalAppendOnlyMapSuite.scala   | 2 +-
 .../apache/spark/ml/regression/GeneralizedLinearRegression.scala| 2 +-
 .../org/apache/spark/mllib/pmml/export/KMeansPMMLModelExport.scala  | 2 +-
 .../src/test/scala/org/apache/spark/ml/feature/InstanceSuite.scala  | 2 +-
 .../org/apache/spark/ml/regression/RandomForestRegressorSuite.scala | 2 +-
 .../org/apache/spark/ml/tuning/TrainValidationSplitSuite.scala  | 2 +-
 .../org/apache/spark/deploy/mesos/MesosClusterDispatcherSuite.scala | 2 +-
 .../scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala | 4 ++--
 .../apache/spark/sql/execution/datasources/PartitioningUtils.scala  | 2 +-
 .../sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala   | 2 +-
 .../src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala | 2 +-
 .../scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala  | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala| 2 +-
 .../spark/sql/execution/datasources/FileFormatWriterSuite.scala | 2 +-
 .../org/apache/spark/sql/execution/ui/SparkPlanInfoSuite.scala  | 2 +-
 .../org/apache/spark/streaming/api/java/JavaStreamingListener.scala | 2 +-
 21 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
index 33f2b18cb27..d11d2e7a4f6 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
@@ -697,7 +697,7 @@ private[spark] class PythonAccumulatorV2(
 @transient private val serverHost: String,
 private val serverPort: Int,
 private val secretToken: String)
-  extends CollectionAccumulator[Array[Byte]] with Logging{
+  extends CollectionAccumulator[Array[Byte]] with Logging {
 
   Utils.checkHost(serverHost)
 
diff --git a/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 
b/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala
index dd962ca11ec..a2a7fb5c100 100644
--- a/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala
@@ -48,7 +48,7 @@ private[spark] object SerDeUtil extends Logging {
   // This should be called before trying to unpickle array.array from Python
   // In cluster mode, this should be put in closure
   def initialize(): Unit = {
-synchronized{
+synchronized {
   if (!initialized) {
 Unpickler.registerConstructor("__builtin__", "bytearray", new 
ByteArrayConstructor())
 Unpickler.registerConstructor("builtins", "bytearray", new 
ByteArrayConstructor())
diff --git a/core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala 
b/core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala
index d3a061fae74..64a786e5825 100644
--- a/core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala
+++ b/core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala
@@ -47,7 +47,7 @@ private[spark] class ConsoleProgressBar(sc: SparkContext) 
extends Logging {
 
   // Schedule a refresh thread to run periodically
   private val timer = new Timer("refresh progress", true)
-  timer.schedule(new TimerTask{
+  timer.schedule(new TimerTask {
 override def run(): Unit = {
   refresh()
 }
diff --git 
a/core/src/test/scala/org/apache/spark/metrics/MetricsSystemSuite.scala 
b/core/src/test/scala/org/apache/spark/

[spark] branch branch-3.3 updated: [SPARK-39633][SQL] Support timestamp in seconds for TimeTravel using Dataframe options

2022-06-30 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 18000fd0e20 [SPARK-39633][SQL] Support timestamp in seconds for 
TimeTravel using Dataframe options
18000fd0e20 is described below

commit 18000fd0e20787b44b930296556483f3fb419a8f
Author: Prashant Singh 
AuthorDate: Thu Jun 30 17:16:32 2022 -0700

[SPARK-39633][SQL] Support timestamp in seconds for TimeTravel using 
Dataframe options

### What changes were proposed in this pull request?

Support timestamp in seconds for TimeTravel using Dataframe options

### Why are the changes needed?

To have a parity in doing TimeTravel via SQL and Dataframe option.

SPARK-SQL supports queries like :
```sql
SELECT * from {table} TIMESTAMP AS OF 1548751078
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UTs for testing the behaviour.

Closes #37025 from singhpk234/fix/timetravel_df_options.

Authored-by: Prashant Singh 
Signed-off-by: huaxingao 
(cherry picked from commit 44e2657f3d511c25135c95dc3d584c540d227b5b)
Signed-off-by: huaxingao 
---
 .../sql/execution/datasources/v2/DataSourceV2Utils.scala | 12 ++--
 .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala| 11 +++
 .../spark/sql/connector/SupportsCatalogOptionsSuite.scala|  7 +++
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
index f69a2a45886..7fd61c44fd1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
@@ -32,7 +32,7 @@ import org.apache.spark.sql.connector.catalog.{CatalogV2Util, 
SessionConfigSuppo
 import org.apache.spark.sql.connector.catalog.TableCapability.BATCH_READ
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.types.{LongType, StructType}
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 private[sql] object DataSourceV2Utils extends Logging {
@@ -124,7 +124,15 @@ private[sql] object DataSourceV2Utils extends Logging {
 val timestamp = hasCatalog.extractTimeTravelTimestamp(dsOptions)
 
 val timeTravelVersion = if (version.isPresent) Some(version.get) else 
None
-val timeTravelTimestamp = if (timestamp.isPresent) 
Some(Literal(timestamp.get)) else None
+val timeTravelTimestamp = if (timestamp.isPresent) {
+  if (timestamp.get.forall(_.isDigit)) {
+Some(Literal(timestamp.get.toLong, LongType))
+  } else {
+Some(Literal(timestamp.get))
+  }
+} else {
+  None
+}
 val timeTravel = TimeTravelSpec.create(timeTravelTimestamp, 
timeTravelVersion, conf)
 (CatalogV2Util.loadTable(catalog, ident, timeTravel).get, 
Some(catalog), Some(ident))
   case _ =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index b64ed080d8b..675dd2807ca 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -21,6 +21,7 @@ import java.sql.Timestamp
 import java.time.{Duration, LocalDate, Period}
 
 import scala.collection.JavaConverters._
+import scala.concurrent.duration.MICROSECONDS
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
@@ -2691,6 +2692,8 @@ class DataSourceV2SQLSuite
 val ts2 = DateTimeUtils.stringToTimestampAnsi(
   UTF8String.fromString("2021-01-29 00:00:00"),
   DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone))
+val ts1InSeconds = MICROSECONDS.toSeconds(ts1).toString
+val ts2InSeconds = MICROSECONDS.toSeconds(ts2).toString
 val t3 = s"testcat.t$ts1"
 val t4 = s"testcat.t$ts2"
 
@@ -2707,6 +2710,14 @@ class DataSourceV2SQLSuite
 === Array(Row(5), Row(6)))
   assert(sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 
00:00:00'").collect
 === Array(Row(7), Row(8)))
+  assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect
+=== Array(Row(5), Row(6)))
+  assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect
+=== Array(Row(7

[spark] branch master updated: [SPARK-39633][SQL] Support timestamp in seconds for TimeTravel using Dataframe options

2022-06-30 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 44e2657f3d5 [SPARK-39633][SQL] Support timestamp in seconds for 
TimeTravel using Dataframe options
44e2657f3d5 is described below

commit 44e2657f3d511c25135c95dc3d584c540d227b5b
Author: Prashant Singh 
AuthorDate: Thu Jun 30 17:16:32 2022 -0700

[SPARK-39633][SQL] Support timestamp in seconds for TimeTravel using 
Dataframe options

### What changes were proposed in this pull request?

Support timestamp in seconds for TimeTravel using Dataframe options

### Why are the changes needed?

To have a parity in doing TimeTravel via SQL and Dataframe option.

SPARK-SQL supports queries like :
```sql
SELECT * from {table} TIMESTAMP AS OF 1548751078
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added new UTs for testing the behaviour.

Closes #37025 from singhpk234/fix/timetravel_df_options.

Authored-by: Prashant Singh 
Signed-off-by: huaxingao 
---
 .../sql/execution/datasources/v2/DataSourceV2Utils.scala | 12 ++--
 .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala| 11 +++
 .../spark/sql/connector/SupportsCatalogOptionsSuite.scala|  7 +++
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
index f69a2a45886..7fd61c44fd1 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Utils.scala
@@ -32,7 +32,7 @@ import org.apache.spark.sql.connector.catalog.{CatalogV2Util, 
SessionConfigSuppo
 import org.apache.spark.sql.connector.catalog.TableCapability.BATCH_READ
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.sql.internal.SQLConf
-import org.apache.spark.sql.types.StructType
+import org.apache.spark.sql.types.{LongType, StructType}
 import org.apache.spark.sql.util.CaseInsensitiveStringMap
 
 private[sql] object DataSourceV2Utils extends Logging {
@@ -124,7 +124,15 @@ private[sql] object DataSourceV2Utils extends Logging {
 val timestamp = hasCatalog.extractTimeTravelTimestamp(dsOptions)
 
 val timeTravelVersion = if (version.isPresent) Some(version.get) else 
None
-val timeTravelTimestamp = if (timestamp.isPresent) 
Some(Literal(timestamp.get)) else None
+val timeTravelTimestamp = if (timestamp.isPresent) {
+  if (timestamp.get.forall(_.isDigit)) {
+Some(Literal(timestamp.get.toLong, LongType))
+  } else {
+Some(Literal(timestamp.get))
+  }
+} else {
+  None
+}
 val timeTravel = TimeTravelSpec.create(timeTravelTimestamp, 
timeTravelVersion, conf)
 (CatalogV2Util.loadTable(catalog, ident, timeTravel).get, 
Some(catalog), Some(ident))
   case _ =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
index 9c92c1d9a0b..c82d875faa7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
@@ -21,6 +21,7 @@ import java.sql.Timestamp
 import java.time.{Duration, LocalDate, Period}
 
 import scala.collection.JavaConverters._
+import scala.concurrent.duration.MICROSECONDS
 
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
@@ -2591,6 +2592,8 @@ class DataSourceV2SQLSuite
 val ts2 = DateTimeUtils.stringToTimestampAnsi(
   UTF8String.fromString("2021-01-29 00:00:00"),
   DateTimeUtils.getZoneId(SQLConf.get.sessionLocalTimeZone))
+val ts1InSeconds = MICROSECONDS.toSeconds(ts1).toString
+val ts2InSeconds = MICROSECONDS.toSeconds(ts2).toString
 val t3 = s"testcat.t$ts1"
 val t4 = s"testcat.t$ts2"
 
@@ -2607,6 +2610,14 @@ class DataSourceV2SQLSuite
 === Array(Row(5), Row(6)))
   assert(sql("SELECT * FROM t TIMESTAMP AS OF '2021-01-29 
00:00:00'").collect
 === Array(Row(7), Row(8)))
+  assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts1InSeconds").collect
+=== Array(Row(5), Row(6)))
+  assert(sql(s"SELECT * FROM t TIMESTAMP AS OF $ts2InSeconds").collect
+=== Array(Row(7), Row(8)))
+  assert(sql(s"SELECT * FROM t FOR SYSTEM_TIME AS OF 
$ts1InSeconds").collec

[spark] branch master updated: [MINOR][SQL] Remove duplicate code for `AggregateExpression.isAggregate` usage

2022-06-15 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2d001642ac2 [MINOR][SQL] Remove duplicate code for 
`AggregateExpression.isAggregate` usage
2d001642ac2 is described below

commit 2d001642ac2983fad613e435c30c6f09b811858b
Author: Yuming Wang 
AuthorDate: Wed Jun 15 21:38:07 2022 -0700

[MINOR][SQL] Remove duplicate code for `AggregateExpression.isAggregate` 
usage

### What changes were proposed in this pull request?

Remove duplicate code for `AggregateExpression.isAggregate` usage.

### Why are the changes needed?

Make the code easier to maintain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UT.

Closes #36886 from wangyum/isAggregateExpression.

Authored-by: Yuming Wang 
Signed-off-by: huaxingao 
---
 .../org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index 7635918279a..45e70bdcb6c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -301,20 +301,16 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   failAnalysis("Input argument tolerance must be non-negative.")
 }
 
-  case a @ Aggregate(groupingExprs, aggregateExprs, child) =>
-def isAggregateExpression(expr: Expression): Boolean = {
-  expr.isInstanceOf[AggregateExpression] || 
PythonUDF.isGroupedAggPandasUDF(expr)
-}
-
+  case Aggregate(groupingExprs, aggregateExprs, _) =>
 def checkValidAggregateExpression(expr: Expression): Unit = expr 
match {
-  case expr: Expression if isAggregateExpression(expr) =>
+  case expr: Expression if AggregateExpression.isAggregate(expr) =>
 val aggFunction = expr match {
   case agg: AggregateExpression => agg.aggregateFunction
   case udf: PythonUDF => udf
 }
 aggFunction.children.foreach { child =>
   child.foreach {
-case expr: Expression if isAggregateExpression(expr) =>
+case expr: Expression if 
AggregateExpression.isAggregate(expr) =>
   failAnalysis(
 s"It is not allowed to use an aggregate function in 
the argument of " +
   s"another aggregate function. Please use the inner 
aggregate function " +


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dcfd9f01289 [SPARK-39417][SQL] Handle Null partition values in 
PartitioningUtils
dcfd9f01289 is described below

commit dcfd9f01289f26c1a25e97432710a13772b3ad4c
Author: Prashant Singh 
AuthorDate: Wed Jun 8 23:08:44 2022 -0700

[SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

### What changes were proposed in this pull request?

We should not try casting everything returned by 
`removeLeadingZerosFromNumberTypePartition` to string, as it returns null value 
for the cases when partition has null value and is already replaced by 
`DEFAULT_PARTITION_NAME`

### Why are the changes needed?

for null partitions where `removeLeadingZerosFromNumberTypePartition` is 
called it would throw a NPE and hence the query would fail.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a UT, which would fail with an NPE otherwise.

Closes #36810 from singhpk234/psinghvk/fix-npe.

Authored-by: Prashant Singh 
Signed-off-by: huaxingao 
---
 .../spark/sql/execution/datasources/PartitioningUtils.scala   | 2 +-
 .../datasources/parquet/ParquetPartitionDiscoverySuite.scala  | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
index 166fc852899..e856bb5b9c2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
@@ -359,7 +359,7 @@ object PartitioningUtils extends SQLConfHelper{
   def removeLeadingZerosFromNumberTypePartition(value: String, dataType: 
DataType): String =
 dataType match {
   case ByteType | ShortType | IntegerType | LongType | FloatType | 
DoubleType =>
-castPartValueToDesiredType(dataType, value, null).toString
+Option(castPartValueToDesiredType(dataType, value, 
null)).map(_.toString).orNull
   case _ => value
 }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
index b5947a4f820..fb5595322f7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
@@ -1259,6 +1259,14 @@ class ParquetV2PartitionDiscoverySuite extends 
ParquetPartitionDiscoverySuite {
 assert("p_int=10/p_float=1.0" === path)
   }
 
+  test("SPARK-39417: Null partition value") {
+// null partition value is replaced by DEFAULT_PARTITION_NAME before 
hitting getPathFragment.
+val spec = Map("p_int"-> ExternalCatalogUtils.DEFAULT_PARTITION_NAME)
+val schema = new StructType().add("p_int", "int")
+val path = PartitioningUtils.getPathFragment(spec, schema)
+assert(s"p_int=${ExternalCatalogUtils.DEFAULT_PARTITION_NAME}" === path)
+  }
+
   test("read partitioned table - partition key included in Parquet file") {
 withTempDir { base =>
   for {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

2022-06-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 4e5ada90cfb [SPARK-39417][SQL] Handle Null partition values in 
PartitioningUtils
4e5ada90cfb is described below

commit 4e5ada90cfb89caa25addd8991cec2af843e24a9
Author: Prashant Singh 
AuthorDate: Wed Jun 8 23:08:44 2022 -0700

[SPARK-39417][SQL] Handle Null partition values in PartitioningUtils

### What changes were proposed in this pull request?

We should not try casting everything returned by 
`removeLeadingZerosFromNumberTypePartition` to string, as it returns null value 
for the cases when partition has null value and is already replaced by 
`DEFAULT_PARTITION_NAME`

### Why are the changes needed?

for null partitions where `removeLeadingZerosFromNumberTypePartition` is 
called it would throw a NPE and hence the query would fail.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Added a UT, which would fail with an NPE otherwise.

Closes #36810 from singhpk234/psinghvk/fix-npe.

Authored-by: Prashant Singh 
Signed-off-by: huaxingao 
(cherry picked from commit dcfd9f01289f26c1a25e97432710a13772b3ad4c)
Signed-off-by: huaxingao 
---
 .../spark/sql/execution/datasources/PartitioningUtils.scala   | 2 +-
 .../datasources/parquet/ParquetPartitionDiscoverySuite.scala  | 8 
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
index 166fc852899..e856bb5b9c2 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
@@ -359,7 +359,7 @@ object PartitioningUtils extends SQLConfHelper{
   def removeLeadingZerosFromNumberTypePartition(value: String, dataType: 
DataType): String =
 dataType match {
   case ByteType | ShortType | IntegerType | LongType | FloatType | 
DoubleType =>
-castPartValueToDesiredType(dataType, value, null).toString
+Option(castPartValueToDesiredType(dataType, value, 
null)).map(_.toString).orNull
   case _ => value
 }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
index ee905fba745..bd908a36401 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
@@ -1259,6 +1259,14 @@ class ParquetV2PartitionDiscoverySuite extends 
ParquetPartitionDiscoverySuite {
 assert("p_int=10/p_float=1.0" === path)
   }
 
+  test("SPARK-39417: Null partition value") {
+// null partition value is replaced by DEFAULT_PARTITION_NAME before 
hitting getPathFragment.
+val spec = Map("p_int"-> ExternalCatalogUtils.DEFAULT_PARTITION_NAME)
+val schema = new StructType().add("p_int", "int")
+val path = PartitioningUtils.getPathFragment(spec, schema)
+assert(s"p_int=${ExternalCatalogUtils.DEFAULT_PARTITION_NAME}" === path)
+  }
+
   test("read partitioned table - partition key included in Parquet file") {
 withTempDir { base =>
   for {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 512d337abf1 [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
512d337abf1 is described below

commit 512d337abf1387a81ac47e50656e330eb3f51b22
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitatio

[spark] branch branch-3.2 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new d42f53b5ec4 [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
d42f53b5ec4 is described below

commit d42f53b5ec4d3442acadaa0f2737a8430172a562
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitatio

[spark] branch branch-3.3 updated: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 5847014fc3f [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types
5847014fc3f is described below

commit 5847014fc3fe08b8a59c107a99c1540fbb2c2208
Author: Amin Borjian 
AuthorDate: Wed Jun 8 13:30:44 2022 -0700

[SPARK-39393][SQL] Parquet data source only supports push-down predicate 
filters for non-repeated primitive types

### What changes were proposed in this pull request?

In Spark version 3.1.0 and newer, Spark creates extra filter predicate 
conditions for repeated parquet columns.
These fields do not have the ability to have a filter predicate, according 
to the [PARQUET-34](https://issues.apache.org/jira/browse/PARQUET-34) issue in 
the parquet library.

This PR solves this problem until the appropriate functionality is provided 
by the parquet.

Before this PR:

Assume follow Protocol buffer schema:

```
message Model {
string name = 1;
repeated string keywords = 2;
}
```

Suppose a parquet file is created from a set of records in the above format 
with the help of the parquet-protobuf library.
Using Spark version 3.1.0 or newer, we get following exception when run the 
following query using spark-shell:

```
val data = spark.read.parquet("/path/to/parquet")
data.registerTempTable("models")
spark.sql("select * from models where array_contains(keywords, 
'X')").show(false)
```

```
Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
currently support repeated columns. Column keywords is repeated.
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
  at 
org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
  at 
org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
  at 
org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
  at 
org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
  at 
org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
  at 
org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
  at 
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
  at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
...
```

The cause of the problem is due to a change in the data filtering 
conditions:

```
spark.sql("select * from log where array_contains(keywords, 
'X')").explain(true);

// Spark 3.0.2 and older
== Physical Plan ==
...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [array_contains(keywords#1, Google)]
  PushedFilters: []
  ...

// Spark 3.1.0 and newer
== Physical Plan == ...
+- FileScan parquet [link#0,keywords#1]
  DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
  PushedFilters: [IsNotNull(keywords)]
  ...
```

Pushing filters down for repeated columns of parquet is not necessary 
because it is not supported by parquet library for now. So we can exclude them 
from pushed predicate filters and solve issue.

### Why are the changes needed?

Predicate filters that are pushed down to parquet should not be created on 
repeated-type fields.

### Does this PR introduce any user-facing change?

No, It's only fixed a bug and before this, due to the limitatio

[spark] branch master updated (19afe1341d2 -> ac2881a8c3c)

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 19afe1341d2 [SPARK-39412][SQL] Exclude IllegalStateException from 
Spark's internal errors
 add ac2881a8c3c [SPARK-39393][SQL] Parquet data source only supports 
push-down predicate filters for non-repeated primitive types

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilters.scala   |  6 -
 .../datasources/parquet/ParquetFilterSuite.scala   | 29 ++
 2 files changed, 34 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d44b47596a [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite
7d44b47596a is described below

commit 7d44b47596a14269c4199ccf86aebf4e6c9e7ca4
Author: Jiaan Geng 
AuthorDate: Wed Jun 8 07:34:36 2022 -0700

[SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite

### What changes were proposed in this pull request?
`JDBCV2Suite` exists some test case which uses sql keywords are not 
capitalized.
This PR will capitalize sql keywords in `JDBCV2Suite`.

### Why are the changes needed?
Capitalize sql keywords in `JDBCV2Suite`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just update test cases.

### How was this patch tested?
N/A.

Closes #36805 from beliefer/SPARK-39413.

Authored-by: Jiaan Geng 
Signed-off-by: huaxingao 
---
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 66 +++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
index 9de4872fd60..cf96c35d8ae 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala
@@ -679,7 +679,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with filter push-down with string functions") {
-val df1 = sql("select * FROM h2.test.employee where " +
+val df1 = sql("SELECT * FROM h2.test.employee WHERE " +
   "substr(name, 2, 1) = 'e'" +
   " AND upper(name) = 'JEN' AND lower(name) = 'jen' ")
 checkFiltersRemoved(df1)
@@ -689,7 +689,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df1, expectedPlanFragment1)
 checkAnswer(df1, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df2 = sql("select * FROM h2.test.employee where " +
+val df2 = sql("SELECT * FROM h2.test.employee WHERE " +
   "trim(name) = 'jen' AND trim('j', name) = 'en'" +
   "AND translate(name, 'e', 1) = 'j1n'")
 checkFiltersRemoved(df2)
@@ -699,7 +699,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df2, expectedPlanFragment2)
 checkAnswer(df2, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df3 = sql("select * FROM h2.test.employee where " +
+val df3 = sql("SELECT * FROM h2.test.employee WHERE " +
   "ltrim(name) = 'jen' AND ltrim('j', name) = 'en'")
 checkFiltersRemoved(df3)
 val expectedPlanFragment3 =
@@ -708,7 +708,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkPushedInfo(df3, expectedPlanFragment3)
 checkAnswer(df3, Seq(Row(6, "jen", 12000, 1200, true)))
 
-val df4 = sql("select * FROM h2.test.employee where " +
+val df4 = sql("SELECT * FROM h2.test.employee WHERE " +
   "rtrim(name) = 'jen' AND rtrim('n', name) = 'je'")
 checkFiltersRemoved(df4)
 val expectedPlanFragment4 =
@@ -718,7 +718,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
 checkAnswer(df4, Seq(Row(6, "jen", 12000, 1200, true)))
 
 // H2 does not support OVERLAY
-val df5 = sql("select * FROM h2.test.employee where OVERLAY(NAME, '1', 2, 
1) = 'j1n'")
+val df5 = sql("SELECT * FROM h2.test.employee WHERE OVERLAY(NAME, '1', 2, 
1) = 'j1n'")
 checkFiltersRemoved(df5, false)
 val expectedPlanFragment5 =
   "PushedFilters: [NAME IS NOT NULL]"
@@ -727,8 +727,8 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: MAX AVG with filter and group by") {
-val df = sql("select MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee where 
dept > 0" +
-  " group by DePt")
+val df = sql("SELECT MAX(SaLaRY), AVG(BONUS) FROM h2.test.employee WHERE 
dept > 0" +
+  " GROUP BY DePt")
 checkFiltersRemoved(df)
 checkAggregateRemoved(df)
 checkPushedInfo(df, "PushedAggregates: [MAX(SALARY), AVG(BONUS)], " +
@@ -749,7 +749,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession 
with ExplainSuiteHel
   }
 
   test("scan with aggregate push-down: MAX AVG with filter without group by") {
-val df = sql("select MAX(ID), AVG(ID) FROM h2.test.people where id > 0"

[spark] branch master updated: [SPARK-39390][CORE] Hide and optimize `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` from INFO log

2022-06-06 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 63f0f91b3f5 [SPARK-39390][CORE] Hide and optimize 
`viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` from INFO log
63f0f91b3f5 is described below

commit 63f0f91b3f5c5d1dee9236824027bd978192a9ff
Author: Qian.Sun 
AuthorDate: Mon Jun 6 21:21:45 2022 -0700

[SPARK-39390][CORE] Hide and optimize 
`viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` from INFO log

### What changes were proposed in this pull request?

This PR aims to hide and optimize 
`viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` from INFO log.

### Why are the changes needed?

* In case of empty Set, `Set()`, there is no much information to users.
* In case of non-empty Set, `Set(root)`, there is poor reading experience 
to users.
```scala
2022-06-02 22:02:48.328 - stderr> 22/06/03 05:02:48 INFO SecurityManager: 
SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set();
users  with modify permissions: Set(root); groups with modify permissions: 
Set()
```
### Does this PR introduce _any_ user-facing change?

This is a INFO log only change.

### How was this patch tested?

Manually.

**BEFORE**

```scala
2022-06-02 22:02:48.328 - stderr> 22/06/03 05:02:48 INFO SecurityManager: 
SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: Set(root); groups 
with view permissions: Set();
users  with modify permissions: Set(root); groups with modify permissions: 
Set()
```
**AFTER**
```scala
2022-06-02 22:02:48.328 - stderr> 22/06/03 05:02:48 INFO SecurityManager: 
SecurityManager: authentication
disabled; ui acls disabled; users  with view permissions: root; groups with 
view permissions: EMPTY;
users  with modify permissions: root; groups with modify permissions: root, 
spark
```

Closes #36777 from dcoliversun/SPARK-39390.

Authored-by: Qian.Sun 
Signed-off-by: huaxingao 
---
 core/src/main/scala/org/apache/spark/SecurityManager.scala | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
b/core/src/main/scala/org/apache/spark/SecurityManager.scala
index f11176cc233..7e72ae8d89e 100644
--- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
+++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
@@ -87,10 +87,14 @@ private[spark] class SecurityManager(
   private var secretKey: String = _
   logInfo("SecurityManager: authentication " + (if (authOn) "enabled" else 
"disabled") +
 "; ui acls " + (if (aclsOn) "enabled" else "disabled") +
-"; users  with view permissions: " + viewAcls.toString() +
-"; groups with view permissions: " + viewAclsGroups.toString() +
-"; users  with modify permissions: " + modifyAcls.toString() +
-"; groups with modify permissions: " + modifyAclsGroups.toString())
+"; users with view permissions: " +
+(if (viewAcls.nonEmpty) viewAcls.mkString(", ") else "EMPTY") +
+"; groups with view permissions: " +
+(if (viewAclsGroups.nonEmpty) viewAclsGroups.mkString(", ") else "EMPTY") +
+"; users with modify permissions: " +
+(if (modifyAcls.nonEmpty) modifyAcls.mkString(", ") else "EMPTY") +
+"; groups with modify permissions: " +
+(if (modifyAclsGroups.nonEmpty) modifyAclsGroups.mkString(", ") else 
"EMPTY"))
 
   private val hadoopConf = SparkHadoopUtil.get.newConfiguration(sparkConf)
   // the default SSL configuration - it will be used by all communication 
layers unless overwritten


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

2022-05-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new ff682c4e26f [MINOR][ML][DOCS] Fix sql data types link in the 
ml-pipeline page
ff682c4e26f is described below

commit ff682c4e26fe82259f14774368fa104da0c41a63
Author: Kent Yao 
AuthorDate: Mon May 23 07:45:50 2022 -0700

[MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

### What changes were proposed in this pull request?

https://user-images.githubusercontent.com/8326978/169767919-6c48554c-87ff-4d40-a47d-ec4da0c993f7.png;>

[Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-reference.html#data-types) 
- `https://spark.apache.org/docs/latest/sql-reference.html#data-types` is 
invalid and it shall be [Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) - 
`https://spark.apache.org/docs/latest/sql-ref-datatypes.html`
https://spark.apache.org/docs/latest/ml-pipeline.html#dataframe

### Why are the changes needed?

doc fix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

`bundle exec jekyll serve`

Closes #36633 from yaooqinn/minor.

Authored-by: Kent Yao 
Signed-off-by: huaxingao 
(cherry picked from commit de73753bb2e5fd947f237e731ff05aa9f2711677)
Signed-off-by: huaxingao 
---
 docs/ml-pipeline.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index 105b1273311..5f9c94781ba 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -72,7 +72,7 @@ E.g., a learning algorithm is an `Estimator` which trains on 
a `DataFrame` and p
 Machine learning can be applied to a wide variety of data types, such as 
vectors, text, images, and structured data.
 This API adopts the `DataFrame` from Spark SQL in order to support a variety 
of data types.
 
-`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-reference.html#data-types) for a list of supported 
types.
+`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-ref-datatypes.html) for a list of supported types.
 In addition to the types listed in the Spark SQL guide, `DataFrame` can use ML 
[`Vector`](mllib-data-types.html#local-vector) types.
 
 A `DataFrame` can be created either implicitly or explicitly from a regular 
`RDD`.  See the code examples below and the [Spark SQL programming 
guide](sql-programming-guide.html) for examples.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

2022-05-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 2a31bf572bf [MINOR][ML][DOCS] Fix sql data types link in the 
ml-pipeline page
2a31bf572bf is described below

commit 2a31bf572bf386bbae2a8c6941ea43722068e0c6
Author: Kent Yao 
AuthorDate: Mon May 23 07:45:50 2022 -0700

[MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

### What changes were proposed in this pull request?

https://user-images.githubusercontent.com/8326978/169767919-6c48554c-87ff-4d40-a47d-ec4da0c993f7.png;>

[Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-reference.html#data-types) 
- `https://spark.apache.org/docs/latest/sql-reference.html#data-types` is 
invalid and it shall be [Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) - 
`https://spark.apache.org/docs/latest/sql-ref-datatypes.html`
https://spark.apache.org/docs/latest/ml-pipeline.html#dataframe

### Why are the changes needed?

doc fix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

`bundle exec jekyll serve`

Closes #36633 from yaooqinn/minor.

Authored-by: Kent Yao 
Signed-off-by: huaxingao 
(cherry picked from commit de73753bb2e5fd947f237e731ff05aa9f2711677)
Signed-off-by: huaxingao 
---
 docs/ml-pipeline.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index 105b1273311..5f9c94781ba 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -72,7 +72,7 @@ E.g., a learning algorithm is an `Estimator` which trains on 
a `DataFrame` and p
 Machine learning can be applied to a wide variety of data types, such as 
vectors, text, images, and structured data.
 This API adopts the `DataFrame` from Spark SQL in order to support a variety 
of data types.
 
-`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-reference.html#data-types) for a list of supported 
types.
+`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-ref-datatypes.html) for a list of supported types.
 In addition to the types listed in the Spark SQL guide, `DataFrame` can use ML 
[`Vector`](mllib-data-types.html#local-vector) types.
 
 A `DataFrame` can be created either implicitly or explicitly from a regular 
`RDD`.  See the code examples below and the [Spark SQL programming 
guide](sql-programming-guide.html) for examples.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

2022-05-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new de73753bb2e [MINOR][ML][DOCS] Fix sql data types link in the 
ml-pipeline page
de73753bb2e is described below

commit de73753bb2e5fd947f237e731ff05aa9f2711677
Author: Kent Yao 
AuthorDate: Mon May 23 07:45:50 2022 -0700

[MINOR][ML][DOCS] Fix sql data types link in the ml-pipeline page

### What changes were proposed in this pull request?

https://user-images.githubusercontent.com/8326978/169767919-6c48554c-87ff-4d40-a47d-ec4da0c993f7.png;>

[Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-reference.html#data-types) 
- `https://spark.apache.org/docs/latest/sql-reference.html#data-types` is 
invalid and it shall be [Spark SQL datatype 
reference](https://spark.apache.org/docs/latest/sql-ref-datatypes.html) - 
`https://spark.apache.org/docs/latest/sql-ref-datatypes.html`
https://spark.apache.org/docs/latest/ml-pipeline.html#dataframe

### Why are the changes needed?

doc fix

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

`bundle exec jekyll serve`

Closes #36633 from yaooqinn/minor.

Authored-by: Kent Yao 
Signed-off-by: huaxingao 
---
 docs/ml-pipeline.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index 105b1273311..5f9c94781ba 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -72,7 +72,7 @@ E.g., a learning algorithm is an `Estimator` which trains on 
a `DataFrame` and p
 Machine learning can be applied to a wide variety of data types, such as 
vectors, text, images, and structured data.
 This API adopts the `DataFrame` from Spark SQL in order to support a variety 
of data types.
 
-`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-reference.html#data-types) for a list of supported 
types.
+`DataFrame` supports many basic and structured types; see the [Spark SQL 
datatype reference](sql-ref-datatypes.html) for a list of supported types.
 In addition to the types listed in the Spark SQL guide, `DataFrame` can use ML 
[`Vector`](mllib-data-types.html#local-vector) types.
 
 A `DataFrame` can be created either implicitly or explicitly from a regular 
`RDD`.  See the code examples below and the [Spark SQL programming 
guide](sql-programming-guide.html) for examples.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`

2022-05-15 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new cc560ea8585 [SPARK-39156][SQL] Clean up the usage of 
`ParquetLogRedirector` in `ParquetFileFormat`
cc560ea8585 is described below

commit cc560ea8585d845af9a01ced1c536036f88e7ba7
Author: yangjie01 
AuthorDate: Sun May 15 08:34:09 2022 -0700

[SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in 
`ParquetFileFormat`

### What changes were proposed in this pull request?
SPARK-17993 introduce `ParquetLogRedirector` for Parquet version < 1.9, 
[PARQUET-305](https://issues.apache.org/jira/browse/PARQUET-305) change to use 
slf4j instead of jul in Parquet 1.9,  Spark uses Parquet 1.12.2 now  and [no 
longer relies on Parquet version 
1.6](https://github.com/apache/spark/pull/30005) now , the 
`ParquetLogRedirector` is no longer needed, so this pr clean up the usage of 
`ParquetLogRedirector` in `ParquetFileFormat`.

### Why are the changes needed?
Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GA
- Manual test:

1. Build Spark client manually before and after this pr
2. Change parquet log4j level to debug:
```
logger.parquet1.name = org.apache.parquet
logger.parquet1.level = debug
logger.parquet2.name = parquet
logger.parquet2.level = debug
```
3. Try to read Parquet file write with 1.6 , for example 
`sql/core/src/test/resources/test-data/dec-in-i32.parquet`.

```
java -jar parquet-tools-1.10.1.jar meta /${basedir}/dec-in-i32.parquet
file:file:/${basedir}/dec-in-i32.parquet
creator: parquet-mr version 1.6.0
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"i32_dec","type":"decimal(5,2)","nullable":true,"metadata":{}}]}

file schema: spark_schema


i32_dec: OPTIONAL INT32 O:DECIMAL R:0 D:1

row group 1: RC:16 TS:102 OFFSET:4


i32_dec:  INT32 GZIP DO:0 FPO:4 SZ:131/102/0.78 VC:16 
ENC:RLE,PLAIN_DICTIONARY,BIT_PACKED ST:[no stats for this column]
```

```
spark.read.parquet("file://${basedir}/ptable/dec-in-i32.parquet").show()
```

The log contents before and after this pr are consistent, and there is no 
error log mentioned in SPARK-17993

Closes #36515 from LuciferYang/remove-parquetLogRedirector.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../datasources/parquet/ParquetLogRedirector.java  | 72 --
 .../datasources/parquet/ParquetFileFormat.scala| 10 ---
 2 files changed, 82 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java
deleted file mode 100644
index 7a7f32ee1e8..000
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetLogRedirector.java
+++ /dev/null
@@ -1,72 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.spark.sql.execution.datasources.parquet;
-
-import java.io.Serializable;
-import java.util.logging.Handler;
-import java.util.logging.Logger;
-
-import org.apache.parquet.Log;
-import org.slf4j.bridge.SLF4JBridgeHandler;
-
-// Redirects the JUL logging for parquet-mr versions <= 1.8 to SLF4J logging 
using
-// SLF4JBridgeHandler. Parquet-mr versions >= 1.9 use SLF4J directly
-final class ParquetLogRedirector implements Serializable {
-  // Client classes should hold a reference to INSTANCE to ensure redirection 
occurs. Thi

[spark] branch master updated: [SPARK-39162][SQL] Jdbc dialect should decide which function could be pushed down

2022-05-14 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 97efe0efb26 [SPARK-39162][SQL] Jdbc dialect should decide which 
function could be pushed down
97efe0efb26 is described below

commit 97efe0efb2665833910e13eb7bae16cc1ad4e0fa
Author: Jiaan Geng 
AuthorDate: Sat May 14 16:28:21 2022 -0700

[SPARK-39162][SQL] Jdbc dialect should decide which function could be 
pushed down

### What changes were proposed in this pull request?
Regardless of whether the functions are ANSI or not, most databases are 
actually unsure of their support.
So we should add a new API into `JdbcDialect` so that jdbc dialect decide 
which function could be pushed down.

### Why are the changes needed?
Let function push-down more flexible.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
Exists tests.

Closes #36521 from beliefer/SPARK-39162.

Authored-by: Jiaan Geng 
Signed-off-by: huaxingao 
---
 .../spark/sql/errors/QueryCompilationErrors.scala  |  4 
 .../org/apache/spark/sql/jdbc/H2Dialect.scala  | 28 --
 .../org/apache/spark/sql/jdbc/JdbcDialects.scala   | 19 +++
 3 files changed, 23 insertions(+), 28 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
index 3b167eeb417..efb4389ec50 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
@@ -2404,10 +2404,6 @@ object QueryCompilationErrors extends QueryErrorsBase {
   "Sinks cannot request distribution and ordering in continuous execution 
mode")
   }
 
-  def noSuchFunctionError(database: String, funcInfo: String): Throwable = {
-new AnalysisException(s"$database does not support function: $funcInfo")
-  }
-
   // Return a more descriptive error message if the user tries to nest a 
DEFAULT column reference
   // inside some other expression (such as DEFAULT + 1) in an INSERT INTO 
command's VALUES list;
   // this is not allowed.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
index 56cadbe8e2c..4a88203ec59 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala
@@ -20,13 +20,9 @@ package org.apache.spark.sql.jdbc
 import java.sql.{SQLException, Types}
 import java.util.Locale
 
-import scala.util.control.NonFatal
-
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.analysis.{NoSuchNamespaceException, 
NoSuchTableException, TableAlreadyExistsException}
-import org.apache.spark.sql.connector.expressions.Expression
 import org.apache.spark.sql.connector.expressions.aggregate.{AggregateFunc, 
GeneralAggregateFunc}
-import org.apache.spark.sql.errors.QueryCompilationErrors
 import org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils
 import org.apache.spark.sql.types.{BooleanType, ByteType, DataType, 
DecimalType, ShortType, StringType}
 
@@ -34,27 +30,11 @@ private object H2Dialect extends JdbcDialect {
   override def canHandle(url: String): Boolean =
 url.toLowerCase(Locale.ROOT).startsWith("jdbc:h2")
 
-  class H2SQLBuilder extends JDBCSQLBuilder {
-override def visitSQLFunction(funcName: String, inputs: Array[String]): 
String = {
-  funcName match {
-case "WIDTH_BUCKET" =>
-  val functionInfo = super.visitSQLFunction(funcName, inputs)
-  throw QueryCompilationErrors.noSuchFunctionError("H2", functionInfo)
-case _ => super.visitSQLFunction(funcName, inputs)
-  }
-}
-  }
+  private val supportedFunctions =
+Set("ABS", "COALESCE", "LN", "EXP", "POWER", "SQRT", "FLOOR", "CEIL")
 
-  override def compileExpression(expr: Expression): Option[String] = {
-val h2SQLBuilder = new H2SQLBuilder()
-try {
-  Some(h2SQLBuilder.build(expr))
-} catch {
-  case NonFatal(e) =>
-logWarning("Error occurs while compiling V2 expression", e)
-None
-}
-  }
+  override def isSupportedFunction(funcName: String): Boolean =
+supportedFunctions.contains(funcName)
 
   override def compileAggregate(aggFunction: AggregateFunc): Option[String] = {
 super.compileAggregate(aggFunction).orElse(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

[spark] branch master updated (ad81ba83971 -> 16b5124d75d)

2022-05-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ad81ba83971 [SPARK-39125][BUILD] Upgrade netty to 4.1.77 and 
netty-tcnative
 add 16b5124d75d [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with 
comment in ReplaceColumns

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns

2022-05-09 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new cf13262bc2d [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with 
comment in ReplaceColumns
cf13262bc2d is described below

commit cf13262bc2d7ee1bce8c08292725353b2beccadd
Author: Qian.Sun 
AuthorDate: Mon May 9 10:00:05 2022 -0700

[SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in 
ReplaceColumns

### What changes were proposed in this pull request?

This PR aims to replace named parameter with comment in `ReplaceColumns`.

### Why are the changes needed?

#36252 changed signature of deleteColumn#**TableChange.java**, but this PR 
breaks sbt compilation in k8s integration test.
```shell
> build/sbt -Pkubernetes -Pkubernetes-integration-tests 
-Dtest.exclude.tags=r -Dspark.kubernetes.test.imageRepo=kubespark 
"kubernetes-integration-tests/test"
[error] 
/Users/IdeaProjects/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:147:45:
 not found: value ifExists
[error]   TableChange.deleteColumn(Array(name), ifExists = false)
[error] ^
[error] 
/Users/IdeaProjects/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala:159:19:
 value ++ is not a member of Array[Nothing]
[error] deleteChanges ++ addChanges
[error]   ^
[error] two errors found
[error] (catalyst / Compile / compileIncremental) Compilation failed
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Pass the GA and k8s integration test.

Closes #36487 from dcoliversun/SPARK-38939.

Authored-by: Qian.Sun 
Signed-off-by: huaxingao 
(cherry picked from commit 16b5124d75dc974c37f2fd87c78d231f8a3bf772)
Signed-off-by: huaxingao 
---
 .../apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
index 8cc93c2dd09..4bd4f58b6a7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala
@@ -144,7 +144,7 @@ case class ReplaceColumns(
 require(table.resolved)
 val deleteChanges = table.schema.fieldNames.map { name =>
   // REPLACE COLUMN should require column to exist
-  TableChange.deleteColumn(Array(name), ifExists = false)
+  TableChange.deleteColumn(Array(name), false /* ifExists */)
 }
 val addChanges = columnsToAdd.map { col =>
   assert(col.path.isEmpty)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-37259][SQL] Support CTE and temp table queries with MSSQL JDBC

2022-05-06 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0129f34f201 [SPARK-37259][SQL] Support CTE and temp table queries with 
MSSQL JDBC
0129f34f201 is described below

commit 0129f34f2016a4d9a0f0e862d21778a26259b4d0
Author: Peter Toth 
AuthorDate: Fri May 6 14:23:06 2022 -0700

[SPARK-37259][SQL] Support CTE and temp table queries with MSSQL JDBC

### What changes were proposed in this pull request?
Currently CTE queries from Spark are not supported with MSSQL server via 
JDBC. This is because MSSQL server doesn't support the nested CTE syntax 
(`SELECT * FROM (WITH t AS (...) SELECT ... FROM t) WHERE 1=0`) that Spark 
builds from the original query (`options.tableOrQuery`) in 
`JDBCRDD.resolveTable()`  and in `JDBCRDD.compute()`.
Unfortunately, it is non-trivial to split an arbitrary query into "with" 
and "regular" clauses in `MsSqlServerDialect`. So instead, I'm proposing a new 
general JDBC option `prepareQuery` that users can use if they have complex 
queries:
```
val df = spark.read.format("jdbc")
  .option("url", jdbcUrl)
  .option("prepareQuery", "WITH t AS (SELECT x, y FROM tbl)")
  .option("query", "SELECT * FROM t WHERE x > 10")
  .load()
```
This change also works with MSSQL's temp table syntax:
```
val df = spark.read.format("jdbc")
  .option("url", jdbcUrl)
  .option("prepareQuery", "(SELECT * INTO #TempTable FROM (SELECT * FROM 
tbl WHERE x > 10) t)")
  .option("query", "SELECT * FROM #TempTable")
  .load()
```

### Why are the changes needed?
To support CTE and temp table queries with MSSQL.

### Does this PR introduce _any_ user-facing change?
Yes, CTE and temp table queries are supported form now.

### How was this patch tested?
Added new integration UTs.

Closes #36440 from peter-toth/SPARK-37259-cte-mssql.

Authored-by: Peter Toth 
Signed-off-by: huaxingao 
---
 .../sql/jdbc/MsSqlServerIntegrationSuite.scala | 55 ++
 docs/sql-data-sources-jdbc.md  | 31 
 .../execution/datasources/jdbc/JDBCOptions.scala   |  5 ++
 .../sql/execution/datasources/jdbc/JDBCRDD.scala   |  6 ++-
 .../execution/datasources/jdbc/JDBCRelation.scala  |  2 +-
 .../sql/execution/datasources/jdbc/JdbcUtils.scala |  3 +-
 .../datasources/v2/jdbc/JDBCScanBuilder.scala  |  3 +-
 7 files changed, 100 insertions(+), 5 deletions(-)

diff --git 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
index e293f9a8f7b..a4e2dba5343 100644
--- 
a/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
+++ 
b/connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
@@ -21,6 +21,7 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
+import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.functions.col
 import org.apache.spark.sql.internal.SQLConf
@@ -374,4 +375,58 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 val filtered = df.where(col("c") === 0).collect()
 assert(filtered.length == 0)
   }
+
+  test("SPARK-37259: prepareQuery and query JDBC options") {
+val expectedResult = Set(
+  (42, "fred"),
+  (17, "dave")
+).map { case (x, y) =>
+  Row(Integer.valueOf(x), String.valueOf(y))
+}
+
+val prepareQuery = "WITH t AS (SELECT x, y FROM tbl)"
+val query = "SELECT * FROM t WHERE x > 10"
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("prepareQuery", prepareQuery)
+  .option("query", query)
+  .load()
+assert(df.collect.toSet === expectedResult)
+  }
+
+  test("SPARK-37259: prepareQuery and dbtable JDBC options") {
+val expectedResult = Set(
+  (42, "fred"),
+  (17, "dave")
+).map { case (x, y) =>
+  Row(Integer.valueOf(x), String.valueOf(y))
+}
+
+val prepareQuery = "WITH t AS (SELECT x, y FROM tbl WHERE x > 10)"
+val dbtable = "t"
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("prepareQuery", prepareQuery)
+  .

[spark] branch master updated: [SPARK-39116][SQL] Replcace double negation in `exists` with `forall`

2022-05-06 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2754d75e1b3 [SPARK-39116][SQL] Replcace double negation in `exists` 
with `forall`
2754d75e1b3 is described below

commit 2754d75e1b33ae191616077bd23801edbd7c7c49
Author: yangjie01 
AuthorDate: Fri May 6 11:30:07 2022 -0700

[SPARK-39116][SQL] Replcace double negation in `exists` with `forall`

### What changes were proposed in this pull request?
This is a minor code simplification:
**Before**

```scala
!Seq(1, 2).exists(x => !condition(x))
```

**After**

```scala
Seq(1, 2).forall(x => condition(x))
```

### Why are the changes needed?
Code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #36470 from LuciferYang/SPARK-39116.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala  | 2 +-
 .../spark/sql/catalyst/plans/logical/basicLogicalOperators.scala  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 906077a9c0e..817a62fd1d8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -3159,7 +3159,7 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   // We only extract Window Expressions after all expressions of the 
Project
   // have been resolved.
   case p @ Project(projectList, child)
-if hasWindowFunction(projectList) && 
!p.expressions.exists(!_.resolved) =>
+if hasWindowFunction(projectList) && p.expressions.forall(_.resolved) 
=>
 val (windowExpressions, regularExpressions) = 
extract(projectList.toIndexedSeq)
 // We add a project to get all needed expressions for window 
expressions from the child
 // of the original Project operator.
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 419e28c8007..e38fa627346 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -81,7 +81,7 @@ case class Project(projectList: Seq[NamedExpression], child: 
LogicalPlan)
   }.nonEmpty
 )
 
-!expressions.exists(!_.resolved) && childrenResolved && 
!hasSpecialExpressions
+expressions.forall(_.resolved) && childrenResolved && 
!hasSpecialExpressions
   }
 
   override lazy val validConstraints: ExpressionSet =
@@ -985,7 +985,7 @@ case class Aggregate(
   }.nonEmpty
 )
 
-!expressions.exists(!_.resolved) && childrenResolved && 
!hasWindowExpressions
+expressions.forall(_.resolved) && childrenResolved && !hasWindowExpressions
   }
 
   override def output: Seq[Attribute] = aggregateExpressions.map(_.toAttribute)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

2022-04-22 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new ca9138ee8b6 [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 
aggregate push down
ca9138ee8b6 is described below

commit ca9138ee8b6d8645943b737cc4231fbb0154c8cb
Author: Cheng Su 
AuthorDate: Fri Apr 22 10:13:40 2022 -0700

[SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

### What changes were proposed in this pull request?

This is a followup per comment in 
https://issues.apache.org/jira/browse/SPARK-34960, to improve the documentation 
for data source v2 aggregate push down of Parquet and ORC.

* Unify SQL config docs between Parquet and ORC, and add the note that if 
statistics is missing from any file footer, exception would be thrown.
* Also adding the same note for exception in Parquet and ORC methods to 
aggregate from statistics.

Though in future Spark release, we may improve the behavior to fallback to 
aggregate from real data of file, in case any statistics are missing. We'd 
better to make a clear documentation for current behavior now.

### Why are the changes needed?

Give users & developers a better idea of when aggregate push down would 
throw exception.
Have a better documentation for current behavior.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation change in SQL configs.

### How was this patch tested?

Existing tests as this is just documentation change.

Closes #36311 from c21/agg-doc.

Authored-by: Cheng Su 
Signed-off-by: huaxingao 
(cherry picked from commit 86b8757c2c4bab6a0f7a700cf2c690cdd7f31eba)
Signed-off-by: huaxingao 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 10 ++
 .../apache/spark/sql/execution/datasources/orc/OrcUtils.scala  |  2 ++
 .../spark/sql/execution/datasources/parquet/ParquetUtils.scala |  4 +++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index f97b7f8f004..76f3d1f5a84 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -974,9 +974,10 @@ object SQLConf {
   .createWithDefault(10)
 
   val PARQUET_AGGREGATE_PUSHDOWN_ENABLED = 
buildConf("spark.sql.parquet.aggregatePushdown")
-.doc("If true, MAX/MIN/COUNT without filter and group by will be pushed" +
-  " down to Parquet for optimization. MAX/MIN/COUNT for complex types and 
timestamp" +
-  " can't be pushed down")
+.doc("If true, aggregates will be pushed down to Parquet for optimization. 
Support MIN, MAX " +
+  "and COUNT as aggregate expression. For MIN/MAX, support boolean, 
integer, float and date " +
+  "type. For COUNT, support all data types. If statistics is missing from 
any Parquet file " +
+  "footer, exception would be thrown.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)
@@ -1110,7 +,8 @@ object SQLConf {
   val ORC_AGGREGATE_PUSHDOWN_ENABLED = 
buildConf("spark.sql.orc.aggregatePushdown")
 .doc("If true, aggregates will be pushed down to ORC for optimization. 
Support MIN, MAX and " +
   "COUNT as aggregate expression. For MIN/MAX, support boolean, integer, 
float and date " +
-  "type. For COUNT, support all data types.")
+  "type. For COUNT, support all data types. If statistics is missing from 
any ORC file " +
+  "footer, exception would be thrown.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
index a68ce1a8636..9011821e1a7 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
@@ -408,6 +408,8 @@ object OrcUtils extends Logging {
* (Max/Min/Count) result using the statistics information from ORC file 
footer, and then
* construct an InternalRow from these aggregate results.
*
+   * NOTE: if statistics is missing from ORC file footer, exception would be 
thrown.
+   *
* @return Aggregate results in the format of InternalRow
*/
   def createAggInternalRowFromFooter(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/

[spark] branch master updated: [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

2022-04-22 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 86b8757c2c4 [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 
aggregate push down
86b8757c2c4 is described below

commit 86b8757c2c4bab6a0f7a700cf2c690cdd7f31eba
Author: Cheng Su 
AuthorDate: Fri Apr 22 10:13:40 2022 -0700

[SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

### What changes were proposed in this pull request?

This is a followup per comment in 
https://issues.apache.org/jira/browse/SPARK-34960, to improve the documentation 
for data source v2 aggregate push down of Parquet and ORC.

* Unify SQL config docs between Parquet and ORC, and add the note that if 
statistics is missing from any file footer, exception would be thrown.
* Also adding the same note for exception in Parquet and ORC methods to 
aggregate from statistics.

Though in future Spark release, we may improve the behavior to fallback to 
aggregate from real data of file, in case any statistics are missing. We'd 
better to make a clear documentation for current behavior now.

### Why are the changes needed?

Give users & developers a better idea of when aggregate push down would 
throw exception.
Have a better documentation for current behavior.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation change in SQL configs.

### How was this patch tested?

Existing tests as this is just documentation change.

Closes #36311 from c21/agg-doc.

Authored-by: Cheng Su 
Signed-off-by: huaxingao 
---
 .../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 10 ++
 .../apache/spark/sql/execution/datasources/orc/OrcUtils.scala  |  2 ++
 .../spark/sql/execution/datasources/parquet/ParquetUtils.scala |  4 +++-
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 50d09d046bc..6d3f283fa73 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -974,9 +974,10 @@ object SQLConf {
   .createWithDefault(10)
 
   val PARQUET_AGGREGATE_PUSHDOWN_ENABLED = 
buildConf("spark.sql.parquet.aggregatePushdown")
-.doc("If true, MAX/MIN/COUNT without filter and group by will be pushed" +
-  " down to Parquet for optimization. MAX/MIN/COUNT for complex types and 
timestamp" +
-  " can't be pushed down")
+.doc("If true, aggregates will be pushed down to Parquet for optimization. 
Support MIN, MAX " +
+  "and COUNT as aggregate expression. For MIN/MAX, support boolean, 
integer, float and date " +
+  "type. For COUNT, support all data types. If statistics is missing from 
any Parquet file " +
+  "footer, exception would be thrown.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)
@@ -1110,7 +,8 @@ object SQLConf {
   val ORC_AGGREGATE_PUSHDOWN_ENABLED = 
buildConf("spark.sql.orc.aggregatePushdown")
 .doc("If true, aggregates will be pushed down to ORC for optimization. 
Support MIN, MAX and " +
   "COUNT as aggregate expression. For MIN/MAX, support boolean, integer, 
float and date " +
-  "type. For COUNT, support all data types.")
+  "type. For COUNT, support all data types. If statistics is missing from 
any ORC file " +
+  "footer, exception would be thrown.")
 .version("3.3.0")
 .booleanConf
 .createWithDefault(false)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
index 79abdfe4690..f07573beae6 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
@@ -407,6 +407,8 @@ object OrcUtils extends Logging {
* (Max/Min/Count) result using the statistics information from ORC file 
footer, and then
* construct an InternalRow from these aggregate results.
*
+   * NOTE: if statistics is missing from ORC file footer, exception would be 
thrown.
+   *
* @return Aggregate results in the format of InternalRow
*/
   def createAggInternalRowFromFooter(
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/exec

[spark] branch branch-3.3 updated: [SPARK-38950][SQL][FOLLOWUP] Fix java doc

2022-04-21 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 17552d5ff90 [SPARK-38950][SQL][FOLLOWUP] Fix java doc
17552d5ff90 is described below

commit 17552d5ff90e6421b2699726468c5798a12970b9
Author: huaxingao 
AuthorDate: Thu Apr 21 11:52:04 2022 -0700

[SPARK-38950][SQL][FOLLOWUP] Fix java doc

### What changes were proposed in this pull request?
`{link #pushFilters(Predicate[])}` ->  `{link 
#pushFilters(Seq[Expression])}`

### Why are the changes needed?
Fixed java doc
### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #36302 from huaxingao/fix.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
(cherry picked from commit 0b543e7480b6e414b23e02e6c805a33abc535c89)
Signed-off-by: huaxingao 
---
 .../spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
index 99590480220..4641a06ba3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
@@ -35,7 +35,7 @@ trait SupportsPushDownCatalystFilters {
 
   /**
* Returns the data filters that are pushed to the data source via
-   * {@link #pushFilters(Predicate[])}.
+   * {@link #pushFilters(Seq[Expression])}.
*/
   def pushedFilters: Array[Predicate]
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38950][SQL][FOLLOWUP] Fix java doc

2022-04-21 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b543e7480b [SPARK-38950][SQL][FOLLOWUP] Fix java doc
0b543e7480b is described below

commit 0b543e7480b6e414b23e02e6c805a33abc535c89
Author: huaxingao 
AuthorDate: Thu Apr 21 11:52:04 2022 -0700

[SPARK-38950][SQL][FOLLOWUP] Fix java doc

### What changes were proposed in this pull request?
`{link #pushFilters(Predicate[])}` ->  `{link 
#pushFilters(Seq[Expression])}`

### Why are the changes needed?
Fixed java doc
### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

Closes #36302 from huaxingao/fix.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
---
 .../spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
index 99590480220..4641a06ba3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsPushDownCatalystFilters.scala
@@ -35,7 +35,7 @@ trait SupportsPushDownCatalystFilters {
 
   /**
* Returns the data filters that are pushed to the data source via
-   * {@link #pushFilters(Predicate[])}.
+   * {@link #pushFilters(Seq[Expression])}.
*/
   def pushedFilters: Array[Predicate]
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-38825][SQL][TEST][FOLLOWUP] Add test for in(null) and notIn(null)

2022-04-18 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new dd6eca7550c [SPARK-38825][SQL][TEST][FOLLOWUP] Add test for in(null) 
and notIn(null)
dd6eca7550c is described below

commit dd6eca7550c25dbcad9f12caf9fccfcad981d33f
Author: huaxingao 
AuthorDate: Mon Apr 18 21:27:57 2022 -0700

[SPARK-38825][SQL][TEST][FOLLOWUP] Add test for in(null) and notIn(null)

### What changes were proposed in this pull request?
Add test for filter `in(null)` and `notIn(null)`

### Why are the changes needed?
to make tests more complete

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

new test

Closes #36248 from huaxingao/inNotIn.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
(cherry picked from commit b760e4a686939bdb837402286b8d3d8b445c5ed4)
Signed-off-by: huaxingao 
---
 .../datasources/parquet/ParquetFilterSuite.scala   | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 71ea474409c..7a09011f27c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -1905,21 +1905,33 @@ abstract class ParquetFilterSuite extends QueryTest 
with ParquetTest with Shared
   test("SPARK-38825: in and notIn filters") {
 import testImplicits._
 withTempPath { file =>
-  Seq(1, 2, 0, -1, 99, 1000, 3, 7, 
2).toDF("id").coalesce(1).write.mode("overwrite")
+  Seq(1, 2, 0, -1, 99, Integer.MAX_VALUE, 1000, 3, 7, Integer.MIN_VALUE, 2)
+.toDF("id").coalesce(1).write.mode("overwrite")
 .parquet(file.getCanonicalPath)
   var df = spark.read.parquet(file.getCanonicalPath)
-  var in = df.filter(col("id").isin(100, 3, 11, 12, 13))
-  var notIn = df.filter(!col("id").isin(100, 3, 11, 12, 13))
-  checkAnswer(in, Seq(Row(3)))
+  var in = df.filter(col("id").isin(100, 3, 11, 12, 13, Integer.MAX_VALUE, 
Integer.MIN_VALUE))
+  var notIn =
+df.filter(!col("id").isin(100, 3, 11, 12, 13, Integer.MAX_VALUE, 
Integer.MIN_VALUE))
+  checkAnswer(in, Seq(Row(3), Row(-2147483648), Row(2147483647)))
   checkAnswer(notIn, Seq(Row(1), Row(2), Row(0), Row(-1), Row(99), 
Row(1000), Row(7), Row(2)))
 
-  Seq("mary", "martin", "lucy", "alex", "mary", 
"dan").toDF("name").coalesce(1)
+  Seq("mary", "martin", "lucy", "alex", null, "mary", 
"dan").toDF("name").coalesce(1)
 .write.mode("overwrite").parquet(file.getCanonicalPath)
   df = spark.read.parquet(file.getCanonicalPath)
   in = df.filter(col("name").isin("mary", "victor", "leo", "alex"))
   notIn = df.filter(!col("name").isin("mary", "victor", "leo", "alex"))
   checkAnswer(in, Seq(Row("mary"), Row("alex"), Row("mary")))
   checkAnswer(notIn, Seq(Row("martin"), Row("lucy"), Row("dan")))
+
+  in = df.filter(col("name").isin("mary", "victor", "leo", "alex", null))
+  notIn = df.filter(!col("name").isin("mary", "victor", "leo", "alex", 
null))
+  checkAnswer(in, Seq(Row("mary"), Row("alex"), Row("mary")))
+  checkAnswer(notIn, Seq())
+
+  in = df.filter(col("name").isin(null))
+  notIn = df.filter(!col("name").isin(null))
+  checkAnswer(in, Seq())
+  checkAnswer(notIn, Seq())
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (242ee22c003 -> b760e4a6869)

2022-04-18 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 242ee22c003 [SPARK-38796][SQL] Update to_number and try_to_number 
functions to restrict S and MI sequence to start or end only
 add b760e4a6869 [SPARK-38825][SQL][TEST][FOLLOWUP] Add test for in(null) 
and notIn(null)

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilterSuite.scala   | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated (76fa565fac3 -> 2e0a21ae1d3)

2022-04-12 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


from 76fa565fac3 [SPARK-38882][PYTHON] Fix usage logger attachment to 
handle static methods properly
 add 2e0a21ae1d3 [SPARK-38865][SQL][DOCS] Update document of JDBC options 
for `pushDownAggregate` and `pushDownLimit`

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-jdbc.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (484d573933c -> 988af33af8d)

2022-04-12 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 484d573933c [SPARK-38785][PYTHON][SQL] Implement 
ExponentialMovingWindow
 add 988af33af8d [SPARK-38865][SQL][DOCS] Update document of JDBC options 
for `pushDownAggregate` and `pushDownLimit`

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-jdbc.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-38825][SQL][TEST] Add a test to cover parquet notIn filter

2022-04-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new cf7e3574efc [SPARK-38825][SQL][TEST] Add a test to cover parquet notIn 
filter
cf7e3574efc is described below

commit cf7e3574efc1d4bb7233f18fcf344e94d26c2ac1
Author: huaxingao 
AuthorDate: Thu Apr 7 16:08:45 2022 -0700

[SPARK-38825][SQL][TEST] Add a test to cover parquet notIn filter

### What changes were proposed in this pull request?
Currently we don't have a test for parquet `notIn` filter, so add a test 
for this

### Why are the changes needed?
to make tests more complete

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
new test

Closes #36109 from huaxingao/inFilter.

Authored-by: huaxingao 
Signed-off-by: huaxingao 
(cherry picked from commit d6fd0405b60875ac5e2c9daee1ec785f74e9b7a3)
Signed-off-by: huaxingao 
---
 .../datasources/parquet/ParquetFilterSuite.scala| 21 +
 1 file changed, 21 insertions(+)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
index 64a2ec6308c..71ea474409c 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
@@ -1901,6 +1901,27 @@ abstract class ParquetFilterSuite extends QueryTest with 
ParquetTest with Shared
   }
 }
   }
+
+  test("SPARK-38825: in and notIn filters") {
+import testImplicits._
+withTempPath { file =>
+  Seq(1, 2, 0, -1, 99, 1000, 3, 7, 
2).toDF("id").coalesce(1).write.mode("overwrite")
+.parquet(file.getCanonicalPath)
+  var df = spark.read.parquet(file.getCanonicalPath)
+  var in = df.filter(col("id").isin(100, 3, 11, 12, 13))
+  var notIn = df.filter(!col("id").isin(100, 3, 11, 12, 13))
+  checkAnswer(in, Seq(Row(3)))
+  checkAnswer(notIn, Seq(Row(1), Row(2), Row(0), Row(-1), Row(99), 
Row(1000), Row(7), Row(2)))
+
+  Seq("mary", "martin", "lucy", "alex", "mary", 
"dan").toDF("name").coalesce(1)
+.write.mode("overwrite").parquet(file.getCanonicalPath)
+  df = spark.read.parquet(file.getCanonicalPath)
+  in = df.filter(col("name").isin("mary", "victor", "leo", "alex"))
+  notIn = df.filter(!col("name").isin("mary", "victor", "leo", "alex"))
+  checkAnswer(in, Seq(Row("mary"), Row("alex"), Row("mary")))
+  checkAnswer(notIn, Seq(Row("martin"), Row("lucy"), Row("dan")))
+}
+  }
 }
 
 @ExtendedSQLTest


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (83963828b54 -> d6fd0405b60)

2022-04-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 83963828b54 [SPARK-38802][K8S][TESTS] Add Support for 
`spark.kubernetes.test.(driver|executor)RequestCores`
 add d6fd0405b60 [SPARK-38825][SQL][TEST] Add a test to cover parquet notIn 
filter

No new revisions were added by this update.

Summary of changes:
 .../datasources/parquet/ParquetFilterSuite.scala| 21 +
 1 file changed, 21 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-38643][ML] Validate input dataset of ml.regression

2022-03-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 6d3149a  [SPARK-38643][ML] Validate input dataset of ml.regression
6d3149a is described below

commit 6d3149a0d5fe0652197841a589bbeb8654471e58
Author: Ruifeng Zheng 
AuthorDate: Thu Mar 24 23:46:31 2022 -0700

[SPARK-38643][ML] Validate input dataset of ml.regression

### What changes were proposed in this pull request?
validate the input dataset, and fail fast when containing invalid values

### Why are the changes needed?
to avoid retruning a bad model silently

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added testsuites

Closes #35958 from zhengruifeng/regression_validate_training_dataset.

Authored-by: Ruifeng Zheng 
Signed-off-by: huaxingao 
---
 .../ml/regression/AFTSurvivalRegression.scala  | 26 ++-
 .../ml/regression/DecisionTreeRegressor.scala  | 13 ++--
 .../apache/spark/ml/regression/FMRegressor.scala   |  9 --
 .../apache/spark/ml/regression/GBTRegressor.scala  | 14 ++--
 .../regression/GeneralizedLinearRegression.scala   | 28 
 .../spark/ml/regression/IsotonicRegression.scala   | 16 ++
 .../spark/ml/regression/LinearRegression.scala | 16 ++
 .../ml/regression/RandomForestRegressor.scala  | 12 +--
 .../org/apache/spark/ml/util/DatasetUtils.scala| 12 ---
 .../ml/regression/AFTSurvivalRegressionSuite.scala | 37 ++
 .../ml/regression/DecisionTreeRegressorSuite.scala |  6 
 .../spark/ml/regression/FMRegressorSuite.scala |  5 +++
 .../spark/ml/regression/GBTRegressorSuite.scala|  6 
 .../GeneralizedLinearRegressionSuite.scala | 31 ++
 .../ml/regression/IsotonicRegressionSuite.scala| 32 +++
 .../ml/regression/LinearRegressionSuite.scala  |  6 
 .../ml/regression/RandomForestRegressorSuite.scala |  6 
 .../scala/org/apache/spark/ml/util/MLTest.scala| 29 +
 18 files changed, 258 insertions(+), 46 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
index 117229b..c48fe68 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala
@@ -35,6 +35,7 @@ import org.apache.spark.ml.param._
 import org.apache.spark.ml.param.shared._
 import org.apache.spark.ml.stat._
 import org.apache.spark.ml.util._
+import org.apache.spark.ml.util.DatasetUtils._
 import org.apache.spark.ml.util.Instrumentation.instrumented
 import org.apache.spark.mllib.util.MLUtils
 import org.apache.spark.rdd.RDD
@@ -210,14 +211,23 @@ class AFTSurvivalRegression @Since("1.6.0") 
(@Since("1.6.0") override val uid: S
 s"then cached during training. Be careful of double caching!")
 }
 
-val instances = dataset.select(col($(featuresCol)), 
col($(labelCol)).cast(DoubleType),
-  col($(censorCol)).cast(DoubleType))
-  .rdd.map { case Row(features: Vector, label: Double, censor: Double) =>
-require(censor == 1.0 || censor == 0.0, "censor must be 1.0 or 0.0")
-// AFT does not support instance weighting,
-// here use Instance.weight to store censor for convenience
-Instance(label, censor, features)
-  }.setName("training instances")
+val validatedCensorCol = {
+  val casted = col($(censorCol)).cast(DoubleType)
+  when(casted.isNull || casted.isNaN, raise_error(lit("Censors MUST NOT be 
Null or NaN")))
+.when(casted =!= 0 && casted =!= 1,
+  raise_error(concat(lit("Censors MUST be in {0, 1}, but got "), 
casted)))
+.otherwise(casted)
+}
+
+val instances = dataset.select(
+  checkRegressionLabels($(labelCol)),
+  validatedCensorCol,
+  checkNonNanVectors($(featuresCol))
+).rdd.map { case Row(l: Double, c: Double, v: Vector) =>
+  // AFT does not support instance weighting,
+  // here use Instance.weight to store censor for convenience
+  Instance(l, c, v)
+}.setName("training instances")
 
 val summarizer = instances.treeAggregate(
   Summarizer.createSummarizerBuffer("mean", "std", "count"))(
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegressor.scala
index 6913718..d9942f1 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/regression/DecisionTreeRegre

[spark] branch master updated: [SPARK-38414][CORE][DSTREAM][EXAMPLES][ML][MLLIB][SQL] Remove redundant `@SuppressWarnings `

2022-03-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ddc1803  [SPARK-38414][CORE][DSTREAM][EXAMPLES][ML][MLLIB][SQL] Remove 
redundant `@SuppressWarnings `
ddc1803 is described below

commit ddc18038ca8be82e801d2554043ae06dafc3f31f
Author: yangjie01 
AuthorDate: Mon Mar 7 10:55:57 2022 -0800

[SPARK-38414][CORE][DSTREAM][EXAMPLES][ML][MLLIB][SQL] Remove redundant 
`@SuppressWarnings `

### What changes were proposed in this pull request?
This pr remove redundant `SuppressWarnings` in Spark Java code, all case 
inspected by IDE (IntelliJ)

### Why are the changes needed?
Remove redundant `SuppressWarnings `

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
Pass GA

Closes #35732 from LuciferYang/cleanup-redundant-suppression.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../shuffle/RetryingBlockTransferorSuite.java  |  1 -
 .../shuffle/sort/UnsafeShuffleWriterSuite.java |  1 -
 .../java/test/org/apache/spark/JavaAPISuite.java   | 20 -
 .../mllib/JavaPowerIterationClusteringExample.java |  1 -
 .../mllib/JavaStratifiedSamplingExample.java   |  1 -
 .../streaming/JavaStatefulNetworkWordCount.java|  1 -
 .../JavaLogisticRegressionSuite.java   |  1 -
 .../JavaStreamingLogisticRegressionSuite.java  |  1 -
 .../mllib/clustering/JavaStreamingKMeansSuite.java |  1 -
 .../mllib/evaluation/JavaRankingMetricsSuite.java  |  1 -
 .../apache/spark/mllib/feature/JavaTfIdfSuite.java |  2 -
 .../spark/mllib/feature/JavaWord2VecSuite.java |  1 -
 .../spark/mllib/fpm/JavaAssociationRulesSuite.java |  1 -
 .../apache/spark/mllib/fpm/JavaFPGrowthSuite.java  |  2 -
 .../spark/mllib/linalg/JavaVectorsSuite.java   |  1 -
 .../spark/mllib/random/JavaRandomRDDsSuite.java|  7 
 .../JavaStreamingLinearRegressionSuite.java|  1 -
 .../test/org/apache/spark/sql/JavaRowSuite.java|  1 -
 .../test/org/apache/spark/sql/JavaUDAFSuite.java   |  1 -
 .../test/org/apache/spark/sql/JavaUDFSuite.java|  8 
 .../org/apache/spark/streaming/JavaAPISuite.java   | 49 --
 21 files changed, 103 deletions(-)

diff --git 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
index 1b44b06..985a7a3 100644
--- 
a/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
+++ 
b/common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java
@@ -240,7 +240,6 @@ public class RetryingBlockTransferorSuite {
* retries -- the first interaction may include an IOException, which causes 
a retry of some
* subset of the original blocks in a second interaction.
*/
-  @SuppressWarnings("unchecked")
   private static void performInteractions(List> 
interactions,
   BlockFetchingListener listener)
 throws IOException, InterruptedException {
diff --git 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
index cd25f32..f4e09b7 100644
--- 
a/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
+++ 
b/core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java
@@ -89,7 +89,6 @@ public class UnsafeShuffleWriterSuite implements 
ShuffleChecksumTestHelper {
   }
 
   @Before
-  @SuppressWarnings("unchecked")
   public void setUp() throws Exception {
 MockitoAnnotations.openMocks(this).close();
 tempDir = Utils.createTempDir(null, "test");
diff --git a/core/src/test/java/test/org/apache/spark/JavaAPISuite.java 
b/core/src/test/java/test/org/apache/spark/JavaAPISuite.java
index 3796d3b..fd91237 100644
--- a/core/src/test/java/test/org/apache/spark/JavaAPISuite.java
+++ b/core/src/test/java/test/org/apache/spark/JavaAPISuite.java
@@ -130,7 +130,6 @@ public class JavaAPISuite implements Serializable {
 assertEquals(4, pUnion.count());
   }
 
-  @SuppressWarnings("unchecked")
   @Test
   public void intersection() {
 List ints1 = Arrays.asList(1, 10, 2, 3, 4, 5);
@@ -216,7 +215,6 @@ public class JavaAPISuite implements Serializable {
 assertEquals(new Tuple2<>(3, 2), sortedPairs.get(2));
   }
 
-  @SuppressWarnings("unchecked")
   @Test
   public void repartitionAndSortWithinPartitions() {
 List> pairs = new ArrayList<>();
@@ -356,7 +354,6 @@ public class JavaAPISuite implements Serializable {
 assertEquals(correctIndexes, i

[spark] branch master updated: [SPARK-38269][CORE][SQL][SS][ML][MLLIB][MESOS][YARN][K8S][EXAMPLES] Clean up redundant type cast

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 226bdec  
[SPARK-38269][CORE][SQL][SS][ML][MLLIB][MESOS][YARN][K8S][EXAMPLES] Clean up 
redundant type cast
226bdec is described below

commit 226bdec8d99c51a58018f0bd085a51f1907c1e1a
Author: yangjie01 
AuthorDate: Wed Mar 2 12:02:27 2022 -0800

[SPARK-38269][CORE][SQL][SS][ML][MLLIB][MESOS][YARN][K8S][EXAMPLES] Clean 
up redundant type cast

### What changes were proposed in this pull request?
This pr aims to clean up redundant type cast in Spark code.

### Why are the changes needed?
Code simplification

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GA
- Manually build a client, check 
`org.apache.spark.examples.DriverSubmissionTest` and 
`org.apache.spark.examples.mllib.LDAExample` passed

Closes #35592 from LuciferYang/redundant-cast.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
---
 .../main/java/org/apache/spark/util/kvstore/ArrayWrappers.java |  2 +-
 .../src/main/java/org/apache/spark/unsafe/types/ByteArray.java |  2 +-
 .../main/java/org/apache/spark/unsafe/types/UTF8String.java|  4 ++--
 .../main/java/org/apache/spark/io/ReadAheadInputStream.java|  2 +-
 .../org/apache/spark/deploy/history/FsHistoryProvider.scala|  6 +++---
 .../scala/org/apache/spark/deploy/master/ui/MasterPage.scala   |  8 
 .../main/scala/org/apache/spark/metrics/MetricsConfig.scala|  6 +++---
 .../scala/org/apache/spark/rdd/ReliableRDDCheckpointData.scala |  2 +-
 .../main/scala/org/apache/spark/resource/ResourceProfile.scala |  2 +-
 core/src/main/scala/org/apache/spark/ui/GraphUIData.scala  |  2 +-
 .../main/scala/org/apache/spark/ui/storage/StoragePage.scala   |  2 +-
 core/src/main/scala/org/apache/spark/util/JsonProtocol.scala   |  2 +-
 core/src/main/scala/org/apache/spark/util/SizeEstimator.scala  |  4 ++--
 .../scala/org/apache/spark/examples/DriverSubmissionTest.scala |  2 +-
 .../scala/org/apache/spark/examples/mllib/LDAExample.scala |  2 +-
 .../org/apache/spark/sql/kafka010/KafkaSourceProvider.scala|  2 +-
 .../apache/spark/ml/classification/LogisticRegression.scala|  2 +-
 .../scala/org/apache/spark/ml/classification/NaiveBayes.scala  |  2 +-
 mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala|  2 +-
 .../spark/ml/regression/GeneralizedLinearRegression.scala  |  2 +-
 .../scala/org/apache/spark/ml/tree/impl/RandomForest.scala |  4 ++--
 mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala |  8 
 .../scala/org/apache/spark/mllib/clustering/LDAModel.scala |  4 ++--
 .../org/apache/spark/mllib/evaluation/MulticlassMetrics.scala  | 10 +-
 .../spark/mllib/linalg/distributed/IndexedRowMatrix.scala  |  4 ++--
 .../cluster/k8s/KubernetesClusterSchedulerBackend.scala|  4 ++--
 .../cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala |  2 +-
 .../org/apache/spark/deploy/yarn/ResourceRequestHelper.scala   |  4 ++--
 .../main/scala/org/apache/spark/deploy/yarn/YarnRMClient.scala |  2 +-
 .../spark/sql/catalyst/rules/QueryExecutionMetering.scala  |  2 +-
 .../datasources/parquet/VectorizedParquetRecordReader.java |  2 +-
 .../spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala   |  2 +-
 .../spark/sql/execution/datasources/jdbc/JDBCOptions.scala |  2 +-
 .../spark/sql/execution/ui/StreamingQueryStatusStore.scala |  2 +-
 .../org/apache/spark/sql/streaming/StreamingQueryStatus.scala  |  2 +-
 .../spark/sql/streaming/ui/StreamingQueryStatusListener.scala  |  2 +-
 .../sql/hive/thriftserver/ui/HiveThriftServer2Listener.scala   |  4 ++--
 .../spark/sql/hive/thriftserver/ui/ThriftServerPage.scala  |  2 +-
 .../main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala |  2 +-
 39 files changed, 61 insertions(+), 61 deletions(-)

diff --git 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java
index 825355e..6f94873 100644
--- 
a/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java
+++ 
b/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java
@@ -200,7 +200,7 @@ class ArrayWrappers {
 public int compareTo(ComparableObjectArray other) {
   int len = Math.min(array.length, other.array.length);
   for (int i = 0; i < len; i++) {
-int diff = ((Comparable) 
array[i]).compareTo((Comparable) other.array[i]);
+int diff = ((Comparable) array[i]).compareTo(other.array[i]);
 if (diff != 0) {
   return diff;
 }
diff --git 
a/common/unsafe/src/main/java/org/apache/spark/unsafe/types/ByteArray.java 
b/common/unsafe/src/m

[spark] branch branch-3.1 updated: [MINOR][SQL][DOCS] Add more examples to sql-ref-syntax-ddl-create-table-datasource

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 6ec3045  [MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource
6ec3045 is described below

commit 6ec30452b87f39b1a22ddf0edb7f83ec94cc906c
Author: Yuming Wang 
AuthorDate: Wed Mar 2 11:57:38 2022 -0800

[MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource

### What changes were proposed in this pull request?

Add more examples to sql-ref-syntax-ddl-create-table-datasource:
1. Create partitioned and bucketed table through CTAS.
2. Create bucketed table through CTAS and CTE

### Why are the changes needed?

Improve doc.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #35712 from wangyum/sql-ref-syntax-ddl-create-table-datasource.

Authored-by: Yuming Wang 
Signed-off-by: huaxingao 
(cherry picked from commit 829d7fb045e47f1ddd43f2645949ea8257ca330d)
Signed-off-by: huaxingao 
---
 docs/sql-ref-syntax-ddl-create-table-datasource.md | 17 +
 1 file changed, 17 insertions(+)

diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md 
b/docs/sql-ref-syntax-ddl-create-table-datasource.md
index ba0516a..9fa5dcb 100644
--- a/docs/sql-ref-syntax-ddl-create-table-datasource.md
+++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md
@@ -132,6 +132,23 @@ CREATE TABLE student (id INT, name STRING, age INT)
 USING CSV
 PARTITIONED BY (age)
 CLUSTERED BY (Id) INTO 4 buckets;
+
+--Create partitioned and bucketed table through CTAS
+CREATE TABLE student_partition_bucket
+USING parquet
+PARTITIONED BY (age)
+CLUSTERED BY (id) INTO 4 buckets
+AS SELECT * FROM student;
+
+--Create bucketed table through CTAS and CTE
+CREATE TABLE student_bucket
+USING parquet
+CLUSTERED BY (id) INTO 4 buckets (
+WITH tmpTable AS (
+SELECT * FROM student WHERE id > 100
+)
+SELECT * FROM tmpTable
+);
 ```
 
 ### Related Statements

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [MINOR][SQL][DOCS] Add more examples to sql-ref-syntax-ddl-create-table-datasource

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new c0e4e73  [MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource
c0e4e73 is described below

commit c0e4e73c38cc7dfa1f06ea5ae59b1ce9fcab14ad
Author: Yuming Wang 
AuthorDate: Wed Mar 2 11:57:38 2022 -0800

[MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource

### What changes were proposed in this pull request?

Add more examples to sql-ref-syntax-ddl-create-table-datasource:
1. Create partitioned and bucketed table through CTAS.
2. Create bucketed table through CTAS and CTE

### Why are the changes needed?

Improve doc.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #35712 from wangyum/sql-ref-syntax-ddl-create-table-datasource.

Authored-by: Yuming Wang 
Signed-off-by: huaxingao 
(cherry picked from commit 829d7fb045e47f1ddd43f2645949ea8257ca330d)
Signed-off-by: huaxingao 
---
 docs/sql-ref-syntax-ddl-create-table-datasource.md | 17 +
 1 file changed, 17 insertions(+)

diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md 
b/docs/sql-ref-syntax-ddl-create-table-datasource.md
index ba0516a..9fa5dcb 100644
--- a/docs/sql-ref-syntax-ddl-create-table-datasource.md
+++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md
@@ -132,6 +132,23 @@ CREATE TABLE student (id INT, name STRING, age INT)
 USING CSV
 PARTITIONED BY (age)
 CLUSTERED BY (Id) INTO 4 buckets;
+
+--Create partitioned and bucketed table through CTAS
+CREATE TABLE student_partition_bucket
+USING parquet
+PARTITIONED BY (age)
+CLUSTERED BY (id) INTO 4 buckets
+AS SELECT * FROM student;
+
+--Create bucketed table through CTAS and CTE
+CREATE TABLE student_bucket
+USING parquet
+CLUSTERED BY (id) INTO 4 buckets (
+WITH tmpTable AS (
+SELECT * FROM student WHERE id > 100
+)
+SELECT * FROM tmpTable
+);
 ```
 
 ### Related Statements

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [MINOR][SQL][DOCS] Add more examples to sql-ref-syntax-ddl-create-table-datasource

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 829d7fb  [MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource
829d7fb is described below

commit 829d7fb045e47f1ddd43f2645949ea8257ca330d
Author: Yuming Wang 
AuthorDate: Wed Mar 2 11:57:38 2022 -0800

[MINOR][SQL][DOCS] Add more examples to 
sql-ref-syntax-ddl-create-table-datasource

### What changes were proposed in this pull request?

Add more examples to sql-ref-syntax-ddl-create-table-datasource:
1. Create partitioned and bucketed table through CTAS.
2. Create bucketed table through CTAS and CTE

### Why are the changes needed?

Improve doc.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual test.

Closes #35712 from wangyum/sql-ref-syntax-ddl-create-table-datasource.

Authored-by: Yuming Wang 
Signed-off-by: huaxingao 
---
 docs/sql-ref-syntax-ddl-create-table-datasource.md | 17 +
 1 file changed, 17 insertions(+)

diff --git a/docs/sql-ref-syntax-ddl-create-table-datasource.md 
b/docs/sql-ref-syntax-ddl-create-table-datasource.md
index ba0516a..9fa5dcb 100644
--- a/docs/sql-ref-syntax-ddl-create-table-datasource.md
+++ b/docs/sql-ref-syntax-ddl-create-table-datasource.md
@@ -132,6 +132,23 @@ CREATE TABLE student (id INT, name STRING, age INT)
 USING CSV
 PARTITIONED BY (age)
 CLUSTERED BY (Id) INTO 4 buckets;
+
+--Create partitioned and bucketed table through CTAS
+CREATE TABLE student_partition_bucket
+USING parquet
+PARTITIONED BY (age)
+CLUSTERED BY (id) INTO 4 buckets
+AS SELECT * FROM student;
+
+--Create bucketed table through CTAS and CTE
+CREATE TABLE student_bucket
+USING parquet
+CLUSTERED BY (id) INTO 4 buckets (
+WITH tmpTable AS (
+SELECT * FROM student WHERE id > 100
+)
+SELECT * FROM tmpTable
+);
 ```
 
 ### Related Statements

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 357d3b2  [SPARK-36553][ML] KMeans avoid compute auxiliary statistics 
for large K
357d3b2 is described below

commit 357d3b24173405cdf915be60f2cebe442fa31536
Author: Ruifeng Zheng 
AuthorDate: Wed Mar 2 11:51:06 2022 -0800

[SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

### What changes were proposed in this pull request?

SPARK-31007 introduce an auxiliary statistics to speed up computation in 
KMeasn.

However, it needs a array of size `k * (k + 1) / 2`, which may cause 
overflow or OOM when k is too large.

So we should skip this optimization in this case.

### Why are the changes needed?

avoid overflow or OOM when k is too large (like 50,000)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #35457 from zhengruifeng/kmean_k_limit.

Authored-by: Ruifeng Zheng 
Signed-off-by: huaxingao 
(cherry picked from commit ad5427ebe644fc01a9b4c19a48f902f584245edf)
Signed-off-by: huaxingao 
---
 .../spark/mllib/clustering/DistanceMeasure.scala   | 23 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 15 ++
 .../spark/mllib/clustering/KMeansModel.scala   | 11 +--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
index 9ac473a..e4c29a7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
@@ -118,6 +118,24 @@ private[spark] abstract class DistanceMeasure extends 
Serializable {
   }
 
   /**
+   * @param centers the clustering centers
+   * @param statistics optional statistics to accelerate the computation, 
which should not
+   *   change the result.
+   * @param point given point
+   * @return the index of the closest center to the given point, as well as 
the cost.
+   */
+  def findClosest(
+  centers: Array[VectorWithNorm],
+  statistics: Option[Array[Double]],
+  point: VectorWithNorm): (Int, Double) = {
+if (statistics.nonEmpty) {
+  findClosest(centers, statistics.get, point)
+} else {
+  findClosest(centers, point)
+}
+  }
+
+  /**
* @return the index of the closest center to the given point, as well as 
the cost.
*/
   def findClosest(
@@ -253,6 +271,11 @@ object DistanceMeasure {
   case _ => false
 }
   }
+
+  private[clustering] def shouldComputeStatistics(k: Int): Boolean = k < 1000
+
+  private[clustering] def shouldComputeStatisticsLocally(k: Int, numFeatures: 
Int): Boolean =
+k.toLong * k * numFeatures < 100
 }
 
 private[spark] class EuclideanDistanceMeasure extends DistanceMeasure {
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
index 76e2928..c140b1b 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
@@ -269,15 +269,22 @@ class KMeans private (
 
 instr.foreach(_.logNumFeatures(numFeatures))
 
-val shouldDistributed = centers.length * centers.length * 
numFeatures.toLong > 100L
+val shouldComputeStats =
+  DistanceMeasure.shouldComputeStatistics(centers.length)
+val shouldComputeStatsLocally =
+  DistanceMeasure.shouldComputeStatisticsLocally(centers.length, 
numFeatures)
 
 // Execute iterations of Lloyd's algorithm until converged
 while (iteration < maxIterations && !converged) {
   val bcCenters = sc.broadcast(centers)
-  val stats = if (shouldDistributed) {
-distanceMeasureInstance.computeStatisticsDistributedly(sc, bcCenters)
+  val stats = if (shouldComputeStats) {
+if (shouldComputeStatsLocally) {
+  Some(distanceMeasureInstance.computeStatistics(centers))
+} else {
+  Some(distanceMeasureInstance.computeStatisticsDistributedly(sc, 
bcCenters))
+}
   } else {
-distanceMeasureInstance.computeStatistics(centers)
+None
   }
   val bcStats = sc.broadcast(stats)
 
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala
index a24493b..64b3521 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala
+++ b/mllib/src/main/scala/org/apache

[spark] branch branch-3.2 updated: [SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new d5e90cf  [SPARK-36553][ML] KMeans avoid compute auxiliary statistics 
for large K
d5e90cf is described below

commit d5e90cf5ecf287eb53234e25e3a4cc37794360f2
Author: Ruifeng Zheng 
AuthorDate: Wed Mar 2 11:51:06 2022 -0800

[SPARK-36553][ML] KMeans avoid compute auxiliary statistics for large K

### What changes were proposed in this pull request?

SPARK-31007 introduce an auxiliary statistics to speed up computation in 
KMeasn.

However, it needs a array of size `k * (k + 1) / 2`, which may cause 
overflow or OOM when k is too large.

So we should skip this optimization in this case.

### Why are the changes needed?

avoid overflow or OOM when k is too large (like 50,000)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #35457 from zhengruifeng/kmean_k_limit.

Authored-by: Ruifeng Zheng 
Signed-off-by: huaxingao 
(cherry picked from commit ad5427ebe644fc01a9b4c19a48f902f584245edf)
Signed-off-by: huaxingao 
---
 .../spark/mllib/clustering/DistanceMeasure.scala   | 23 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 15 ++
 .../spark/mllib/clustering/KMeansModel.scala   | 11 +--
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
index 9ac473a..e4c29a7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
@@ -118,6 +118,24 @@ private[spark] abstract class DistanceMeasure extends 
Serializable {
   }
 
   /**
+   * @param centers the clustering centers
+   * @param statistics optional statistics to accelerate the computation, 
which should not
+   *   change the result.
+   * @param point given point
+   * @return the index of the closest center to the given point, as well as 
the cost.
+   */
+  def findClosest(
+  centers: Array[VectorWithNorm],
+  statistics: Option[Array[Double]],
+  point: VectorWithNorm): (Int, Double) = {
+if (statistics.nonEmpty) {
+  findClosest(centers, statistics.get, point)
+} else {
+  findClosest(centers, point)
+}
+  }
+
+  /**
* @return the index of the closest center to the given point, as well as 
the cost.
*/
   def findClosest(
@@ -253,6 +271,11 @@ object DistanceMeasure {
   case _ => false
 }
   }
+
+  private[clustering] def shouldComputeStatistics(k: Int): Boolean = k < 1000
+
+  private[clustering] def shouldComputeStatisticsLocally(k: Int, numFeatures: 
Int): Boolean =
+k.toLong * k * numFeatures < 100
 }
 
 private[spark] class EuclideanDistanceMeasure extends DistanceMeasure {
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
index 76e2928..c140b1b 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
@@ -269,15 +269,22 @@ class KMeans private (
 
 instr.foreach(_.logNumFeatures(numFeatures))
 
-val shouldDistributed = centers.length * centers.length * 
numFeatures.toLong > 100L
+val shouldComputeStats =
+  DistanceMeasure.shouldComputeStatistics(centers.length)
+val shouldComputeStatsLocally =
+  DistanceMeasure.shouldComputeStatisticsLocally(centers.length, 
numFeatures)
 
 // Execute iterations of Lloyd's algorithm until converged
 while (iteration < maxIterations && !converged) {
   val bcCenters = sc.broadcast(centers)
-  val stats = if (shouldDistributed) {
-distanceMeasureInstance.computeStatisticsDistributedly(sc, bcCenters)
+  val stats = if (shouldComputeStats) {
+if (shouldComputeStatsLocally) {
+  Some(distanceMeasureInstance.computeStatistics(centers))
+} else {
+  Some(distanceMeasureInstance.computeStatisticsDistributedly(sc, 
bcCenters))
+}
   } else {
-distanceMeasureInstance.computeStatistics(centers)
+None
   }
   val bcStats = sc.broadcast(stats)
 
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala
index a24493b..64b3521 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala
+++ b/mllib/src/main/scala/org/apache

[spark] branch master updated (4d4c044 -> ad5427e)

2022-03-02 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4d4c044  [SPARK-38392][K8S][TESTS] Add `spark-` prefix to namespaces 
and `-driver` suffix to drivers during IT
 add ad5427e  [SPARK-36553][ML] KMeans avoid compute auxiliary statistics 
for large K

No new revisions were added by this update.

Summary of changes:
 .../spark/mllib/clustering/DistanceMeasure.scala   | 23 ++
 .../org/apache/spark/mllib/clustering/KMeans.scala | 15 ++
 .../spark/mllib/clustering/KMeansModel.scala   | 11 +--
 3 files changed, 43 insertions(+), 6 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3aa0cd4 -> 89464bf)

2022-02-26 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3aa0cd4  [SPARK-38302][K8S][TESTS] Use `Java 17` in K8S IT in case of 
`spark-tgz` option
 add 89464bf  [SPARK-36488][SQL][FOLLOWUP] Simplify the implementation of 
ResolveReferences#extractStar

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (fb543a7 -> 4357643)

2022-02-23 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from fb543a7  [SPARK-38306][SQL] Fix ExplainSuite,StatisticsCollectionSuite 
and StringFunctionsSuite under ANSI mode
 add 4357643  [SPARK-37923][SQL][FOLLOWUP] Rename 
MultipleBucketTransformsError in QueryExecutionErrors to 
multipleBucketTransformsError

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala | 2 +-
 .../main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-38100][SQL] Remove unused private method in `Decimal`

2022-02-03 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 4cd3fd3  [SPARK-38100][SQL] Remove unused private method in `Decimal`
4cd3fd3 is described below

commit 4cd3fd3152949a57939a850ec18d9eb4bc31fdf7
Author: yangjie01 
AuthorDate: Thu Feb 3 13:38:10 2022 -0800

[SPARK-38100][SQL] Remove unused private method in `Decimal`

### What changes were proposed in this pull request?
There is an unused `private` method `overflowException` in 
`org.apache.spark.sql.types.Decimal`, this method add by SPARK-28741 and  the 
relevant invocations are replaced by 
`QueryExecutionErrors.castingCauseOverflowError` directly after SPARK-35060. So 
this pr remove this unused method.

### Why are the changes needed?
Remove unused method.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA

Closes #35392 from LuciferYang/SPARK-38100.

Authored-by: yangjie01 
Signed-off-by: huaxingao 
(cherry picked from commit 7a613ecb826d3009f4587cefeb89f31b1cb4bed2)
Signed-off-by: huaxingao 
---
 sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala | 3 ---
 1 file changed, 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index cb468c5..4681429 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -251,9 +251,6 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
 
   def toByte: Byte = toLong.toByte
 
-  private def overflowException(dataType: String) =
-throw QueryExecutionErrors.castingCauseOverflowError(this, dataType)
-
   /**
* @return the Byte value that is equal to the rounded decimal.
* @throws ArithmeticException if the decimal is too big to fit in Byte type.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (4fcfcc8 -> 7a613ec)

2022-02-03 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4fcfcc8  [SPARK-38096][BUILD] Update sbt to 1.6.2
 add 7a613ec  [SPARK-38100][SQL] Remove unused private method in `Decimal`

No new revisions were added by this update.

Summary of changes:
 sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala | 3 ---
 1 file changed, 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-site updated: fix display problem

2022-01-28 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 87d73df  fix display problem
87d73df is described below

commit 87d73df22b2626ae8bd43fc30b8478ff7ac981f8
Author: huaxingao 
AuthorDate: Fri Jan 28 15:00:28 2022 -0800

fix display problem


I think this `|` inside ` Support (IGNORE | RESPECT) ...` caused display 
problem. The dot looks strange and the `|` in between of `IGNORE` and `RESPECT` 
is gone.
```
- [[SPARK-30789]](https://issues.apache.org/jira/browse/SPARK-30789): 
Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
```
https://user-images.githubusercontent.com/13592258/151631202-b3219f5e-9c88-4cb9-a60d-bd2de2536f1d.png;>

I will also fix the spaces in this file. All the other release md files 
have two spaces before each of the line for the jira list, but my intelliJ 
setting is not right, all the spaces are gone so I will manually add these 
spaces back.

Author: huaxingao 

Closes #377 from huaxingao/fixspace.
---
 releases/_posts/2022-01-26-spark-release-3-2-1.md | 38 +++
 site/releases/spark-release-3-2-1.html| 11 +--
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/releases/_posts/2022-01-26-spark-release-3-2-1.md 
b/releases/_posts/2022-01-26-spark-release-3-2-1.md
index 7ef8375..d999d0c 100644
--- a/releases/_posts/2022-01-26-spark-release-3-2-1.md
+++ b/releases/_posts/2022-01-26-spark-release-3-2-1.md
@@ -15,30 +15,30 @@ Spark 3.2.1 is a maintenance release containing stability 
fixes. This release is
 
 ### Notable changes
 
-- [[SPARK-30789]](https://issues.apache.org/jira/browse/SPARK-30789): Support 
(IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
-- [[SPARK-33277]](https://issues.apache.org/jira/browse/SPARK-33277): 
Python/Pandas UDF right after off-heap vectorized reader could cause executor 
crash.
-- [[SPARK-34399]](https://issues.apache.org/jira/browse/SPARK-34399): Add file 
commit time to metrics and shown in SQL Tab UI
-- [[SPARK-35714]](https://issues.apache.org/jira/browse/SPARK-35714): Bug fix 
for deadlock during the executor shutdown
-- [[SPARK-36754]](https://issues.apache.org/jira/browse/SPARK-36754): 
array_intersect should handle Double.NaN and Float.NaN
-- [[SPARK-37001]](https://issues.apache.org/jira/browse/SPARK-37001): Disable 
two level of map for final hash aggregation by default
-- [[SPARK-37023]](https://issues.apache.org/jira/browse/SPARK-37023): Avoid 
fetching merge status when shuffleMergeEnabled is false for a shuffleDependency 
during retry
-- [[SPARK-37088]](https://issues.apache.org/jira/browse/SPARK-37088): Python 
UDF after off-heap vectorized reader can cause crash due to use-after-free in 
writer thread
-- [[SPARK-37202]](https://issues.apache.org/jira/browse/SPARK-37202): Temp 
view didn't collect temp function that registered with catalog API
-- [[SPARK-37208]](https://issues.apache.org/jira/browse/SPARK-37208): Support 
mapping Spark gpu/fpga resource types to custom YARN resource type
-- [[SPARK-37214]](https://issues.apache.org/jira/browse/SPARK-37214): Fail 
query analysis earlier with invalid identifiers
-- [[SPARK-37392]](https://issues.apache.org/jira/browse/SPARK-37392): Fix the 
performance bug when inferring constraints for Generate
-- [[SPARK-37695]](https://issues.apache.org/jira/browse/SPARK-37695): Skip 
diagnosis ob merged blocks from push-based shuffle
-- [[SPARK-37705]](https://issues.apache.org/jira/browse/SPARK-37705): Write 
session time zone in the Parquet file metadata so that rebase can use it 
instead of JVM timezone
-- [[SPARK-37957]](https://issues.apache.org/jira/browse/SPARK-37957): 
Deterministic flag is not handled for V2 functions
+  - [[SPARK-30789]](https://issues.apache.org/jira/browse/SPARK-30789): 
Support IGNORE/RESPECT NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
+  - [[SPARK-33277]](https://issues.apache.org/jira/browse/SPARK-33277): 
Python/Pandas UDF right after off-heap vectorized reader could cause executor 
crash.
+  - [[SPARK-34399]](https://issues.apache.org/jira/browse/SPARK-34399): Add 
file commit time to metrics and shown in SQL Tab UI
+  - [[SPARK-35714]](https://issues.apache.org/jira/browse/SPARK-35714): Bug 
fix for deadlock during the executor shutdown
+  - [[SPARK-36754]](https://issues.apache.org/jira/browse/SPARK-36754): 
array_intersect should handle Double.NaN and Float.NaN
+  - [[SPARK-37001]](https://issues.apache.org/jira/browse/SPARK-37001): 
Disable two level of map for final hash aggregation by default
+  - [[SPARK-37023]](https://issues.apache.org/jira/browse/SPARK-37023): Avoid 
fetching merge status when shuffleMergeEnabled is false for a shuffleDependency 
during retry
+  - [[SPARK-37088]](ht

[spark-website] branch asf-site updated: fix version in downloads.md

2022-01-28 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 15615fc  fix version in downloads.md
15615fc is described below

commit 15615fccd4f74ff9863dec5917636a383f0ef2bf
Author: huaxingao 
AuthorDate: Fri Jan 28 11:29:24 2022 -0800

fix version in downloads.md


Fix wrong version

Author: huaxingao 

Closes #376 from huaxingao/fix.
---
 downloads.md| 2 +-
 site/downloads.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/downloads.md b/downloads.md
index 8acc73e..e5d1db3 100644
--- a/downloads.md
+++ b/downloads.md
@@ -35,7 +35,7 @@ Spark artifacts are [hosted in Maven 
Central](https://search.maven.org/search?q=
 
 groupId: org.apache.spark
 artifactId: spark-core_2.12
-version: 3.2.0
+version: 3.2.1
 
 ### Installing with PyPi
 https://pypi.org/project/pyspark/;>PySpark is now available in 
pypi. To install just run `pip install pyspark`.
diff --git a/site/downloads.html b/site/downloads.html
index 9de6966..243c32e 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -179,7 +179,7 @@ window.onload = function () {
 
 groupId: org.apache.spark
 artifactId: spark-core_2.12
-version: 3.2.0
+version: 3.2.1
 
 
 Installing with PyPi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r52278 - /dev/spark/v3.2.1-rc2-docs/

2022-01-25 Thread huaxingao

Author: huaxingao
Date: Wed Jan 26 05:52:38 2022
New Revision: 52278

Log:
Remove RC artifacts

Removed:
dev/spark/v3.2.1-rc2-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.2.1 created (now 4f25b3f)

2022-01-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to tag v3.2.1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 4f25b3f  (commit)
No new revisions were added by this update.

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-30062][SQL] Add the IMMEDIATE statement to the DB2 dialect truncate implementation

2022-01-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new cd7a3c2  [SPARK-30062][SQL] Add the IMMEDIATE statement to the DB2 
dialect truncate implementation
cd7a3c2 is described below

commit cd7a3c2e667600c722c86b3914d487394f711916
Author: Ivan Karol 
AuthorDate: Tue Jan 25 19:14:24 2022 -0800

[SPARK-30062][SQL] Add the IMMEDIATE statement to the DB2 dialect truncate 
implementation

### What changes were proposed in this pull request?
I've added a DB2 specific truncate implementation that adds an IMMEDIATE 
statement at the end of the query.

### Why are the changes needed?
I've encountered this issue myself while working with DB2 and trying to use 
truncate functionality.
A quick google search shows that some people have also encountered this 
issue before:

https://stackoverflow.com/questions/70027567/overwrite-mode-does-not-work-in-spark-sql-while-adding-data-in-db2
https://issues.apache.org/jira/browse/SPARK-30062

By looking into DB2 docs it becomes apparent that the IMMEDIATE statement 
is only optional if the table is column organized(though I'm not sure if it 
applies to all DB2 versions). So for the cases(such as mine) where the table is 
not column organized adding an IMMEDIATE statement becomes essential for the 
query to work.

https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0053474.html

Also, that might not be the best example, but I've found that DbVisualizer 
does add an IMMEDIATE statement at the end of the truncate command. Though, 
does it only for versions that are >=9.7
https://fossies.org/linux/dbvis/resources/profiles/db2.xml (please look at 
line number 473)

### Does this PR introduce _any_ user-facing change?
It should not, as even though the docs mention that if the TRUNCATE 
statement is executed in conjunction with IMMEDIATE, it has to be the first 
statement in the transaction, the JDBC connection that is established to 
execute the TRUNCATE statement has the auto-commit mode turned on. This means 
that there won't be any other query/statement executed prior within the same 
transaction.
https://www.ibm.com/docs/en/db2/11.5?topic=statements-truncate (see the 
description for IMMEDIATE)

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala#L49

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala#L57

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L108

### How was this patch tested?
Existing test case with slightly adjusted logic.

Closes #35283 from ikarol/SPARK-30062.

Authored-by: Ivan Karol 
Signed-off-by: huaxingao 
(cherry picked from commit 7e5c3b216431b6a5e9a0786bf7cded694228cdee)
Signed-off-by: huaxingao 
---
 .../apache/spark/sql/jdbc/DB2IntegrationSuite.scala | 21 -
 .../org/apache/spark/sql/jdbc/DB2Dialect.scala  |  9 +
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala |  8 ++--
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
index 77d7254..fd4f2aa 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
@@ -23,7 +23,7 @@ import java.util.Properties
 
 import org.scalatest.time.SpanSugar._
 
-import org.apache.spark.sql.Row
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.types.{BooleanType, ByteType, ShortType, 
StructType}
 import org.apache.spark.tags.DockerTest
@@ -198,4 +198,23 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
""".stripMargin.replaceAll("\n", " "))
 assert(sql("select x, y from queryOption").collect.toSet == expectedResult)
   }
+
+  test("SPARK-30062") {
+val expectedResult = Set(
+  (42, "fred"),
+  (17, "dave")
+).map { case (x, y) =>
+  Row(Integer.valueOf(x), String.valueOf(y))
+}
+val df = sqlContext.read.jdbc(jdbcUrl, "tbl", new Properties)
+for (_ <- 0 to 2) {
+  df.write.mode(SaveMode.Append).jdbc(jdbcUrl, "tblcopy",

[spark] branch master updated (94df0d5 -> 7e5c3b2)

2022-01-25 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 94df0d5  [SPARK-38028][SQL] Expose Arrow Vector from ArrowColumnVector
 add 7e5c3b2  [SPARK-30062][SQL] Add the IMMEDIATE statement to the DB2 
dialect truncate implementation

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/jdbc/DB2IntegrationSuite.scala | 21 -
 .../org/apache/spark/sql/jdbc/DB2Dialect.scala  |  9 +
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala |  8 ++--
 3 files changed, 35 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r52181 - in /dev/spark/v3.2.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2022-01-20 Thread huaxingao

Author: huaxingao
Date: Thu Jan 20 22:50:09 2022
New Revision: 52181

Log:
Apache Spark v3.2.1-rc2 docs


[This commit notification would consist of 2355 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r52178 - /dev/spark/v3.2.1-rc2-bin/

2022-01-20 Thread huaxingao

Author: huaxingao
Date: Thu Jan 20 21:37:10 2022
New Revision: 52178

Log:
Apache Spark v3.2.1-rc2

Added:
dev/spark/v3.2.1-rc2-bin/
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz.sha512

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc Thu Jan 20 21:37:10 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHpwcEACgkQrAHm6ROf
+YQwx5RAAjRhC2HpxkJJt+e20pwSg1OIGAIYBswDGFXyzHFNFazxTgpVXSfUU2zzp
+kmAeSiNW2LN0fo89X6ajLsaoO11QFvGwidH6xq3euTZIrBSe/cUlEt5AH8hEcYnR
+GpIe6q9TGewc380H8oXrP+ldgPeoy0TFTh4yWvtAaQR29TGXbvCmqh3EOShHsC0Y
+IOZZc5tKAIuKNa5NOe43gIBLfDPlDYg/O169pjoc8vfn16GsCSUDacDt1ByCVO/T
+sGvN7sjvSdaQ1lM4mqq3hBJRzj4mgLl+pgGQsAwXHDEa9zJvu19/sjW7OXNgy/aJ
+0NxLAxTkMhOIlEAuidAIGdJoqBxVOzz5aF2GMS+xUqUHrXHrupPECwQ6LuAVV++p
+9BZD9dXIwwZ7JlBefLC8BUtPzeJqJWA0+IAp3uaq5ezzAKfjyeOmGF52wvSIEEf6
+PdKyGh1QQ6BRwUEWwiucQNvqZ+8uN4AHY1nO0MVj0h3x5FeNCv2k/CILmNfxrv2x
+g+cw92Z2pPCiTPKJU23OsYhlG4hSmt1L9eH5xgpVySNPYtcUX3sSCyL/u04K6Pid
+Oq269oS4O27WbjF7MWE6be766yMNm8MlsMddvOvoxPodOyrZEYmgBfb0GxK154mv
+HfZ39o8vgDVCHpxSy70p943MPp7l2c1X6CusC3g7BgL4bZ+uQFE=
+=7mqr
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512
==
--- dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512 (added)
+++ dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512 Thu Jan 20 21:37:10 2022
@@ -0,0 +1,3 @@
+SparkR_3.2.1.tar.gz: 1DB089C3 E111728C 99CE62A8 A58423CD BF81A427 93521BC7
+ F013AA63 9AC88FB0 C43D2AAA 33B3E619 3559B298 0BE10BC2
+ 7AF509EE DF543023 101B8CCF 05611A2E

Added: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc Thu Jan 20 21:37:10 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHpwcUACgkQrAHm6ROf
+YQxMwBAA02aZypS+tl6TeF7xMmQr4FFpAXdBc/9/WnBj0ufEv3VQnCKBllejDKY8
+IPLrMwwqpuw9vaRzLWOv1GlI3wExyBRrUcW3G5afLZ9RjKixqYhXdUaPnGIBu7uy
+93Q4H2LTYfwKUlQ26UvP9WjiPHqK1ym9NhPbJcgiftwjcN+DReezrRbeWi1mWlku
+oA3KYI0h7Rz6S/V2+JVGPorJfJtpwku3nvBihLIAtj+UBdhHLnIuHsZjb/4OMLJj
+oRchZjbxahq3jqIjG+WigMcq4hKhfoojP7/QV37SRMoqNnSwcQsYUyG+96NygOHx
+V3t4RlpnetfM5KF7FEbdbOOVCWMGzvtp8OZ8JFN4f25EU7VucgQ0EZle7FanKLL7
+D0bnjbG5SEarM9FLGQfgSe4ZAJHqzGzWWoyaVIT2Wr5gF1Xe+T6gszFscTfiMuhF
+p3GqqApz7ug2gMhU7jDeC7RjD8U+vgy4uAJ/Hjtqa7x92q0jI4mpDRxa2OxqUyXq
+m4NbK6l9zCk9qNC0OU74ooPJPuO4Mlvsn7hImDm/0KMyrKIEy5dqqGM55iWSlZLL
+jSoczDlJhwei3UwsQHzSD9/cY5GQ+w+mPv++BusUr/IEUnOGNCkoRutCHSkep2SC
+LC+K7PNMQdb0v+BYqe00dpAGuR8P/l347YycTdwO7TMVYEJCtmA=
+=lyY9
+-END PGP SIGNATURE

svn commit: r52174 - in /dev/spark: v3.2.1-rc2-bin/ v3.2.1-rc2-docs/

2022-01-20 Thread huaxingao

Author: huaxingao
Date: Thu Jan 20 17:36:50 2022
New Revision: 52174

Log:
Removing v3.2.1-rc2 artifacts.

Removed:
dev/spark/v3.2.1-rc2-bin/
dev/spark/v3.2.1-rc2-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (3860ac5 -> 025b885)

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3860ac5  [SPARK-37957][SQL] Correctly pass deterministic flag for V2 
scalar functions
 add 4f25b3f  Preparing Spark release v3.2.1-rc2
 new 025b885  Preparing development version 3.2.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing development version 3.2.2-SNAPSHOT

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 025b8852372d8aefec0360c8af5310b6d60d7dda
Author: Huaxin Gao 
AuthorDate: Thu Jan 20 05:03:17 2022 +

Preparing development version 3.2.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..5590c86 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a852011..9584884 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 11cf0cb..167e69f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9957a77..eaf1c1e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b3ea287..811e503 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8fb7d4e..23513f6 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7e4c6c3..c5c6161 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags

[spark] tag v3.2.1-rc2 created (now 4f25b3f)

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to tag v3.2.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 4f25b3f  (commit)
This tag includes the following new commits:

 new 4f25b3f  Preparing Spark release v3.2.1-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.2.1-rc2

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to tag v3.2.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 4f25b3f71238a00508a356591553f2dfa89f8290
Author: Huaxin Gao 
AuthorDate: Thu Jan 20 05:03:12 2022 +

Preparing Spark release v3.2.1-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5590c86..2abad61 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.2
+Version: 3.2.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9584884..a852011 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 167e69f..11cf0cb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index eaf1c1e..9957a77 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 811e503..b3ea287 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 23513f6..8fb7d4e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index c5c6161..7e4c6c3 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml

[spark] branch branch-3.2 updated: [SPARK-37959][ML] Fix the UT of checking norm in KMeans & BiKMeans

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 5cf8108  [SPARK-37959][ML] Fix the UT of checking norm in KMeans & 
BiKMeans
5cf8108 is described below

commit 5cf810870073693f7ec2e1f2efe030567c973fb4
Author: Ruifeng Zheng 
AuthorDate: Wed Jan 19 09:17:25 2022 -0800

[SPARK-37959][ML] Fix the UT of checking norm in KMeans & BiKMeans

### What changes were proposed in this pull request?

In `KMeansSuite` and `BisectingKMeansSuite`, there are some unused lines:

```
model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0
```

For cosine distance, the norm of centering vector should be 1, so the norm 
checking is meaningful;

For euclidean distance, the norm checking is meaningless;

### Why are the changes needed?

to enable norm checking for cosine distance, and diable it for euclidean 
distance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
updated testsuites

Closes #35247 from zhengruifeng/fix_kmeans_ut.

Authored-by: Ruifeng Zheng 
Signed-off-by: huaxingao 
(cherry picked from commit 789fce8c8b200eba5f94c2d83b4b83e3bfb9a2b1)
Signed-off-by: huaxingao 
---
 .../apache/spark/ml/clustering/BisectingKMeansSuite.scala  | 10 +++---
 .../scala/org/apache/spark/ml/clustering/KMeansSuite.scala | 14 +++---
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
index 04b20d1..fb6110d 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
@@ -186,7 +186,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap(Vectors.dense(-100.0, 90.0)))
 
-model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 1e-6))
   }
 
   test("Comparing with and without weightCol with cosine distance") {
@@ -217,7 +217,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap1(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap1(Vectors.dense(-100.0, 90.0)))
 
-model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model1.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 
1e-6))
 
 val df2 = spark.createDataFrame(spark.sparkContext.parallelize(Seq(
   (Vectors.dense(1.0, 1.0), 2.0), (Vectors.dense(10.0, 10.0), 2.0),
@@ -244,7 +244,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap2(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap2(Vectors.dense(-100.0, 90.0)))
 
-model2.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model2.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 
1e-6))
 assert(model1.clusterCenters === model2.clusterCenters)
   }
 
@@ -284,8 +284,6 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap1(Vectors.dense(10.0, 10.0)) ==
   predictionsMap1(Vectors.dense(10.0, 4.4)))
 
-model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
-
 val df2 = spark.createDataFrame(spark.sparkContext.parallelize(Seq(
   (Vectors.dense(1.0, 1.0), 1.0), (Vectors.dense(10.0, 10.0), 2.0),
   (Vectors.dense(1.0, 0.5), 2.0), (Vectors.dense(10.0, 4.4), 3.0),
@@ -310,8 +308,6 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap2(Vectors.dense(10.0, 10.0)) ==
   predictionsMap2(Vectors.dense(10.0, 4.4)))
 
-model2.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
-
 assert(model1.clusterCenters(0) === model2.clusterCenters(0))
 assert(model1.clusterCenters(1) === model2.clusterCenters(1))
 assert(model1.clusterCenters(2) ~== model2.clusterCenters(2) absTol 1e-6)
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
index 61f4359..7d2a0b8 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
@@ -186,7 +186,7 @@ class KMeansSuite extends MLTest with DefaultReadWriteTest 
with PMMLReadWriteTes
 assert(predictionsMap(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap(Vectors.dense(-100.0, 90.0)))
 
-model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model.clusterCenters.forall

[spark] branch master updated: [SPARK-37959][ML] Fix the UT of checking norm in KMeans & BiKMeans

2022-01-19 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 789fce8  [SPARK-37959][ML] Fix the UT of checking norm in KMeans & 
BiKMeans
789fce8 is described below

commit 789fce8c8b200eba5f94c2d83b4b83e3bfb9a2b1
Author: Ruifeng Zheng 
AuthorDate: Wed Jan 19 09:17:25 2022 -0800

[SPARK-37959][ML] Fix the UT of checking norm in KMeans & BiKMeans

### What changes were proposed in this pull request?

In `KMeansSuite` and `BisectingKMeansSuite`, there are some unused lines:

```
model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0
```

For cosine distance, the norm of centering vector should be 1, so the norm 
checking is meaningful;

For euclidean distance, the norm checking is meaningless;

### Why are the changes needed?

to enable norm checking for cosine distance, and diable it for euclidean 
distance

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
updated testsuites

Closes #35247 from zhengruifeng/fix_kmeans_ut.

Authored-by: Ruifeng Zheng 
Signed-off-by: huaxingao 
---
 .../apache/spark/ml/clustering/BisectingKMeansSuite.scala  | 10 +++---
 .../scala/org/apache/spark/ml/clustering/KMeansSuite.scala | 14 +++---
 2 files changed, 6 insertions(+), 18 deletions(-)

diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
index 04b20d1..fb6110d 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala
@@ -186,7 +186,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap(Vectors.dense(-100.0, 90.0)))
 
-model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 1e-6))
   }
 
   test("Comparing with and without weightCol with cosine distance") {
@@ -217,7 +217,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap1(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap1(Vectors.dense(-100.0, 90.0)))
 
-model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model1.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 
1e-6))
 
 val df2 = spark.createDataFrame(spark.sparkContext.parallelize(Seq(
   (Vectors.dense(1.0, 1.0), 2.0), (Vectors.dense(10.0, 10.0), 2.0),
@@ -244,7 +244,7 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap2(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap2(Vectors.dense(-100.0, 90.0)))
 
-model2.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model2.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 
1e-6))
 assert(model1.clusterCenters === model2.clusterCenters)
   }
 
@@ -284,8 +284,6 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap1(Vectors.dense(10.0, 10.0)) ==
   predictionsMap1(Vectors.dense(10.0, 4.4)))
 
-model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
-
 val df2 = spark.createDataFrame(spark.sparkContext.parallelize(Seq(
   (Vectors.dense(1.0, 1.0), 1.0), (Vectors.dense(10.0, 10.0), 2.0),
   (Vectors.dense(1.0, 0.5), 2.0), (Vectors.dense(10.0, 4.4), 3.0),
@@ -310,8 +308,6 @@ class BisectingKMeansSuite extends MLTest with 
DefaultReadWriteTest {
 assert(predictionsMap2(Vectors.dense(10.0, 10.0)) ==
   predictionsMap2(Vectors.dense(10.0, 4.4)))
 
-model2.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
-
 assert(model1.clusterCenters(0) === model2.clusterCenters(0))
 assert(model1.clusterCenters(1) === model2.clusterCenters(1))
 assert(model1.clusterCenters(2) ~== model2.clusterCenters(2) absTol 1e-6)
diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala 
b/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
index 61f4359..7d2a0b8 100644
--- a/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
+++ b/mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala
@@ -186,7 +186,7 @@ class KMeansSuite extends MLTest with DefaultReadWriteTest 
with PMMLReadWriteTes
 assert(predictionsMap(Vectors.dense(-1.0, 1.0)) ==
   predictionsMap(Vectors.dense(-100.0, 90.0)))
 
-model.clusterCenters.forall(Vectors.norm(_, 2) == 1.0)
+assert(model.clusterCenters.forall(Vectors.norm(_, 2) ~== 1.0 absTol 1e-6))
   }
 
   test("KMeans with cosine distance is not support

svn commit: r52092 - in /dev/spark/v3.2.1-rc2-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2022-01-15 Thread huaxingao

Author: huaxingao
Date: Sat Jan 15 08:52:14 2022
New Revision: 52092

Log:
Apache Spark v3.2.1-rc2 docs


[This commit notification would consist of 2355 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r52091 - /dev/spark/v3.2.1-rc2-bin/

2022-01-14 Thread huaxingao

Author: huaxingao
Date: Sat Jan 15 07:23:31 2022
New Revision: 52091

Log:
Apache Spark v3.2.1-rc2

Added:
dev/spark/v3.2.1-rc2-bin/
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz   (with props)
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz.asc
dev/spark/v3.2.1-rc2-bin/spark-3.2.1.tgz.sha512

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.asc Sat Jan 15 07:23:31 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHiYk0ACgkQrAHm6ROf
+YQxwgxAAp4VhD5G8WItw3bw2/IcpBVXtCS+6l8Qx9byNmTHLPeVFK/d4jVT7eshH
+Z8hJS3iU/MOZQvYdXsIfBKP+SffQjfd2m8/g6xpFp+wD1wLDaIRiJfgsLnbJ2f86
+wbzDGrHUVWDjDguEplKaHvIFNV6HulJkv2oZdCXWfDrSOJ8LB+injGH4GlQrsgJX
+3CYsu0RjrW7InTAhryr8ylVtyt+HzJv3C1sM778WCSpeUV0Pc6DxnhWV+VaBYRVl
+BZuL3V5FwWpq5vKfvo83QP0GLdHNgzivUt4zn8JWWA/7QYVqpKbcpi+kCEJCMGcq
+YCSkMrumh38Cg++lAsOkHBDQi0E3TlGQjvWCnKL3Ag7cH8AyU9TP2GrmcqsHM87S
+gWwWvldQszw2kUDgIAIYdZTtOoFcwwRJHvUcdnYmqV8m45fTG3XAXqLx7kNkI589
+r3PWrZ0NAGTqfKVKwmHjCeKzKCEYEfJZNxKBzyHE31/ZYnRNI78o6W6dDQTqq9+J
+YeXFArYlQlkIETLewG9IOi8H682RfKSokylGpHXuBf3r7/7XFe7zrBjI2Kcgm59S
+ST0WvoTEsl/ZuQvFYoUFSdf0Voj33c9oFdpIKv3N4r4m31l1MrDva1muGpkhI9WP
+HywnuvMB6yUI3375byD81lAMM/B7Y9wsbLy4s1zyFzslq9Jwg00=
+=Aw0i
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512
==
--- dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512 (added)
+++ dev/spark/v3.2.1-rc2-bin/SparkR_3.2.1.tar.gz.sha512 Sat Jan 15 07:23:31 2022
@@ -0,0 +1,3 @@
+SparkR_3.2.1.tar.gz: 57418EC8 C2A116A7 97536A36 57FC2AF0 A319BDF4 91F62E9D
+ 9A4CB9BE C64292C2 BB284731 FC819CB9 0CF7EA7B FDC55F4B
+ 97F9E418 FE737CAF AB2984AD 3979F784

Added: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc2-bin/pyspark-3.2.1.tar.gz.asc Sat Jan 15 07:23:31 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHiYlEACgkQrAHm6ROf
+YQzhGw//ZEtDVBcPqfyEsAvdzDU1RAYtAWK8by/NIs2jccpZV5L+1uCdtOK9QT1w
+BN5RJdWEY9+gNkQrSA01sJmVuCxwVr6Z0xSedTboOHn9myv4+YoPznXrSV3WLDsK
+sJLVSgYLssCiai/5ALOP4uy1ZgFFcc2USHMytMY0FXeziz07RMYH2n+D6+67e6Hb
+ZfuvzptpONUny9Kp8Ilr+JnZiPYq7hi1It5QrSn3Zf2Z8mKUtFhqO0DUTdqFfIst
+LrlFwiuBBec6tBfN7y8vi9ZN2cMChxjGTid+8riIhztwI9MaRo7zoXEJkljZi1AC
+KtVL1GHvqq2dTYBp87MViOcoJ1RQ6bzz+83b6CPntYYh1qtSTQ6Y0avaxQ4ne/ZX
+ao7xRkTeHT6bBjPHP+g0F5Guvadl/B2oajlqVrhCQAw2gdOHmxQ03jFh+kls8lVV
+hTPpG3D6WimsFF49lgI0FwjaqSDuFcLLt8LlmupbnBQb8/10MB/bQ1U8Y7F1Uptk
+ypa/DBUSvFQ6Af5R4+rPwJ3glw9ABIwIwI+QDxwqPe2wGwHtywO86f5nt2I1Hiux
+P6s9zvp5La6kXGK/xGkOMdxIzQl/tkHuGlHxLJDPDtJzU++mBPPoqf0RLTffkg2U
+mBN68COBynJ80abHSepBsxQ2ha9aosfH66p51AkDVixSGCFufXw=
+=Xh7N
+-END PGP SIGNATURE

[spark] 01/01: Preparing development version 3.2.2-SNAPSHOT

2022-01-14 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 75ea6457471cf3df55e7641c4dc6ba66b03e733b
Author: Huaxin Gao 
AuthorDate: Sat Jan 15 01:24:15 2022 +

Preparing development version 3.2.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..5590c86 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a852011..9584884 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 11cf0cb..167e69f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9957a77..eaf1c1e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b3ea287..811e503 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8fb7d4e..23513f6 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7e4c6c3..c5c6161 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags

[spark] branch branch-3.2 updated (66e1231 -> 75ea645)

2022-01-14 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 66e1231  [SPARK-37859][SQL] Do not check for metadata during schema 
comparison
 add ea8ce99  Preparing Spark release v3.2.1-rc2
 new 75ea645  Preparing development version 3.2.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v3.2.1-rc2 created (now ea8ce99)

2022-01-14 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to tag v3.2.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at ea8ce99  (commit)
This tag includes the following new commits:

 new ea8ce99  Preparing Spark release v3.2.1-rc2

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.2.1-rc2

2022-01-14 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to tag v3.2.1-rc2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ea8ce995b7651bda7ef8c2e89c39b92f17e55402
Author: Huaxin Gao 
AuthorDate: Sat Jan 15 01:24:08 2022 +

Preparing Spark release v3.2.1-rc2
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5590c86..2abad61 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.2
+Version: 3.2.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9584884..a852011 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 167e69f..11cf0cb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index eaf1c1e..9957a77 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 811e503..b3ea287 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 23513f6..8fb7d4e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index c5c6161..7e4c6c3 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml

svn commit: r51967 - /dev/spark/KEYS

2022-01-09 Thread huaxingao

Author: huaxingao
Date: Mon Jan 10 05:55:45 2022
New Revision: 51967

Log:
Update KEYS

Modified:
dev/spark/KEYS

Modified: dev/spark/KEYS
==
--- dev/spark/KEYS (original)
+++ dev/spark/KEYS Mon Jan 10 05:55:45 2022
@@ -1679,3 +1679,60 @@ atzKlpZxTel4xO9ZPRdngxTrtAxbcOY4C9R017/q
 KSJjUZyL1f+EufpF7lRzqRVVRzc=
 =kgaF
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2021-12-07 [SC]
+  CEA888BDB32D983C7F094564AC01E6E9139F610C
+uid   [ultimate] Huaxin Gao (CODE SIGNING KEY) 
+sub   rsa4096 2021-12-07 [E]
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBGGutG0BEADV+VY+DciBLfD1iZDrDKs/hND4K4q9rE7qHgXoWdzF2JlvbSmn
+EM26aTySuvsH8Y02a/g/GwAmHVyjSOHd69/kdvtzUS04W3yBToZbS9ZZ1M4NXVe5
+Apl5WlfF5CSW28CcbB8X67YDAkjc3qAviSWhGYn+V19wUx5gBE3QhmhPgGvnTpzw
+je7TmtU6HMbfI+Nt2gNyQ5YWMFqIgKBH70F+cvy5Cs4mEJ8llLRqt600vOPLITCd
+Wi9SpyEcftxWyTopfxuMDiuyw7quKsx5pfnOMbaGqN9YpCmK1/KuYkIXOS0i84Nr
+1iNCZJjRxt/inPRH9kZZtRzTpr5MEmYooE5sfwZUGSo+EI+4eQV950p8x9eUsrx1
+X7BiEyDjTnBJU1qSr0f+CvTgjhcCBGMH2eV+r2/Vl/u+WzRfXOiBdh5EUQd5BW9+
+3zB8YwHp7cFFNhD/oF1IPWPhEiqEs+KsNYbKcqkjipakAyu/SQppTXCLgLFf2fGT
+fa57S/uablQfsIL0Em3pl+mkpidxZ0st/ZhBtFBjVQ8vCnrYIKuswUd6XMI85kEt
+YdaUYqaT+riLXX96SdTLiq4IGJypo1ERgF7epYWTH7XCIO1IZ8K/HoK2+wiOc3jA
++6ydOHAxruncBl8glM+Ffi6c/g0cULYBxJV010rm7L5NyUl9iktkXtl6EwARAQAB
+tDZIdWF4aW4gR2FvIChDT0RFIFNJR05JTkcgS0VZKSA8aHVheGluLmdhbzExQGdt
+YWlsLmNvbT6JAlIEEwEIADwWIQTOqIi9sy2YPH8JRWSsAebpE59hDAUCYa60bQIb
+AwULCQgHAgMiAgEGFQoJCAsCBBYCAwECHgcCF4AACgkQrAHm6ROfYQwMPxAApjub
+YoZK9/2Y7XlbWwIRDkcXA2ktMGlka/gISBfOw0aXkjeRTwuq7fG6YwK4BRlsuZVF
+ALtGRvNiz+UMsPemR/NRaCQY+z4onIvwMbotQ+4ow6vmxZMPhyeCkhL50NPWX2M7
+XkZWRm1r4P9+jJaQiqL6XKcfUb8W9bK6xQ9+SABEh7Nwp8vf8+A9Ab8jXMjYqhmj
+yAITsBW7y8xCdJ26xhWNIQbTwnoKsT6X5pDD/mQpXvTnqRXK1//IO0c5jHtswKgx
+qEe3nMM4GFFaCLghI5DBoXKJPTgIdb+XEyaBJzuw3tI+ZClxi7P82GOE85m/xiwh
+KO3VDpInp81cnHB1aDNh4QLd2F89KYsNUbWlnA6lgLJA+T4Ljg8A4ps5jf0VSP1Y
+KJ/G4C999WD03EZzi1XbIdN2JujsdLpvJzxkyL0civaKbYD/Rn/cWuvQ9JMR3hna
+0qE2w5NSsxRTvCt2svo/8KSr09fUZvqakkUhJWd5q/TJd6ysgXZ1qIeLqy4zkilp
+sopHYFPfsuccVze7wblCIkZPT6bXK9cLBKddiaSCX8iIP57xDrktutrtTGmKmkf/
+9BPHYVK3sM4yiWFkmBn8gyFT52wTY1Hoq8k4SIsA1uG14EK8OYZsGudOAP3sGjZr
+5K+1EWxFk2E976IryZ/jqI5wArbYyeJL+w9nIUa5Ag0EYa60bQEQAKvEMF4C/MR8
+X5YPkWFVaiQJL7WW4Rxc9dMV1wzUm7xexWvSnkglN1QtVJ9MFUoBUJsCBtcS5GaJ
+u4Vv9W/gA9nkk3EG5h8AMIa9XrPQvv1VudLx8I4VAI33XW83bDCxh0jo4Fq9TZt1
+Wa/jcbIPxIiV1Z18VXaaMgS/N/SL+zO2IuMsj1mctlZ2AvR6e2j3M30l4ZfbJ8fO
+PvyG9FiPiVCikmoI92eOFl06AfEQTrCbwsB1/i5ugKZleHalS46tynkCgzUtxJCk
+z6q1xgJtbF164lL8TCPHzTr3bfEZCAw0LgJuRTHK/qloPGVcCheYnaijeMExtYY2
+Q/VM36q5adDegEZOhGIzcJ9mbTDpl8euvRuvAAn5bQOqO/v0aKE00sNlfIGiU6wu
+B0K3QtxHgO48lhZ4agU489fyPGNBrHR+/goKPSNthMa+Pp0/B3FGdG5US9BiIe1O
+cLTllvofeEwjrlqetZna0687peImiu3FWBG5JUzXTFjfEckXqsxsMcQe8715y8tz
+unW8fEmHkGxK9vRljFTy15ug2cgAIdl8WF9h6zReKyVmvQPaROhN6+H0CIanDnzB
+xt3hhfIhY7LP0E1baCrVxPWslugcEVTO93mmzRFEgV599BojbO7XUr7nziMNPYJL
+/WwGnQHQ9wMTIE4iBK5mtbvTKTxzq0kvABEBAAGJAjYEGAEIACAWIQTOqIi9sy2Y
+PH8JRWSsAebpE59hDAUCYa60bQIbDAAKCRCsAebpE59hDINOD/9muHui2A2BgiO0
+PE4cpzLw0AvHvilFF1Cnd6pwy9SyXKUXCHBAKbo3Z0PRBXw24BgwUiAsbFakPc60
+cD8IgcGKyvDeFNat1cYtIzw+ZFFtLdedzlUbaAnMCB+c7CncKhjNfPxJl7AgNn6r
+bG7kQ0n1By7VMEcN7x9jpg5b5IzWi7nOWbPL1XTTg5f8JknB63eWFqvjivdCL08m
+uTIR76frsvnlkhdxgnBvdAw/iPc43/EAM1IPvsm45Bpa7kZShU+HslLT5BXg2f3x
+/BGCwX2DI4Aoww452iwqYlKbZES8bVROk1BFmaRSzqjz8qN3PRbd1rNJK1IgzmlZ
+LFLHhOnCwTO79/cNn3u47he6h1PPvBZsacWGlCHJlXYi51z8Wdq8k31LaczNpk1u
+MBR4ngBnW7QqXVE4LlSqISBczpTaYuTTvx93d5SjBLi8woWTZHH4GAyRTuFXK3lu
+DR+P2FH3Gqvb1dRbmh5R4w1WuuepzU46rANYthRDaiaGTn5npkplEzMr3fscACDU
+q52TUoULJ0ztpnejklwULzpyD8QzR/TKKjdjpKepX5ykIcRuhsriJ3CiVTnXBfLA
+QzSNDZqk1/XFpio3lgqBgt6UuZyJfb24mnMUvghiYBxPhf2AT2XR9YVYlQJmZ3YN
+NStvKoQzc2ERXMW50A6DhyYI1UGQVw==
+=rMlG
+-END PGP PUBLIC KEY BLOCK-



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r51966 - in /dev/spark/v3.2.1-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2022-01-09 Thread huaxingao

Author: huaxingao
Date: Mon Jan 10 05:23:09 2022
New Revision: 51966

Log:
Apache Spark v3.2.1-rc1 docs


[This commit notification would consist of 2355 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r51965 - /dev/spark/v3.2.1-rc1-bin/

2022-01-09 Thread huaxingao

Author: huaxingao
Date: Mon Jan 10 04:10:14 2022
New Revision: 51965

Log:
Apache Spark v3.2.1-rc1

Added:
dev/spark/v3.2.1-rc1-bin/
dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz   (with props)
dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz.asc
dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz.sha512
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-without-hadoop.tgz.asc
dev/spark/v3.2.1-rc1-bin/spark-3.2.1-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.1-rc1-bin/spark-3.2.1.tgz   (with props)
dev/spark/v3.2.1-rc1-bin/spark-3.2.1.tgz.asc
dev/spark/v3.2.1-rc1-bin/spark-3.2.1.tgz.sha512

Added: dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.asc Mon Jan 10 04:10:14 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHbnZwACgkQrAHm6ROf
+YQyraxAAxxp6KE72KrxF12G0UeBG3s8XV5f83sNCZtajq4W6EMAwLp2gcex931d1
+CZrPXO3mChSgAE3q43unOZy6wNmxx9768gdCqdcif205bzhW4F7/5E1JC/VnRMab
+6kQGjOu6GinnUATcB+55hQphWBCxv/3icv5D2kRooovOK6T4FwgKYYRkt3yX9SHS
+XWIbjAuYNAp3atkRe3p1B7ZZONPF+AXvbHpTnoE3lfUCOUlNOWV15yYSz7ZHqSwp
+6vfH9EHkx0bChHRYlxzSa46tEdlhKcS5Sml0isTx3Dz6eZUYyJ75be7aS7qer3fI
+rxAm6+FFgRd2bqztOMqyQ8v17xNOxQXfPgPdEAONoULT8qVMwjeIwqITSmFQisrj
+PUgHgeSku+8GDLJI2cSR2IUGNx5v9ek1zzbI3fblxXJLiYhe3tvp13YUvp4xarP4
+FcbMEZEZf83lqy73d6Q96JxQQT+FjVJRmAUoooYjtXXqsNC6cdgw6IBATCUZ5/y5
+uGo79d+S/zwyvQJKSTVOuyORcqQIzaNIiuZJGnTnqsYNVR8A8Co+AHRT4Qamk2DL
+EbVoGK4gGTDjhwRvE6F62TXEH2VlSLCi8HhZT+moze5YiQZB59L/z+8RsoE9VNl+
+EdtzUf9JyR2ohupjGABy8/s9ed7ZOVc/p9OD/GFQK31GtqHTnF4=
+=wVaR
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.sha512
==
--- dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.sha512 (added)
+++ dev/spark/v3.2.1-rc1-bin/SparkR_3.2.1.tar.gz.sha512 Mon Jan 10 04:10:14 2022
@@ -0,0 +1,3 @@
+SparkR_3.2.1.tar.gz: 4B5310DD FDB9932B 138EADF0 64B4A6EB 26D7AFAE 53D1B7F7
+ 1F24AF6C B65BE777 54C3331F 1EF94CAD A6908B4D 70741454
+ 983BB268 31B9C151 25A247E0 A1408AD7

Added: dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz.asc
==
--- dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz.asc (added)
+++ dev/spark/v3.2.1-rc1-bin/pyspark-3.2.1.tar.gz.asc Mon Jan 10 04:10:14 2022
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCgAdFiEEzqiIvbMtmDx/CUVkrAHm6ROfYQwFAmHbnaAACgkQrAHm6ROf
+YQzjMhAAspBZTlDONKh5q5PzIEzcHHmlwDZoPhsWlu728yRvMQrBEdgKSCQX5XSj
+m109gzVkEFerF4Q7ubLRtbDeB9XBJfl2IXCuEExuGLoppZD3LxOFhJiKJXSwGOF2
+YqEnXH18ylPsOIxEXWcS1udWyMWS5fsJn7AQuaen015KRjehYw6Thl5h+e6bN8pW
+mQegyzoK6nLFquxMPoY4D4iR1peW4kIBDKLfhHRjVYQgbL56boVGrY36XEVI3lRR
+A6vrsM330KQvLWCIjE5f2JcpOu4dGtb0BjtJYVcL4FakS1VajPgMuLZKKUkAtqUi
+GleFhAfeDiWXFtdnn5TeRGRxjOOjssdXsFhz+X3fjXLYcHSGRh6sXXOuu0F/7dhV
+giJu2eISyYv+psavMA0mvpAizKwZXE5NUKj0UXbNUrP+23wARjelNuuDI0WRhPjZ
+z70gOjsrL6PKTiy3Xp3wa6i1PcwuJR628aN+6wkEeE8FUwRwdCZN1pyphJOzYNxx
+trdTZryOoYlrPIvTNDCoB1Dks6TuNkqac/t3yPKS8Q6xORrICWm0JpDvS8uCVRwY
+Dc6CFCWdSCASpIWjAMlpxnkQzPcVjZ3BZMtSz1syU+fpjFuAd008vPLc1RawOozD
+JoIwSQN3WqQ+JH4beSArbfa9f6h6+ZlbU2Fz0q1ocImHqNCUEtw=
+=3xiY
+-END PGP SIGNATURE

[spark] 01/01: Preparing development version 3.2.2-SNAPSHOT

2022-01-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit ae309f0c60c3db8c2a4f1b1a75f99146fb172554
Author: Huaxin Gao 
AuthorDate: Fri Jan 7 17:38:42 2022 +

Preparing development version 3.2.2-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 2abad61..5590c86 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.1
+Version: 3.2.2
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index a852011..9584884 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 11cf0cb..167e69f 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index 9957a77..eaf1c1e 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index b3ea287..811e503 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 8fb7d4e..23513f6 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index 7e4c6c3..c5c6161 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.1
+3.2.2-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/tags

[spark] branch branch-3.2 updated (4b5d2d7 -> ae309f0)

2022-01-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4b5d2d7  [SPARK-37802][SQL][3.2] Composite field name should work with 
Aggregate push down
 add 2b0ee22  Preparing Spark release v3.2.1-rc1
 new ae309f0  Preparing development version 3.2.2-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 01/01: Preparing Spark release v3.2.1-rc1

2022-01-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to tag v3.2.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 2b0ee226f8dd17b278ad11139e62464433191653
Author: Huaxin Gao 
AuthorDate: Fri Jan 7 17:38:35 2022 +

Preparing Spark release v3.2.1-rc1
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5590c86..2abad61 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.2
+Version: 3.2.1
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
<https://spark.apache.org>.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9584884..a852011 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 167e69f..11cf0cb 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index eaf1c1e..9957a77 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 811e503..b3ea287 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 23513f6..8fb7d4e 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index c5c6161..7e4c6c3 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.1
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common

[spark] tag v3.2.1-rc1 created (now 2b0ee22)

2022-01-07 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to tag v3.2.1-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at 2b0ee22  (commit)
This tag includes the following new commits:

 new 2b0ee22  Preparing Spark release v3.2.1-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (2470640 -> 4b5d2d7)

2022-01-06 Thread huaxingao

This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2470640  [SPARK-37800][SQL][FOLLOW-UP] Remove duplicated LogicalPlan 
inheritance
 add 4b5d2d7  [SPARK-37802][SQL][3.2] Composite field name should work with 
Aggregate push down

No new revisions were added by this update.

Summary of changes:
 .../sql/connector/expressions/expressions.scala|  4 +++
 .../execution/datasources/DataSourceStrategy.scala | 10 +++---
 .../execution/datasources/v2/PushDownUtils.scala   |  2 +-
 .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 41 +-
 4 files changed, 50 insertions(+), 7 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

1 2 3 >

1 - 100 of 218 matches

Mail list logo