[spark] branch branch-3.3 updated: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-11 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new acf8f66650a [SPARK-39647][CORE] Register the executor with ESS before 
registering the BlockManager
acf8f66650a is described below

commit acf8f66650af53718b08f3778c2a2a3a5d10a88f
Author: Chandni Singh 
AuthorDate: Tue Jul 12 00:20:43 2022 -0500

[SPARK-39647][CORE] Register the executor with ESS before registering the 
BlockManager

### What changes were proposed in this pull request?
Currently the executors register with the ESS after the `BlockManager` 
registration with the `BlockManagerMaster`.  This order creates a problem with 
the push-based shuffle. A registered BlockManager node is picked up by the 
driver as a merger but the shuffle service on that node is not yet ready to 
merge the data which causes block pushes to fail until the local executor 
registers with it. This fix is to reverse the order, that is, register with the 
ESS before registering the `BlockManager`

### Why are the changes needed?
They are needed to fix the issue which causes block pushes to fail.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added a UT.

Closes #37052 from otterc/SPARK-39647.

Authored-by: Chandni Singh 
Signed-off-by: Mridul Muralidharan gmail.com>
(cherry picked from commit 79ba2890f51c5f676b9cd6e3a6682c7969462999)
Signed-off-by: Mridul Muralidharan 
---
 .../org/apache/spark/storage/BlockManager.scala| 30 --
 .../apache/spark/storage/BlockManagerSuite.scala   | 36 ++
 2 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
index d5901888d1a..53d2d054121 100644
--- a/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
+++ b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala
@@ -516,9 +516,27 @@ private[spark] class BlockManager(
   ret
 }
 
+// Register Executors' configuration with the local shuffle service, if 
one should exist.
+// Registration with the ESS should happen before registering the block 
manager with the
+// BlockManagerMaster. In push-based shuffle, the registered BM is 
selected by the driver
+// as a merger. However, for the ESS on this host to be able to merge 
blocks successfully,
+// it needs the merge directories metadata which is provided by the local 
executor during
+// the registration with the ESS. Therefore, this registration should be 
prior to
+// the BlockManager registration. See SPARK-39647.
+if (externalShuffleServiceEnabled) {
+  logInfo(s"external shuffle service port = $externalShuffleServicePort")
+  shuffleServerId = BlockManagerId(executorId, 
blockTransferService.hostName,
+externalShuffleServicePort)
+  if (!isDriver) {
+registerWithExternalShuffleServer()
+  }
+}
+
 val id =
   BlockManagerId(executorId, blockTransferService.hostName, 
blockTransferService.port, None)
 
+// The idFromMaster has just additional topology information. Otherwise, 
it has the same
+// executor id/host/port of idWithoutTopologyInfo which is not expected to 
be changed.
 val idFromMaster = master.registerBlockManager(
   id,
   diskBlockManager.localDirsString,
@@ -528,16 +546,8 @@ private[spark] class BlockManager(
 
 blockManagerId = if (idFromMaster != null) idFromMaster else id
 
-shuffleServerId = if (externalShuffleServiceEnabled) {
-  logInfo(s"external shuffle service port = $externalShuffleServicePort")
-  BlockManagerId(executorId, blockTransferService.hostName, 
externalShuffleServicePort)
-} else {
-  blockManagerId
-}
-
-// Register Executors' configuration with the local shuffle service, if 
one should exist.
-if (externalShuffleServiceEnabled && !blockManagerId.isDriver) {
-  registerWithExternalShuffleServer()
+if (!externalShuffleServiceEnabled) {
+  shuffleServerId = blockManagerId
 }
 
 hostLocalDirManager = {
diff --git 
a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala 
b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
index 45e05b2cc2d..874b2b4f005 100644
--- a/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
@@ -2175,6 +2175,42 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 assert(kryoException.getMessage === "java.io.IOException: Input/output 
error")
   }
 
+  test("SPARK-39647: Failure to register with ESS should prevent registering 
the BM") {
+

[spark] branch master updated (aa51da42908 -> 79ba2890f51)

2022-07-11 Thread mridulm80
This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from aa51da42908 [SPARK-39723][R] Implement functionExists/getFunc in 
SparkR for 3L namespace
 add 79ba2890f51 [SPARK-39647][CORE] Register the executor with ESS before 
registering the BlockManager

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/storage/BlockManager.scala| 30 --
 .../apache/spark/storage/BlockManagerSuite.scala   | 36 ++
 2 files changed, 56 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

2022-07-11 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aa51da42908 [SPARK-39723][R] Implement functionExists/getFunc in 
SparkR for 3L namespace
aa51da42908 is described below

commit aa51da4290814bf3ccdc52000b8d90d6db575d3f
Author: Ruifeng Zheng 
AuthorDate: Tue Jul 12 11:05:25 2022 +0800

[SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

### What changes were proposed in this pull request?
1, implement functionExists/getFunc in SparkR
2, update doc of ListFunctions

### Why are the changes needed?
for 3L namespace

### Does this PR introduce _any_ user-facing change?
yes, new API functionExists

### How was this patch tested?
added UT

Closes #37135 from zhengruifeng/r_3L_func.

Lead-authored-by: Ruifeng Zheng 
Co-authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 R/pkg/NAMESPACE   |  2 +
 R/pkg/R/catalog.R | 75 ++-
 R/pkg/pkgdown/_pkgdown_template.yml   |  2 +
 R/pkg/tests/fulltests/test_sparkSQL.R | 34 +++-
 4 files changed, 111 insertions(+), 2 deletions(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 3937791421a..e078ba0c2cd 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -479,7 +479,9 @@ export("as.DataFrame",
"databaseExists",
"dropTempTable",
"dropTempView",
+   "functionExists",
"getDatabase",
+   "getFunc",
"getTable",
"listCatalogs",
"listColumns",
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index 680415ea6cd..942af4de3c0 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -583,13 +583,14 @@ listColumns <- function(tableName, databaseName = NULL) {
 #' This includes all temporary functions.
 #'
 #' @param databaseName (optional) name of the database
+#' The database name can be qualified with catalog name 
since 3.4.0.
 #' @return a SparkDataFrame of the list of function descriptions.
 #' @rdname listFunctions
 #' @name listFunctions
 #' @examples
 #' \dontrun{
 #' sparkR.session()
-#' listFunctions()
+#' listFunctions(spark_catalog.default)
 #' }
 #' @note since 2.2.0
 listFunctions <- function(databaseName = NULL) {
@@ -606,6 +607,78 @@ listFunctions <- function(databaseName = NULL) {
   dataFrame(callJMethod(jdst, "toDF"))
 }
 
+#' Checks if the function with the specified name exists.
+#'
+#' Checks if the function with the specified name exists.
+#'
+#' @param functionName name of the function, allowed to be qualified with 
catalog name
+#' @rdname functionExists
+#' @name functionExists
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' functionExists("spark_catalog.default.myFunc")
+#' }
+#' @note since 3.4.0
+functionExists <- function(functionName) {
+  sparkSession <- getSparkSession()
+  if (class(functionName) != "character") {
+stop("functionName must be a string.")
+  }
+  catalog <- callJMethod(sparkSession, "catalog")
+  callJMethod(catalog, "functionExists", functionName)
+}
+
+#' Get the function with the specified name
+#'
+#' Get the function with the specified name
+#'
+#' @param functionName name of the function, allowed to be qualified with 
catalog name
+#' @return A named list.
+#' @rdname getFunc
+#' @name getFunc
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' func <- getFunc("spark_catalog.default.myFunc")
+#' }
+#' @note since 3.4.0. Use different name with the scala/python side, to avoid 
the
+#'   signature conflict with built-in "getFunction".
+getFunc <- function(functionName) {
+  sparkSession <- getSparkSession()
+  if (class(functionName) != "character") {
+stop("functionName must be a string.")
+  }
+  catalog <- callJMethod(sparkSession, "catalog")
+  jfunc <- handledCallJMethod(catalog, "getFunction", functionName)
+
+  ret <- list(name = callJMethod(jfunc, "name"))
+  jcata <- callJMethod(jfunc, "catalog")
+  if (is.null(jcata)) {
+ret$catalog <- NA
+  } else {
+ret$catalog <- jcata
+  }
+
+  jns <- callJMethod(jfunc, "namespace")
+  if (is.null(jns)) {
+ret$namespace <- NA
+  } else {
+ret$namespace <- jns
+  }
+
+  jdesc <- callJMethod(jfunc, "description")
+  if (is.null(jdesc)) {
+ret$description <- NA
+  } else {
+ret$description <- jdesc
+  }
+
+  ret$className <- callJMethod(jfunc, "className")
+  ret$isTemporary <- callJMethod(jfunc, "isTemporary")
+  ret
+}
+
 #' Recovers all the partitions in the directory of a table and update the 
catalog
 #'
 #' Recovers all the partitions in the directory of a table and update the 
catalog. The name should
diff --git a/R/pkg/pkgdown/_pkgdown_template.yml 
b/R/pkg/pkgdown/_pkgdown_template.yml
index df93f200ab2..1da1d62ee9c 100644
--- 

[spark] branch master updated: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-11 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d10db29ca16 [SPARK-39711][TESTS] Remove redundant trait: 
BeforeAndAfterAll & BeforeAndAfterEach & Logging
d10db29ca16 is described below

commit d10db29ca1609c34f082068f0cc7419c5ecef190
Author: panbingkun 
AuthorDate: Mon Jul 11 19:30:30 2022 -0700

[SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & 
BeforeAndAfterEach & Logging

### What changes were proposed in this pull request?
SparkFunSuite declare as follow:
```
abstract class SparkFunSuite
extends AnyFunSuite
with BeforeAndAfterAll
with BeforeAndAfterEach
with ThreadAudit
with Logging
```
some suite extends SparkFunSuite and meanwhile with BeforeAndAfterAll or 
BeforeAndAfterEach or Logging, it is redundant.

### Why are the changes needed?
Eliminate redundant information and make the code cleaner.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #37123 from panbingkun/remove_BeforeAndAfterAll.

Authored-by: panbingkun 
Signed-off-by: huaxingao 
---
 .../scala/org/apache/spark/kafka010/KafkaTokenSparkConfSuite.scala | 3 +--
 .../apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala   | 3 +--
 .../scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala  | 3 +--
 core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala | 3 +--
 .../org/apache/spark/SparkContextSchedulerCreationSuite.scala  | 4 +---
 core/src/test/scala/org/apache/spark/ThreadingSuite.scala  | 4 +---
 .../test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala | 3 +--
 .../org/apache/spark/deploy/history/ApplicationCacheSuite.scala| 2 +-
 .../org/apache/spark/deploy/history/FsHistoryProviderSuite.scala   | 4 +---
 .../spark/deploy/history/RealBrowserUIHistoryServerSuite.scala | 4 +---
 .../scala/org/apache/spark/deploy/master/ui/MasterWebUISuite.scala | 4 +---
 .../org/apache/spark/deploy/rest/StandaloneRestSubmitSuite.scala   | 3 +--
 .../org/apache/spark/input/WholeTextFileInputFormatSuite.scala | 5 +
 .../org/apache/spark/input/WholeTextFileRecordReaderSuite.scala| 4 +---
 .../org/apache/spark/internal/plugin/PluginContainerSuite.scala| 3 +--
 .../test/scala/org/apache/spark/memory/MemoryManagerSuite.scala| 4 +---
 .../src/test/scala/org/apache/spark/rdd/AsyncRDDActionsSuite.scala | 3 +--
 core/src/test/scala/org/apache/spark/rdd/SortingSuite.scala| 3 +--
 core/src/test/scala/org/apache/spark/rpc/RpcEnvSuite.scala | 3 +--
 .../org/apache/spark/scheduler/EventLoggingListenerSuite.scala | 5 +
 .../test/scala/org/apache/spark/scheduler/HealthTrackerSuite.scala | 4 +---
 .../scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala  | 6 ++
 .../scala/org/apache/spark/scheduler/TaskSetExcludelistSuite.scala | 3 +--
 .../org/apache/spark/serializer/SerializationDebuggerSuite.scala   | 5 +
 .../scala/org/apache/spark/shuffle/ShuffleBlockPusherSuite.scala   | 3 +--
 .../org/apache/spark/shuffle/ShuffleDriverComponentsSuite.scala| 4 +---
 .../apache/spark/shuffle/sort/IndexShuffleBlockResolverSuite.scala | 3 +--
 .../shuffle/sort/io/LocalDiskShuffleMapOutputWriterSuite.scala | 3 +--
 .../scala/org/apache/spark/storage/BlockInfoManagerSuite.scala | 4 +---
 .../test/scala/org/apache/spark/storage/BlockManagerSuite.scala| 7 +++
 .../scala/org/apache/spark/storage/DiskBlockManagerSuite.scala | 4 +---
 .../org/apache/spark/storage/DiskBlockObjectWriterSuite.scala  | 4 +---
 .../scala/org/apache/spark/ui/RealBrowserUISeleniumSuite.scala | 3 +--
 core/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala  | 3 +--
 core/src/test/scala/org/apache/spark/util/FileAppenderSuite.scala  | 4 ++--
 core/src/test/scala/org/apache/spark/util/UtilsSuite.scala | 3 +--
 .../apache/spark/util/collection/ExternalSorterSpillSuite.scala| 3 +--
 .../test/scala/org/apache/spark/util/collection/SorterSuite.scala  | 3 +--
 .../apache/spark/util/collection/unsafe/sort/RadixSortSuite.scala  | 3 +--
 mllib/src/test/scala/org/apache/spark/ml/MLEventsSuite.scala   | 5 +
 .../test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala   | 3 +--
 .../src/test/scala/org/apache/spark/ml/stat/CorrelationSuite.scala | 4 +---
 .../org/apache/spark/ml/tree/impl/GradientBoostedTreesSuite.scala  | 3 +--
 .../test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala| 3 +--
 .../test/scala/org/apache/spark/mllib/stat/CorrelationSuite.scala  | 3 +--
 .../org/apache/spark/mllib/tree/GradientBoostedTreesSuite.scala| 3 +--
 repl/src/test/scala-2.12/org/apache/spark/repl/Repl2Suite.scala| 4 +---
 

[spark] branch master updated: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ab277e123c5 [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` 
should support aggregate filter
ab277e123c5 is described below

commit ab277e123c5f8cdc9f147ae019dbb38bc0e50262
Author: Jiaan Geng 
AuthorDate: Tue Jul 12 10:18:23 2022 +0800

[SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support 
aggregate filter

### What changes were proposed in this pull request?
Currently, Spark support ANSI aggregation function percentile_cont and 
percentile_disc.
But the two aggregate functions does not support aggregate filter.

### Why are the changes needed?
aggregate filter could improve performance and is very useful.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes #37150 from beliefer/SPARK-39737.

Authored-by: Jiaan Geng 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/parser/SqlBaseParser.g4 |  3 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  3 +-
 .../sql/catalyst/parser/PlanParserSuite.scala  | 14 ++-
 .../resources/sql-tests/inputs/percentiles.sql | 16 ++--
 .../sql-tests/results/percentiles.sql.out  | 48 +-
 5 files changed, 57 insertions(+), 27 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index ce37a09d5ba..f398ddd76f7 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -849,7 +849,8 @@ primaryExpression
 | OVERLAY LEFT_PAREN input=valueExpression PLACING replace=valueExpression
   FROM position=valueExpression (FOR length=valueExpression)? RIGHT_PAREN  
#overlay
 | name=(PERCENTILE_CONT | PERCENTILE_DISC) LEFT_PAREN 
percentage=valueExpression RIGHT_PAREN
-  WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN ( OVER 
windowSpec)?#percentile
+WITHIN GROUP LEFT_PAREN ORDER BY sortItem RIGHT_PAREN
+(FILTER LEFT_PAREN WHERE where=booleanExpression RIGHT_PAREN)? ( OVER 
windowSpec)? #percentile
 ;
 
 constant
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 46847411bf0..05b3ddca022 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1865,7 +1865,8 @@ class AstBuilder extends SqlBaseParserBaseVisitor[AnyRef] 
with SQLConfHelper wit
   case Descending => PercentileDisc(sortOrder.child, percentage, true)
 }
 }
-val aggregateExpression = percentile.toAggregateExpression()
+val filter = Option(ctx.where).map(expression(_))
+val aggregateExpression = percentile.toAggregateExpression(false, filter)
 ctx.windowSpec match {
   case spec: WindowRefContext =>
 UnresolvedWindowExpression(aggregateExpression, visitWindowRef(spec))
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
index 3c757442e13..6c0d970143b 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
@@ -1327,6 +1327,12 @@ class PlanParserSuite extends AnalysisTest {
 Literal(Decimal(0.1), DecimalType(1, 1)), true).toAggregateExpression()
 )
 
+assertPercentilePlans(
+  "SELECT PERCENTILE_CONT(0.1) WITHIN GROUP (ORDER BY col) FILTER (WHERE 
id > 10)",
+  PercentileCont(UnresolvedAttribute("col"), Literal(Decimal(0.1), 
DecimalType(1, 1)))
+.toAggregateExpression(false, 
Some(GreaterThan(UnresolvedAttribute("id"), Literal(10
+)
+
 assertPercentilePlans(
   "SELECT PERCENTILE_DISC(0.1) WITHIN GROUP (ORDER BY col)",
   PercentileDisc(UnresolvedAttribute("col"), Literal(Decimal(0.1), 
DecimalType(1, 1)))
@@ -1335,8 +1341,14 @@ class PlanParserSuite extends AnalysisTest {
 
 assertPercentilePlans(
   "SELECT PERCENTILE_DISC(0.1) WITHIN GROUP (ORDER BY col DESC)",
-  new PercentileDisc(UnresolvedAttribute("col"),
+  PercentileDisc(UnresolvedAttribute("col"),
 Literal(Decimal(0.1), DecimalType(1, 1)), true).toAggregateExpression()
 )
+
+ 

[spark] branch master updated (ac6f8bb280f -> ab96d805373)

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from ac6f8bb280f [MINOR][FOLLOWUP] Remove redundant return
 add ab96d805373 [SPARK-39736][INFRA] Enable base image build in SparkR job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ac6f8bb280f [MINOR][FOLLOWUP] Remove redundant return
ac6f8bb280f is described below

commit ac6f8bb280f86f4420bbc45b60344b4adbfd7ccd
Author: panbingkun 
AuthorDate: Tue Jul 12 08:42:09 2022 +0900

[MINOR][FOLLOWUP] Remove redundant return

### What changes were proposed in this pull request?
Remove redundant return in scala code.

The pr followup: https://github.com/apache/spark/pull/37148

### Why are the changes needed?
Syntactic simplification.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #37157 from panbingkun/remove_redundance_return_followup.

Authored-by: panbingkun 
Signed-off-by: Hyukjin Kwon 
---
 mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala  | 4 
 .../spark/sql/catalyst/expressions/conditionalExpressions.scala   | 4 ++--
 .../src/main/scala/org/apache/spark/sql/types/ArrayType.scala | 6 +++---
 .../org/apache/spark/sql/execution/streaming/FileStreamSink.scala | 2 +-
 .../sql/execution/streaming/continuous/ContinuousExecution.scala  | 1 -
 .../execution/streaming/state/SymmetricHashJoinStateManager.scala | 8 
 .../scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala   | 6 +++---
 .../spark/sql/execution/streaming/state/StateStoreSuite.scala | 4 ++--
 .../org/apache/spark/streaming/dstream/FileInputDStream.scala | 2 +-
 9 files changed, 16 insertions(+), 21 deletions(-)

diff --git a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala 
b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
index f93195c4750..c81dee5ef8f 100644
--- a/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
+++ b/mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala
@@ -385,7 +385,6 @@ private[spark] object BLAS extends Serializable {
   "The matrix C cannot be the product of a transpose() call. 
C.isTransposed must be false.")
 if (alpha == 0.0 && beta == 1.0) {
   // gemm: alpha is equal to 0 and beta is equal to 1. Returning C.
-  return
 } else if (alpha == 0.0) {
   getBLAS(C.values.length).dscal(C.values.length, beta, C.values, 1)
 } else {
@@ -416,7 +415,6 @@ private[spark] object BLAS extends Serializable {
   s"the length of CValues must be no less than ${A.numRows} X 
${B.numCols}")
 if (alpha == 0.0 && beta == 1.0) {
   // gemm: alpha is equal to 0 and beta is equal to 1. Returning C.
-  return
 } else if (alpha == 0.0) {
   val n = A.numRows * B.numCols
   getBLAS(n).dscal(n, beta, CValues, 1)
@@ -619,7 +617,6 @@ private[spark] object BLAS extends Serializable {
   s"The rows of A don't match the number of elements of y. A: 
${A.numRows}, y:${y.length}")
 if (alpha == 0.0 && beta == 1.0) {
   // gemv: alpha is equal to 0 and beta is equal to 1. Returning y.
-  return
 } else if (alpha == 0.0) {
   getBLAS(A.numRows).dscal(A.numRows, beta, y, 1)
 } else {
@@ -650,7 +647,6 @@ private[spark] object BLAS extends Serializable {
   s"The rows of A don't match the number of elements of y. A: 
${A.numRows}, y:${y.size}")
 if (alpha == 0.0 && beta == 1.0) {
   // gemv: alpha is equal to 0 and beta is equal to 1. Returning y.
-  return
 } else if (alpha == 0.0) {
   scal(beta, y)
 } else {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
index 7213440bebe..f506acde7c2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
@@ -226,9 +226,9 @@ case class CaseWhen(
   i += 1
 }
 if (elseValue.isDefined) {
-  return elseValue.get.eval(input)
+  elseValue.get.eval(input)
 } else {
-  return null
+  null
 }
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala
index a3a2ccf5ab1..b5708bae923 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/ArrayType.scala
@@ -139,11 +139,11 @@ case class ArrayType(elementType: DataType, containsNull: 
Boolean) extends DataT
 i += 1
   }
   if (leftArray.numElements() < rightArray.numElements()) {
-return -1
+-1
   } else if (leftArray.numElements() > rightArray.numElements()) {
-return 1

[spark] branch master updated (9c5c21ccc9c -> e99f3f635cf)

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 9c5c21ccc9c [SPARK-39667][SQL] Add another workaround when there is 
not enough memory to build and broadcast the table
 add e99f3f635cf [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char 
reference in the doc

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-number-pattern.md | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r55687 - in /dev/spark/v3.2.2-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/parqu

2022-07-11 Thread dongjoon
Author: dongjoon
Date: Mon Jul 11 16:49:02 2022
New Revision: 55687

Log:
Apache Spark v3.2.2-rc1 docs


[This commit notification would consist of 2528 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r55686 - /dev/spark/v3.2.2-rc1-bin/

2022-07-11 Thread dongjoon
Author: dongjoon
Date: Mon Jul 11 16:18:32 2022
New Revision: 55686

Log:
Apache Spark v3.2.2-rc1

Added:
dev/spark/v3.2.2-rc1-bin/
dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz   (with props)
dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.asc
dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.sha512
dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz   (with props)
dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz.asc
dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz.sha512
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop2.7.tgz   (with props)
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop2.7.tgz.asc
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop2.7.tgz.sha512
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2-scala2.13.tgz   (with 
props)
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2-scala2.13.tgz.asc
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2-scala2.13.tgz.sha512
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2.tgz   (with props)
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2.tgz.asc
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-hadoop3.2.tgz.sha512
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-without-hadoop.tgz   (with props)
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-without-hadoop.tgz.asc
dev/spark/v3.2.2-rc1-bin/spark-3.2.2-bin-without-hadoop.tgz.sha512
dev/spark/v3.2.2-rc1-bin/spark-3.2.2.tgz   (with props)
dev/spark/v3.2.2-rc1-bin/spark-3.2.2.tgz.asc
dev/spark/v3.2.2-rc1-bin/spark-3.2.2.tgz.sha512

Added: dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.asc
==
--- dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.asc (added)
+++ dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.asc Mon Jul 11 16:18:32 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmLMSXIUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FzJCg/9HYpnXfLcAC8uXxQhbVuJybax8Yok
+GgeH8+u0WpjlwOkpMYrdXqkXcN4Eo/ljYA3wWLbXVJ1maLlea1fdJSuM3F1Pjm+9
+WvrRE/pWAkhVP1YKse3mL6LPowHA1VGSpC6NL/YiJcFlrdr6Q9PdFzwgvysJeTgP
+TsxkbBGEzIhQBtOduSB3FrcY2btl9NseTVfmKFXDvdA8mulGCTiICmFBMK4BhXEw
+js+Qm7eoGlsbpV255hPvLP1nIvpwHDLxiflmUKb8q47lofj5dBkEn6nMtIIyCBAA
+RhtshzQkj0TD8KAh4BitHcHiDZxWu2GNe9rmDUws0uK+4qNBwehHWIGApO2ER3Sg
+en9AA+hMtYdebZML+yTsJd1Lfw3aDikgAQd0d8KMX6pp3j0IJw2dXLHjn79JzgQV
+J045fPkaaZduNHmqnQAYQWmZ5qcjWkRw7CU+sCmnoYYpn+DL2nRGNBQqA9AaFyWO
+12hKtjIr7yE1qxCkVp64cYsig3mbSV6MIkVzMgREhEg3d6XFrrAt0Oz3DIthVmY+
+2zwDJygUmyvoCViqAlYYE3skUUTQBzk2gigJi6bmPMePzAy1iYfBJmyBVWJULWOA
+7HLDg+AUUduOp/viO3bza4Wv3uOXHbkfs4OT1y3E0VUjAKEVCjdXFWiA922/iDMa
+bxexIJ/48VfLdBw=
+=kDyY
+-END PGP SIGNATURE-

Added: dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.sha512
==
--- dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.sha512 (added)
+++ dev/spark/v3.2.2-rc1-bin/SparkR_3.2.2.tar.gz.sha512 Mon Jul 11 16:18:32 2022
@@ -0,0 +1 @@
+3672add0d081f76b236e8e7a57209aed25ca3f677aab294ce35005482a25de39ab56760efe37c2ebfe012a99f0f3e647cb732af4002b89691eb2256ee68a8ee3
  SparkR_3.2.2.tar.gz

Added: dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz.asc
==
--- dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz.asc (added)
+++ dev/spark/v3.2.2-rc1-bin/pyspark-3.2.2.tar.gz.asc Mon Jul 11 16:18:32 2022
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+
+iQJIBAABCgAyFiEE8oycklwYjDXjRWFN7aAM6DTw/FwFAmLMSXQUHGRvbmdqb29u
+QGFwYWNoZS5vcmcACgkQ7aAM6DTw/FzwKA/9EBe4Om2DMSFnErQBlNXn7392/LlU
+FcE76t5U2kIMUCGZyjXlOUdIPgPhQk5lZk2tgpZagDIqKE7AWsR3fWPiaXcTXxFv
+nEypXvn7cAJiSi0FEbKR/4nCG/RR0l5z8aXoyF0GhCLZkdG9nSTrvzb59VBcQ/cf
+WYesiC2mZDk78nEcuul0iW9batthuq4k7pUKrkYZKTbunmh6gQOnl/Pd0D+RELXM
+puFuwkbJ0YwOI45BPy604VYzUMztxZqX3D7FuJdya4+/TMFzPUAui6wIpMz/dkqG
+Ih2WpIO5MaBd1QhrlK4eRtdrIDFjLXK5gk/95hyvl4TXCYP9U05lWtumMnTIiTfW
+2y/Cy4g26cgJWSArIgysGRHCXl1+7bNvenJAWbRsiHYIBH3/tLv9HpndIfcro4m9
+1nbI0R3LXqR0khOTeJZx6dt4Y1Hl55fpCIfoD9rCSkrqSNJptVyGr3DVTTCG/wLb
+OZ7+mUGSfovpqvzbgTwIoOjzIYyOLDuvZn02vjV9ZJMrjLO49goTkAQ5rDBWDmFD
+QVJQajc31PwxKOWWsKWP/5xnaazF/0kNg20Xv4xUkLCMWktCfh2cpULghGTPjdz3
+qbUVJY3Js+9BTstOlzK4zH/kvGGnN5cmdHOC8u8q8hzbnzmsuCvGAzGIWRI0c+7v
+1B/NunjicGZdmJU=
+=sxHD
+-END PGP SIGNATURE-


[spark] branch branch-3.2 updated (ba978b3c533 -> 1cae70dc559)

2022-07-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from ba978b3c533 [SPARK-39099][BUILD] Add dependencies to Dockerfile for 
building Spark releases
 add 78a5825fe26 Preparing Spark release v3.2.2-rc1
 new 1cae70dc559 Preparing development version 3.2.3-SNAPSHOT

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing development version 3.2.3-SNAPSHOT

2022-07-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 1cae70dc559122da5cbf1227d9b3238623fd87fb
Author: Dongjoon Hyun 
AuthorDate: Mon Jul 11 15:13:06 2022 +

Preparing development version 3.2.3-SNAPSHOT
---
 R/pkg/DESCRIPTION  | 2 +-
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 6 +++---
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 39 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/R/pkg/DESCRIPTION b/R/pkg/DESCRIPTION
index 5590c862921..46e5717e10c 100644
--- a/R/pkg/DESCRIPTION
+++ b/R/pkg/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: SparkR
 Type: Package
-Version: 3.2.2
+Version: 3.2.3
 Title: R Front End for 'Apache Spark'
 Description: Provides an R Front end for 'Apache Spark' 
.
 Authors@R: c(person("Shivaram", "Venkataraman", role = "aut",
diff --git a/assembly/pom.xml b/assembly/pom.xml
index 7248e549a9c..bb0ae44676d 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 42f54b5f110..834cb79a5e0 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index d74828a18b0..712069007c9 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 7e73e2e1bc4..f673dfb29b3 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index f7e6a344d7d..d413da51414 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index e57b79ef33f..17596971dfd 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2
+3.2.3-SNAPSHOT
 ../../pom.xml
   
 
diff 

[spark] tag v3.2.2-rc1 created (now 78a5825fe26)

2022-07-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to tag v3.2.2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git


  at 78a5825fe26 (commit)
This tag includes the following new commits:

 new 78a5825fe26 Preparing Spark release v3.2.2-rc1

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] 01/01: Preparing Spark release v3.2.2-rc1

2022-07-11 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to tag v3.2.2-rc1
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 78a5825fe266c0884d2dd18cbca9625fa258d7f7
Author: Dongjoon Hyun 
AuthorDate: Mon Jul 11 15:13:02 2022 +

Preparing Spark release v3.2.2-rc1
---
 assembly/pom.xml   | 2 +-
 common/kvstore/pom.xml | 2 +-
 common/network-common/pom.xml  | 2 +-
 common/network-shuffle/pom.xml | 2 +-
 common/network-yarn/pom.xml| 2 +-
 common/sketch/pom.xml  | 2 +-
 common/tags/pom.xml| 2 +-
 common/unsafe/pom.xml  | 2 +-
 core/pom.xml   | 2 +-
 docs/_config.yml   | 2 +-
 examples/pom.xml   | 2 +-
 external/avro/pom.xml  | 2 +-
 external/docker-integration-tests/pom.xml  | 2 +-
 external/kafka-0-10-assembly/pom.xml   | 2 +-
 external/kafka-0-10-sql/pom.xml| 2 +-
 external/kafka-0-10-token-provider/pom.xml | 2 +-
 external/kafka-0-10/pom.xml| 2 +-
 external/kinesis-asl-assembly/pom.xml  | 2 +-
 external/kinesis-asl/pom.xml   | 2 +-
 external/spark-ganglia-lgpl/pom.xml| 2 +-
 graphx/pom.xml | 2 +-
 hadoop-cloud/pom.xml   | 2 +-
 launcher/pom.xml   | 2 +-
 mllib-local/pom.xml| 2 +-
 mllib/pom.xml  | 2 +-
 pom.xml| 2 +-
 python/pyspark/version.py  | 2 +-
 repl/pom.xml   | 2 +-
 resource-managers/kubernetes/core/pom.xml  | 2 +-
 resource-managers/kubernetes/integration-tests/pom.xml | 2 +-
 resource-managers/mesos/pom.xml| 2 +-
 resource-managers/yarn/pom.xml | 2 +-
 sql/catalyst/pom.xml   | 2 +-
 sql/core/pom.xml   | 2 +-
 sql/hive-thriftserver/pom.xml  | 2 +-
 sql/hive/pom.xml   | 2 +-
 streaming/pom.xml  | 2 +-
 tools/pom.xml  | 2 +-
 38 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/assembly/pom.xml b/assembly/pom.xml
index 9584884b003..7248e549a9c 100644
--- a/assembly/pom.xml
+++ b/assembly/pom.xml
@@ -21,7 +21,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../pom.xml
   
 
diff --git a/common/kvstore/pom.xml b/common/kvstore/pom.xml
index 167e69f9e6e..42f54b5f110 100644
--- a/common/kvstore/pom.xml
+++ b/common/kvstore/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/network-common/pom.xml b/common/network-common/pom.xml
index eaf1c1e12e2..d74828a18b0 100644
--- a/common/network-common/pom.xml
+++ b/common/network-common/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/network-shuffle/pom.xml b/common/network-shuffle/pom.xml
index 811e5035c46..7e73e2e1bc4 100644
--- a/common/network-shuffle/pom.xml
+++ b/common/network-shuffle/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/network-yarn/pom.xml b/common/network-yarn/pom.xml
index 23513f60439..f7e6a344d7d 100644
--- a/common/network-yarn/pom.xml
+++ b/common/network-yarn/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/sketch/pom.xml b/common/sketch/pom.xml
index c5c61611440..e57b79ef33f 100644
--- a/common/sketch/pom.xml
+++ b/common/sketch/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/tags/pom.xml b/common/tags/pom.xml
index cffc824aa41..1b3438fbb6c 100644
--- a/common/tags/pom.xml
+++ b/common/tags/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-3.2.2-SNAPSHOT
+3.2.2
 ../../pom.xml
   
 
diff --git a/common/unsafe/pom.xml b/common/unsafe/pom.xml
index 993d0f32ed9..f3d07103e52 100644
--- a/common/unsafe/pom.xml
+++ b/common/unsafe/pom.xml
@@ -22,7 +22,7 @@
   
 org.apache.spark
 spark-parent_2.12
-  

[spark] branch master updated (3bf07190510 -> 9c5c21ccc9c)

2022-07-11 Thread yumwang
This is an automated email from the ASF dual-hosted git repository.

yumwang pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3bf07190510 [SPARK-39735][INFRA] Move image condition to jobs to make 
non-master schedule job work
 add 9c5c21ccc9c [SPARK-39667][SQL] Add another workaround when there is 
not enough memory to build and broadcast the table

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/errors/QueryExecutionErrors.scala  | 13 ++---
 .../sql/execution/exchange/BroadcastExchangeExec.scala  |  5 +++--
 2 files changed, 13 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39735][INFRA] Move image condition to jobs to make non-master schedule job work

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3bf07190510 [SPARK-39735][INFRA] Move image condition to jobs to make 
non-master schedule job work
3bf07190510 is described below

commit 3bf07190510e1dc9a882914204995a9b63a79d60
Author: Yikun Jiang 
AuthorDate: Mon Jul 11 22:08:07 2022 +0900

[SPARK-39735][INFRA] Move image condition to jobs to make non-master 
schedule job work

### What changes were proposed in this pull request?
Currently, we only enable the infra-image when `input.branches == master`, 
otherwise infra-image will skip, and branches schedule job will failed due to 
empty container name.

### Why are the changes needed?
We need to move image condition to jobs to make branch schedule job work.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
- base image push the [right 
image](https://github.com/Yikun/spark/runs/7280303774?check_suite_focus=true#step:7:209
 )
- precondition generate the [right image 
url](https://github.com/Yikun/spark/runs/7280249393?check_suite_focus=true#step:5:4)
- pyspark use [right image 
url](https://github.com/Yikun/spark/runs/7280366394?check_suite_focus=true#step:2:19)
- lint use [right 
image:](https://github.com/Yikun/spark/runs/7280366124?check_suite_focus=true#step:2:19)

Closes #37155 from Yikun/SPARK-39736-precondition.

Authored-by: Yikun Jiang 
Signed-off-by: Hyukjin Kwon 
---
 .github/workflows/build_and_test.yml | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index febc6383c2e..4949c6159d4 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -56,6 +56,11 @@ jobs:
   GITHUB_PREV_SHA: ${{ github.event.before }}
 outputs:
   required: ${{ steps.set-outputs.outputs.required }}
+  image_url: >-
+${{
+  (inputs.branch == 'master' && 
steps.infra-image-outputs.outputs.image_url)
+  || 'dongjoon/apache-spark-github-action-image:20220207'
+}}
 steps:
 - name: Checkout Spark repository
   uses: actions/checkout@v2
@@ -108,6 +113,14 @@ jobs:
   precondition="${precondition//$'\n'/'%0A'}"
   echo "::set-output name=required::$precondition"
 fi
+- name: Generate infra image URL
+  id: infra-image-outputs
+  run: |
+# Convert to lowercase to meet Docker repo name requirement
+REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
+IMG_NAME="apache-spark-ci-image:${{ inputs.branch }}-${{ github.run_id 
}}"
+IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME"
+echo ::set-output name=image_url::$IMG_URL
 
   # Build: build Spark and run the tests for specified modules.
   build:
@@ -260,21 +273,7 @@ jobs:
   fromJson(needs.precondition.outputs.required).lint == 'true') &&
   inputs.branch == 'master'
 runs-on: ubuntu-latest
-outputs:
-  image_url: >-
-${{
-  (inputs.branch == 'master' && 
steps.infra-image-outputs.outputs.image_url)
-  || 'dongjoon/apache-spark-github-action-image:20220207'
-}}
 steps:
-  - name: Generate image name and url
-id: infra-image-outputs
-run: |
-  # Convert to lowercase to meet docker repo name requirement
-  REPO_OWNER=$(echo "${{ github.repository_owner }}" | tr '[:upper:]' 
'[:lower:]')
-  IMG_NAME="apache-spark-ci-image:${{ inputs.branch }}-${{ 
github.run_id }}"
-  IMG_URL="ghcr.io/$REPO_OWNER/$IMG_NAME"
-  echo ::set-output name=image_url::$IMG_URL
   - name: Login to GitHub Container Registry
 uses: docker/login-action@v2
 with:
@@ -306,7 +305,7 @@ jobs:
   context: ./dev/infra/
   push: true
   tags: |
-${{ steps.infra-image-outputs.outputs.image_url }}
+${{ needs.precondition.outputs.image_url }}
   # Use the infra image cache to speed up
   cache-from: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-cache:${{
 inputs.branch }}
 
@@ -317,7 +316,7 @@ jobs:
 name: "Build modules: ${{ matrix.modules }}"
 runs-on: ubuntu-20.04
 container:
-  image: ${{ needs.infra-image.outputs.image_url }}
+  image: ${{ needs.precondition.outputs.image_url }}
 strategy:
   fail-fast: false
   matrix:
@@ -498,7 +497,7 @@ jobs:
   PYSPARK_DRIVER_PYTHON: python3.9
   PYSPARK_PYTHON: python3.9
 container:
-  image: ${{ needs.infra-image.outputs.image_url }}
+  image: ${{ needs.precondition.outputs.image_url }}
 steps:
 

[spark] branch master updated: [SQL][MINOR] Move general char/varchar test to the base test suite

2022-07-11 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 61233769ffa [SQL][MINOR] Move general char/varchar test to the base 
test suite
61233769ffa is described below

commit 61233769ffabbbed243bc8c65b7a788c04e57244
Author: Wenchen Fan 
AuthorDate: Mon Jul 11 13:28:15 2022 +0300

[SQL][MINOR] Move general char/varchar test to the base test suite

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/32501 . It moves 
a general char/varchar test from file source suite to the base char/varchar 
suite, so that it will be verified in all table formats, including v2.

### Why are the changes needed?

improve test coverage

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #37152 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: Max Gekk 
---
 .../apache/spark/sql/CharVarcharTestSuite.scala| 25 +++---
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
index 978e3f8d36d..321a838f276 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala
@@ -672,6 +672,18 @@ trait CharVarcharTestSuite extends QueryTest with 
SQLTestUtils {
   }
 }
   }
+
+  test("SPARK-35359: create table and insert data over length values") {
+Seq("char", "varchar").foreach { typ =>
+  withSQLConf((SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key, "true")) {
+withTable("t") {
+  sql(s"CREATE TABLE t (col $typ(2)) using $format")
+  sql("INSERT INTO t SELECT 'aaa'")
+  checkAnswer(sql("select * from t"), Row("aaa"))
+}
+  }
+}
+  }
 }
 
 // Some basic char/varchar tests which doesn't rely on table implementation.
@@ -799,7 +811,6 @@ class FileSourceCharVarcharTestSuite extends 
CharVarcharTestSuite with SharedSpa
 withTable("t") {
   sql("SELECT '12' as col").write.format(format).save(dir.toString)
   sql(s"CREATE TABLE t (col $typ(2)) using $format LOCATION '$dir'")
-  val df = sql("select * from t")
   checkAnswer(sql("select * from t"), Row("12"))
 }
   }
@@ -818,18 +829,6 @@ class FileSourceCharVarcharTestSuite extends 
CharVarcharTestSuite with SharedSpa
 }
   }
 
-  test("SPARK-35359: create table and insert data over length values") {
-Seq("char", "varchar").foreach { typ =>
-  withSQLConf((SQLConf.LEGACY_CHAR_VARCHAR_AS_STRING.key, "true")) {
-withTable("t") {
-  sql(s"CREATE TABLE t (col $typ(2)) using $format")
-  sql("INSERT INTO t SELECT 'aaa'")
-  checkAnswer(sql("select * from t"), Row("aaa"))
-}
-  }
-}
-  }
-
   test("alter table set location w/ fit length values") {
 Seq("char", "varchar").foreach { typ =>
   withTempPath { dir =>


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [MINOR][SQL] Add docstring for function pyspark.sql.functions.timestamp_seconds

2022-07-11 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 775b5ced70c [MINOR][SQL] Add docstring for function 
pyspark.sql.functions.timestamp_seconds
775b5ced70c is described below

commit 775b5ced70c87d6a8709f44eb4a5de48c286a51d
Author: moritzkoerber 
AuthorDate: Mon Jul 11 13:26:42 2022 +0300

[MINOR][SQL] Add docstring for function 
pyspark.sql.functions.timestamp_seconds

### What changes were proposed in this pull request?
The documentation of the function `pyspark.sql.functions.timestamp_seconds` 
currently features an example but no text describing the function. This PR adds 
the missing text based on the docstring of 
`pyspark.sql.functions.from_unixtime`.

### Why are the changes needed?
The docstring is currently missing.

### Does this PR introduce _any_ user-facing change?
Yes, this PR adds a docstring that will be published in Spark's 
documentation.

### How was this patch tested?
No tests were added because this PR just adds text in a docstring.

Closes #36944 from moritzkoerber/add-timestamps_seconds-docstring.

Lead-authored-by: moritzkoerber 
Co-authored-by: Moritz Körber 
Signed-off-by: Max Gekk 
---
 R/pkg/R/functions.R  |  6 --
 python/pyspark/sql/functions.py  | 10 --
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala |  3 ++-
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/R/pkg/R/functions.R b/R/pkg/R/functions.R
index 1377f0daa73..d772c9bd4e4 100644
--- a/R/pkg/R/functions.R
+++ b/R/pkg/R/functions.R
@@ -3256,7 +3256,8 @@ setMethod("format_string", signature(format = 
"character", x = "Column"),
 #' tmp <- mutate(df, to_unix = unix_timestamp(df$time),
 #'   to_unix2 = unix_timestamp(df$time, '-MM-dd HH'),
 #'   from_unix = from_unixtime(unix_timestamp(df$time)),
-#'   from_unix2 = from_unixtime(unix_timestamp(df$time), 
'-MM-dd HH:mm'))
+#'   from_unix2 = from_unixtime(unix_timestamp(df$time), 
'-MM-dd HH:mm'),
+#'   timestamp_from_unix = 
timestamp_seconds(unix_timestamp(df$time)))
 #' head(tmp)}
 #' @note from_unixtime since 1.5.0
 setMethod("from_unixtime", signature(x = "Column"),
@@ -4854,7 +4855,8 @@ setMethod("current_timestamp",
   })
 
 #' @details
-#' \code{timestamp_seconds}: Creates timestamp from the number of seconds 
since UTC epoch.
+#' \code{timestamp_seconds}: Converts the number of seconds from the Unix epoch
+#' (1970-01-01T00:00:00Z) to a timestamp.
 #'
 #' @rdname column_datetime_functions
 #' @aliases timestamp_seconds timestamp_seconds,Column-method
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 3112690cc68..db99dbfc400 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2519,19 +2519,25 @@ def to_utc_timestamp(timestamp: "ColumnOrName", tz: 
"ColumnOrName") -> Column:
 
 def timestamp_seconds(col: "ColumnOrName") -> Column:
 """
+Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z)
+to a timestamp.
+
 .. versionadded:: 3.1.0
 
 Examples
 
 >>> from pyspark.sql.functions import timestamp_seconds
->>> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
+>>> spark.conf.set("spark.sql.session.timeZone", "UTC")
 >>> time_df = spark.createDataFrame([(1230219000,)], ['unix_time'])
 >>> time_df.select(timestamp_seconds(time_df.unix_time).alias('ts')).show()
 +---+
 | ts|
 +---+
-|2008-12-25 07:30:00|
+|2008-12-25 15:30:00|
 +---+
+>>> 
time_df.select(timestamp_seconds('unix_time').alias('ts')).printSchema()
+root
+ |-- ts: timestamp (nullable = true)
 >>> spark.conf.unset("spark.sql.session.timeZone")
 """
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 814a2e472f7..c056baba8ba 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -3826,7 +3826,8 @@ object functions {
   }
 
   /**
-   * Creates timestamp from the number of seconds since UTC epoch.
+   * Converts the number of seconds from the Unix epoch (1970-01-01T00:00:00Z)
+   * to a timestamp.
* @group datetime_funcs
* @since 3.1.0
*/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a68bcc62244 -> 85f422fc142)

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a68bcc62244 [SPARK-39728][PYTHON] Add explicit PySpark SQL function 
parity check
 add 85f422fc142 [SPARK-39739][BUILD] Upgrade sbt to 1.7.0

No new revisions were added by this update.

Summary of changes:
 dev/appveyor-install-dependencies.ps1 | 2 +-
 project/build.properties  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39728][PYTHON] Add explicit PySpark SQL function parity check

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a68bcc62244 [SPARK-39728][PYTHON] Add explicit PySpark SQL function 
parity check
a68bcc62244 is described below

commit a68bcc622446fa85414286da9563da3bcdf1fbaa
Author: Andrew Ray 
AuthorDate: Mon Jul 11 16:30:45 2022 +0900

[SPARK-39728][PYTHON] Add explicit PySpark SQL function parity check

### What changes were proposed in this pull request?

This PR adds a test that compares the available list of Python DataFrame 
functions in pyspark.sql.functions with those available in the Scala/Java 
DataFrame API in org.apache.spark.sql.functions. If a function is added to only 
one but not the other this test will fail until its exclusions are updated.

### Why are the changes needed?

Currently there is no easy way to verify what functions are missing from 
the Python DataFrame API

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

This PR is testing only

Closes #37144 from aray/python-function-parity-test.

Authored-by: Andrew Ray 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/sql/tests/test_functions.py | 56 ++
 1 file changed, 56 insertions(+)

diff --git a/python/pyspark/sql/tests/test_functions.py 
b/python/pyspark/sql/tests/test_functions.py
index 5c6acaffa32..5091fa711a8 100644
--- a/python/pyspark/sql/tests/test_functions.py
+++ b/python/pyspark/sql/tests/test_functions.py
@@ -16,6 +16,7 @@
 #
 
 import datetime
+from inspect import getmembers, isfunction
 from itertools import chain
 import re
 import math
@@ -51,10 +52,65 @@ from pyspark.sql.functions import (
 slice,
 least,
 )
+from pyspark.sql import functions
 from pyspark.testing.sqlutils import ReusedSQLTestCase, SQLTestUtils
 
 
 class FunctionsTests(ReusedSQLTestCase):
+def test_function_parity(self):
+# This test compares the available list of functions in 
pyspark.sql.functions with those
+# available in the Scala/Java DataFrame API in 
org.apache.spark.sql.functions.
+#
+# NOTE FOR DEVELOPERS:
+# If this test fails one of the following needs to happen
+# * If a function was added to org.apache.spark.sql.functions it 
either needs to be added to
+# pyspark.sql.functions or added to the below 
expected_missing_in_py set.
+# * If a function was added to pyspark.sql.functions that was already 
in
+# org.apache.spark.sql.functions then it needs to be removed from 
expected_missing_in_py
+# below. If the function has a different name it needs to be added 
to py_equiv_jvm
+# mapping.
+# * If it's not related to an added/removed function then likely the 
exclusion list
+# jvm_excluded_fn needs to be updated.
+
+jvm_fn_set = {name for (name, value) in 
getmembers(self.sc._jvm.functions)}
+py_fn_set = {name for (name, value) in getmembers(functions, 
isfunction) if name[0] != "_"}
+
+# Functions on the JVM side we do not expect to be available in python 
because they are
+# depreciated, irrelevant to python, or have equivalents.
+jvm_excluded_fn = [
+"callUDF",  # depreciated, use call_udf
+"typedlit",  # Scala only
+"typedLit",  # Scala only
+"monotonicallyIncreasingId",  # depreciated, use 
monotonically_increasing_id
+"negate",  # equivalent to python -expression
+"not",  # equivalent to python ~expression
+"udaf",  # used for creating UDAF's which are not supported in 
PySpark
+]
+
+jvm_fn_set.difference_update(jvm_excluded_fn)
+
+# For functions that are named differently in pyspark this is the 
mapping of their
+# python name to the JVM equivalent
+py_equiv_jvm = {"create_map": "map"}
+for py_name, jvm_name in py_equiv_jvm.items():
+if py_name in py_fn_set:
+py_fn_set.remove(py_name)
+py_fn_set.add(jvm_name)
+
+missing_in_py = jvm_fn_set.difference(py_fn_set)
+
+# Functions that we expect to be missing in python until they are 
added to pyspark
+expected_missing_in_py = {
+"call_udf",  # TODO(SPARK-39734)
+"localtimestamp",  # TODO(SPARK-36259)
+"map_contains_key",  # TODO(SPARK-39733)
+"pmod",  # TODO(SPARK-37348)
+}
+
+self.assertEqual(
+expected_missing_in_py, missing_in_py, "Missing functions in 
pyspark not as expected"
+)
+
 def test_explode(self):
 from pyspark.sql.functions import explode, explode_outer, 
posexplode_outer
 



[spark] branch master updated (a1cc5d60c5f -> 49143acb97b)

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from a1cc5d60c5f [SPARK-39735][INFRA] Enable base image build in lint job
 add 49143acb97b [MINOR] Remove redundant return

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala   | 2 +-
 .../scala/org/apache/spark/serializer/SerializationDebugger.scala | 8 
 .../scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala   | 4 ++--
 core/src/main/scala/org/apache/spark/storage/BlockManager.scala   | 2 +-
 .../org/apache/spark/storage/BlockManagerDecommissioner.scala | 2 +-
 core/src/main/scala/org/apache/spark/util/SizeEstimator.scala | 2 +-
 .../main/scala/org/apache/spark/repl/ExecutorClassLoader.scala| 8 
 7 files changed, 14 insertions(+), 14 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (d5e9c5801cb -> a1cc5d60c5f)

2022-07-11 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from d5e9c5801cb [SPARK-39719][R] Implement databaseExists/getDatabase in 
SparkR support 3L namespace
 add a1cc5d60c5f [SPARK-39735][INFRA] Enable base image build in lint job

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 25 ++---
 dev/infra/Dockerfile |  3 +++
 2 files changed, 17 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-39719][R] Implement databaseExists/getDatabase in SparkR support 3L namespace

2022-07-11 Thread ruifengz
This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d5e9c5801cb [SPARK-39719][R] Implement databaseExists/getDatabase in 
SparkR support 3L namespace
d5e9c5801cb is described below

commit d5e9c5801cb1d0c8cb545b679261bd94b5ae0280
Author: Ruifeng Zheng 
AuthorDate: Mon Jul 11 14:23:12 2022 +0800

[SPARK-39719][R] Implement databaseExists/getDatabase in SparkR support 3L 
namespace

### What changes were proposed in this pull request?
1, add `databaseExists`/`getDatabase`
2, make sure `listTables` support 3L namespace
3, modify sparkR-specific catalog method `tables` and `tableNames` to 
support 3L namespace

### Why are the changes needed?
to support 3L namespace in SparkR

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
updated UT and manual check

Closes #37132 from zhengruifeng/r_3L_dbname.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 R/pkg/NAMESPACE|  2 +
 R/pkg/R/catalog.R  | 72 +-
 R/pkg/pkgdown/_pkgdown_template.yml|  2 +
 R/pkg/tests/fulltests/test_sparkSQL.R  | 34 +-
 .../org/apache/spark/sql/api/r/SQLUtils.scala  |  2 +-
 5 files changed, 107 insertions(+), 5 deletions(-)

diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index f5f60ecf134..3937791421a 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -476,8 +476,10 @@ export("as.DataFrame",
"createTable",
"currentCatalog",
"currentDatabase",
+   "databaseExists",
"dropTempTable",
"dropTempView",
+   "getDatabase",
"getTable",
"listCatalogs",
"listColumns",
diff --git a/R/pkg/R/catalog.R b/R/pkg/R/catalog.R
index 8237ac26b33..680415ea6cd 100644
--- a/R/pkg/R/catalog.R
+++ b/R/pkg/R/catalog.R
@@ -278,13 +278,14 @@ dropTempView <- function(viewName) {
 #' Returns a SparkDataFrame containing names of tables in the given database.
 #'
 #' @param databaseName (optional) name of the database
+#' The database name can be qualified with catalog name 
since 3.4.0.
 #' @return a SparkDataFrame
 #' @rdname tables
 #' @seealso \link{listTables}
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' tables("hive")
+#' tables("spark_catalog.hive")
 #' }
 #' @name tables
 #' @note tables since 1.4.0
@@ -298,12 +299,13 @@ tables <- function(databaseName = NULL) {
 #' Returns the names of tables in the given database as an array.
 #'
 #' @param databaseName (optional) name of the database
+#' The database name can be qualified with catalog name 
since 3.4.0.
 #' @return a list of table names
 #' @rdname tableNames
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' tableNames("hive")
+#' tableNames("spark_catalog.hive")
 #' }
 #' @name tableNames
 #' @note tableNames since 1.4.0
@@ -356,6 +358,28 @@ setCurrentDatabase <- function(databaseName) {
   invisible(handledCallJMethod(catalog, "setCurrentDatabase", databaseName))
 }
 
+#' Checks if the database with the specified name exists.
+#'
+#' Checks if the database with the specified name exists.
+#'
+#' @param databaseName name of the database, allowed to be qualified with 
catalog name
+#' @rdname databaseExists
+#' @name databaseExists
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' databaseExists("spark_catalog.default")
+#' }
+#' @note since 3.4.0
+databaseExists <- function(databaseName) {
+  sparkSession <- getSparkSession()
+  if (class(databaseName) != "character") {
+stop("databaseName must be a string.")
+  }
+  catalog <- callJMethod(sparkSession, "catalog")
+  callJMethod(catalog, "databaseExists", databaseName)
+}
+
 #' Returns a list of databases available
 #'
 #' Returns a list of databases available.
@@ -375,12 +399,54 @@ listDatabases <- function() {
   dataFrame(callJMethod(callJMethod(catalog, "listDatabases"), "toDF"))
 }
 
+#' Get the database with the specified name
+#'
+#' Get the database with the specified name
+#'
+#' @param databaseName name of the database, allowed to be qualified with 
catalog name
+#' @return A named list.
+#' @rdname getDatabase
+#' @name getDatabase
+#' @examples
+#' \dontrun{
+#' sparkR.session()
+#' db <- getDatabase("default")
+#' }
+#' @note since 3.4.0
+getDatabase <- function(databaseName) {
+  sparkSession <- getSparkSession()
+  if (class(databaseName) != "character") {
+stop("databaseName must be a string.")
+  }
+  catalog <- callJMethod(sparkSession, "catalog")
+  jdb <- handledCallJMethod(catalog, "getDatabase", databaseName)
+
+  ret <- list(name = callJMethod(jdb, "name"))
+  jcata <- callJMethod(jdb, "catalog")
+  if (is.null(jcata)) {
+ret$catalog <- NA
+  } else {