[spark] branch branch-3.2 updated: [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame

2022-03-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new e683932  [SPARK-38487][PYTHON][DOC] Fix docstrings of 
nlargest/nsmallest of DataFrame
e683932 is described below

commit e683932495fae444b2c17a755d9a660a6c2d63ef
Author: Xinrong Meng 
AuthorDate: Thu Mar 10 15:32:48 2022 +0900

[SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame

Fix docstrings of nlargest/nsmallest of DataFrame

To make docstring less confusing.

No.

Manual test.

Closes #35793 from xinrong-databricks/frame.ntop.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit c483e2977cbc6ae33d999c9c9d1dbacd9c53d85a)
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/frame.py | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index e576789..efc677b 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -7198,7 +7198,7 @@ defaultdict(, {'col..., 'col...})]
 )
 return internal
 
-# TODO:  add keep = First
+# TODO: add keep = First
 def nlargest(self, n: int, columns: Union[Name, List[Name]]) -> 
"DataFrame":
 """
 Return the first `n` rows ordered by `columns` in descending order.
@@ -7255,7 +7255,7 @@ defaultdict(, {'col..., 'col...})]
 6  NaN  12
 
 In the following example, we will use ``nlargest`` to select the three
-rows having the largest values in column "population".
+rows having the largest values in column "X".
 
 >>> df.nlargest(n=3, columns='X')
  X   Y
@@ -7263,12 +7263,14 @@ defaultdict(, {'col..., 'col...})]
 4  6.0  10
 3  5.0   9
 
+To order by the largest values in column "Y" and then "X", we can
+specify multiple columns like in the next example.
+
 >>> df.nlargest(n=3, columns=['Y', 'X'])
  X   Y
 6  NaN  12
 5  7.0  11
 4  6.0  10
-
 """
 return self.sort_values(by=columns, ascending=False).head(n=n)
 
@@ -7318,7 +7320,7 @@ defaultdict(, {'col..., 'col...})]
 6  NaN  12
 
 In the following example, we will use ``nsmallest`` to select the
-three rows having the smallest values in column "a".
+three rows having the smallest values in column "X".
 
 >>> df.nsmallest(n=3, columns='X') # doctest: +NORMALIZE_WHITESPACE
  X   Y
@@ -7326,7 +7328,7 @@ defaultdict(, {'col..., 'col...})]
 1  2.0   7
 2  3.0   8
 
-To order by the largest values in column "a" and then "c", we can
+To order by the smallest values in column "Y" and then "X", we can
 specify multiple columns like in the next example.
 
 >>> df.nsmallest(n=3, columns=['Y', 'X']) # doctest: 
+NORMALIZE_WHITESPACE

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame

2022-03-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c483e29  [SPARK-38487][PYTHON][DOC] Fix docstrings of 
nlargest/nsmallest of DataFrame
c483e29 is described below

commit c483e2977cbc6ae33d999c9c9d1dbacd9c53d85a
Author: Xinrong Meng 
AuthorDate: Thu Mar 10 15:32:48 2022 +0900

[SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame

### What changes were proposed in this pull request?
Fix docstrings of nlargest/nsmallest of DataFrame

### Why are the changes needed?
To make docstring less confusing.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Manual test.

Closes #35793 from xinrong-databricks/frame.ntop.

Authored-by: Xinrong Meng 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/pandas/frame.py | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index d4803eb..64a6471 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -7283,7 +7283,7 @@ defaultdict(, {'col..., 'col...})]
 )
 return internal
 
-# TODO:  add keep = First
+# TODO: add keep = First
 def nlargest(self, n: int, columns: Union[Name, List[Name]]) -> 
"DataFrame":
 """
 Return the first `n` rows ordered by `columns` in descending order.
@@ -7340,7 +7340,7 @@ defaultdict(, {'col..., 'col...})]
 6  NaN  12
 
 In the following example, we will use ``nlargest`` to select the three
-rows having the largest values in column "population".
+rows having the largest values in column "X".
 
 >>> df.nlargest(n=3, columns='X')
  X   Y
@@ -7348,12 +7348,14 @@ defaultdict(, {'col..., 'col...})]
 4  6.0  10
 3  5.0   9
 
+To order by the largest values in column "Y" and then "X", we can
+specify multiple columns like in the next example.
+
 >>> df.nlargest(n=3, columns=['Y', 'X'])
  X   Y
 6  NaN  12
 5  7.0  11
 4  6.0  10
-
 """
 return self.sort_values(by=columns, ascending=False).head(n=n)
 
@@ -7403,7 +7405,7 @@ defaultdict(, {'col..., 'col...})]
 6  NaN  12
 
 In the following example, we will use ``nsmallest`` to select the
-three rows having the smallest values in column "a".
+three rows having the smallest values in column "X".
 
 >>> df.nsmallest(n=3, columns='X') # doctest: +NORMALIZE_WHITESPACE
  X   Y
@@ -7411,7 +7413,7 @@ defaultdict(, {'col..., 'col...})]
 1  2.0   7
 2  3.0   8
 
-To order by the largest values in column "a" and then "c", we can
+To order by the smallest values in column "Y" and then "X", we can
 specify multiple columns like in the next example.
 
 >>> df.nsmallest(n=3, columns=['Y', 'X']) # doctest: 
+NORMALIZE_WHITESPACE

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class"

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new c52c34a  Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when 
mounting persistent volume with storage class"
c52c34a is described below

commit c52c34a8429bf7fe58c1d5d33117974b602905a3
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 9 21:39:26 2022 -0800

Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting 
persistent volume with storage class"

This reverts commit f97c3b37b85fecd78627e7f85da1f2edbcc75910.
---
 .../k8s/features/MountVolumesFeatureStep.scala |  2 +-
 .../features/MountVolumesFeatureStepSuite.scala| 25 --
 2 files changed, 1 insertion(+), 26 deletions(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
index 78dd6ec..4e16473 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
@@ -85,7 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: 
KubernetesConf)
   .withApiVersion("v1")
   .withNewMetadata()
 .withName(claimName)
-.addToLabels(SPARK_APP_ID_LABEL, conf.appId)
+.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId)
 .endMetadata()
   .withNewSpec()
 .withStorageClassName(storageClass.get)
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
index 468d1dd..38f8fac 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
@@ -89,31 +89,6 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite {
 assert(executorPVC.getClaimName === 
s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}")
   }
 
-  test("SPARK-32713 Mounts parameterized persistentVolumeClaims in executors 
with storage class") {
-val volumeConf = KubernetesVolumeSpec(
-  "testVolume",
-  "/tmp",
-  "",
-  true,
-  KubernetesPVCVolumeConf("pvc-spark-SPARK_EXECUTOR_ID", Some("fast"), 
Some("512mb"))
-)
-val driverConf = KubernetesTestConf.createDriverConf(volumes = 
Seq(volumeConf))
-val driverStep = new MountVolumesFeatureStep(driverConf)
-val driverPod = driverStep.configurePod(SparkPod.initialPod())
-
-assert(driverPod.pod.getSpec.getVolumes.size() === 1)
-val driverPVC = 
driverPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim
-assert(driverPVC.getClaimName === "pvc-spark-SPARK_EXECUTOR_ID")
-
-val executorConf = KubernetesTestConf.createExecutorConf(volumes = 
Seq(volumeConf))
-val executorStep = new MountVolumesFeatureStep(executorConf)
-val executorPod = executorStep.configurePod(SparkPod.initialPod())
-
-assert(executorPod.pod.getSpec.getVolumes.size() === 1)
-val executorPVC = 
executorPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim
-assert(executorPVC.getClaimName === 
s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}")
-  }
-
   test("Create and mounts persistentVolumeClaims in driver") {
 val volumeConf = KubernetesVolumeSpec(
   "testVolume",

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ec544ad -> e5a86a3)

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ec544ad  [SPARK-38148][SQL] Do not add dynamic partition pruning if 
there exists static partition pruning
 add e5a86a3  [SPARK-38453][K8S][DOCS] Add `volcano` section to K8s IT 
`README.md`

No new revisions were added by this update.

Summary of changes:
 .../kubernetes/integration-tests/README.md  | 21 +
 1 file changed, 21 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-3.2 updated: [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new f97c3b3  [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting 
persistent volume with storage class
f97c3b3 is described below

commit f97c3b37b85fecd78627e7f85da1f2edbcc75910
Author: Thomas Graves 
AuthorDate: Wed Mar 9 21:06:25 2022 -0800

[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent 
volume with storage class

### What changes were proposed in this pull request?

Running spark-shell in client mode on Kubernetes cluster when mounting 
persistent volumes with a storage class results in a big warning being thrown 
on startup.

https://issues.apache.org/jira/browse/SPARK-38379

The issue here is there is a race condition between when spark.app.id is 
set in SparkContext and when its used, so change to use the KubernetesConf 
appId, which is what is used to set spark.app.id.

### Why are the changes needed?

Throws big warning to user and I believe the label is wrong as well.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test added.  The test fails without the fix.
Also manually tested on real k8s cluster.

Closes #35792 from tgravescs/fixVolk8s.

Authored-by: Thomas Graves 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit f286416ee16e878de3c70a31cef20549b33aaa0a)
Signed-off-by: Dongjoon Hyun 
---
 .../k8s/features/MountVolumesFeatureStep.scala |  2 +-
 .../features/MountVolumesFeatureStepSuite.scala| 25 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
index 4e16473..78dd6ec 100644
--- 
a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
+++ 
b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
@@ -85,7 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: 
KubernetesConf)
   .withApiVersion("v1")
   .withNewMetadata()
 .withName(claimName)
-.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId)
+.addToLabels(SPARK_APP_ID_LABEL, conf.appId)
 .endMetadata()
   .withNewSpec()
 .withStorageClassName(storageClass.get)
diff --git 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
index 38f8fac..468d1dd 100644
--- 
a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
+++ 
b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala
@@ -89,6 +89,31 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite {
 assert(executorPVC.getClaimName === 
s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}")
   }
 
+  test("SPARK-32713 Mounts parameterized persistentVolumeClaims in executors 
with storage class") {
+val volumeConf = KubernetesVolumeSpec(
+  "testVolume",
+  "/tmp",
+  "",
+  true,
+  KubernetesPVCVolumeConf("pvc-spark-SPARK_EXECUTOR_ID", Some("fast"), 
Some("512mb"))
+)
+val driverConf = KubernetesTestConf.createDriverConf(volumes = 
Seq(volumeConf))
+val driverStep = new MountVolumesFeatureStep(driverConf)
+val driverPod = driverStep.configurePod(SparkPod.initialPod())
+
+assert(driverPod.pod.getSpec.getVolumes.size() === 1)
+val driverPVC = 
driverPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim
+assert(driverPVC.getClaimName === "pvc-spark-SPARK_EXECUTOR_ID")
+
+val executorConf = KubernetesTestConf.createExecutorConf(volumes = 
Seq(volumeConf))
+val executorStep = new MountVolumesFeatureStep(executorConf)
+val executorPod = executorStep.configurePod(SparkPod.initialPod())
+
+assert(executorPod.pod.getSpec.getVolumes.size() === 1)
+val executorPVC = 
executorPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim
+assert(executorPVC.getClaimName === 
s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}")
+  }
+
   test("Create and mounts persistentVolumeClaims in driver") {
 val volumeConf = KubernetesVolumeSpec(
   "testVolume",

-
To unsub

[spark] branch master updated (82b6194 -> f286416)

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 82b6194  [SPARK-38385][SQL] Improve error messages of empty statement 
and  in ParseException
 add f286416  [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting 
persistent volume with storage class

No new revisions were added by this update.

Summary of changes:
 .../k8s/features/MountVolumesFeatureStep.scala |  2 +-
 .../features/MountVolumesFeatureStepSuite.scala| 25 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (f286416 -> ec544ad)

2022-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f286416  [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting 
persistent volume with storage class
 add ec544ad  [SPARK-38148][SQL] Do not add dynamic partition pruning if 
there exists static partition pruning

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/SparkOptimizer.scala   |  2 +
 .../CleanupDynamicPruningFilters.scala | 38 --
 .../approved-plans-v1_4/q13.sf100/explain.txt  |  4 +-
 .../approved-plans-v1_4/q13/explain.txt|  4 +-
 .../spark/sql/DynamicPartitionPruningSuite.scala   | 45 ++
 5 files changed, 85 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (ecabfb1c9 -> 82b6194)

2022-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ecabfb1c9 [SPARK-38187][K8S][TESTS] Add K8S IT for `volcano` 
minResources cpu/memory spec
 add 82b6194  [SPARK-38385][SQL] Improve error messages of empty statement 
and  in ParseException

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json  |  4 
 .../apache/spark/sql/catalyst/parser/ParseDriver.scala|  7 ++-
 .../sql/catalyst/parser/SparkParserErrorStrategy.scala|  7 ++-
 .../spark/sql/catalyst/parser/ErrorParserSuite.scala  | 15 +++
 .../test/resources/sql-tests/results/show-tables.sql.out  |  2 +-
 5 files changed, 32 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd08e79 -> ecabfb1c9)

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd08e79  [SPARK-38355][PYTHON][TESTS] Use `mkstemp` instead of `mktemp`
 add ecabfb1c9 [SPARK-38187][K8S][TESTS] Add K8S IT for `volcano` 
minResources cpu/memory spec

No new revisions were added by this update.

Summary of changes:
 ...ate.yml => driver-podgroup-template-cpu-2u.yml} |  5 +-
 yml => driver-podgroup-template-memory-3g.yml} |  5 +-
 .../volcano/{enable-queue.yml => queue-2u-3g.yml}  |  5 +-
 .../k8s/integrationtest/VolcanoTestsSuite.scala| 68 +-
 4 files changed, 76 insertions(+), 7 deletions(-)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{queue-driver-podgroup-template.yml
 => driver-podgroup-template-cpu-2u.yml} (92%)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{queue0-driver-podgroup-template.yml
 => driver-podgroup-template-memory-3g.yml} (92%)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{enable-queue.yml
 => queue-2u-3g.yml} (94%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0f4c26a -> bd08e79)

2022-03-09 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0f4c26a  [SPARK-38387][PYTHON] Support `na_action` and Series input 
correspondence in `Series.map`
 add bd08e79  [SPARK-38355][PYTHON][TESTS] Use `mkstemp` instead of `mktemp`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/testing/pandasutils.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (01014aa -> 0f4c26a)

2022-03-09 Thread ueshin
This is an automated email from the ASF dual-hosted git repository.

ueshin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 01014aa  [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube 
version to 1.18.0
 add 0f4c26a  [SPARK-38387][PYTHON] Support `na_action` and Series input 
correspondence in `Series.map`

No new revisions were added by this update.

Summary of changes:
 python/pyspark/pandas/series.py| 36 +-
 python/pyspark/pandas/tests/test_series.py | 25 +++--
 2 files changed, 53 insertions(+), 8 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon closed pull request #381: Regenerate PySpark API documentation for Spark 3.2.1

2022-03-09 Thread GitBox


HyukjinKwon closed pull request #381:
URL: https://github.com/apache/spark-website/pull/381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] HyukjinKwon commented on pull request #381: Regenerate PySpark API documentation for Spark 3.2.1

2022-03-09 Thread GitBox


HyukjinKwon commented on pull request #381:
URL: https://github.com/apache/spark-website/pull/381#issuecomment-1063531527


   Merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 01014aa  [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube 
version to 1.18.0
01014aa is described below

commit 01014aa99fa851411262a6719058dde97319bbb3
Author: Dongjoon Hyun 
AuthorDate: Wed Mar 9 16:22:04 2022 -0800

[SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0

### What changes were proposed in this pull request?

This PR aims to upgrade the minimum Minikube version to 1.18.0 from 1.7.3 
at Apache Spark 3.3.0.

### Why are the changes needed?

Minikube v1.18.0 was released one year ago on March 2021, and the first 
version supporting Apple Silicon natively. Previously, there exists some issues 
while running Intel arch binary on Apple Silicon.
- 
https://github.com/kubernetes/minikube/releases/download/v1.18.0/minikube-darwin-arm64
- https://github.com/kubernetes/minikube/releases/tag/v1.18.0

### Does this PR introduce _any_ user-facing change?

No, this is a test-only PR.

### How was this patch tested?

Manually.

Closes #35791 from dongjoon-hyun/SPARK-38486.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 resource-managers/kubernetes/integration-tests/README.md  | 4 ++--
 .../spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/resource-managers/kubernetes/integration-tests/README.md 
b/resource-managers/kubernetes/integration-tests/README.md
index 9eb928d..ac82282 100644
--- a/resource-managers/kubernetes/integration-tests/README.md
+++ b/resource-managers/kubernetes/integration-tests/README.md
@@ -28,7 +28,7 @@ To run tests with Hadoop 2.x instead of Hadoop 3.x, use 
`--hadoop-profile`.
 
 ./dev/dev-run-integration-tests.sh --hadoop-profile hadoop-2
 
-The minimum tested version of Minikube is 1.7.3. The kube-dns addon must be 
enabled. Minikube should
+The minimum tested version of Minikube is 1.18.0. The kube-dns addon must be 
enabled. Minikube should
 run with a minimum of 4 CPUs and 6G of memory:
 
 minikube start --cpus 4 --memory 6144
@@ -47,7 +47,7 @@ default this is set to `minikube`, the available backends are 
their prerequisite
 
 ### `minikube`
 
-Uses the local `minikube` cluster, this requires that `minikube` 1.7.3 or 
greater be installed and that it be allocated
+Uses the local `minikube` cluster, this requires that `minikube` 1.18.0 or 
greater be installed and that it be allocated
 at least 4 CPUs and 6GB memory (some users have reported success with as few 
as 3 CPUs and 4GB memory).  The tests will 
 check if `minikube` is started and abort early if it isn't currently running.
 
diff --git 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
index 9f99ede..755feb9 100644
--- 
a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
+++ 
b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala
@@ -48,9 +48,9 @@ private[spark] object Minikube extends Logging {
 
 versionArrayOpt match {
   case Some(Array(x, y, z)) =>
-if (Ordering.Tuple3[Int, Int, Int].lt((x, y, z), (1, 7, 3))) {
+if (Ordering.Tuple3[Int, Int, Int].lt((x, y, z), (1, 18, 0))) {
   assert(false, s"Unsupported Minikube version is detected: 
$minikubeVersionString." +
-"For integration testing Minikube version 1.7.3 or greater is 
expected.")
+"For integration testing Minikube version 1.18.0 or greater is 
expected.")
 }
   case _ =>
 assert(false, s"Unexpected version format detected in 
`$minikubeVersionString`." +

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Add notice for CVE-2021-38296

2022-03-09 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 1569fce  Add notice for CVE-2021-38296
1569fce is described below

commit 1569fcefeb8b6deba7270acc928a27ee678b6118
Author: Sean Owen 
AuthorDate: Wed Mar 9 16:11:18 2022 -0600

Add notice for CVE-2021-38296

Author: Sean Owen 

Closes #382 from srowen/CVE-2021-38296.
---
 security.md| 27 +++
 site/security.html | 32 
 2 files changed, 59 insertions(+)

diff --git a/security.md b/security.md
index dc9a9e6..32bbb74 100644
--- a/security.md
+++ b/security.md
@@ -18,6 +18,33 @@ non-public list that will reach the Apache Security team, as 
well as the Spark P
 
 Known security issues
 
+CVE-2021-38296: Apache Spark™ Key Negotiation Vulnerability
+
+Severity: Medium
+
+Vendor: The Apache Software Foundation
+
+Versions Affected:
+
+- Apache Spark 3.1.2 and earlier
+
+Description:
+
+Apache Spark supports end-to-end encryption of RPC connections via 
`spark.authenticate` and `spark.network.crypto.enabled`. 
+In versions 3.1.2 and earlier, it uses a bespoke mutual authentication 
protocol that allows for full encryption key 
+recovery. After an initial interactive attack, this would allow someone to 
decrypt plaintext traffic offline. 
+Note that this does not affect security mechanisms controlled by 
`spark.authenticate.enableSaslEncryption`, 
+`spark.io.encryption.enabled`, `spark.ssl`, `spark.ui.strictTransportSecurity`.
+
+Mitigation:
+
+- Update to Spark 3.1.3 or later
+
+Credit:
+
+- Steve Weis (Databricks)
+
+
 CVE-2020-9480: Apache Spark™ RCE vulnerability in auth-enabled standalone 
master
 
 Severity: Important
diff --git a/site/security.html b/site/security.html
index ff3de6c..be0a8d8 100644
--- a/site/security.html
+++ b/site/security.html
@@ -155,6 +155,38 @@ non-public list that will reach the Apache Security team, 
as well as the Spark P
 
 Known security issues
 
+CVE-2021-38296: Apache Spark™ Key Negotiation Vulnerability
+
+Severity: Medium
+
+Vendor: The Apache Software Foundation
+
+Versions Affected:
+
+
+  Apache Spark 3.1.2 and earlier
+
+
+Description:
+
+Apache Spark supports end-to-end encryption of RPC connections via spark.authenticate and 
spark.network.crypto.enabled. 
+In versions 3.1.2 and earlier, it uses a bespoke mutual authentication 
protocol that allows for full encryption key 
+recovery. After an initial interactive attack, this would allow someone to 
decrypt plaintext traffic offline. 
+Note that this does not affect security mechanisms controlled by spark.authenticate.enableSaslEncryption, 
+spark.io.encryption.enabled, spark.ssl, spark.ui.strictTransportSecurity.
+
+Mitigation:
+
+
+  Update to Spark 3.1.3 or later
+
+
+Credit:
+
+
+  Steve Weis (Databricks)
+
+
 CVE-2020-9480: Apache Spark™ RCE vulnerability in auth-enabled standalone 
master
 
 Severity: Important

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen closed pull request #382: Add notice for CVE-2021-38296

2022-03-09 Thread GitBox


srowen closed pull request #382:
URL: https://github.com/apache/spark-website/pull/382


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (effef84 -> 97df016)

2022-03-09 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from effef84  [SPARK-36681][CORE][TEST] Enable SnappyCodec test in FileSuite
 add 97df016  [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in 
favor of `spark.kubernetes.driver.podGroupTemplateFile`

No new revisions were added by this update.

Summary of changes:
 docs/running-on-kubernetes.md  |  9 -
 .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala|  7 ---
 .../apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala  |  2 --
 .../spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala| 10 --
 .../{disable-queue.yml => queue-driver-podgroup-template.yml}  |  8 ++--
 .../{disable-queue.yml => queue0-driver-podgroup-template.yml} |  8 ++--
 .../{disable-queue.yml => queue1-driver-podgroup-template.yml} |  8 ++--
 .../spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala   |  7 ++-
 8 files changed, 12 insertions(+), 47 deletions(-)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml
 => queue-driver-podgroup-template.yml} (91%)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml
 => queue0-driver-podgroup-template.yml} (91%)
 copy 
resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml
 => queue1-driver-podgroup-template.yml} (91%)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (1584366 -> effef84)

2022-03-09 Thread viirya
This is an automated email from the ASF dual-hosted git repository.

viirya pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1584366  [SPARK-38354][SQL] Add hash probes metric for shuffled hash 
join
 add effef84  [SPARK-36681][CORE][TEST] Enable SnappyCodec test in FileSuite

No new revisions were added by this update.

Summary of changes:
 core/src/test/scala/org/apache/spark/FileSuite.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (93a25a4 -> 1584366)

2022-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 93a25a4  [SPARK-37947][SQL] Extract generator from GeneratorOuter 
expression contained by a Generate operator
 add 1584366  [SPARK-38354][SQL] Add hash probes metric for shuffled hash 
join

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/unsafe/map/BytesToBytesMap.java   |  2 +-
 .../execution/UnsafeFixedWidthAggregationMap.java  |  6 ++--
 .../execution/aggregate/HashAggregateExec.scala|  4 +--
 .../aggregate/TungstenAggregationIterator.scala|  2 +-
 .../spark/sql/execution/joins/HashedRelation.scala | 35 ++
 .../sql/execution/joins/ShuffledHashJoinExec.scala | 10 +--
 .../sql/execution/metric/SQLMetricsSuite.scala | 16 +-
 7 files changed, 59 insertions(+), 16 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (62e4c29 -> 93a25a4)

2022-03-09 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 62e4c29  [SPARK-37421][PYTHON] Inline type hints for 
python/pyspark/mllib/evaluation.py
 add 93a25a4  [SPARK-37947][SQL] Extract generator from GeneratorOuter 
expression contained by a Generate operator

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  3 +++
 .../apache/spark/sql/GeneratorFunctionSuite.scala  | 24 ++
 2 files changed, 27 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (bd6a3b4 -> 62e4c29)

2022-03-09 Thread zero323
This is an automated email from the ASF dual-hosted git repository.

zero323 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bd6a3b4  [SPARK-38437][SQL] Lenient serialization of datetime from 
datasource
 add 62e4c29  [SPARK-37421][PYTHON] Inline type hints for 
python/pyspark/mllib/evaluation.py

No new revisions were added by this update.

Summary of changes:
 python/pyspark/mllib/evaluation.py  | 134 +++-
 python/pyspark/mllib/evaluation.pyi |  92 -
 2 files changed, 72 insertions(+), 154 deletions(-)
 delete mode 100644 python/pyspark/mllib/evaluation.pyi

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38437][SQL] Lenient serialization of datetime from datasource

2022-03-09 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bd6a3b4  [SPARK-38437][SQL] Lenient serialization of datetime from 
datasource
bd6a3b4 is described below

commit bd6a3b4a001d29255f36bab9e9969cd919306fc2
Author: Max Gekk 
AuthorDate: Wed Mar 9 11:36:57 2022 +0300

[SPARK-38437][SQL] Lenient serialization of datetime from datasource

### What changes were proposed in this pull request?
In the PR, I propose to support the lenient mode by the row serializer used 
by datasources to converts rows received from scans. Spark SQL will be able to 
accept:
- `java.time.Instant` and `java.sql.Timestamp` for the `TIMESTAMP` type, and
- `java.time.LocalDate` and `java.sql.Date` for the `DATE` type

independently from the current value of the SQL config 
`spark.sql.datetime.java8API.enabled`.

### Why are the changes needed?
A datasource might not aware of the Spark SQL config 
`spark.sql.datetime.java8API.enabled` if this datasource was developed before 
the config was introduced by Spark version 3.0.0. In that case, it always 
return "legacy" timestamps/dates of the types 
`java.sql.Timestamp`/`java.sql.Date` even if an user enabled Java 8 API. As 
Spark expects `java.time.Instant` or `java.time.LocalDate` but gets 
`java.time.Timestamp` or `java.sql.Date`, the user observes the exception:
```java
ERROR SparkExecuteStatementOperation: Error executing query with 
ac61b10a-486e-463b-8726-3b61da58582e, currentState RUNNING,
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 
(TID 8) (10.157.1.194 executor 0): java.lang.RuntimeException: Error while 
encoding: java.lang.RuntimeException: java.sql.Timestamp is not a valid 
external type for schema of timestamp
if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
else staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, 
TimestampType, instantToMicros, 
validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
org.apache.spark.sql.Row, true]), 0, loan_perf_date), TimestampType), true, 
false) AS loan_perf_date#1125
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:239)
```

This PR fixes the issue above. And after the changes, users can use legacy 
datasource connecters with new Spark versions even when they need to enable 
Java 8 API.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
By running the affected test suites:
```
$ build/sbt "test:testOnly *CodeGenerationSuite"
$ build/sbt "test:testOnly *ObjectExpressionsSuite"
```
and new tests:
```
$ build/sbt "test:testOnly *RowEncoderSuite"
$ build/sbt "test:testOnly *TableScanSuite"
```

Closes #35756 from MaxGekk/dynamic-serializer-java-ts.

Authored-by: Max Gekk 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/SerializerBuildHelper.scala | 18 +++
 .../spark/sql/catalyst/encoders/RowEncoder.scala   | 35 ++
 .../sql/catalyst/expressions/objects/objects.scala | 26 
 .../spark/sql/catalyst/util/DateTimeUtils.scala| 22 ++
 .../sql/catalyst/encoders/RowEncoderSuite.scala| 23 ++
 .../catalyst/expressions/CodeGenerationSuite.scala |  4 ++-
 .../expressions/ObjectExpressionsSuite.scala   |  8 +++--
 .../execution/datasources/DataSourceStrategy.scala |  2 +-
 .../apache/spark/sql/sources/TableScanSuite.scala  | 27 +
 9 files changed, 144 insertions(+), 21 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala
index 3c17575..8dec923 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala
@@ -86,6 +86,15 @@ object SerializerBuildHelper {
   returnNullable = false)
   }
 
+  def createSerializerForAnyTimestamp(inputObject: Expression): Expression = {
+StaticInvoke(
+  DateTimeUtils.getClass,
+  TimestampType,
+  "anyToMicros",
+  inputObject :: Nil,
+  returnNullable = false)
+  }
+
   def createSerializerForLocalDateTime(inputObject: Expression): Expression = {
 StaticInvoke(
   DateTimeUtils.getClass,
@@ -113,6 +122,15 @@ object SerializerBuildHelper {
   returnNullable = false)
   }
 
+  def createSerializerForAnyDate(inputObject: Expression): Expression = {
+StaticInvoke(
+  DateTimeUtils.getClass,
+  DateType,
+