[spark] branch branch-3.2 updated: [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new e683932 [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame e683932 is described below commit e683932495fae444b2c17a755d9a660a6c2d63ef Author: Xinrong Meng AuthorDate: Thu Mar 10 15:32:48 2022 +0900 [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame Fix docstrings of nlargest/nsmallest of DataFrame To make docstring less confusing. No. Manual test. Closes #35793 from xinrong-databricks/frame.ntop. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon (cherry picked from commit c483e2977cbc6ae33d999c9c9d1dbacd9c53d85a) Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/frame.py | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index e576789..efc677b 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -7198,7 +7198,7 @@ defaultdict(, {'col..., 'col...})] ) return internal -# TODO: add keep = First +# TODO: add keep = First def nlargest(self, n: int, columns: Union[Name, List[Name]]) -> "DataFrame": """ Return the first `n` rows ordered by `columns` in descending order. @@ -7255,7 +7255,7 @@ defaultdict(, {'col..., 'col...})] 6 NaN 12 In the following example, we will use ``nlargest`` to select the three -rows having the largest values in column "population". +rows having the largest values in column "X". >>> df.nlargest(n=3, columns='X') X Y @@ -7263,12 +7263,14 @@ defaultdict(, {'col..., 'col...})] 4 6.0 10 3 5.0 9 +To order by the largest values in column "Y" and then "X", we can +specify multiple columns like in the next example. + >>> df.nlargest(n=3, columns=['Y', 'X']) X Y 6 NaN 12 5 7.0 11 4 6.0 10 - """ return self.sort_values(by=columns, ascending=False).head(n=n) @@ -7318,7 +7320,7 @@ defaultdict(, {'col..., 'col...})] 6 NaN 12 In the following example, we will use ``nsmallest`` to select the -three rows having the smallest values in column "a". +three rows having the smallest values in column "X". >>> df.nsmallest(n=3, columns='X') # doctest: +NORMALIZE_WHITESPACE X Y @@ -7326,7 +7328,7 @@ defaultdict(, {'col..., 'col...})] 1 2.0 7 2 3.0 8 -To order by the largest values in column "a" and then "c", we can +To order by the smallest values in column "Y" and then "X", we can specify multiple columns like in the next example. >>> df.nsmallest(n=3, columns=['Y', 'X']) # doctest: +NORMALIZE_WHITESPACE - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c483e29 [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame c483e29 is described below commit c483e2977cbc6ae33d999c9c9d1dbacd9c53d85a Author: Xinrong Meng AuthorDate: Thu Mar 10 15:32:48 2022 +0900 [SPARK-38487][PYTHON][DOC] Fix docstrings of nlargest/nsmallest of DataFrame ### What changes were proposed in this pull request? Fix docstrings of nlargest/nsmallest of DataFrame ### Why are the changes needed? To make docstring less confusing. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual test. Closes #35793 from xinrong-databricks/frame.ntop. Authored-by: Xinrong Meng Signed-off-by: Hyukjin Kwon --- python/pyspark/pandas/frame.py | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py index d4803eb..64a6471 100644 --- a/python/pyspark/pandas/frame.py +++ b/python/pyspark/pandas/frame.py @@ -7283,7 +7283,7 @@ defaultdict(, {'col..., 'col...})] ) return internal -# TODO: add keep = First +# TODO: add keep = First def nlargest(self, n: int, columns: Union[Name, List[Name]]) -> "DataFrame": """ Return the first `n` rows ordered by `columns` in descending order. @@ -7340,7 +7340,7 @@ defaultdict(, {'col..., 'col...})] 6 NaN 12 In the following example, we will use ``nlargest`` to select the three -rows having the largest values in column "population". +rows having the largest values in column "X". >>> df.nlargest(n=3, columns='X') X Y @@ -7348,12 +7348,14 @@ defaultdict(, {'col..., 'col...})] 4 6.0 10 3 5.0 9 +To order by the largest values in column "Y" and then "X", we can +specify multiple columns like in the next example. + >>> df.nlargest(n=3, columns=['Y', 'X']) X Y 6 NaN 12 5 7.0 11 4 6.0 10 - """ return self.sort_values(by=columns, ascending=False).head(n=n) @@ -7403,7 +7405,7 @@ defaultdict(, {'col..., 'col...})] 6 NaN 12 In the following example, we will use ``nsmallest`` to select the -three rows having the smallest values in column "a". +three rows having the smallest values in column "X". >>> df.nsmallest(n=3, columns='X') # doctest: +NORMALIZE_WHITESPACE X Y @@ -7411,7 +7413,7 @@ defaultdict(, {'col..., 'col...})] 1 2.0 7 2 3.0 8 -To order by the largest values in column "a" and then "c", we can +To order by the smallest values in column "Y" and then "X", we can specify multiple columns like in the next example. >>> df.nsmallest(n=3, columns=['Y', 'X']) # doctest: +NORMALIZE_WHITESPACE - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class"
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new c52c34a Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class" c52c34a is described below commit c52c34a8429bf7fe58c1d5d33117974b602905a3 Author: Dongjoon Hyun AuthorDate: Wed Mar 9 21:39:26 2022 -0800 Revert "[SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class" This reverts commit f97c3b37b85fecd78627e7f85da1f2edbcc75910. --- .../k8s/features/MountVolumesFeatureStep.scala | 2 +- .../features/MountVolumesFeatureStepSuite.scala| 25 -- 2 files changed, 1 insertion(+), 26 deletions(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala index 78dd6ec..4e16473 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala @@ -85,7 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) .withApiVersion("v1") .withNewMetadata() .withName(claimName) -.addToLabels(SPARK_APP_ID_LABEL, conf.appId) +.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId) .endMetadata() .withNewSpec() .withStorageClassName(storageClass.get) diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala index 468d1dd..38f8fac 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala @@ -89,31 +89,6 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite { assert(executorPVC.getClaimName === s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}") } - test("SPARK-32713 Mounts parameterized persistentVolumeClaims in executors with storage class") { -val volumeConf = KubernetesVolumeSpec( - "testVolume", - "/tmp", - "", - true, - KubernetesPVCVolumeConf("pvc-spark-SPARK_EXECUTOR_ID", Some("fast"), Some("512mb")) -) -val driverConf = KubernetesTestConf.createDriverConf(volumes = Seq(volumeConf)) -val driverStep = new MountVolumesFeatureStep(driverConf) -val driverPod = driverStep.configurePod(SparkPod.initialPod()) - -assert(driverPod.pod.getSpec.getVolumes.size() === 1) -val driverPVC = driverPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim -assert(driverPVC.getClaimName === "pvc-spark-SPARK_EXECUTOR_ID") - -val executorConf = KubernetesTestConf.createExecutorConf(volumes = Seq(volumeConf)) -val executorStep = new MountVolumesFeatureStep(executorConf) -val executorPod = executorStep.configurePod(SparkPod.initialPod()) - -assert(executorPod.pod.getSpec.getVolumes.size() === 1) -val executorPVC = executorPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim -assert(executorPVC.getClaimName === s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}") - } - test("Create and mounts persistentVolumeClaims in driver") { val volumeConf = KubernetesVolumeSpec( "testVolume", - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ec544ad -> e5a86a3)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ec544ad [SPARK-38148][SQL] Do not add dynamic partition pruning if there exists static partition pruning add e5a86a3 [SPARK-38453][K8S][DOCS] Add `volcano` section to K8s IT `README.md` No new revisions were added by this update. Summary of changes: .../kubernetes/integration-tests/README.md | 21 + 1 file changed, 21 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.2 updated: [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new f97c3b3 [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class f97c3b3 is described below commit f97c3b37b85fecd78627e7f85da1f2edbcc75910 Author: Thomas Graves AuthorDate: Wed Mar 9 21:06:25 2022 -0800 [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class ### What changes were proposed in this pull request? Running spark-shell in client mode on Kubernetes cluster when mounting persistent volumes with a storage class results in a big warning being thrown on startup. https://issues.apache.org/jira/browse/SPARK-38379 The issue here is there is a race condition between when spark.app.id is set in SparkContext and when its used, so change to use the KubernetesConf appId, which is what is used to set spark.app.id. ### Why are the changes needed? Throws big warning to user and I believe the label is wrong as well. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test added. The test fails without the fix. Also manually tested on real k8s cluster. Closes #35792 from tgravescs/fixVolk8s. Authored-by: Thomas Graves Signed-off-by: Dongjoon Hyun (cherry picked from commit f286416ee16e878de3c70a31cef20549b33aaa0a) Signed-off-by: Dongjoon Hyun --- .../k8s/features/MountVolumesFeatureStep.scala | 2 +- .../features/MountVolumesFeatureStepSuite.scala| 25 ++ 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala index 4e16473..78dd6ec 100644 --- a/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala +++ b/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala @@ -85,7 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) .withApiVersion("v1") .withNewMetadata() .withName(claimName) -.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId) +.addToLabels(SPARK_APP_ID_LABEL, conf.appId) .endMetadata() .withNewSpec() .withStorageClassName(storageClass.get) diff --git a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala index 38f8fac..468d1dd 100644 --- a/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala +++ b/resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStepSuite.scala @@ -89,6 +89,31 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite { assert(executorPVC.getClaimName === s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}") } + test("SPARK-32713 Mounts parameterized persistentVolumeClaims in executors with storage class") { +val volumeConf = KubernetesVolumeSpec( + "testVolume", + "/tmp", + "", + true, + KubernetesPVCVolumeConf("pvc-spark-SPARK_EXECUTOR_ID", Some("fast"), Some("512mb")) +) +val driverConf = KubernetesTestConf.createDriverConf(volumes = Seq(volumeConf)) +val driverStep = new MountVolumesFeatureStep(driverConf) +val driverPod = driverStep.configurePod(SparkPod.initialPod()) + +assert(driverPod.pod.getSpec.getVolumes.size() === 1) +val driverPVC = driverPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim +assert(driverPVC.getClaimName === "pvc-spark-SPARK_EXECUTOR_ID") + +val executorConf = KubernetesTestConf.createExecutorConf(volumes = Seq(volumeConf)) +val executorStep = new MountVolumesFeatureStep(executorConf) +val executorPod = executorStep.configurePod(SparkPod.initialPod()) + +assert(executorPod.pod.getSpec.getVolumes.size() === 1) +val executorPVC = executorPod.pod.getSpec.getVolumes.get(0).getPersistentVolumeClaim +assert(executorPVC.getClaimName === s"pvc-spark-${KubernetesTestConf.EXECUTOR_ID}") + } + test("Create and mounts persistentVolumeClaims in driver") { val volumeConf = KubernetesVolumeSpec( "testVolume", - To unsub
[spark] branch master updated (82b6194 -> f286416)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 82b6194 [SPARK-38385][SQL] Improve error messages of empty statement and in ParseException add f286416 [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class No new revisions were added by this update. Summary of changes: .../k8s/features/MountVolumesFeatureStep.scala | 2 +- .../features/MountVolumesFeatureStepSuite.scala| 25 ++ 2 files changed, 26 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f286416 -> ec544ad)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f286416 [SPARK-38379][K8S] Fix Kubernetes Client mode when mounting persistent volume with storage class add ec544ad [SPARK-38148][SQL] Do not add dynamic partition pruning if there exists static partition pruning No new revisions were added by this update. Summary of changes: .../spark/sql/execution/SparkOptimizer.scala | 2 + .../CleanupDynamicPruningFilters.scala | 38 -- .../approved-plans-v1_4/q13.sf100/explain.txt | 4 +- .../approved-plans-v1_4/q13/explain.txt| 4 +- .../spark/sql/DynamicPartitionPruningSuite.scala | 45 ++ 5 files changed, 85 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ecabfb1c9 -> 82b6194)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ecabfb1c9 [SPARK-38187][K8S][TESTS] Add K8S IT for `volcano` minResources cpu/memory spec add 82b6194 [SPARK-38385][SQL] Improve error messages of empty statement and in ParseException No new revisions were added by this update. Summary of changes: core/src/main/resources/error/error-classes.json | 4 .../apache/spark/sql/catalyst/parser/ParseDriver.scala| 7 ++- .../sql/catalyst/parser/SparkParserErrorStrategy.scala| 7 ++- .../spark/sql/catalyst/parser/ErrorParserSuite.scala | 15 +++ .../test/resources/sql-tests/results/show-tables.sql.out | 2 +- 5 files changed, 32 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd08e79 -> ecabfb1c9)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd08e79 [SPARK-38355][PYTHON][TESTS] Use `mkstemp` instead of `mktemp` add ecabfb1c9 [SPARK-38187][K8S][TESTS] Add K8S IT for `volcano` minResources cpu/memory spec No new revisions were added by this update. Summary of changes: ...ate.yml => driver-podgroup-template-cpu-2u.yml} | 5 +- yml => driver-podgroup-template-memory-3g.yml} | 5 +- .../volcano/{enable-queue.yml => queue-2u-3g.yml} | 5 +- .../k8s/integrationtest/VolcanoTestsSuite.scala| 68 +- 4 files changed, 76 insertions(+), 7 deletions(-) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{queue-driver-podgroup-template.yml => driver-podgroup-template-cpu-2u.yml} (92%) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{queue0-driver-podgroup-template.yml => driver-podgroup-template-memory-3g.yml} (92%) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{enable-queue.yml => queue-2u-3g.yml} (94%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (0f4c26a -> bd08e79)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 0f4c26a [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map` add bd08e79 [SPARK-38355][PYTHON][TESTS] Use `mkstemp` instead of `mktemp` No new revisions were added by this update. Summary of changes: python/pyspark/testing/pandasutils.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (01014aa -> 0f4c26a)
This is an automated email from the ASF dual-hosted git repository. ueshin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 01014aa [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0 add 0f4c26a [SPARK-38387][PYTHON] Support `na_action` and Series input correspondence in `Series.map` No new revisions were added by this update. Summary of changes: python/pyspark/pandas/series.py| 36 +- python/pyspark/pandas/tests/test_series.py | 25 +++-- 2 files changed, 53 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon closed pull request #381: Regenerate PySpark API documentation for Spark 3.2.1
HyukjinKwon closed pull request #381: URL: https://github.com/apache/spark-website/pull/381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] HyukjinKwon commented on pull request #381: Regenerate PySpark API documentation for Spark 3.2.1
HyukjinKwon commented on pull request #381: URL: https://github.com/apache/spark-website/pull/381#issuecomment-1063531527 Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 01014aa [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0 01014aa is described below commit 01014aa99fa851411262a6719058dde97319bbb3 Author: Dongjoon Hyun AuthorDate: Wed Mar 9 16:22:04 2022 -0800 [SPARK-38486][K8S][TESTS] Upgrade the minimum Minikube version to 1.18.0 ### What changes were proposed in this pull request? This PR aims to upgrade the minimum Minikube version to 1.18.0 from 1.7.3 at Apache Spark 3.3.0. ### Why are the changes needed? Minikube v1.18.0 was released one year ago on March 2021, and the first version supporting Apple Silicon natively. Previously, there exists some issues while running Intel arch binary on Apple Silicon. - https://github.com/kubernetes/minikube/releases/download/v1.18.0/minikube-darwin-arm64 - https://github.com/kubernetes/minikube/releases/tag/v1.18.0 ### Does this PR introduce _any_ user-facing change? No, this is a test-only PR. ### How was this patch tested? Manually. Closes #35791 from dongjoon-hyun/SPARK-38486. Authored-by: Dongjoon Hyun Signed-off-by: Dongjoon Hyun --- resource-managers/kubernetes/integration-tests/README.md | 4 ++-- .../spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/resource-managers/kubernetes/integration-tests/README.md b/resource-managers/kubernetes/integration-tests/README.md index 9eb928d..ac82282 100644 --- a/resource-managers/kubernetes/integration-tests/README.md +++ b/resource-managers/kubernetes/integration-tests/README.md @@ -28,7 +28,7 @@ To run tests with Hadoop 2.x instead of Hadoop 3.x, use `--hadoop-profile`. ./dev/dev-run-integration-tests.sh --hadoop-profile hadoop-2 -The minimum tested version of Minikube is 1.7.3. The kube-dns addon must be enabled. Minikube should +The minimum tested version of Minikube is 1.18.0. The kube-dns addon must be enabled. Minikube should run with a minimum of 4 CPUs and 6G of memory: minikube start --cpus 4 --memory 6144 @@ -47,7 +47,7 @@ default this is set to `minikube`, the available backends are their prerequisite ### `minikube` -Uses the local `minikube` cluster, this requires that `minikube` 1.7.3 or greater be installed and that it be allocated +Uses the local `minikube` cluster, this requires that `minikube` 1.18.0 or greater be installed and that it be allocated at least 4 CPUs and 6GB memory (some users have reported success with as few as 3 CPUs and 4GB memory). The tests will check if `minikube` is started and abort early if it isn't currently running. diff --git a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala index 9f99ede..755feb9 100644 --- a/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala +++ b/resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/backend/minikube/Minikube.scala @@ -48,9 +48,9 @@ private[spark] object Minikube extends Logging { versionArrayOpt match { case Some(Array(x, y, z)) => -if (Ordering.Tuple3[Int, Int, Int].lt((x, y, z), (1, 7, 3))) { +if (Ordering.Tuple3[Int, Int, Int].lt((x, y, z), (1, 18, 0))) { assert(false, s"Unsupported Minikube version is detected: $minikubeVersionString." + -"For integration testing Minikube version 1.7.3 or greater is expected.") +"For integration testing Minikube version 1.18.0 or greater is expected.") } case _ => assert(false, s"Unexpected version format detected in `$minikubeVersionString`." + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark-website] branch asf-site updated: Add notice for CVE-2021-38296
This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git The following commit(s) were added to refs/heads/asf-site by this push: new 1569fce Add notice for CVE-2021-38296 1569fce is described below commit 1569fcefeb8b6deba7270acc928a27ee678b6118 Author: Sean Owen AuthorDate: Wed Mar 9 16:11:18 2022 -0600 Add notice for CVE-2021-38296 Author: Sean Owen Closes #382 from srowen/CVE-2021-38296. --- security.md| 27 +++ site/security.html | 32 2 files changed, 59 insertions(+) diff --git a/security.md b/security.md index dc9a9e6..32bbb74 100644 --- a/security.md +++ b/security.md @@ -18,6 +18,33 @@ non-public list that will reach the Apache Security team, as well as the Spark P Known security issues +CVE-2021-38296: Apache Spark™ Key Negotiation Vulnerability + +Severity: Medium + +Vendor: The Apache Software Foundation + +Versions Affected: + +- Apache Spark 3.1.2 and earlier + +Description: + +Apache Spark supports end-to-end encryption of RPC connections via `spark.authenticate` and `spark.network.crypto.enabled`. +In versions 3.1.2 and earlier, it uses a bespoke mutual authentication protocol that allows for full encryption key +recovery. After an initial interactive attack, this would allow someone to decrypt plaintext traffic offline. +Note that this does not affect security mechanisms controlled by `spark.authenticate.enableSaslEncryption`, +`spark.io.encryption.enabled`, `spark.ssl`, `spark.ui.strictTransportSecurity`. + +Mitigation: + +- Update to Spark 3.1.3 or later + +Credit: + +- Steve Weis (Databricks) + + CVE-2020-9480: Apache Spark™ RCE vulnerability in auth-enabled standalone master Severity: Important diff --git a/site/security.html b/site/security.html index ff3de6c..be0a8d8 100644 --- a/site/security.html +++ b/site/security.html @@ -155,6 +155,38 @@ non-public list that will reach the Apache Security team, as well as the Spark P Known security issues +CVE-2021-38296: Apache Spark™ Key Negotiation Vulnerability + +Severity: Medium + +Vendor: The Apache Software Foundation + +Versions Affected: + + + Apache Spark 3.1.2 and earlier + + +Description: + +Apache Spark supports end-to-end encryption of RPC connections via spark.authenticate and spark.network.crypto.enabled. +In versions 3.1.2 and earlier, it uses a bespoke mutual authentication protocol that allows for full encryption key +recovery. After an initial interactive attack, this would allow someone to decrypt plaintext traffic offline. +Note that this does not affect security mechanisms controlled by spark.authenticate.enableSaslEncryption, +spark.io.encryption.enabled, spark.ssl, spark.ui.strictTransportSecurity. + +Mitigation: + + + Update to Spark 3.1.3 or later + + +Credit: + + + Steve Weis (Databricks) + + CVE-2020-9480: Apache Spark™ RCE vulnerability in auth-enabled standalone master Severity: Important - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen closed pull request #382: Add notice for CVE-2021-38296
srowen closed pull request #382: URL: https://github.com/apache/spark-website/pull/382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (effef84 -> 97df016)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from effef84 [SPARK-36681][CORE][TEST] Enable SnappyCodec test in FileSuite add 97df016 [SPARK-38480][K8S] Remove `spark.kubernetes.job.queue` in favor of `spark.kubernetes.driver.podGroupTemplateFile` No new revisions were added by this update. Summary of changes: docs/running-on-kubernetes.md | 9 - .../src/main/scala/org/apache/spark/deploy/k8s/Config.scala| 7 --- .../apache/spark/deploy/k8s/features/VolcanoFeatureStep.scala | 2 -- .../spark/deploy/k8s/features/VolcanoFeatureStepSuite.scala| 10 -- .../{disable-queue.yml => queue-driver-podgroup-template.yml} | 8 ++-- .../{disable-queue.yml => queue0-driver-podgroup-template.yml} | 8 ++-- .../{disable-queue.yml => queue1-driver-podgroup-template.yml} | 8 ++-- .../spark/deploy/k8s/integrationtest/VolcanoTestsSuite.scala | 7 ++- 8 files changed, 12 insertions(+), 47 deletions(-) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml => queue-driver-podgroup-template.yml} (91%) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml => queue0-driver-podgroup-template.yml} (91%) copy resource-managers/kubernetes/integration-tests/src/test/resources/volcano/{disable-queue.yml => queue1-driver-podgroup-template.yml} (91%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1584366 -> effef84)
This is an automated email from the ASF dual-hosted git repository. viirya pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1584366 [SPARK-38354][SQL] Add hash probes metric for shuffled hash join add effef84 [SPARK-36681][CORE][TEST] Enable SnappyCodec test in FileSuite No new revisions were added by this update. Summary of changes: core/src/test/scala/org/apache/spark/FileSuite.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (93a25a4 -> 1584366)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 93a25a4 [SPARK-37947][SQL] Extract generator from GeneratorOuter expression contained by a Generate operator add 1584366 [SPARK-38354][SQL] Add hash probes metric for shuffled hash join No new revisions were added by this update. Summary of changes: .../apache/spark/unsafe/map/BytesToBytesMap.java | 2 +- .../execution/UnsafeFixedWidthAggregationMap.java | 6 ++-- .../execution/aggregate/HashAggregateExec.scala| 4 +-- .../aggregate/TungstenAggregationIterator.scala| 2 +- .../spark/sql/execution/joins/HashedRelation.scala | 35 ++ .../sql/execution/joins/ShuffledHashJoinExec.scala | 10 +-- .../sql/execution/metric/SQLMetricsSuite.scala | 16 +- 7 files changed, 59 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (62e4c29 -> 93a25a4)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 62e4c29 [SPARK-37421][PYTHON] Inline type hints for python/pyspark/mllib/evaluation.py add 93a25a4 [SPARK-37947][SQL] Extract generator from GeneratorOuter expression contained by a Generate operator No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 3 +++ .../apache/spark/sql/GeneratorFunctionSuite.scala | 24 ++ 2 files changed, 27 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (bd6a3b4 -> 62e4c29)
This is an automated email from the ASF dual-hosted git repository. zero323 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from bd6a3b4 [SPARK-38437][SQL] Lenient serialization of datetime from datasource add 62e4c29 [SPARK-37421][PYTHON] Inline type hints for python/pyspark/mllib/evaluation.py No new revisions were added by this update. Summary of changes: python/pyspark/mllib/evaluation.py | 134 +++- python/pyspark/mllib/evaluation.pyi | 92 - 2 files changed, 72 insertions(+), 154 deletions(-) delete mode 100644 python/pyspark/mllib/evaluation.pyi - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-38437][SQL] Lenient serialization of datetime from datasource
This is an automated email from the ASF dual-hosted git repository. maxgekk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new bd6a3b4 [SPARK-38437][SQL] Lenient serialization of datetime from datasource bd6a3b4 is described below commit bd6a3b4a001d29255f36bab9e9969cd919306fc2 Author: Max Gekk AuthorDate: Wed Mar 9 11:36:57 2022 +0300 [SPARK-38437][SQL] Lenient serialization of datetime from datasource ### What changes were proposed in this pull request? In the PR, I propose to support the lenient mode by the row serializer used by datasources to converts rows received from scans. Spark SQL will be able to accept: - `java.time.Instant` and `java.sql.Timestamp` for the `TIMESTAMP` type, and - `java.time.LocalDate` and `java.sql.Date` for the `DATE` type independently from the current value of the SQL config `spark.sql.datetime.java8API.enabled`. ### Why are the changes needed? A datasource might not aware of the Spark SQL config `spark.sql.datetime.java8API.enabled` if this datasource was developed before the config was introduced by Spark version 3.0.0. In that case, it always return "legacy" timestamps/dates of the types `java.sql.Timestamp`/`java.sql.Date` even if an user enabled Java 8 API. As Spark expects `java.time.Instant` or `java.time.LocalDate` but gets `java.time.Timestamp` or `java.sql.Date`, the user observes the exception: ```java ERROR SparkExecuteStatementOperation: Error executing query with ac61b10a-486e-463b-8726-3b61da58582e, currentState RUNNING, org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 8) (10.157.1.194 executor 0): java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: java.sql.Timestamp is not a valid external type for schema of timestamp if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null else staticinvoke(class org.apache.spark.sql.catalyst.util.DateTimeUtils$, TimestampType, instantToMicros, validateexternaltype(getexternalrowfield(assertnotnull(input[0, org.apache.spark.sql.Row, true]), 0, loan_perf_date), TimestampType), true, false) AS loan_perf_date#1125 at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Serializer.apply(ExpressionEncoder.scala:239) ``` This PR fixes the issue above. And after the changes, users can use legacy datasource connecters with new Spark versions even when they need to enable Java 8 API. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? By running the affected test suites: ``` $ build/sbt "test:testOnly *CodeGenerationSuite" $ build/sbt "test:testOnly *ObjectExpressionsSuite" ``` and new tests: ``` $ build/sbt "test:testOnly *RowEncoderSuite" $ build/sbt "test:testOnly *TableScanSuite" ``` Closes #35756 from MaxGekk/dynamic-serializer-java-ts. Authored-by: Max Gekk Signed-off-by: Max Gekk --- .../spark/sql/catalyst/SerializerBuildHelper.scala | 18 +++ .../spark/sql/catalyst/encoders/RowEncoder.scala | 35 ++ .../sql/catalyst/expressions/objects/objects.scala | 26 .../spark/sql/catalyst/util/DateTimeUtils.scala| 22 ++ .../sql/catalyst/encoders/RowEncoderSuite.scala| 23 ++ .../catalyst/expressions/CodeGenerationSuite.scala | 4 ++- .../expressions/ObjectExpressionsSuite.scala | 8 +++-- .../execution/datasources/DataSourceStrategy.scala | 2 +- .../apache/spark/sql/sources/TableScanSuite.scala | 27 + 9 files changed, 144 insertions(+), 21 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala index 3c17575..8dec923 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala @@ -86,6 +86,15 @@ object SerializerBuildHelper { returnNullable = false) } + def createSerializerForAnyTimestamp(inputObject: Expression): Expression = { +StaticInvoke( + DateTimeUtils.getClass, + TimestampType, + "anyToMicros", + inputObject :: Nil, + returnNullable = false) + } + def createSerializerForLocalDateTime(inputObject: Expression): Expression = { StaticInvoke( DateTimeUtils.getClass, @@ -113,6 +122,15 @@ object SerializerBuildHelper { returnNullable = false) } + def createSerializerForAnyDate(inputObject: Expression): Expression = { +StaticInvoke( + DateTimeUtils.getClass, + DateType, +