date:20230103

[spark] branch master updated: [SPARK-41862][SQL][TESTS][FOLLOWUP] Update OrcReadBenchmark result

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new aa7e54fc072 [SPARK-41862][SQL][TESTS][FOLLOWUP] Update 
OrcReadBenchmark result
aa7e54fc072 is described below

commit aa7e54fc0728af99cc8c55f3bd88b4ade4aaab05
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 22:05:50 2023 -0800

[SPARK-41862][SQL][TESTS][FOLLOWUP] Update OrcReadBenchmark result

### What changes were proposed in this pull request?

This PR is a follow-up of https://github.com/apache/spark/pull/39370 .

### Why are the changes needed?

To sync the patch with the recovered perf result.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.
- Java 8: https://github.com/dongjoon-hyun/spark/actions/runs/3834890434
- Java 11: https://github.com/dongjoon-hyun/spark/actions/runs/3834892478
- Java 17: https://github.com/dongjoon-hyun/spark/actions/runs/3834893844

Closes #39380 from dongjoon-hyun/SPARK-41862.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../benchmarks/OrcReadBenchmark-jdk11-results.txt  | 188 ++---
 .../benchmarks/OrcReadBenchmark-jdk17-results.txt  | 188 ++---
 sql/hive/benchmarks/OrcReadBenchmark-results.txt   | 188 ++---
 3 files changed, 282 insertions(+), 282 deletions(-)

diff --git a/sql/hive/benchmarks/OrcReadBenchmark-jdk11-results.txt 
b/sql/hive/benchmarks/OrcReadBenchmark-jdk11-results.txt
index 5c44741b591..7d6db9ae30d 100644
--- a/sql/hive/benchmarks/OrcReadBenchmark-jdk11-results.txt
+++ b/sql/hive/benchmarks/OrcReadBenchmark-jdk11-results.txt
@@ -3,52 +3,52 @@ SQL Single Numeric Column Scan
 

 
 OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 SQL Single TINYINT Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Hive built-in ORC  1088   1102 
 18 14.5  69.2   1.0X
-Native ORC MR   905971 
 90 17.4  57.5   1.2X
-Native ORC Vectorized   137206 
 69114.4   8.7   7.9X
+Hive built-in ORC  1087   1119 
 45 14.5  69.1   1.0X
+Native ORC MR   882936 
 50 17.8  56.1   1.2X
+Native ORC Vectorized   164213 
 31 96.0  10.4   6.6X
 
 OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 SQL Single SMALLINT Column Scan:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Hive built-in ORC  1265   1279 
 20 12.4  80.4   1.0X
-Native ORC MR  1022   1102 
113 15.4  65.0   1.2X
-Native ORC Vectorized   135201 
 63116.8   8.6   9.4X
+Hive built-in ORC  1282   1289 
 10 12.3  81.5   1.0X
+Native ORC MR   916962 
 65 17.2  58.2   1.4X
+Native ORC Vectorized   151212 
 47104.1   9.6   8.5X
 
 OpenJDK 64-Bit Server VM 11.0.17+8 on Linux 5.15.0-1023-azure
-Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 SQL Single INT Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
 

-Hive built-in ORC  1196   1258 
 88 13.1  76.0   1.0X
-Native ORC MR   995   1014 
 27 15.8  63.3   1.2X
-Native ORC Vectorized

[spark] 01/03: [SPARK-38261][INFRA] Add missing R packages from base image

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit b8978fecea4580300c51be748a6afb99fb352974
Author: khalidmammadov 
AuthorDate: Mon Feb 21 11:04:48 2022 +0900

[SPARK-38261][INFRA] Add missing R packages from base image

Current GitHub workflow job **Linters, licenses, dependencies and 
documentation generation** is missing R packages to complete Documentation and 
API build.

**Build and test** -  is not failing as these packages are installed on the 
base image.

We need to keep them in-sync IMO with the base image for easy switch back 
to ubuntu runner when ready.

Reference: [**The base 
image**](https://hub.docker.com/layers/dongjoon/apache-spark-github-action-image/20220207/images/sha256-af09d172ff8e2cbd71df9a1bc5384a47578c4a4cc293786c539333cafaf4a7ce?context=explore)

Adding missing packages to the workflow file

To make them inline with the base image config and make the job task 
**complete** for standalone execution (i.e. without this image)

No

GitHub builds and in the local Docker containers

Closes #35583 from khalidmammadov/sync_doc_build_with_base.

Authored-by: khalidmammadov 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 898542746b2c56b2571562ed8e9818bcb565aff2)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 4d8e6db73e0..2d2e96fdee8 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -406,7 +406,7 @@ jobs:
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
-Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
+Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'markdown', 'e1071', 'roxygen2'), 
repos='https://cloud.r-project.org/')"
 gem install bundler
 cd docs
 bundle install


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (576ca6e43c3 -> 0f5e231923b)

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from 576ca6e43c3 Revert "[SPARK-36939][PYTHON][DOCS] Add orphan migration 
page into list in PySpark documentation"
 new b8978fecea4 [SPARK-38261][INFRA] Add missing R packages from base image
 new 706cecdc028 [SPARK-39596][INFRA] Install `ggplot2` for GitHub Action 
linter job
 new 0f5e231923b [SPARK-39596][INFRA][FOLLOWUP] Install `mvtnorm` and 
`statmod` at linter job

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .github/workflows/build_and_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 02/03: [SPARK-39596][INFRA] Install `ggplot2` for GitHub Action linter job

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 706cecdc02833e2ea8f2137383cd0ff1222e8f44
Author: Dongjoon Hyun 
AuthorDate: Sat Jun 25 00:31:54 2022 -0700

[SPARK-39596][INFRA] Install `ggplot2` for GitHub Action linter job

### What changes were proposed in this pull request?

This PR aims to fix GitHub Action linter job by installing `ggplot2`.

### Why are the changes needed?

It starts to fail like the following.
- https://github.com/apache/spark/runs/7047294196?check_suite_focus=true
```
x Failed to parse Rd in histogram.Rd
ℹ there is no package called ‘ggplot2’
```

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the GitHub Action linter job.

Closes #36987 from dongjoon-hyun/SPARK-39596.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit bf59f6e4bd7f34f8a36bfef1e93e0ddccddf9e43)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 2d2e96fdee8..7639fea7e79 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -385,6 +385,7 @@ jobs:
   libtiff5-dev libjpeg-dev
 Rscript -e "install.packages(c('devtools'), 
repos='https://cloud.r-project.org/')"
 Rscript -e "devtools::install_version('lintr', version='2.0.1', 
repos='https://cloud.r-project.org')"
+Rscript -e "install.packages(c('ggplot2'), 
repos='https://cloud.r-project.org/')"
 ./R/install-dev.sh
 - name: Instll JavaScript linter dependencies
   run: |


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] 03/03: [SPARK-39596][INFRA][FOLLOWUP] Install `mvtnorm` and `statmod` at linter job

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git

commit 0f5e231923b77d39239acef80b654834834e9b29
Author: Dongjoon Hyun 
AuthorDate: Sat Jun 25 20:37:53 2022 +0900

[SPARK-39596][INFRA][FOLLOWUP] Install `mvtnorm` and `statmod` at linter job

Closes #36988 from dongjoon-hyun/SPARK-39596-2.

Authored-by: Dongjoon Hyun 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit 4c79cc7d5f0d818e479565f5d623e168d777ba0a)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 7639fea7e79..4a4840995a1 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -385,7 +385,6 @@ jobs:
   libtiff5-dev libjpeg-dev
 Rscript -e "install.packages(c('devtools'), 
repos='https://cloud.r-project.org/')"
 Rscript -e "devtools::install_version('lintr', version='2.0.1', 
repos='https://cloud.r-project.org')"
-Rscript -e "install.packages(c('ggplot2'), 
repos='https://cloud.r-project.org/')"
 ./R/install-dev.sh
 - name: Instll JavaScript linter dependencies
   run: |
@@ -407,7 +406,7 @@ jobs:
 python3.9 -m pip install 'docutils<0.18.0' # See SPARK-39421
 apt-get update -y
 apt-get install -y ruby ruby-dev
-Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'markdown', 'e1071', 'roxygen2'), 
repos='https://cloud.r-project.org/')"
+Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 
'rmarkdown', 'markdown', 'e1071', 'roxygen2', 'ggplot2', 'mvtnorm', 'statmod'), 
repos='https://cloud.r-project.org/')"
 gem install bundler
 cd docs
 bundle install


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (0b786901633 -> 3130ca9748b)

2023-01-03 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 0b786901633 [SPARK-41850][CONNECT][PYTHON][TESTS] Enable doctest for 
`isnan`
 add 3130ca9748b [SPARK-41859][SQL] CreateHiveTableAsSelectCommand should 
set the overwrite flag correctly

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/execution/CreateHiveTableAsSelectCommand.scala  | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: Revert "[SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation"

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 576ca6e43c3 Revert "[SPARK-36939][PYTHON][DOCS] Add orphan migration 
page into list in PySpark documentation"
576ca6e43c3 is described below

commit 576ca6e43c37570bc920cc5239ecbb29a4e34560
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 21:03:20 2023 -0800

Revert "[SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in 
PySpark documentation"

This reverts commit 0565d95a86e738d24e9c05a4c5c3c3815944b4be.
---
 python/docs/source/migration_guide/index.rst | 1 -
 1 file changed, 1 deletion(-)

diff --git a/python/docs/source/migration_guide/index.rst 
b/python/docs/source/migration_guide/index.rst
index 2e61653a9a5..b25ac313c7c 100644
--- a/python/docs/source/migration_guide/index.rst
+++ b/python/docs/source/migration_guide/index.rst
@@ -25,7 +25,6 @@ This page describes the migration guide specific to PySpark.
 .. toctree::
:maxdepth: 2
 
-   pyspark_3.2_to_3.3
pyspark_3.1_to_3.2
pyspark_2.4_to_3.0
pyspark_2.3_to_2.4


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41850][CONNECT][PYTHON][TESTS] Enable doctest for `isnan`

2023-01-03 Thread ruifengz

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0b786901633 [SPARK-41850][CONNECT][PYTHON][TESTS] Enable doctest for 
`isnan`
0b786901633 is described below

commit 0b786901633f3b8942dcb4c25e6c8a1671d3c0d6
Author: Ruifeng Zheng 
AuthorDate: Wed Jan 4 12:28:20 2023 +0800

[SPARK-41850][CONNECT][PYTHON][TESTS] Enable doctest for `isnan`

### What changes were proposed in this pull request?
Enable doctest for `isnan`, it had been resolved in 
https://github.com/apache/spark/pull/39360

### Why are the changes needed?
for test coverage

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
enabled doctest

Closes #39376 from zhengruifeng/connect_fix_41850.

Authored-by: Ruifeng Zheng 
Signed-off-by: Ruifeng Zheng 
---
 python/pyspark/sql/connect/functions.py | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/python/pyspark/sql/connect/functions.py 
b/python/pyspark/sql/connect/functions.py
index c8ddd0cea7c..7a50906ee39 100644
--- a/python/pyspark/sql/connect/functions.py
+++ b/python/pyspark/sql/connect/functions.py
@@ -2444,9 +2444,6 @@ def _test() -> None:
 # TODO(SPARK-41849): implement DataFrameReader.text
 del pyspark.sql.connect.functions.input_file_name.__doc__
 
-# TODO(SPARK-41850): fix isnan
-del pyspark.sql.connect.functions.isnan.__doc__
-
 # Creates a remote Spark session.
 os.environ["SPARK_REMOTE"] = "sc://localhost"
 globs["spark"] = 
PySparkSession.builder.remote("sc://localhost").getOrCreate()


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 977e445865c [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors
977e445865c is described below

commit 977e445865c0b835b19db44a129502c135b3348a
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 15:00:50 2023 -0800

[SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

Currently, the GitHub Action Python linter job is broken. This PR will 
recover Python linter failure.

There are two kind of failures.
1. https://github.com/apache/spark/actions/runs/3829330032/jobs/6524170799
```
python/pyspark/pandas/sql_processor.py:221: error: unused "type: ignore" 
comment
Found 1 error in 1 file (checked 380 source files)
```

2. After fixing (1), we hit the following.
```
ModuleNotFoundError: No module named 'py._path'; 'py' is not a package
```

No.

Pass the GitHub CI on this PR. Or, manually run the following.
```
$ dev/lint-python
starting python compilation test...
python compilation succeeded.

starting black test...
black checks passed.

starting flake8 test...
flake8 checks passed.

starting mypy annotations test...
annotations passed mypy checks.

starting mypy examples test...
examples passed mypy checks.

starting mypy data test...
annotations passed data checks.

all lint-python tests passed!
```

Closes #39373 from dongjoon-hyun/SPARK-41864.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 13b2856e6e77392a417d2bb2ce804f873ee72b28)
Signed-off-by: Dongjoon Hyun 
---
 dev/requirements.txt   | 1 +
 python/pyspark/pandas/sql_processor.py | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index e7e0a4b4274..79a70624312 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -43,3 +43,4 @@ PyGithub
 
 # pandas API on Spark Code formatter.
 black
+py
diff --git a/python/pyspark/pandas/sql_processor.py 
b/python/pyspark/pandas/sql_processor.py
index d8ae6888b68..7cf2f7461ba 100644
--- a/python/pyspark/pandas/sql_processor.py
+++ b/python/pyspark/pandas/sql_processor.py
@@ -218,7 +218,7 @@ def _get_ipython_scope() -> Dict[str, Any]:
 in an IPython notebook environment.
 """
 try:
-from IPython import get_ipython  # type: ignore[import]
+from IPython import get_ipython
 
 shell = get_ipython()
 return shell.user_ns


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 09f65c1e304 [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI 
images
09f65c1e304 is described below

commit 09f65c1e304ade7036322920a97edc11fad1b194
Author: Dongjoon Hyun 
AuthorDate: Wed Sep 29 11:39:01 2021 -0700

[SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images

### What changes were proposed in this pull request?

This PR aims to upgrade GitHub Action CI image to recover CRAN installation 
failure.

### Why are the changes needed?

Sometimes, GitHub Action linter job failed
- https://github.com/apache/spark/runs/3739748809

New image have R 4.1.1 and will recover the failure.
```
$ docker run -it --rm dongjoon/apache-spark-github-action-image:20210928 R 
--version
R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
https://www.gnu.org/licenses/.
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass `GitHub Action`.

Closes #34138 from dongjoon-hyun/SPARK-36883.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit aa9064ad96ff7cefaa4381e912608b0b0d39a09c)
Signed-off-by: Dongjoon Hyun 
---
 .github/workflows/build_and_test.yml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 45d688ea98e..4d8e6db73e0 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -168,7 +168,7 @@ jobs:
 name: "Build modules: ${{ matrix.modules }}"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20210730
+  image: dongjoon/apache-spark-github-action-image:20210930
 strategy:
   fail-fast: false
   matrix:
@@ -265,7 +265,7 @@ jobs:
 name: "Build modules: sparkr"
 runs-on: ubuntu-20.04
 container:
-  image: dongjoon/apache-spark-github-action-image:20210602
+  image: dongjoon/apache-spark-github-action-image:20210930
 env:
   HADOOP_PROFILE: hadoop3.2
   HIVE_PROFILE: hive2.3
@@ -328,8 +328,9 @@ jobs:
   LC_ALL: C.UTF-8
   LANG: C.UTF-8
   PYSPARK_DRIVER_PYTHON: python3.9
+  PYSPARK_PYTHON: python3.9
 container:
-  image: dongjoon/apache-spark-github-action-image:20210602
+  image: dongjoon/apache-spark-github-action-image:20210930
 steps:
 - name: Checkout Spark repository
   uses: actions/checkout@v2


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41719][CORE] Skip SSLOptions sub-settings if `ssl` is disabled

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 98f7182122e [SPARK-41719][CORE] Skip SSLOptions sub-settings if `ssl` 
is disabled
98f7182122e is described below

commit 98f7182122e151cbf7ea83303e39c44d9acb1a72
Author: Shrikant Prasad 
AuthorDate: Tue Jan 3 17:40:40 2023 -0800

[SPARK-41719][CORE] Skip SSLOptions sub-settings if `ssl` is disabled

### What changes were proposed in this pull request?
In SSLOptions rest of the settings should be set only when ssl is enabled.

### Why are the changes needed?
If spark.ssl.enabled is false, there is no use of setting rest of 
spark.ssl.* settings in SSLOptions as this requires unnecessary operations to 
be performed to set these properties.
Additional implication of trying to set the rest of settings is if any 
error occurs in setting these properties it will cause job failure which 
otherwise should have worked since ssl is disabled. For example, if the user 
doesn't have access to the keystore path which is set in 
hadoop.security.credential.provider.path of hive-site.xml, it can result in 
failure while launching spark shell since SSLOptions won't be initialized due 
to error in accessing the keystore.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added new test.

Closes #39221 from shrprasa/ssl_options_fix.

Authored-by: Shrikant Prasad 
Signed-off-by: Dongjoon Hyun 
---
 core/src/main/scala/org/apache/spark/SSLOptions.scala  |  4 +++-
 .../test/scala/org/apache/spark/SSLOptionsSuite.scala  | 18 --
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SSLOptions.scala 
b/core/src/main/scala/org/apache/spark/SSLOptions.scala
index f1668966d8e..d159f5717b0 100644
--- a/core/src/main/scala/org/apache/spark/SSLOptions.scala
+++ b/core/src/main/scala/org/apache/spark/SSLOptions.scala
@@ -181,7 +181,9 @@ private[spark] object SSLOptions extends Logging {
   ns: String,
   defaults: Option[SSLOptions] = None): SSLOptions = {
 val enabled = conf.getBoolean(s"$ns.enabled", defaultValue = 
defaults.exists(_.enabled))
-
+if (!enabled) {
+  return new SSLOptions()
+}
 val port = conf.getWithSubstitution(s"$ns.port").map(_.toInt)
 port.foreach { p =>
   require(p >= 0, "Port number must be a non-negative value.")
diff --git a/core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala 
b/core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala
index c990d81de2e..81bc4ae9da0 100644
--- a/core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SSLOptionsSuite.scala
@@ -109,7 +109,7 @@ class SSLOptionsSuite extends SparkFunSuite {
 val conf = new SparkConf
 val hadoopConf = new Configuration()
 conf.set("spark.ssl.enabled", "true")
-conf.set("spark.ssl.ui.enabled", "false")
+conf.set("spark.ssl.ui.enabled", "true")
 conf.set("spark.ssl.ui.port", "4242")
 conf.set("spark.ssl.keyStore", keyStorePath)
 conf.set("spark.ssl.keyStorePassword", "password")
@@ -125,7 +125,7 @@ class SSLOptionsSuite extends SparkFunSuite {
 val defaultOpts = SSLOptions.parse(conf, hadoopConf, "spark.ssl", defaults 
= None)
 val opts = SSLOptions.parse(conf, hadoopConf, "spark.ssl.ui", defaults = 
Some(defaultOpts))
 
-assert(opts.enabled === false)
+assert(opts.enabled === true)
 assert(opts.port === Some(4242))
 assert(opts.trustStore.isDefined)
 assert(opts.trustStore.get.getName === "truststore")
@@ -140,6 +140,20 @@ class SSLOptionsSuite extends SparkFunSuite {
 assert(opts.enabledAlgorithms === Set("ABC", "DEF"))
   }
 
+  test("SPARK-41719: Skip ssl sub-settings if ssl is disabled") {
+val keyStorePath = new 
File(this.getClass.getResource("/keystore").toURI).getAbsolutePath
+val conf = new SparkConf
+val hadoopConf = new Configuration()
+conf.set("spark.ssl.enabled", "false")
+conf.set("spark.ssl.keyStorePassword", "password")
+conf.set("spark.ssl.keyStore", keyStorePath)
+val sslOpts = SSLOptions.parse(conf, hadoopConf, "spark.ssl", defaults = 
None)
+
+assert(sslOpts.enabled === false)
+assert(sslOpts.keyStorePassword === None)
+assert(sslOpts.keyStore === None)
+  }
+
   test("variable substitution") {
 val conf = new SparkConfWithEnv(Map(
   "ENV1" -> "val1",


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (63722c39462 -> 736964e73b7)

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from 63722c39462 [SPARK-41865][INFRA][3.2] Use pycodestyle to 2.7.0 to fix 
pycodestyle errors
 add 736964e73b7 [SPARK-41030][BUILD][3.2] Upgrade `Apache Ivy` to 2.5.1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 2 +-
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 2 +-
 pom.xml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated (ad2d42709ab -> 63722c39462)

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


from ad2d42709ab [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if 
the command is not available
 add 63722c39462 [SPARK-41865][INFRA][3.2] Use pycodestyle to 2.7.0 to fix 
pycodestyle errors

No new revisions were added by this update.

Summary of changes:
 .github/workflows/build_and_test.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

2023-01-03 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a2392be592b [SPARK-41862][SQL] Fix correctness bug related to DEFAULT 
values in Orc reader
a2392be592b is described below

commit a2392be592bf6aa75391ea50cbab77cde152f8ce
Author: Daniel Tenedorio 
AuthorDate: Wed Jan 4 09:30:42 2023 +0900

[SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc 
reader

### What changes were proposed in this pull request?

This PR fixes a correctness bug related to column DEFAULT values in Orc 
reader.

* https://github.com/apache/spark/pull/37280 introduced a performance 
regression in the Orc reader.
* https://github.com/apache/spark/pull/39362 fixed the performance 
regression, but stopped the column DEFAULT feature from working, causing a 
temporary correctness regression that we agreed for me to fix later.
* This PR restores column DEFAULT functionality for Orc scans and fixes the 
correctness regression while not reintroducing the performance regression.

### Why are the changes needed?

This PR fixes a correctness bug.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This PR updates a unit test to exercise that the Orc scan functionality is 
correct.

Closes #39370 from dtenedor/fix-perf-bug-orc-reader.

Authored-by: Daniel Tenedorio 
Signed-off-by: Hyukjin Kwon 
---
 .../datasources/orc/OrcDeserializer.scala  | 71 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 15 +
 2 files changed, 19 insertions(+), 67 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
index 5b207a04ada..5bac404fd53 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
@@ -42,21 +42,26 @@ class OrcDeserializer(
   //   is always null in this case
   // - a function that updates target column `index` otherwise.
   private val fieldWriters: Array[WritableComparable[_] => Unit] = {
+// Assume we create a table backed by Orc files. Then if we later run a 
command "ALTER TABLE t
+// ADD COLUMN c DEFAULT " on the Orc table, this adds one field to 
the Catalyst schema.
+// Then if we query the old files with the new Catalyst schema, we should 
only apply the
+// existence default value to the columns whose IDs are not explicitly 
requested.
+if (requiredSchema.hasExistenceDefaultValues) {
+  for (i <- 0 until requiredSchema.existenceDefaultValues.size) {
+requiredSchema.existenceDefaultsBitmask(i) =
+  if (requestedColIds(i) != -1) {
+false
+  } else {
+requiredSchema.existenceDefaultValues(i) != null
+  }
+  }
+}
 requiredSchema.zipWithIndex
   .map { case (f, index) =>
 if (requestedColIds(index) == -1) {
   null
 } else {
-  // Create a RowUpdater instance for converting Orc objects to 
Catalyst rows. If any fields
-  // in the Orc result schema have associated existence default 
values, maintain a
-  // boolean array to track which fields have been explicitly assigned 
for each row.
-  val rowUpdater: RowUpdater =
-if (requiredSchema.hasExistenceDefaultValues) {
-  resetExistenceDefaultsBitmask(requiredSchema)
-  new RowUpdaterWithBitmask(resultRow, 
requiredSchema.existenceDefaultsBitmask)
-} else {
-  new RowUpdater(resultRow)
-}
+  val rowUpdater = new RowUpdater(resultRow)
   val writer = newWriter(f.dataType, rowUpdater)
   (value: WritableComparable[_]) => writer(index, value)
 }
@@ -93,6 +98,7 @@ class OrcDeserializer(
   }
   targetColumnIndex += 1
 }
+applyExistenceDefaultValuesToRow(requiredSchema, resultRow)
 resultRow
   }
 
@@ -288,49 +294,4 @@ class OrcDeserializer(
 override def setDouble(ordinal: Int, value: Double): Unit = 
array.setDouble(ordinal, value)
 override def setFloat(ordinal: Int, value: Float): Unit = 
array.setFloat(ordinal, value)
   }
-
-  /**
-   * Subclass of RowUpdater that also updates a boolean array bitmask. In this 
way, after all
-   * assignments are complete, it is possible to inspect the bitmask to 
determine which columns have
-   * been written at least once.
-   */
-  final class RowUpdaterWithBitmask(
-  row: InternalRow, bitmask: Array[Boolean]) extends RowUpdater(row) {
-override def setNullAt(ordinal:

[spark] branch master updated (7da7ad3c5b9 -> c26d59864a9)

2023-01-03 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 7da7ad3c5b9 [SPARK-41423][CORE][BUILD] Exclude StageData.rddIds, this 
and accumulatorUpdates for Scala 2.13
 add c26d59864a9 [SPARK-41856][CONNECT][TESTS] Enable 
test_create_nan_decimal_dataframe, test_freqItems, test_input_files, 
test_to_pandas_required_pandas_not_found

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/tests/connect/test_parity_dataframe.py | 12 
 python/pyspark/sql/tests/test_dataframe.py|  2 +-
 2 files changed, 1 insertion(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1a5ef40a4d5 -> 7da7ad3c5b9)

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 1a5ef40a4d5 [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if 
the command is not available
 add 7da7ad3c5b9 [SPARK-41423][CORE][BUILD] Exclude StageData.rddIds, this 
and accumulatorUpdates for Scala 2.13

No new revisions were added by this update.

Summary of changes:
 project/MimaExcludes.scala | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.2 updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new ad2d42709ab [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if 
the command is not available
ad2d42709ab is described below

commit ad2d42709abfc8f8ad27f836c811a4b75ef32ee9
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 15:01:43 2023 -0800

[SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is 
not available

### What changes were proposed in this pull request?

This PR aims to skip `flake8` tests if the command is not available.

### Why are the changes needed?

Linters are optional modules and we can be skip in some systems like `mypy`.
```
$ dev/lint-python
starting python compilation test...
python compilation succeeded.

The Python library providing 'black' module was not found. Skipping black 
checks for now.

The flake8 command was not found. Skipping for now.
The mypy command was not found. Skipping for now.

all lint-python tests passed!
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

Closes #39372 from dongjoon-hyun/SPARK-41863.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1a5ef40a4d59b377b028b55ea3805caf5d55f28f)
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/dev/lint-python b/dev/lint-python
index e54e391c587..031b34f4af9 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -164,9 +164,8 @@ function flake8_test {
 local FLAKE8_STATUS=
 
 if ! hash "$FLAKE8_BUILD" 2> /dev/null; then
-echo "The flake8 command was not found."
-echo "flake8 checks failed."
-exit 1
+echo "The flake8 command was not found. Skipping for now."
+return
 fi
 
 _FLAKE8_VERSION=($($FLAKE8_BUILD --version))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.3 updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.3 by this push:
 new 2da30ad0658 [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if 
the command is not available
2da30ad0658 is described below

commit 2da30ad0658406462ede656ad368f890e7051a5e
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 15:01:43 2023 -0800

[SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is 
not available

### What changes were proposed in this pull request?

This PR aims to skip `flake8` tests if the command is not available.

### Why are the changes needed?

Linters are optional modules and we can be skip in some systems like `mypy`.
```
$ dev/lint-python
starting python compilation test...
python compilation succeeded.

The Python library providing 'black' module was not found. Skipping black 
checks for now.

The flake8 command was not found. Skipping for now.
The mypy command was not found. Skipping for now.

all lint-python tests passed!
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

Closes #39372 from dongjoon-hyun/SPARK-41863.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 1a5ef40a4d59b377b028b55ea3805caf5d55f28f)
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/dev/lint-python b/dev/lint-python
index f0ca8832be0..5505f4b1105 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -173,9 +173,8 @@ function flake8_test {
 local FLAKE8_STATUS=
 
 if ! hash "$FLAKE8_BUILD" 2> /dev/null; then
-echo "The flake8 command was not found."
-echo "flake8 checks failed."
-exit 1
+echo "The flake8 command was not found. Skipping for now."
+return
 fi
 
 _FLAKE8_VERSION=($($FLAKE8_BUILD --version))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a5ef40a4d5 [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if 
the command is not available
1a5ef40a4d5 is described below

commit 1a5ef40a4d59b377b028b55ea3805caf5d55f28f
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 15:01:43 2023 -0800

[SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is 
not available

### What changes were proposed in this pull request?

This PR aims to skip `flake8` tests if the command is not available.

### Why are the changes needed?

Linters are optional modules and we can be skip in some systems like `mypy`.
```
$ dev/lint-python
starting python compilation test...
python compilation succeeded.

The Python library providing 'black' module was not found. Skipping black 
checks for now.

The flake8 command was not found. Skipping for now.
The mypy command was not found. Skipping for now.

all lint-python tests passed!
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

Closes #39372 from dongjoon-hyun/SPARK-41863.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/lint-python | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/dev/lint-python b/dev/lint-python
index f1f4e9f1070..b5ee63e3869 100755
--- a/dev/lint-python
+++ b/dev/lint-python
@@ -175,9 +175,8 @@ function flake8_test {
 local FLAKE8_STATUS=
 
 if ! hash "$FLAKE8_BUILD" 2> /dev/null; then
-echo "The flake8 command was not found."
-echo "flake8 checks failed."
-exit 1
+echo "The flake8 command was not found. Skipping for now."
+return
 fi
 
 _FLAKE8_VERSION=($($FLAKE8_BUILD --version))


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 13b2856e6e7 [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors
13b2856e6e7 is described below

commit 13b2856e6e77392a417d2bb2ce804f873ee72b28
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 15:00:50 2023 -0800

[SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

### What changes were proposed in this pull request?

Currently, the GitHub Action Python linter job is broken. This PR will 
recover Python linter failure.

### Why are the changes needed?

There are two kind of failures.
1. https://github.com/apache/spark/actions/runs/3829330032/jobs/6524170799
```
python/pyspark/pandas/sql_processor.py:221: error: unused "type: ignore" 
comment
Found 1 error in 1 file (checked 380 source files)
```

2. After fixing (1), we hit the following.
```
ModuleNotFoundError: No module named 'py._path'; 'py' is not a package
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GitHub CI on this PR. Or, manually run the following.
```
$ dev/lint-python
starting python compilation test...
python compilation succeeded.

starting black test...
black checks passed.

starting flake8 test...
flake8 checks passed.

starting mypy annotations test...
annotations passed mypy checks.

starting mypy examples test...
examples passed mypy checks.

starting mypy data test...
annotations passed data checks.

all lint-python tests passed!
```

Closes #39373 from dongjoon-hyun/SPARK-41864.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 dev/requirements.txt   | 1 +
 python/pyspark/pandas/sql_processor.py | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/dev/requirements.txt b/dev/requirements.txt
index c3911b57eb9..1d978c4602c 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -47,6 +47,7 @@ PyGithub
 
 # pandas API on Spark Code formatter.
 black==22.6.0
+py
 
 # Spark Connect (required)
 grpcio==1.48.1
diff --git a/python/pyspark/pandas/sql_processor.py 
b/python/pyspark/pandas/sql_processor.py
index ec6b0498511..28e2329b8f9 100644
--- a/python/pyspark/pandas/sql_processor.py
+++ b/python/pyspark/pandas/sql_processor.py
@@ -218,7 +218,7 @@ def _get_ipython_scope() -> Dict[str, Any]:
 in an IPython notebook environment.
 """
 try:
-from IPython import get_ipython  # type: ignore[import]
+from IPython import get_ipython
 
 shell = get_ipython()
 return shell.user_ns


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (23aec321bd8 -> 7ede493bfca)

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 23aec321bd8 [SPARK-41049][SQL][FOLLOWUP] Move expression 
initialization code to the base class
 add 7ede493bfca [SPARK-41814][SPARK-41851][SPARK-41852][FOLLOW-UP] 
Reeanble skipped doctests

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/column.py| 3 +--
 python/pyspark/sql/connect/functions.py | 6 --
 2 files changed, 1 insertion(+), 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41049][SQL][FOLLOWUP] Move expression initialization code to the base class

2023-01-03 Thread viirya

This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 23aec321bd8 [SPARK-41049][SQL][FOLLOWUP] Move expression 
initialization code to the base class
23aec321bd8 is described below

commit 23aec321bd822867a698ee3bc17b21753ce8
Author: Wenchen Fan 
AuthorDate: Tue Jan 3 10:46:44 2023 -0800

[SPARK-41049][SQL][FOLLOWUP] Move expression initialization code to the 
base class

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/39248 , to add 
one more code cleanup. The expression initialization code is duplicated 6 times 
and we should put it in the base class.

### Why are the changes needed?

code cleanup

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #39364 from cloud-fan/expr.

Authored-by: Wenchen Fan 
Signed-off-by: Liang-Chi Hsieh 
---
 .../spark/sql/catalyst/expressions/ExpressionsEvaluator.scala  | 7 +++
 .../sql/catalyst/expressions/InterpretedMutableProjection.scala| 5 +
 .../spark/sql/catalyst/expressions/InterpretedSafeProjection.scala | 5 +
 .../sql/catalyst/expressions/InterpretedUnsafeProjection.scala | 5 +
 .../org/apache/spark/sql/catalyst/expressions/Projection.scala | 5 +
 .../org/apache/spark/sql/catalyst/expressions/predicates.scala | 6 +-
 6 files changed, 12 insertions(+), 21 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala
index dcbc6926cd3..1fc0144fede 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpressionsEvaluator.scala
@@ -42,4 +42,11 @@ trait ExpressionsEvaluator {
* The default implementation does nothing.
*/
   def initialize(partitionIndex: Int): Unit = {}
+
+  protected def initializeExprs(exprs: Seq[Expression], partitionIndex: Int): 
Unit = {
+exprs.foreach(_.foreach {
+  case n: Nondeterministic => n.initialize(partitionIndex)
+  case _ =>
+})
+  }
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
index 682604b9bf7..01e9de085da 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedMutableProjection.scala
@@ -41,10 +41,7 @@ class InterpretedMutableProjection(expressions: 
Seq[Expression]) extends Mutable
   private[this] val buffer = new Array[Any](expressions.size)
 
   override def initialize(partitionIndex: Int): Unit = {
-exprs.foreach(_.foreach {
-  case n: Nondeterministic => n.initialize(partitionIndex)
-  case _ =>
-})
+initializeExprs(exprs, partitionIndex)
   }
 
   private[this] val validExprs = expressions.zipWithIndex.filter {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedSafeProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedSafeProjection.scala
index 84263d97f5d..87539e80b0b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedSafeProjection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedSafeProjection.scala
@@ -101,10 +101,7 @@ class InterpretedSafeProjection(expressions: 
Seq[Expression]) extends Projection
   }
 
   override def initialize(partitionIndex: Int): Unit = {
-expressions.foreach(_.foreach {
-  case n: Nondeterministic => n.initialize(partitionIndex)
-  case _ =>
-})
+initializeExprs(exprs, partitionIndex)
   }
 
   override def apply(row: InternalRow): InternalRow = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
index 9108a045c09..90a90444695 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
@@ -67,10 +67,7 @@ class InterpretedUnsafeProjection(expressions: 
Array[Expression]) extends Unsafe
   }
 
   override def initialize(partitionIndex: Int): Unit

[spark] branch master updated: [SPARK-41858][SQL] Fix ORC reader perf regression due to DEFAULT value feature

2023-01-03 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d81e55e1ff9 [SPARK-41858][SQL] Fix ORC reader perf regression due to 
DEFAULT value feature
d81e55e1ff9 is described below

commit d81e55e1ff998c624fa80c5660d7724701b4df23
Author: Dongjoon Hyun 
AuthorDate: Tue Jan 3 10:40:44 2023 -0800

[SPARK-41858][SQL] Fix ORC reader perf regression due to DEFAULT value 
feature

### What changes were proposed in this pull request?

This PR is a partial and logical revert of SPARK-39862, 
https://github.com/apache/spark/pull/37280, to fix the huge ORC reader perf 
regression (3x slower).

SPARK-39862 should propose a fix without perf regression.

### Why are the changes needed?

During Apache Spark 3.4.0 preparation, SPARK-41782 identified a perf 
regression.
- https://github.com/apache/spark/pull/39301#discussion_r1059239575

### Does this PR introduce _any_ user-facing change?

After this PR, the regression is removed. However, the bug of DEFAULT value 
feature will remain. This should be handled separately.

### How was this patch tested?

Pass the CI.

Closes #39362 from dongjoon-hyun/SPARK-41858.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 .../execution/datasources/orc/OrcDeserializer.scala | 21 +++--
 .../org/apache/spark/sql/sources/InsertSuite.scala  |  9 +++--
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
index 5276f5c6d7b..5b207a04ada 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
@@ -57,14 +57,7 @@ class OrcDeserializer(
 } else {
   new RowUpdater(resultRow)
 }
-  val writer: (Int, WritableComparable[_]) => Unit =
-(ordinal, value) =>
-  if (value == null) {
-rowUpdater.setNullAt(ordinal)
-  } else {
-val writerFunc = newWriter(f.dataType, rowUpdater)
-writerFunc(ordinal, value)
-  }
+  val writer = newWriter(f.dataType, rowUpdater)
   (value: WritableComparable[_]) => writer(index, value)
 }
   }.toArray
@@ -75,7 +68,11 @@ class OrcDeserializer(
 while (targetColumnIndex < fieldWriters.length) {
   if (fieldWriters(targetColumnIndex) != null) {
 val value = orcStruct.getFieldValue(requestedColIds(targetColumnIndex))
-fieldWriters(targetColumnIndex)(value)
+if (value == null) {
+  resultRow.setNullAt(targetColumnIndex)
+} else {
+  fieldWriters(targetColumnIndex)(value)
+}
   }
   targetColumnIndex += 1
 }
@@ -88,7 +85,11 @@ class OrcDeserializer(
 while (targetColumnIndex < fieldWriters.length) {
   if (fieldWriters(targetColumnIndex) != null) {
 val value = orcValues(requestedColIds(targetColumnIndex))
-fieldWriters(targetColumnIndex)(value)
+if (value == null) {
+  resultRow.setNullAt(targetColumnIndex)
+} else {
+  fieldWriters(targetColumnIndex)(value)
+}
   }
   targetColumnIndex += 1
 }
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
index dd37c93871e..7c4a39d6ff4 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala
@@ -1679,7 +1679,8 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   Config(
 None),
   Config(
-Some(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> "false",
+Some(SQLConf.ORC_VECTORIZED_READER_ENABLED.key -> "false"),
+insertNullsToStorage = false))),
   TestCase(
 dataSource = "parquet",
 Seq(
@@ -1943,7 +1944,11 @@ class InsertSuite extends DataSourceTest with 
SharedSparkSession {
   Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl"))),
   Seq(Map(true -> "xyz"))),
 Row(2,
-  null,
+  if (config.dataSource != "orc") {
+null
+  } else {
+Row(Seq(Row(1, 2)), Seq(Map(false -> "def", true -> "jkl")))
+  },
   Seq(Map(true -> "xyz"))),
 Row(3,
   Row(Seq(Row(3, 4)), Seq(Map(false -> "mno", true ->

[spark] branch master updated (3c40be2dddc -> ec594236df4)

2023-01-03 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 3c40be2dddc [SPARK-41405][SQL] Centralize the column resolution logic
 add ec594236df4 [SPARK-41853][CORE] Use Map in place of SortedMap for 
ErrorClassesJsonReader

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f0d9692c5d2 -> 3c40be2dddc)

2023-01-03 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from f0d9692c5d2 
[SPARK-41855][SPARK-41814][SPARK-41851][SPARK-41852][CONNECT][PYTHON] Make 
`createDataFrame` handle None/NaN properly
 add 3c40be2dddc [SPARK-41405][SQL] Centralize the column resolution logic

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala | 797 ++---
 .../ResolveLateralColumnAliasReference.scala   |  24 +-
 .../spark/sql/catalyst/analysis/unresolved.scala   |  19 +-
 .../catalyst/expressions/namedExpressions.scala|  23 +-
 .../spark/sql/catalyst/expressions/subquery.scala  |   9 +-
 .../sql/catalyst/rules/RuleIdCollection.scala  |   1 -
 .../spark/sql/catalyst/trees/TreePatterns.scala|   1 +
 .../apache/spark/sql/LateralColumnAliasSuite.scala |   3 +-
 8 files changed, 424 insertions(+), 453 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5935693185d -> f0d9692c5d2)

2023-01-03 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 5935693185d [SPARK-41857][CONNECT][TESTS] Enable 
test_between_function, test_datetime_functions, test_expr, test_math_functions, 
test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
test_approxQuantile
 add f0d9692c5d2 
[SPARK-41855][SPARK-41814][SPARK-41851][SPARK-41852][CONNECT][PYTHON] Make 
`createDataFrame` handle None/NaN properly

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/connect/session.py  |  52 ---
 .../sql/tests/connect/test_connect_basic.py| 102 +
 2 files changed, 142 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (02f12eeed0c -> 5935693185d)

2023-01-03 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


from 02f12eeed0c [SPARK-41658][SPARK-41656][DOCS][FOLLOW-UP] Update JIRAs 
in skipped tests' comments
 add 5935693185d [SPARK-41857][CONNECT][TESTS] Enable 
test_between_function, test_datetime_functions, test_expr, test_math_functions, 
test_window_functions_cumulative_sum, test_corr, test_cov, test_crosstab, 
test_approxQuantile

No new revisions were added by this update.

Summary of changes:
 .../sql/tests/connect/test_parity_functions.py | 36 --
 1 file changed, 36 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41862][SQL][TESTS][FOLLOWUP] Update OrcReadBenchmark result

[spark] 01/03: [SPARK-38261][INFRA] Add missing R packages from base image

[spark] branch branch-3.2 updated (576ca6e43c3 -> 0f5e231923b)

[spark] 02/03: [SPARK-39596][INFRA] Install `ggplot2` for GitHub Action linter job

[spark] 03/03: [SPARK-39596][INFRA][FOLLOWUP] Install `mvtnorm` and `statmod` at linter job

[spark] branch master updated (0b786901633 -> 3130ca9748b)

[spark] branch branch-3.2 updated: Revert "[SPARK-36939][PYTHON][DOCS] Add orphan migration page into list in PySpark documentation"

[spark] branch master updated: [SPARK-41850][CONNECT][PYTHON][TESTS] Enable doctest for `isnan`

[spark] branch branch-3.3 updated: [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

[spark] branch branch-3.2 updated: [SPARK-36883][INFRA] Upgrade R version to 4.1.1 in CI images

[spark] branch master updated: [SPARK-41719][CORE] Skip SSLOptions sub-settings if `ssl` is disabled

[spark] branch branch-3.2 updated (63722c39462 -> 736964e73b7)

[spark] branch branch-3.2 updated (ad2d42709ab -> 63722c39462)

[spark] branch master updated: [SPARK-41862][SQL] Fix correctness bug related to DEFAULT values in Orc reader

[spark] branch master updated (7da7ad3c5b9 -> c26d59864a9)

[spark] branch master updated (1a5ef40a4d5 -> 7da7ad3c5b9)

[spark] branch branch-3.2 updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

[spark] branch branch-3.3 updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

[spark] branch master updated: [SPARK-41863][INFRA][PYTHON][TESTS] Skip `flake8` tests if the command is not available

[spark] branch master updated: [SPARK-41864][INFRA][PYTHON] Fix mypy linter errors

[spark] branch master updated (23aec321bd8 -> 7ede493bfca)

[spark] branch master updated: [SPARK-41049][SQL][FOLLOWUP] Move expression initialization code to the base class

[spark] branch master updated: [SPARK-41858][SQL] Fix ORC reader perf regression due to DEFAULT value feature

[spark] branch master updated (3c40be2dddc -> ec594236df4)

[spark] branch master updated (f0d9692c5d2 -> 3c40be2dddc)

[spark] branch master updated (5935693185d -> f0d9692c5d2)

[spark] branch master updated (02f12eeed0c -> 5935693185d)

27 matches

Site Navigation

Mail list logo

Footer information