[spark] branch branch-3.2 updated: [SPARK-38073][PYTHON] Update atexit function to avoid issues with late binding

2022-02-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
 new 2e382c8  [SPARK-38073][PYTHON] Update atexit function to avoid issues 
with late binding
2e382c8 is described below

commit 2e382c8bff2d0c3733b9b525168254971ca1175e
Author: zero323 
AuthorDate: Fri Feb 4 20:21:02 2022 -0800

[SPARK-38073][PYTHON] Update atexit function to avoid issues with late 
binding

### What changes were proposed in this pull request?

This PR updates function registered in PySpark shell `atexit` to capture 
`SparkContext` instead of depending on the surrounding context.

**Note**

A simpler approach

```python
atexit.register(sc.stop)
```

is possible, but won't work properly in case of contexts with monkey 
patched `stop` methods (for example like 
[pyspark-asyncactions](https://github.com/zero323/pyspark-asyncactions))

I also consider using `_active_spark_context`

```python
atexit.register(lambda: (
SparkContext._active_spark_context.stop()
if SparkContext._active_spark_context
else None
))
```

but `SparkContext` is also out of scope, so that doesn't work without 
introducing a standard function within the scope.

### Why are the changes needed?

When using `ipython` as a driver with Python 3.8, `sc` goes out of scope 
before `atexit` function is called. This leads to `NameError` on exit. This is 
a mild annoyance and likely a bug in ipython (there are quite a few of these 
with similar behavior), but it is easy to address on our side, without causing 
regressions for users of earlier Python versions.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual testing to confirm that:

- Named error is no longer thrown on exit with ipython and Python 3.8 or 
later.
- `stop` is indeed invoked on exit with both plain interpreter and ipython 
shells.

Closes #35396 from zero323/SPARK-38073.

Authored-by: zero323 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 3e0d4899dcb3be226a120cbeec8df78ff7fb00ba)
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/shell.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/shell.py b/python/pyspark/shell.py
index 25aadb1..0c6a608 100644
--- a/python/pyspark/shell.py
+++ b/python/pyspark/shell.py
@@ -45,7 +45,7 @@ except Exception:
 
 sc = spark.sparkContext
 sql = spark.sql
-atexit.register(lambda: sc.stop())
+atexit.register((lambda sc: lambda: sc.stop())(sc))
 
 # for compatibility
 sqlContext = spark._wrapped

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (49f215a -> 3e0d489)

2022-02-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 49f215a  [SPARK-38082][PYTHON] Update minimum numpy version to 1.15
 add 3e0d489  [SPARK-38073][PYTHON] Update atexit function to avoid issues 
with late binding

No new revisions were added by this update.

Summary of changes:
 python/pyspark/shell.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-38082][PYTHON] Update minimum numpy version to 1.15

2022-02-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49f215a  [SPARK-38082][PYTHON] Update minimum numpy version to 1.15
49f215a is described below

commit 49f215a5ae64a50e889ae5cf94421cdeb0eacf09
Author: zero323 
AuthorDate: Fri Feb 4 20:05:35 2022 -0800

[SPARK-38082][PYTHON] Update minimum numpy version to 1.15

### What changes were proposed in this pull request?

This PR changes minimum required numpy version to 1.15.

Additionally, it replaces calls to deprecated `tostring` method.

### Why are the changes needed?

Current lower bound is ancient and no longer supported by the rest our 
dependencies.

Additionally, supporting it, requires usage of long deprecated methods 
creating unnecessary gaps in our type checker coverage.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #35398 from zero323/SPARK-38082.

Authored-by: zero323 
Signed-off-by: Dongjoon Hyun 
---
 python/pyspark/ml/linalg/__init__.py| 12 ++--
 python/pyspark/mllib/linalg/__init__.py | 14 +++---
 python/setup.py |  6 +++---
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/python/pyspark/ml/linalg/__init__.py 
b/python/pyspark/ml/linalg/__init__.py
index b361925..03e63e9 100644
--- a/python/pyspark/ml/linalg/__init__.py
+++ b/python/pyspark/ml/linalg/__init__.py
@@ -303,7 +303,7 @@ class DenseVector(Vector):
 self.array = ar
 
 def __reduce__(self):
-return DenseVector, (self.array.tostring(),)
+return DenseVector, (self.array.tobytes(),)
 
 def numNonzeros(self):
 """
@@ -591,7 +591,7 @@ class SparseVector(Vector):
 return np.linalg.norm(self.values, p)
 
 def __reduce__(self):
-return (SparseVector, (self.size, self.indices.tostring(), 
self.values.tostring()))
+return (SparseVector, (self.size, self.indices.tobytes(), 
self.values.tobytes()))
 
 def dot(self, other):
 """
@@ -949,7 +949,7 @@ class DenseMatrix(Matrix):
 return DenseMatrix, (
 self.numRows,
 self.numCols,
-self.values.tostring(),
+self.values.tobytes(),
 int(self.isTransposed),
 )
 
@@ -1160,9 +1160,9 @@ class SparseMatrix(Matrix):
 return SparseMatrix, (
 self.numRows,
 self.numCols,
-self.colPtrs.tostring(),
-self.rowIndices.tostring(),
-self.values.tostring(),
+self.colPtrs.tobytes(),
+self.rowIndices.tobytes(),
+self.values.tobytes(),
 int(self.isTransposed),
 )
 
diff --git a/python/pyspark/mllib/linalg/__init__.py 
b/python/pyspark/mllib/linalg/__init__.py
index 30fa84c..b9c391e 100644
--- a/python/pyspark/mllib/linalg/__init__.py
+++ b/python/pyspark/mllib/linalg/__init__.py
@@ -390,7 +390,7 @@ class DenseVector(Vector):
 return DenseVector(values)
 
 def __reduce__(self) -> Tuple[Type["DenseVector"], Tuple[bytes]]:
-return DenseVector, (self.array.tostring(),)  # type: 
ignore[attr-defined]
+return DenseVector, (self.array.tobytes(),)
 
 def numNonzeros(self) -> int:
 """
@@ -712,8 +712,8 @@ class SparseVector(Vector):
 SparseVector,
 (
 self.size,
-self.indices.tostring(),  # type: ignore[attr-defined]
-self.values.tostring(),  # type: ignore[attr-defined]
+self.indices.tobytes(),
+self.values.tobytes(),
 ),
 )
 
@@ -1256,7 +1256,7 @@ class DenseMatrix(Matrix):
 return DenseMatrix, (
 self.numRows,
 self.numCols,
-self.values.tostring(),  # type: ignore[attr-defined]
+self.values.tobytes(),
 int(self.isTransposed),
 )
 
@@ -1489,9 +1489,9 @@ class SparseMatrix(Matrix):
 return SparseMatrix, (
 self.numRows,
 self.numCols,
-self.colPtrs.tostring(),  # type: ignore[attr-defined]
-self.rowIndices.tostring(),  # type: ignore[attr-defined]
-self.values.tostring(),  # type: ignore[attr-defined]
+self.colPtrs.tobytes(),
+self.rowIndices.tobytes(),
+self.values.tobytes(),
 int(self.isTransposed),
 )
 
diff --git a/python/setup.py b/python/setup.py
index 4ff495c..673b146 100755
--- a/python/setup.py
+++ b/python/setup.py
@@ -260,8 +260,8 @@ try:
 # if you're updating the versions or dependencies.
 install_requires=['py4j==0.10.9.3'],
 extras_require={
-'ml': ['numpy>=1.7'],
-

[spark] branch master updated (54b11fa -> 973ea0f)

2022-02-04 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 54b11fa  [MINOR] Remove unnecessary null check for exception cause
 add 973ea0f  [SPARK-36837][BUILD] Upgrade Kafka to 3.1.0

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/kafka010/KafkaTestUtils.scala   |  7 +++
 .../spark/streaming/kafka010/KafkaRDDSuite.scala | 20 ++--
 .../spark/streaming/kafka010/KafkaTestUtils.scala|  3 ++-
 pom.xml  |  2 +-
 4 files changed, 20 insertions(+), 12 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[GitHub] [spark-website] srowen closed pull request #378: Contribution guide to document actual guide for pull requests

2022-02-04 Thread GitBox


srowen closed pull request #378:
URL: https://github.com/apache/spark-website/pull/378


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark-website] branch asf-site updated: Contribution guide to document actual guide for pull requests

2022-02-04 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/spark-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 991df19  Contribution guide to document actual guide for pull requests
991df19 is described below

commit 991df1959e2381dfd32dadce39cbfa2be80ec0c6
Author: khalidmammadov 
AuthorDate: Fri Feb 4 17:07:55 2022 -0600

Contribution guide to document actual guide for pull requests

Currently contribution guide does not reflect actual flow to raise a new PR 
and hence it's not clear (for a new contributors) what exactly needs to be done 
to make a PR for Spark repository and test it as per expectation. This PR 
addresses that by following:

- It describes in the Pull request section of the Contributing page the 
actual procedure and takes a contributor through a step by step process.
- It removes optional "Running tests in your forked repository" section on 
Developer Tools page which is obsolete now and doesn't reflect reality anymore 
i.e. it says we can test by clicking “Run workflow” button which is not 
available anymore as workflow does not use "workflow_dispatch" event trigger 
anymore and was removed in  https://github.com/apache/spark/pull/32092
- Instead it documents the new procedure that above PR introduced i.e. 
contributors needs to use their own GitHub free workflow credits to test new 
changes they are purposing and a Spark Actions workflow will expect that to be 
completed before marking PR to be ready for a review.
- Some general wording was copied from  "Running tests in your forked 
repository" section on Developer Tools page but main content was rewritten to 
meet objective
- Also fixed URL to developer-tools.html to be resolved by parser (that 
converted it into relative URI) instead of using hard coded absolute URL.

Tested imperically with `bundle exec jekyll serve` and static files were 
generated with `bundle exec jekyll build` commands

This closes https://issues.apache.org/jira/browse/SPARK-37996

Author: khalidmammadov 

Closes #378 from khalidmammadov/fix_contribution_workflow_guide.
---
 contributing.md|  21 +++--
 developer-tools.md |  17 -
 images/running-tests-using-github-actions.png  | Bin 312696 -> 0 bytes
 site/contributing.html |  18 +-
 site/developer-tools.html  |  19 ---
 site/images/running-tests-using-github-actions.png | Bin 312696 -> 0 bytes
 6 files changed, 28 insertions(+), 47 deletions(-)

diff --git a/contributing.md b/contributing.md
index d5f0142..b127afe 100644
--- a/contributing.md
+++ b/contributing.md
@@ -322,9 +322,16 @@ Example: `Fix typos in Foo scaladoc`
 
 Pull request
 
+Before creating a pull request in Apache Spark, it is important to check if 
tests can pass on your branch because 
+our GitHub Actions workflows automatically run tests for your pull 
request/following commits 
+and every run burdens the limited resources of GitHub Actions in Apache Spark 
repository.
+Below steps will take your through the process.
+
+
 1. https://help.github.com/articles/fork-a-repo/;>Fork the GitHub 
repository at 
 https://github.com/apache/spark;>https://github.com/apache/spark 
if you haven't already
-1. Clone your fork, create a new branch, push commits to the branch.
+1. Go to "Actions" tab on your forked repository and enable "Build and test" 
and "Report test results" workflows  
+1. Clone your fork and create a new branch
 1. Consider whether documentation or tests need to be added or updated as part 
of the change, 
 and add them as needed.
   1. When you add tests, make sure the tests are self-descriptive.
@@ -355,14 +362,16 @@ and add them as needed.
   ...
 ```
 1. Consider whether benchmark results should be added or updated as part of 
the change, and add them as needed by
-https://spark.apache.org/developer-tools.html#github-workflow-benchmarks;>Running
 benchmarks in your forked repository
+Running 
benchmarks in your forked repository
 to generate benchmark results.
 1. Run all tests with `./dev/run-tests` to verify that the code still 
compiles, passes tests, and 
-passes style checks. Alternatively you can run the tests via GitHub Actions 
workflow by
-https://spark.apache.org/developer-tools.html#github-workflow-tests;>Running
 tests in your forked repository.
+passes style checks. 
 If style checks fail, review the Code Style Guide below.
+1. Push commits to your branch. This will trigger "Build and test" and "Report 
test results" workflows 
+on your forked repository and start testing and validating your changes.
 1. https://help.github.com/articles/using-pull-requests/;>Open a pull 
request against 
-the 

[GitHub] [spark-website] khalidmammadov commented on pull request #378: Contribution guide to document actual guide for pull requests

2022-02-04 Thread GitBox


khalidmammadov commented on pull request #378:
URL: https://github.com/apache/spark-website/pull/378#issuecomment-1030411355


   I think all done, anything left to do here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7a613ec -> 54b11fa)

2022-02-04 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7a613ec  [SPARK-38100][SQL] Remove unused private method in `Decimal`
 add 54b11fa  [MINOR] Remove unnecessary null check for exception cause

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/spark/network/shuffle/ErrorHandler.java  | 4 ++--
 .../org/apache/spark/network/shuffle/RetryingBlockTransferor.java | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org