[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-21 Thread e-dorigatti
GitHub user e-dorigatti opened a pull request: https://github.com/apache/spark/pull/21383 [SPARK-23754][Python] Re-raising StopIteration in client code ## What changes were proposed in this pull request? Make sure that `StopIteration`s raised in users' code do not silently

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189823961 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189863049 --- Diff: python/pyspark/rdd.py --- @@ -173,6 +173,7 @@ def ignore_unicode_prefix(f): return f + --- End diff -- I

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189872060 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-22 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r189890760 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-23 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190153031 --- Diff: python/pyspark/shuffle.py --- @@ -67,6 +67,19 @@ def get_used_memory(): return 0 +def safe_iter(f

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-26 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21383 I am thinking to use `fail_on_stopiteration` in the worker instead of in the `UserDefinedFunction`. I don't really like this solution since you have to fix every other place that uses an udf

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190603773 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190567010 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-24 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r190656662 --- Diff: python/pyspark/tests.py --- @@ -1246,6 +1277,25 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-24 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21383 I must be doing something wrong because tests pass on my machine. For now I have no other way but keep pushing here

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-22 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21383 I am really not sure how moving and renaming the function broke the `udf` test, I'm looking into it right now

[GitHub] spark pull request #21538: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-12 Thread e-dorigatti
GitHub user e-dorigatti opened a pull request: https://github.com/apache/spark/pull/21538 [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration wrapping from driver to executor SPARK-23754 was fixed in #21383 by changing the UDF code to wrap the user function, but this required

[GitHub] spark issue #21538: [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF s...

2018-06-12 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21538 Seems like it skipped the pandas tests, for both python2.7 and pypy ``` Will skip Pandas related features against Python executable

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-08 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r194041622 --- Diff: python/pyspark/worker.py --- @@ -140,15 +139,18 @@ def read_single_udf(pickleSer, infile, eval_type): else

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-08 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 I updated the description, so that it will be included in the commit message, and slightly clarified some comments

[GitHub] spark issue #21538: [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Move UDF s...

2018-06-13 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21538 @HyukjinKwon thank you so much for your patience :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21538: [SPARK-23754][PYTHON][FOLLOWUP][BACKPORT-2.3] Mov...

2018-06-13 Thread e-dorigatti
Github user e-dorigatti closed the pull request at: https://github.com/apache/spark/pull/21538 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21461: [SPARK-23754][Python] Backport bugfix

2018-05-30 Thread e-dorigatti
Github user e-dorigatti closed the pull request at: https://github.com/apache/spark/pull/21461 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising Stop...

2018-05-30 Thread e-dorigatti
Github user e-dorigatti closed the pull request at: https://github.com/apache/spark/pull/21463 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIterati...

2018-05-30 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21463 @HyukjinKwon go ahead, no problem --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #21467: [SPARK-23754][Python] Fix UDF hack

2018-05-31 Thread e-dorigatti
GitHub user e-dorigatti opened a pull request: https://github.com/apache/spark/pull/21467 [SPARK-23754][Python] Fix UDF hack ## What changes were proposed in this pull request? SPARK-23754 was fixed in #21383 by changing the UDF code to wrap the user function, but this required

[GitHub] spark issue #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIterati...

2018-05-31 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21463 @HyukjinKwon haha no probs :) I have to open a new PR tho as yesterday I messed up my repos. --- - To unsubscribe, e-mail

[GitHub] spark pull request #21461: [SPARK-23754][Python] Backport bugfix

2018-05-30 Thread e-dorigatti
GitHub user e-dorigatti opened a pull request: https://github.com/apache/spark/pull/21461 [SPARK-23754][Python] Backport bugfix Fix for master was already accepted in [another pull request](https://github.com/apache/spark/pull/21383), but there were conflicts while merging

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-30 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21383 @HyukjinKwon [here](https://github.com/apache/spark/pull/21461) --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191418719 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191427825 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-04 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 does jenkins show _all_ the failed tests? or just the first one? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-05-31 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 I'm the least qualified to answer here, but technically the bug is fixed and this is just to clean the code a bit, so it shouldn't be a blocker (but don't quote me

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 @viirya we only want to reverd `udf.py` and the hack in `_get_argspec`. Did I miss anything there? --- - To unsubscribe, e

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r192502611 --- Diff: python/pyspark/util.py --- @@ -53,16 +53,11 @@ def _get_argspec(f): """ Get argspec of a function. Support

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 `PyArrow` again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 @viirya for RDDs, we cannot check for stopiterations in the executor because this bug is introduced in the driver by rewriting the methods. consider rdd.map: ``` def map(self, f

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 messed up, of course :) fixing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-05 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-05 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 @HyukjinKwon thank you, I though it was reserved to members --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21383: [SPARK-23754][Python] Re-raising StopIteration in client...

2018-05-27 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21383 Yes, the problem was that the signature is lost when the function is wrapped, and the worker needs the signature to know whether the function needs keys together with values

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-27 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191113977 --- Diff: python/pyspark/util.py --- @@ -89,6 +93,33 @@ def majorMinorVersion(sparkVersion): " version nu

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191472884 --- Diff: python/pyspark/sql/udf.py --- @@ -157,7 +157,17 @@ def _create_judf(self): spark = SparkSession.builder.getOrCreate

[GitHub] spark pull request #21383: [SPARK-23754][Python] Re-raising StopIteration in...

2018-05-29 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21383#discussion_r191473968 --- Diff: python/pyspark/sql/tests.py --- @@ -900,6 +900,22 @@ def __call__(self, x): self.assertEqual(f, f_.func

[GitHub] spark pull request #21467: [SPARK-23754][Python] Fix UDF argspec hack

2018-05-31 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r192114498 --- Diff: python/pyspark/worker.py --- @@ -69,6 +69,7 @@ def chain(f, g): def wrap_udf(f, return_type): +f

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-05-31 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r192116000 --- Diff: python/pyspark/worker.py --- @@ -69,6 +69,7 @@ def chain(f, g): def wrap_udf(f, return_type): +f

[GitHub] spark pull request #21463: [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising Stop...

2018-05-30 Thread e-dorigatti
GitHub user e-dorigatti opened a pull request: https://github.com/apache/spark/pull/21463 [SPARK-23754][BRANCH-2.3][PYTHON] Re-raising StopIteration in client code ## What changes are proposed Make sure that `StopIteration`s raised in users' code do not silently interrupt

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-01 Thread e-dorigatti
Github user e-dorigatti commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r192455582 --- Diff: python/pyspark/worker.py --- @@ -140,15 +139,20 @@ def read_single_udf(pickleSer, infile, eval_type): else

[GitHub] spark issue #21467: [SPARK-23754][Python] Fix UDF argspec hack

2018-05-31 Thread e-dorigatti
Github user e-dorigatti commented on the issue: https://github.com/apache/spark/pull/21467 Seems like a problem in Jenkins ``` ImportError: PyArrow >= 0.8.0 must be installed; however, it was not fo