HyukjinKwon commented on a change in pull request #25847:
[SPARK-21045][PYSPARK] Defensive check for exception info thrown by user
URL: https://github.com/apache/spark/pull/25847#discussion_r326044391
##########
File path: python/pyspark/tests/test_worker.py
##########
@@ -150,6 +151,28 @@ def test_with_different_versions_of_python(self):
finally:
self.sc.pythonVer = version
+ def test_python_exception_non_hanging(self):
+ """
+ SPARK-21045: exceptions with no ascii encoding shall not hanging
PySpark.
+ """
+ def f():
+ raise Exception("exception with 中 and \xd6\xd0")
+
+ def run():
+ self.sc.parallelize([1]).map(lambda x: f()).count()
+
+ t = ExecThread(target=run)
+ t.daemon = True
+ t.start()
+ t.join(10)
+ self.assertFalse(t.isAlive(), "Spark should not be blocked")
+ self.assertIsInstance(t.exception, Py4JJavaError)
+ if sys.version_info.major < 3:
+ # we have to use unicode here to avoid UnicodeDecodeError
+ self.assertRegexpMatches(unicode(t.exception).encode("utf-8"),
"exception with 中")
Review comment:
Yes, `str` against Py4j exception doesn't properly handle non-ascii codes
(https://github.com/bartdag/py4j/pull/308)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]