Github user e-dorigatti commented on a diff in the pull request:
https://github.com/apache/spark/pull/21383#discussion_r191113977
--- Diff: python/pyspark/util.py ---
@@ -89,6 +93,33 @@ def majorMinorVersion(sparkVersion):
" version numbers.")
+def fail_on_stopiteration(f):
+ """
+ Wraps the input function to fail on 'StopIteration' by raising a
'RuntimeError'
+ prevents silent loss of data when 'f' is used in a for loop
+ """
+ def wrapper(*args, **kwargs):
+ try:
+ return f(*args, **kwargs)
+ except StopIteration as exc:
+ raise RuntimeError(
+ "Caught StopIteration thrown from user's code; failing the
task",
+ exc
+ )
+
+ # prevent inspect to fail
+ # e.g. inspect.getargspec(sum) raises
+ # TypeError: <built-in function sum> is not a Python function
+ try:
+ argspec = _get_argspec(f)
--- End diff --
You said to do it in `udf.UserDefinedFunction._create_judf`, but sent the
code of `udf._create_udf`. I assume you meant the former, since we cannot do
that in `_create_udf` (`UserDefinedFunction._wrapped` needs the original
function for its documentation and other stuff). I will also simplify the code
as you suggested, yes
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]