Github user zero323 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16537#discussion_r98964355
--- Diff: python/pyspark/sql/tests.py ---
@@ -429,6 +429,11 @@ def test_udf_with_input_file_name(self):
row =
self.spark.read.json(filePath).select(sourceFile(input_file_name())).first()
self.assertTrue(row[0].find("people1.json") != -1)
+ def test_udf_should_validate_input_args(self):
+ from pyspark.sql.functions import udf
+
+ self.assertRaises(TypeError, udf(lambda x: x), None)
--- End diff --
Yeah. I am afraid it can actually cause more troubles than its worth:
- If we throw an exception there is a chance we hit some border cases.
- Issuing a warning doesn't prevent task failure so it doesn't provide the
same advantages as failing early.
Maybe it is better to leave it as is. Right now users get a clear feedback,
if there is an incorrect type, and for additional safety one can always use
annotations and type checker.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]