Github user kalpit commented on the pull request:
https://github.com/apache/spark/pull/554#issuecomment-41487520
I suspect that the NPEs will happen for any PySpark User who has an RDD
that returns null for some input "x" based on the lambda/transform. Check out
the test case I added to "PythonRDDSuite.scala" to reproduce the NPE.
I considered the idea of using negative length (-4) to pass "None" to
python (PythonRDD.SpecialLengths -1 to -3 are taken). The tricky part however
is that the read() method returns an array of bytes based on the length.
Existing code treats empty array as end of data/stream. So I am not sure how we
would communicate "None" to python. Thoughts ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---