Github user kalpit commented on the pull request:
https://github.com/apache/spark/pull/554#issuecomment-41595891
I see your point. I don't have a Python-only use-case that can trigger the
NPE.
My custom RDD implementation had a corner-case in which RDD's compute()
method returned a "null" in the iterator stream. I have fixed my custom RDD
implementation to not do that, so I don't run into this NPE anymore. However,
should anyone else out there ever implement a custom RDD of similar nature (has
nulls for some elements in a partition's iterator stream) and tries accessing
such an RDD from PySpark, he/she would run into the NPE, so I thought it would
be nicer if we handled nulls in the stream gracefully.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---