[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

kalpit Sat, 26 Apr 2014 20:35:21 -0700

Github user kalpit commented on the pull request:

    https://github.com/apache/spark/pull/554#issuecomment-41487520
  
    I suspect that the NPEs will happen for any PySpark User who has an RDD 
that returns null for some input "x" based on the lambda/transform. Check out 
the test case I added to "PythonRDDSuite.scala" to reproduce the NPE.
    
    I considered the idea of using negative length (-4) to pass "None" to 
python (PythonRDD.SpecialLengths -1 to -3 are taken). The tricky part however 
is that the read() method returns an array of bytes based on the length. 
Existing code treats empty array as end of data/stream. So I am not sure how we 
would communicate "None" to python. Thoughts ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1630: Make PythonRDD handle NULL element...

Reply via email to