GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/1551
[SPARK-1630] Turn Null of Java/Scala into None of Python
During serializing PythonRDD, it will cause an NPE if there null in it.
This patch will handle it as None of Python.
This PR is based on #554, thanks to @kalpit.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark null
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1551.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1551
----
commit ff036d31c6adbc2cd5f2c9347c267073b673167b
Author: Kalpit Shah <[email protected]>
Date: 2014-04-25T17:44:30Z
SPARK-1630: Make PythonRDD handle Null elements and strings gracefully
commit 8a4a0f94d34b76b44b590ca741b438393b803106
Author: Kalpit Shah <[email protected]>
Date: 2014-04-25T18:24:41Z
SPARK-1630: Incorporated code-review feedback
commit dddda9e2858d518c916b60972a2ba0a025b38855
Author: Kalpit Shah <[email protected]>
Date: 2014-04-25T18:59:50Z
SPARK-1630: Fixed indentation
commit 55a077a900244bb0d286927bfa2b51a049cea94b
Author: Davies Liu <[email protected]>
Date: 2014-07-23T19:17:50Z
Merge branch 'pyspark/handleNullData' of github.com:kalpit/spark into null
commit 3af8b4d4e7152bf5857201febd6e223aab4587fe
Author: Davies Liu <[email protected]>
Date: 2014-07-23T19:47:33Z
turn Null of Java into None of Python
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---