Github user davies commented on the pull request:
https://github.com/apache/spark/pull/10024#issuecomment-211480633
@lianhuiwang Thanks for working on this, I think it's in the good
direction. Two things left:
1) thread safety. For example, you will have two threads for PythonRDD
(same for RRDD), one iterate rows from parent RDD, another iterator rows from
PythonRDD/RRDD, the second one could trigger spilling, the spilling happen in
second thread, and the first thread could consuming the same iterator. So must
make them thread safe. This is the hardest part, you could take the SQL
operators as examples.
2) Adding more tests. As @squito suggested, more comments to explain the
high level ideas will be good to have.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]