[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

davies Mon, 18 Apr 2016 10:13:35 -0700

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/10024#issuecomment-211480633
  
    @lianhuiwang Thanks for working on this, I think it's in the good 
direction. Two things left:
    
    1) thread safety. For example, you will have two threads for PythonRDD 
(same for RRDD), one iterate rows from parent RDD, another iterator rows from 
PythonRDD/RRDD, the second one could trigger spilling, the spilling happen in 
second thread, and the first thread could consuming the same iterator. So must 
make them thread safe. This is the hardest part, you could take the SQL 
operators as examples.
    
    2) Adding more tests. As @squito suggested, more comments to explain the 
high level ideas will be good to have.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4452][Core]Shuffle data structures can ...

Reply via email to