[GitHub] spark pull request: [SPARK-6307][Core] Speed up RDD.cartesian by c...

viirya Mon, 25 May 2015 03:24:08 -0700

Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/5572#issuecomment-105199065
  
    @squito I supposed that because we don't do any special to them, the newly 
cached blocks are treated as same as other cached blocks? I think just only 
applying idea 1 will not have significant performance improvement as caching 
remote blocks.
    
    I agree that this is a very narrow case. However, without this kind of 
modification, the original approach is very inefficient. If it is just 2X or 3X 
improvement, we can skip it. From the observed improvement, it is still worth 
adding this. And this PR has limited the modification, only `CartesianRDD` uses 
the newly added `iterator`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6307][Core] Speed up RDD.cartesian by c...

Reply via email to