[GitHub] spark pull request: [SPARK-6307][Core] Speed up RDD.cartesian by c...

squito Mon, 25 May 2015 08:49:44 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5572#discussion_r30985762
  
    --- Diff: core/src/main/scala/org/apache/spark/CacheManager.scala ---
    @@ -48,7 +49,13 @@ private[spark] class CacheManager(blockManager: 
BlockManager) extends Logging {
               .getInputMetricsForReadMethod(blockResult.readMethod)
             existingMetrics.incBytesRead(blockResult.bytes)
     
    -        val iter = blockResult.data.asInstanceOf[Iterator[T]]
    +        val buf = blockResult.data.toArray
    --- End diff --
    
    yes, that's right.  Using this approach, and avoiding OOMs requires some 
more changes -- as you suggest, using `unrollSafely`, or perhaps making a 
variant of `blockManager.putIterator` to give an iterator back or something 
like that.  But if the block is only cached to disk anyway, then you end up 
jumping through some hoops for nothing, so it gets tricky.
    
    All of these complications really have me in favor of "idea 1", for its 
simplicity, which is why I was avoiding getting into some of the nitty gritty 
here.  But if we really go for "idea 2", there are a lot more details like this 
to figure out, I think.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6307][Core] Speed up RDD.cartesian by c...

Reply via email to