Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6990#discussion_r33749523
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -833,8 +833,10 @@ private[spark] class BlockManager(
         logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
     
         // Either we're storing bytes and we asynchronously started 
replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    +    // values and need to serialize and replicate them now.
    +    // Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
    +    // putting it to local is failed.
    +    if (!putBlockInfo.isFailed && putLevel.replication > 1) {
    --- End diff --
    
    Since those blocks are never used and eventually gets evicted, the only
    problem is slightly higher memory usage for some time. So I dont really see
    a critical problem that needs to be solved at the cost of the de-optimizing
    existing code paths.
    
    On Wed, Jul 1, 2015 at 11:01 PM, Dibyendu Bhattacharya <
    [email protected]> wrote:
    
    > In core/src/main/scala/org/apache/spark/storage/BlockManager.scala
    > <https://github.com/apache/spark/pull/6990#discussion_r33749161>:
    >
    > > @@ -833,8 +833,10 @@ private[spark] class BlockManager(
    > >      logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
    > >
    > >      // Either we're storing bytes and we asynchronously started 
replication, or we're storing
    > > -    // values and need to serialize and replicate them now:
    > > -    if (putLevel.replication > 1) {
    > > +    // values and need to serialize and replicate them now.
    > > +    // Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
    > > +    // putting it to local is failed.
    > > +    if (!putBlockInfo.isFailed && putLevel.replication > 1) {
    >
    > The problem here is , if local memory got filled up and block store
    > failed, blocks still get replicated to remote and used up memory but same
    > blocks never used in Streaming jobs... Even though those blocks will
    > eventually evicted , but this fix will optimize the memory. I understand
    > your concern about RDD partition which can still use the remote replica 
for
    > speedup even local store failed.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/6990/files#r33749161>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to