Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6990#discussion_r33741094
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -833,8 +833,10 @@ private[spark] class BlockManager(
         logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
     
         // Either we're storing bytes and we asynchronously started 
replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    +    // values and need to serialize and replicate them now.
    +    // Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
    +    // putting it to local is failed.
    +    if (!putBlockInfo.isFailed && putLevel.replication > 1) {
    --- End diff --
    
    ah, ok I hadn't considered streaming before.  I'm not an expert on that 
part of the code, so maybe we should rope in somebody else more knowledgeable 
-- @tdas @harishreedharan , want to weigh in here?
    
    With that caveat, I think I see the issue now.  If I understand correctly, 
the problem is that if you can't store a block locally, the receiver thinks the 
block has not been stored anywhere -- even if it has been successfully 
replicated.  That is because the receiver [just looks at the result of 
`doPut`](https://github.com/apache/spark/blob/3a342dedc04799948bf6da69843bd1a91202ffe5/streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala#L91)
 to see if the block has been stored.  But, as you've noted, when we fail to 
store locally but successfully store on a remote, that result doesn't contain 
the block.
    
    So your proposal seems to be -- as long as the receiver is going to ignore 
the replicated block in any case, there isn't any sense in replicating.  But 
maybe a better alternative would be for the receiver to accept it being stored 
on a remote?
    
    This makes me wonder if the receiver should also be checking that the block 
has actually been replicated, and not only stored locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to