Github user dibbhatt commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6990#discussion_r33746918
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -833,8 +833,10 @@ private[spark] class BlockManager(
         logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
     
         // Either we're storing bytes and we asynchronously started 
replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    +    // values and need to serialize and replicate them now.
    +    // Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
    +    // putting it to local is failed.
    +    if (!putBlockInfo.isFailed && putLevel.replication > 1) {
    --- End diff --
    
    Hi @tdas it was indeed race condition. I did not see your reply before 
posting mine :)
    
    I understand your point completely that options 3 is better , and it is 
also true that MEMORY_AND_DISK cases wont be having this block dropping issue . 
But if someone use MEMORY_ONLY_2 settings , then there will be a possible 
problem.
    
    Nevertheless , my point here was little different . There are two possible 
cases being discussed here.. . 
    
    Case 1 : Even in present state , for MEMORY_ONLY_2 Storagelevel, if Block 
failed to stored locally , but replicated to remote peer , the 
ReceivedBlockHandler wont be getting the block id and will throw Exception . 
Now it is upto the Receiver implementation to ignore this exception or retry 
depending on how resilient the Receiver is towards data loss. My point is , if 
Receiver anyway not getting the Block Id if BlockManager failed to store 
locally, why even then try to Replicate ? It unnecessary consume additional 
memory in remote peer. And I guess it can happen to any client which calls 
BlockManager doPut to store the block and get updatedBlocks object which is 
Seq[(BlockId, BlockStatus)] to check if block id exists . As this updatedBlocks 
 does not contains the details if replication is successful or not , client 
wont find the blockId if store to local is failed. 
    
    Case 2 : What @squito is telling is also correct. Even if block failed to 
store locally but replicated to remote peer, why cant ReceivedBlockHandler get 
the block id from replicated node before throwing Exception. But that need 
change in ReceivedBlockHanlder to get block details from remote peers. But in 
this case the desired replication level will be reduced to 1 instead 2 .
    
    If we try option 3 , even in that case , I guess ReceivedBlockHandler needs 
changes to get the block id from remote peer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to