[GitHub] spark pull request: [SPARK-8591][CORE]Block failed to unroll to me...

tdas Wed, 01 Jul 2015 18:52:07 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6990#discussion_r33741857
  
    --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
    @@ -833,8 +833,10 @@ private[spark] class BlockManager(
         logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
     
         // Either we're storing bytes and we asynchronously started 
replication, or we're storing
    -    // values and need to serialize and replicate them now:
    -    if (putLevel.replication > 1) {
    +    // values and need to serialize and replicate them now.
    +    // Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
    +    // putting it to local is failed.
    +    if (!putBlockInfo.isFailed && putLevel.replication > 1) {
    --- End diff --
    
    i can see that its beneficial to throw errors if replicating to two was not 
possible, so that the receiver can retry. However, even if the receiver 
retries, there is no good way for the receiver to ensure that the block has 
been replicated to the desired level even after two tries. Since there is not 
feedback mechanism to check for success after retries, doing something that 
increases "likelihood" is not very useful. Doing something like that and 
relying on is bad design. 
    
    That's why I am more inclined towards option 3, that is, if local fails, it 
tries to replicate it two machines. I agree that it is inconsistent with 
MEMORY_ONLY, but its still better change than the above which does not provided 
anything significantly more. And I think its okay to break consistency because 
of the benefit we are getting especially in a scenario where the behavior is 
related to something as critical as fault-tolerance behavior.
    
    Furthermore, stepping back, for receivers, why are even using MEMORY_ONLY 
and not MEMORY_AND_DISK (with or w/o replication)? Do you get any benefit by 
using the former over latter?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8591][CORE]Block failed to unroll to me...

Reply via email to