Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/6990#issuecomment-120515683
  
    @dibbhatt Thanks for submitting the fix. I spoke offline with @tdas about 
this issue and came to the following conclusions.
    
    **Problem summary.** The issue seems to be a disconnect between what the 
`BlockManagerMaster` knows and what the `ReceiverTracker` knows. If a block 
fails to unroll, the receiver tracker will never know about the block and will 
not include it in a future computation. In the mean time, however, the block 
may be replicated and take up space on other executors even though it will 
never be used.
    
    **Implications for Spark core.** For Spark core, however, it is reasonable 
to replicate a block even if it fails to unroll. Just because there is not 
enough memory to cache this block on this executor doesn't mean the same is 
true on a different executor. This is all best effort, but a future computation 
of the RDD will still benefit from having the cached block somewhere. (Note: 
the existing code doesn't actually do this yet because `CacheManager` has its 
own unrolling logic. But replication even when unrolling fails is something we 
want in the future for normal RDD caching.)
    
    **Alternative fix.** The right fix in Spark streaming would be to have the 
`ReceiverTracker` just read its blocks from the `BlockManagerMaster`. This 
simplifies the two divergent block reporting code paths. Since the 
`BlockManagerMaster` is notified of replicated blocks, the replication here 
will also help mitigate data loss in the case of `MEMORY_ONLY_*`.
    
    **TL;DR.** This patch removes a small feature from block manager that, 
though not currently used, is desirable in the future. However, the underlying 
issue is not caused by a bug in the block manager, but an incorrect assumption 
in the `ReceiverTracker` that doesn't take into account replication. The 
correct way forward would be to fix this in Spark streaming by refactoring the 
`ReceiverTracker` to depend on `BlockManagerMaster`.
    
    In short, we should consider an alternative fix that is longer-term and 
more correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to