Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/21346
  
    It's been a little while since I've thought about this issue, so I have a 
few clarifying questions to help me understand the high-level changes:
    
    1. I recall that the problem with large shuffle blocks was that the 
OneForOneBlockFetcher strategy basically read the entire block as a single 
chunk, which becomes a problem for large blocks. I understand that we have now 
removed this limitation for shuffles by using a streaming transfer strategy 
only for large blocks (above some threshold). Is this patch conceptually doing 
the same thing for push-based communication where the action is initiated by a 
sender (e.g. to push a block for replication)? Does it also affect pull-based 
remote cache block reads or will that be handled separately?
    2. Given that we already seem to have pull-based `openStream()` calls which 
can be initiated from the receive side, could we simplify things here by 
pushing a "this value is big, pull it" message and then have the remote end 
initiate a streaming read, similar to how DirectTaskResult and 
IndirectTaskResult work?
    3. For remote reads of large cached blocks: is it true that this works 
today _only if_ the block is on disk but fails if the block is in memory? If 
certain size limit problems only occur when things are cached in memory, can we 
simplify anything if we add a requirement that blocks above 2GB can _only_ be 
cached on disk (regardless of storage level)?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to