Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21346
  
    So, one thing that I was thinking about is whether it would be worth it to 
make error handling a little better here. I think this is no worse than the 
current status quo, and looking at the related PR I'm not sure how much better 
this would make things, but...
    
    The current implementation sends a  "header" message + the streamed payload 
as a single RPC, so there's a single opportunity for the receiver to return an 
error. That means that if, for example, the receiver does not have enough space 
to store a block that is being uploaded, it can return an error, but the sender 
will still try to send all the block data to the receiver (which will just 
ignore it).
    
    I'm wondering if it would be worth to try to implement this as a couple of 
"chained RPCs", one that sends the metadata and a second one that streams the 
data. That way the receiver can error out on the first RPC and the sender can 
just throw away the second RPC, instead of having to transfer everything.
    
    It might create the "some state needs to be stored somewhere" problem on 
the receiver side, though. Haven't really thought that far yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to