[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

squito Wed, 30 May 2018 07:40:52 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21346
  
    yeah I see what you're saying about better error handling, but I'd really 
rather not take that on here.  I think some prior attempts at solving the 2gb 
limit  have tried to take on too much, and I'd like to keep this is simple as 
possible, and leave more for future improvements.  I guess it means that when 
(if) we do make the changes you're proposing, we'd have to go back to changing 
the network layer again, possibly introducing new message types etc.  But we're 
not really painting ourselves in a corner at all, we can do that if it becomes 
necessary.
    
    fwiw, there are other things that are higher on my list to fix when the 
basic functionality goes in:
    1) when you do a remote read of a cached data, even if you fetch to disk, 
you memory map the entire file, rather than just using a FileInputStream
    2) if you replicate a disk-cached block, it'll get written to disk to a 
temp file, then read back from that file into memory, and then written to the 
new location.
    3) when you a do remote read of cached data, you shouldn't actually have to 
wait till you fetch all the data, you should just be able to treat it as an 
inputstream



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21346: [SPARK-6237][NETWORK] Network-layer changes to allow str...

Reply via email to