[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...

squito Thu, 31 May 2018 12:05:44 -0700

Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21456
  
    Re: normalization -- if I understand correctly, its not that you know that 
the normalization definitely *does* change the strings for the heap dump you 
have.  Its just to make sure that your change is effective even if 
normalization were to change things.  In practice, I don't think spark's usage 
should lead to any de-normalized paths, but I think its a good precaution.
    
    Re: so many objects.  I don't think its that surprising actually.  Imagine 
a shuffle on a large cluster writing to 10k partitions.  The shuffle-read side 
is going to make a lot of simultaneous requests to the same shuffle-write side 
task -- all that data lives in the same file, just at different offsets.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21456: [SPARK-24356] [CORE] Duplicate strings in File.path mana...

Reply via email to