Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4450#issuecomment-95237844
  
    Hey @sryza - two higher level questions as I'm doing a deeper review of 
this.
    
    1. This seems predicated on the idea that serialization streams can safely 
be sliced with respect to individual objects. In general that's a bit of a 
gamble, given that both of our serializers only officially support use through 
a stream interface. I was actually surprised this works given that Kryo does 
object reference tracking, however, I looked and it turns out that Kryo will 
reset its internal tracking maps after every outer call to writeObject(): 
https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/src/com/esotericsoftware/kryo/Kryo.java#L488.
 Is the idea that we just assume Kryo will continue to have this property, or 
at least some configuration where this will work?
    2. The second broad question is whether the chained buffer is necessary, 
since it constitutes a lot of the complexity in this patch. Is the chain buffer 
just an additional optimization or is it in some way required for using 
serailized outputs? The way it is now, you are going to allocate a large number 
of 8KB buffers, at random memory locations.... overall the fragmentation this 
creates could make GC slower for the JVM. The "win" is that you don't need to 
use a buffer that does copying to double growth, but given that the copying is 
amortized over a relatively small number of doublings, I'm not sure I see why 
that's a big deal. When GC tries to move your little buffers around, it will 
likely have to copy them anyways. I guess I'd like to better understand the 
evidence that this chaining is clearly better than the simpler alternative of 
using a managed, growing buffer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to