Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6415#issuecomment-105679926
  
    Whoops, meant to post this comment earlier:
    
    I think that he root problem here is that KryoSerializerInstance can be 
used in a way which is unsafe even within a single thread, e.g. by creating 
multiple open OutputStreams from the same instance or by interleaving 
deserialize and deserializeStream calls.  I considered a smaller patch which 
adds assertions to guard against this type of "misuse" but abandoned that 
approach after I realized how convoluted the Scaladoc became.
    
    I just pushed a WIP commit which illustrates my proposed fix.  In a 
nutshell, I think that rather than using a single `Kryo` instance in 
KryoSerializerInstance, we should implement a `borrowKryo()` / `releaseKryo()` 
API that's backed by a "pool" of capacity 1.  In the common case, every call to 
a KryoSerializerInstnace method, do its work, then release the serializer 
instance back to the pool.  If the pool is empty and we need an instance, we'll 
allocate a new Kryo on-demand.  This makes it safe for multiple OutputStreams 
to be opened from the same serializer.  If we try to release a Kryo back to the 
pool but the pool already contains a Kryo, then we'll just discard the new 
Kryo.  I don't think there's a clear benefit to having a larger pool since our 
usages tend to fall into two cases, a) where we only create a single 
OutputStream and b) where we create a huge number of OutputStreams with the 
same lifecycle, then destroy the KryoSerializerInstance (this is what's 
happening in th
 e bypassMergeSort code path that my failing test hits).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to