Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/5445#issuecomment-91945517
  
    To recap my understanding of this patch:
    
    - The `_cached_cls` dictionary maps from either DataType object ids or 
DataType objects to the generated Row classes for those data types.
    - Using an object id as a dictionary key will be safe as long as that that 
id refers to the same object for the lifetime of the dictionary.  As long as 
DataType instance isn't garbage-collected, its object id will not be re-used by 
a different DataType object.
    - The problem here seems to be that we weren't guaranteed to retain a 
strong reference to the DataType instance.  Although the DataType itself was 
used as a dictionary key for the `_cached_cls` dictionary, that dictionary is a 
[WeakValueDictionary](https://docs.python.org/2/library/weakref.html#weakref.WeakValueDictionary),
 so its reference to the DataType key would be removed if that type's Row class 
was garbage collected.
    - The solution implemented in this patch addresses this issue via two 
mechanisms:
      - When storing a Row class in the `_cached_cls` map, store a reference in 
the Row class that points to the DataType.  This avoids the problem that we 
have now where the Row class can remain in the map even though its DataType has 
been garbage-collected.
      - Add a check that tests whether the Row class returned from the 
`_cached_cls` map has the expected DataType.  As far as I understand it, this 
acts more as an assertion / sanity check / error-handler, so we expect this 
check to succeed most (all?) of the time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to