[GitHub] [spark] peter-toth commented on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

GitBox Tue, 02 Mar 2021 06:31:40 -0800


peter-toth commented on pull request #31682:
URL: https://github.com/apache/spark/pull/31682#issuecomment-788950105



   > correct me if I'm wrong: pickler recursively serializes the input and 
applies the cache. The input is a row of `(c1, c2)`, but pickler recursively 
serializes the row of `c1` and `c2`, and causes a problem because of the cache.
   
   You are right that caching has an important role in this issue. But IMO 
cache lookup by references can't cause issues if we use immutable objects. The 
issue here is that pytolite 4.21 introduced cache lookup by value and some of 
our data structures (`GenericRowWithSchema`) behaves weird when comparing them 
with `.equals()`...
   
   > Then I think it's not realistic to make one pickler instance to handle 
data with the same schema. Turning off `valueCompare` may be the only choice.
   
   Agreed. I've already modified this PR to revert previous changes and add 
`valueCompare=false`.
    
   > To evaluate the severity of the problem, it seems only an issue when there 
are nested struct types?
   
   Yes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] peter-toth commented on pull request #31682: [WIP][SPARK-34545][SQL] Fix issues with valueCompare feature of pyrolite

Reply via email to