sadikovi commented on PR #37293:
URL: https://github.com/apache/spark/pull/37293#issuecomment-1210088022

   It is fantastic to have performance improvements like this one 👍 but it 
would also be good to avoid future regressions if possible. People use 
different parquet-mr versions with Spark so it would be good to either have a 
constraint/error so they know what went wrong or have some kind of general way 
of handling things.
   
   I don't think it has anything to do with the current codebase, not the Spark 
one anyway. It would be up to committers to revert but IMHO, it is too drastic 
- we can just update the code to make it future-proof, plus the change is in 
master for now.
   
   For example, we could add another code path for it that is only enabled when 
certain conditions are met. Also, we could add a feature flag that allows to 
fall back to the previous version. We can also rewrite the page deserialisation 
logic instead of relying on parquet-mr machinery, this should make reads even 
faster.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to