sadikovi commented on PR #37293: URL: https://github.com/apache/spark/pull/37293#issuecomment-1210088022
It is fantastic to have performance improvements like this one 👍 but it would also be good to avoid future regressions if possible. People use different parquet-mr versions with Spark so it would be good to either have a constraint/error so they know what went wrong or have some kind of general way of handling things. I don't think it has anything to do with the current codebase, not the Spark one anyway. It would be up to committers to revert but IMHO, it is too drastic - we can just update the code to make it future-proof, plus the change is in master for now. For example, we could add another code path for it that is only enabled when certain conditions are met. Also, we could add a feature flag that allows to fall back to the previous version. We can also rewrite the page deserialisation logic instead of relying on parquet-mr machinery, this should make reads even faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
