steveloughran commented on PR #45740: URL: https://github.com/apache/spark/pull/45740#issuecomment-2059899891
I have no problems with the PR; we have made it the default in our releases. This could be a good time to revisit "why there's some separate PathOutputCommitter" stuff; originally it was because spark built against releases without the new PathOutputCommitter interface. This no longer holds: could anything needed from it be pulled up into the main committer? One recurrent troublespot we have with committing work is parquet; it requires all committers to be a subclass of ParquetOutputCommitter, hence the (ugly, brittle) wrapping stuff. Life will be a lot easier if parquet didn't mind if it was any PathOutputCommitter -it would just skip the schema writing. Of course, we then come up against the fact that parquet still wants to build against hadoop 2.8. Everyone needs to move on, especially as hadoop java11+ support is 3.2.x+ only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
