yaooqinn opened a new pull request, #48494:
URL: https://github.com/apache/spark/pull/48494
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section
is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class
hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other
DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
In 'HadoopMapReduceCommitProtocol', task output files are generated ahead
instead of calling
`org.apache.hadoop.mapreduce.lib.output.FileOutputFormat#getDefaultWorkFile`,
which uses the `mapreduce.output.basename` as the prefix of output files.
In this pull request, we modify the
`HadoopMapReduceCommitProtocol.getFilename` method to also look up this config
instead of using the hardcoded 'part'.
### Why are the changes needed?
Given a custom file name is a useful feature for users. They can use it to
distinguish files added by different engines, on different days, etc. We can
also align the usage scenario with other SQL on Hadoop engines.
### Does this PR introduce _any_ user-facing change?
Yes, a Hadoop configuration 'mapreduce.output.basename' can be used in file
datasource output files
### How was this patch tested?
new tests
### Was this patch authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
patch, please include the
phrase: 'Generated-by: ' followed by the name of the tool and its version.
If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
`-->no`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]