Hi, hive users, We need help from the hive community! We are now using very old hive version(0.13) due to historical reason, and we often meet following issue:
Caused by: java.io.IOException: File already exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXXXXXX We have investigated this issue for quite a long time, but didn't get a good fix, so I may want to ask the hive community for help to see if there are any solutions.The error is created during map/reduce stage, once an instance failed due to some unexpected reason(for example unstable spot instance got killed), then later retry will throw the above exception, instead of overwriting it. We have several guesses like the following: 1. Is it caused by orc file type? I have found similar issue like https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there, and our table is stored as orc style. 2. Is the problem solved in the higher hive version? because we are also running hive 2.3.6, but didn't meet such an issue, so want to see if version upgrade can solve the issue? 3.Do we have such a config that supports always cleaning up existing folders during retry of mapper/reducer stage. I have searched all mapreduce config but can not find one. I am really sorry for proposing this question, but I really need help from the community. Thanks a lot in advance!!!