Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/18975
@gatorsmile: If the query fails in the middle (eg. tasks are OOMing), hive
would have written data to the staging location and not the final output
location. So users wont see this partial data.
Over here, we are directly telling tasks to write to the final output
location. So if there are few tasks completed, their output is in the final
output location. If the remaining tasks hit issues which lead to job failure,
then users are left with partial output.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]