zhengchenyu commented on PR #37346: URL: https://github.com/apache/spark/pull/37346#issuecomment-3397018906
@viirya @wForget @dongjoon-hyun I found the same phenomenon. I have this audit log. ``` # first rename op cmd=rename src=/user/testuser/testdb.db/test_table/_temporary/0/task_xxx/pt=20250908000000 dst=/user/testuser/testdb.db/test_table/pt=20250908000000 # second delete op cmd=delete src=/user/testuser/testdb.db/test_table/_temporary ``` For any partition application, will delete `/user/testuser/testdb.db/test_table/_temporary`. When multiple application for different partitions are running at the same time, data loss may occur. we can just solve this problem to replace `/user/testuser/testdb.db/test_table/_temporary` with a unique directory. How about reopen this PR? And I think the use case is not suspicious. For example, if I want to recalculate the partition data for the last month, I will run multiple application in parallel. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
