zhengchenyu commented on PR #37346:
URL: https://github.com/apache/spark/pull/37346#issuecomment-3397018906

   @viirya @wForget @dongjoon-hyun 
   I found the same phenomenon. I have this audit log.
   
   ```
   # first rename op
   cmd=rename   
src=/user/testuser/testdb.db/test_table/_temporary/0/task_xxx/pt=20250908000000
   dst=/user/testuser/testdb.db/test_table/pt=20250908000000    
   # second delete op
   cmd=delete   src=/user/testuser/testdb.db/test_table/_temporary
   ```
   
   For any partition application, will delete 
`/user/testuser/testdb.db/test_table/_temporary`. When multiple application for 
different partitions are running at the same time, data loss may occur. we can 
just solve this problem to replace 
`/user/testuser/testdb.db/test_table/_temporary` with a unique directory.
   
   How about reopen this PR? And I think the use case is not suspicious. For 
example, if I want to recalculate the partition data for the last month, I will 
run multiple application in parallel.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to