[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278416#comment-16278416 ] Steve Loughran commented on HADOOP-15086: - bq. Looks like we can use conditional-headers to implement an atomic file rename on Azure blob storage re-opening then, assigning to you. Tests expected, and for this kind of stuff, anything you can do to prove correctness. What would be the plan? -enum headers -use them in the conditional rename? What about: * something else adding something to the dest path during the rename process? * something deleting the parent dir & replacing it with a file? * something deleting files added to the dest path during the rename process? * failure of the rename process partway through? I think you really need to consider that you can't guarantee atomicity here, though you may stand a chance of improving what you get today. Before do that, it'd be worth getting the opinions of the Wasb team, in the form of [~tmarquardt]. bq. I think it's not necessary to introduce object-store specific committers when a storage already supports atomic operations. depends on whether the store consistently does rename as O(1); if its O(data) you will find what works in demo & test underperformance in production. Azure does do its renames faster than S3...we should really do some (scaleable) micro-benchmark integration test for both which creates a file, does the rename & prints its BW. I think this mostly done on both already, but not really quantified across stores. The good news: the Hadoop FileOutputCommitters don't have atomic task commit (V1 algorithm) or job commit (V1 algorithm), instead hoping that the probability of a failure on task commit is low (which it is when rename is fast enough, and expecting the job itself to detect and react to the failure/timeout of task commit , and job commit (v1 can recover with all committed tasks retained, v2 fails the job & so delivers at-most-once). So, if you can make the atomicity semantics of the commit algorithms closer to that of HDFS, those algorithms may work better. However, I fear that you can't do enough for V1, not if you intend to support job restart (Spark doesn't do this, I know, but MR does. The v1 algorithm it lists task attempt ID path under `$dest/_temporary/$appAttemptId/` and assumes that if the path is there, that taskId has been completed. Therefore, if `$dest/_temporary/$appAttemptId/$taskAttemptId` path is visible before the rename completes, rename() doesn't meet the expectations of `FileOutputCommitter`...Better to write a manifest file to `$dest/_temporary/$appAttemptId/` for every task attempt & rely on PUT being atomic on that write, at which point yes, you are in the world of custom committers > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277260#comment-16277260 ] Shixiong Zhu commented on HADOOP-15086: --- In addition, I probably was not clear. I created this ticket is just for atomic file rename. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277256#comment-16277256 ] Shixiong Zhu commented on HADOOP-15086: --- Looks like we can use conditional-headers to implement an atomic file rename on Azure blob storage https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations ? I think it's not necessary to introduce object-store specific committers when a storage already supports atomic operations. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276796#comment-16276796 ] Steve Loughran commented on HADOOP-15086: - I don't disagree with you about the existence of the problem, just don't think it's easily fixed. Essentially: blobstores tend not to have a rename() (or indeed: create(overwrite=false), delete(directory), and the things we do to mimic this in our connectors aren't atomic 1. We cover this in [Object Stores|https://hado op.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Object_Stores_vs._Filesystems] 2. This is also common to: S3x, Swift, OSS, ADL, ... 3. By inference, the Hadoop FileOutputCommit protocol is not atomic on object stores either. 4. Compare with the requirements of rename() as covered in [rename()|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d] There is actually special support in Azure for atomic rename of HBase directories; this is done with leasing, recovery and stuff. It manages exclusivity, but it is still not an O(1) operation. If you look at where we are going with this, the work is in moving to object-store specific committers which provide the commit semantics without relying on renames. HADOOP-13786 is the initial implementation of this for S3A, but the hooks put into FileOutputFormat are designed to support filesystem-specific committers for any store which implements one. I'm closing as a WONTFIX. Sorry. It's not that we don't want to, it's just directory operations are where the metaphor "object stores are like filesystems" fail if you look closely enough. (On a brighter note: wasb is consistent of both metadata and data) > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275422#comment-16275422 ] Cheng Lian commented on HADOOP-15086: - To be more specific, when multiple threads rename files to the same target path, more than 1 *but not all* threads can succeed. It's because check and copy file in {{NativeAzureFileSystem#rename()}} is not atomic. The problem here is that it's unclear what the expected semantics of {{NativeAzureFileSystem#rename()}} is: - If the semantics is "error if the destination file already exists", then only 1 thread can succeed. - If the semantics is "overwrite if the destination file already exists", then all threads should succeed. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org