[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16278416#comment-16278416
 ] 

Steve Loughran commented on HADOOP-15086:
-

bq. Looks like we can use conditional-headers to implement an atomic file 
rename on Azure blob storage

re-opening then, assigning to you. Tests expected, and for this kind of stuff, 
anything you can do to prove correctness.

What would be the plan?
-enum headers
-use them in the conditional rename?

What about: 
* something else adding something to the dest path during the rename process?
* something deleting the parent dir & replacing it with a file?
* something deleting files added to the dest path during the rename process?
* failure of the rename process partway through?

I think you really need to consider that you can't guarantee atomicity here, 
though you may stand a chance of improving what you get today. Before do that, 
it'd be worth getting the opinions of the Wasb team, in the form of 
[~tmarquardt].

bq.  I think it's not necessary to introduce object-store specific committers 
when a storage already supports atomic operations.

depends on whether the store consistently does rename as O(1); if its O(data) 
you will find what works in demo & test underperformance in production. Azure 
does do its renames faster than S3...we should really do some (scaleable) 
micro-benchmark integration test for both which creates a file, does the rename 
& prints its BW. I think this mostly done on both already, but not really 
quantified across stores.

The good news: the Hadoop FileOutputCommitters don't have atomic task commit 
(V1 algorithm) or job commit (V1 algorithm), instead hoping that the 
probability of a failure on task commit is low (which it is when rename is fast 
enough, and expecting the job itself to detect and react to the failure/timeout 
of task commit , and job commit (v1 can recover with all committed tasks 
retained, v2 fails the job & so delivers at-most-once). So, if you can make the 
atomicity semantics of the commit algorithms closer to that of HDFS, those 
algorithms may work better. However, I fear that you can't do enough for V1, 
not if you intend to support job restart (Spark doesn't do this, I know, but MR 
does. The v1 algorithm it lists task attempt ID path under 
`$dest/_temporary/$appAttemptId/` and assumes that if the path is there, that 
taskId has been completed. Therefore, if 
`$dest/_temporary/$appAttemptId/$taskAttemptId` path is visible before the 
rename completes, rename() doesn't meet the expectations of 
`FileOutputCommitter`...Better to write a manifest file to 
`$dest/_temporary/$appAttemptId/` for every task attempt & rely on PUT being 
atomic on that write, at which point yes, you are in the world of custom 
committers

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-04 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277260#comment-16277260
 ] 

Shixiong Zhu commented on HADOOP-15086:
---

In addition, I probably was not clear. I created this ticket is just for atomic 
file rename.

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-04 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277256#comment-16277256
 ] 

Shixiong Zhu commented on HADOOP-15086:
---

Looks like we can use conditional-headers to implement an atomic file rename on 
Azure blob storage  
https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations
 ? I think it's not necessary to introduce object-store specific committers 
when a storage already supports atomic operations.

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16276796#comment-16276796
 ] 

Steve Loughran commented on HADOOP-15086:
-

I don't disagree with you about the existence of the problem, just don't think 
it's easily fixed. Essentially: blobstores tend not to have a rename() (or 
indeed: create(overwrite=false), delete(directory), and the things we do to 
mimic this in our connectors aren't atomic

1. We cover this in [Object Stores|https://hado 
op.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/introduction.html#Object_Stores_vs._Filesystems]
2. This is also common to: S3x, Swift, OSS, ADL, ...
3. By inference, the Hadoop FileOutputCommit protocol is not atomic on object 
stores either. 
4. Compare with the requirements of rename() as covered in 
[rename()|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
 

There is actually special support in Azure for atomic rename of HBase 
directories; this is done with leasing, recovery and stuff. It manages 
exclusivity, but it is still not an O(1) operation.

If you look at where we are going with this, the work is in moving to 
object-store specific committers which provide the commit semantics without 
relying on renames. HADOOP-13786 is the initial implementation of this for S3A, 
but the hooks put into FileOutputFormat are designed to support 
filesystem-specific committers for any store which implements one. 

I'm closing as a WONTFIX. Sorry. It's not that we don't want to, it's just 
directory operations are where the metaphor "object stores are like 
filesystems" fail if you look closely enough.

(On a brighter note: wasb is consistent of both metadata and data)

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275422#comment-16275422
 ] 

Cheng Lian commented on HADOOP-15086:
-

To be more specific, when multiple threads rename files to the same target 
path, more than 1 *but not all* threads can succeed. It's because check and 
copy file in {{NativeAzureFileSystem#rename()}} is not atomic.

The problem here is that it's unclear what the expected semantics of 
{{NativeAzureFileSystem#rename()}} is:

- If the semantics is "error if the destination file already exists", then only 
1 thread can succeed.
- If the semantics is "overwrite if the destination file already exists", then 
all threads should succeed.

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org