[jira] [Commented] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file
[ https://issues.apache.org/jira/browse/HADOOP-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664106#comment-16664106 ] Shixiong Zhu commented on HADOOP-15875: --- Sorry. I misread your comment. > S3AInputStream.seek should throw EOFException if seeking past the end of file > - > > Key: HADOOP-15875 > URL: https://issues.apache.org/jira/browse/HADOOP-15875 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.2.0 >Reporter: Shixiong Zhu >Priority: Minor > > I read the javadoc of `Seekable.seek` but it doesn't say what should be done > when seeking past the end of file. Right now, DFSInputStream throws new > EOFException, but S3AInputStream doesn't throw any error. > I think it's better to have consistent behavior in `seek.` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file
[ https://issues.apache.org/jira/browse/HADOOP-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662802#comment-16662802 ] Shixiong Zhu commented on HADOOP-15875: --- [~ste...@apache.org] S3AInputStream can check this easily since it has the file length. https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L123 has the same check as well. > S3AInputStream.seek should throw EOFException if seeking past the end of file > - > > Key: HADOOP-15875 > URL: https://issues.apache.org/jira/browse/HADOOP-15875 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.2.0 >Reporter: Shixiong Zhu >Priority: Minor > > I read the javadoc of `Seekable.seek` but it doesn't say what should be done > when seeking past the end of file. Right now, DFSInputStream throws new > EOFException, but S3AInputStream doesn't throw any error. > I think it's better to have consistent behavior in `seek.` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file
Shixiong Zhu created HADOOP-15875: - Summary: S3AInputStream.seek should throw EOFException if seeking past the end of file Key: HADOOP-15875 URL: https://issues.apache.org/jira/browse/HADOOP-15875 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Shixiong Zhu I read the javadoc of `Seekable.seek` but it doesn't say what should be done when seeking past the end of file. Right now, DFSInputStream throws new EOFException, but S3AInputStream doesn't throw any error. I think it's better to have consistent behavior in `seek.` -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15871) Some input streams does not obey "java.io.InputStream.available" contract
Shixiong Zhu created HADOOP-15871: - Summary: Some input streams does not obey "java.io.InputStream.available" contract Key: HADOOP-15871 URL: https://issues.apache.org/jira/browse/HADOOP-15871 Project: Hadoop Common Issue Type: Bug Components: fs, fs/s3 Reporter: Shixiong Zhu E.g, DFSInputStream and S3AInputStream return the size of the remaining available bytes, but the javadoc of "available" says it should "Returns an estimate of the number of bytes that can be read (or skipped over) from this input stream *without blocking* by the next invocation of a method for this input stream." I understand that some applications may rely on the current behavior. It would be great that there is an interface to document how "available" should be implemented. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15870) S3AInputStream.remainingInFile should use nextReadPos
Shixiong Zhu created HADOOP-15870: - Summary: S3AInputStream.remainingInFile should use nextReadPos Key: HADOOP-15870 URL: https://issues.apache.org/jira/browse/HADOOP-15870 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.1.1 Reporter: Shixiong Zhu Otherwise `remainingInFile` will not change after `seek`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277260#comment-16277260 ] Shixiong Zhu commented on HADOOP-15086: --- In addition, I probably was not clear. I created this ticket is just for atomic file rename. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277256#comment-16277256 ] Shixiong Zhu commented on HADOOP-15086: --- Looks like we can use conditional-headers to implement an atomic file rename on Azure blob storage https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations ? I think it's not necessary to introduce object-store specific committers when a storage already supports atomic operations. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated HADOOP-15086: -- Attachment: RenameReproducer.java Reproducer > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275198#comment-16275198 ] Shixiong Zhu edited comment on HADOOP-15086 at 12/1/17 11:43 PM: - Attached a reproducer was (Author: zsxwing): Reproducer > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > Attachments: RenameReproducer.java > > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
Shixiong Zhu created HADOOP-15086: - Summary: NativeAzureFileSystem.rename is not atomic Key: HADOOP-15086 URL: https://issues.apache.org/jira/browse/HADOOP-15086 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 2.7.3 Reporter: Shixiong Zhu When multiple threads rename files to the same target path, more than 1 threads can succeed. It's because check and copy file in `rename` is not atomic. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic
[ https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated HADOOP-15086: -- Description: When multiple threads rename files to the same target path, more than 1 threads can succeed. It's because check and copy file in `rename` is not atomic. I would expect it's atomic just like HDFS. was:When multiple threads rename files to the same target path, more than 1 threads can succeed. It's because check and copy file in `rename` is not atomic. > NativeAzureFileSystem.rename is not atomic > -- > > Key: HADOOP-15086 > URL: https://issues.apache.org/jira/browse/HADOOP-15086 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 2.7.3 >Reporter: Shixiong Zhu > > When multiple threads rename files to the same target path, more than 1 > threads can succeed. It's because check and copy file in `rename` is not > atomic. > I would expect it's atomic just like HDFS. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14084) Shell.joinThread swallows InterruptedException
[ https://issues.apache.org/jira/browse/HADOOP-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated HADOOP-14084: -- Description: In "Shell.joinThread", when the user tries to interrupt the thread that runs Shell.joinThread, it will catch InterruptedException and propagate it to thread t. However, it doesn't set the interrupt state of the current thread before returning, so the user codes won't know it's already interrupted. See https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035 was: In "Shell.joinThread", when the user tries to interrupt the thread that runs Shell.joinThread, it will catch InterruptedException and propagate it to thread t. However, it doesn't set the interrupt state of the current thread before returning, so the user codes won't know it's already interrupted and should exit. See https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035 > Shell.joinThread swallows InterruptedException > -- > > Key: HADOOP-14084 > URL: https://issues.apache.org/jira/browse/HADOOP-14084 > Project: Hadoop Common > Issue Type: Bug >Reporter: Shixiong Zhu >Priority: Minor > > In "Shell.joinThread", when the user tries to interrupt the thread that runs > Shell.joinThread, it will catch InterruptedException and propagate it to > thread t. However, it doesn't set the interrupt state of the current thread > before returning, so the user codes won't know it's already interrupted. > See > https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14084) Shell.joinThread swallows InterruptedException
Shixiong Zhu created HADOOP-14084: - Summary: Shell.joinThread swallows InterruptedException Key: HADOOP-14084 URL: https://issues.apache.org/jira/browse/HADOOP-14084 Project: Hadoop Common Issue Type: Bug Reporter: Shixiong Zhu Priority: Minor In "Shell.joinThread", when the user tries to interrupt the thread that runs Shell.joinThread, it will catch InterruptedException and propagate it to thread t. However, it doesn't set the interrupt state of the current thread before returning, so the user codes won't know it's already interrupted and should exit. See https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org