[jira] [Commented] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file

2018-10-25 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664106#comment-16664106
 ] 

Shixiong Zhu commented on HADOOP-15875:
---

Sorry. I misread your comment.

> S3AInputStream.seek should throw EOFException if seeking past the end of file
> -
>
> Key: HADOOP-15875
> URL: https://issues.apache.org/jira/browse/HADOOP-15875
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Shixiong Zhu
>Priority: Minor
>
> I read the javadoc of `Seekable.seek` but it doesn't say what should be done 
> when seeking past the end of file. Right now, DFSInputStream throws new 
> EOFException, but S3AInputStream doesn't throw any error.
> I think it's better to have consistent behavior in `seek.`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file

2018-10-24 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16662802#comment-16662802
 ] 

Shixiong Zhu commented on HADOOP-15875:
---

[~ste...@apache.org] S3AInputStream can check this easily since it has the file 
length. 
https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azure/BlockBlobInputStream.java#L123
 has the same check as well.

> S3AInputStream.seek should throw EOFException if seeking past the end of file
> -
>
> Key: HADOOP-15875
> URL: https://issues.apache.org/jira/browse/HADOOP-15875
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Shixiong Zhu
>Priority: Minor
>
> I read the javadoc of `Seekable.seek` but it doesn't say what should be done 
> when seeking past the end of file. Right now, DFSInputStream throws new 
> EOFException, but S3AInputStream doesn't throw any error.
> I think it's better to have consistent behavior in `seek.`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15875) S3AInputStream.seek should throw EOFException if seeking past the end of file

2018-10-23 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created HADOOP-15875:
-

 Summary: S3AInputStream.seek should throw EOFException if seeking 
past the end of file
 Key: HADOOP-15875
 URL: https://issues.apache.org/jira/browse/HADOOP-15875
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Shixiong Zhu


I read the javadoc of `Seekable.seek` but it doesn't say what should be done 
when seeking past the end of file. Right now, DFSInputStream throws new 
EOFException, but S3AInputStream doesn't throw any error.

I think it's better to have consistent behavior in `seek.`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15871) Some input streams does not obey "java.io.InputStream.available" contract

2018-10-22 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created HADOOP-15871:
-

 Summary: Some input streams does not obey 
"java.io.InputStream.available" contract 
 Key: HADOOP-15871
 URL: https://issues.apache.org/jira/browse/HADOOP-15871
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs, fs/s3
Reporter: Shixiong Zhu


E.g,  DFSInputStream  and S3AInputStream return the size of the remaining 
available bytes, but the javadoc of "available" says it should "Returns an 
estimate of the number of bytes that can be read (or skipped over) from this 
input stream *without blocking* by the next invocation of a method for this 
input stream."

I understand that some applications may rely on the current behavior. It would 
be great that there is an interface to document how "available" should be 
implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15870) S3AInputStream.remainingInFile should use nextReadPos

2018-10-22 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created HADOOP-15870:
-

 Summary: S3AInputStream.remainingInFile should use nextReadPos
 Key: HADOOP-15870
 URL: https://issues.apache.org/jira/browse/HADOOP-15870
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.1.1
Reporter: Shixiong Zhu


Otherwise `remainingInFile` will not change after `seek`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-04 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277260#comment-16277260
 ] 

Shixiong Zhu commented on HADOOP-15086:
---

In addition, I probably was not clear. I created this ticket is just for atomic 
file rename.

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-04 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277256#comment-16277256
 ] 

Shixiong Zhu commented on HADOOP-15086:
---

Looks like we can use conditional-headers to implement an atomic file rename on 
Azure blob storage  
https://docs.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations
 ? I think it's not necessary to introduce object-store specific committers 
when a storage already supports atomic operations.

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated HADOOP-15086:
--
Attachment: RenameReproducer.java

Reproducer

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275198#comment-16275198
 ] 

Shixiong Zhu edited comment on HADOOP-15086 at 12/1/17 11:43 PM:
-

Attached a reproducer


was (Author: zsxwing):
Reproducer

> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
> Attachments: RenameReproducer.java
>
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created HADOOP-15086:
-

 Summary: NativeAzureFileSystem.rename is not atomic
 Key: HADOOP-15086
 URL: https://issues.apache.org/jira/browse/HADOOP-15086
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 2.7.3
Reporter: Shixiong Zhu


When multiple threads rename files to the same target path, more than 1 threads 
can succeed. It's because check and copy file in `rename` is not atomic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15086) NativeAzureFileSystem.rename is not atomic

2017-12-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated HADOOP-15086:
--
Description: 
When multiple threads rename files to the same target path, more than 1 threads 
can succeed. It's because check and copy file in `rename` is not atomic.

I would expect it's atomic just like HDFS.

  was:When multiple threads rename files to the same target path, more than 1 
threads can succeed. It's because check and copy file in `rename` is not atomic.


> NativeAzureFileSystem.rename is not atomic
> --
>
> Key: HADOOP-15086
> URL: https://issues.apache.org/jira/browse/HADOOP-15086
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 2.7.3
>Reporter: Shixiong Zhu
>
> When multiple threads rename files to the same target path, more than 1 
> threads can succeed. It's because check and copy file in `rename` is not 
> atomic.
> I would expect it's atomic just like HDFS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14084) Shell.joinThread swallows InterruptedException

2017-02-15 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated HADOOP-14084:
--
Description: 
In "Shell.joinThread", when the user tries to interrupt the thread that runs 
Shell.joinThread, it will catch InterruptedException and propagate it to thread 
t. However, it doesn't set the interrupt state of the current thread before 
returning, so the user codes won't know it's already interrupted.

See 
https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035

  was:
In "Shell.joinThread", when the user tries to interrupt the thread that runs 
Shell.joinThread, it will catch InterruptedException and propagate it to thread 
t. However, it doesn't set the interrupt state of the current thread before 
returning, so the user codes won't know it's already interrupted and should 
exit.

See 
https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035


> Shell.joinThread swallows InterruptedException
> --
>
> Key: HADOOP-14084
> URL: https://issues.apache.org/jira/browse/HADOOP-14084
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Shixiong Zhu
>Priority: Minor
>
> In "Shell.joinThread", when the user tries to interrupt the thread that runs 
> Shell.joinThread, it will catch InterruptedException and propagate it to 
> thread t. However, it doesn't set the interrupt state of the current thread 
> before returning, so the user codes won't know it's already interrupted.
> See 
> https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14084) Shell.joinThread swallows InterruptedException

2017-02-15 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created HADOOP-14084:
-

 Summary: Shell.joinThread swallows InterruptedException
 Key: HADOOP-14084
 URL: https://issues.apache.org/jira/browse/HADOOP-14084
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Shixiong Zhu
Priority: Minor


In "Shell.joinThread", when the user tries to interrupt the thread that runs 
Shell.joinThread, it will catch InterruptedException and propagate it to thread 
t. However, it doesn't set the interrupt state of the current thread before 
returning, so the user codes won't know it's already interrupted and should 
exit.

See 
https://github.com/apache/hadoop/blob/9e19f758c1950cbcfcd1969461a8a910efca0767/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Shell.java#L1035



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org