[jira] [Commented] (HADOOP-11851) s3n to swallow IOEs on inner stream close
[ https://issues.apache.org/jira/browse/HADOOP-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503893#comment-14503893 ] Takenori Sato commented on HADOOP-11851: Isn't this the duplicate of HADOOP-11730? > s3n to swallow IOEs on inner stream close > - > > Key: HADOOP-11851 > URL: https://issues.apache.org/jira/browse/HADOOP-11851 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Steve Loughran >Assignee: Anu Engineer >Priority: Minor > > We've seen a situation where some work was failing from (recurrent) > connection reset exceptions. > Irrespective of the root cause, these were surfacing not in the read > operations, but when the input stream was being closed -including during a > seek() > These exceptions could be caught & logged & warn, rather than trigger > immediate failures. It shouldn't matter to the next GET whether the last > stream closed prematurely, as long as the new one works -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392251#comment-14392251 ] Takenori Sato commented on HADOOP-11742: _mkdir_ and _ls_ worked as expected with the fix. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/ 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/ 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3atest/root Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3a://s3atest/root {code} The created directory didn't become visible immediately. But the successive _ls_ showed it was successful. > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, > HADOOP-11742-branch-2.7.003-2.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392231#comment-14392231 ] Takenori Sato commented on HADOOP-11742: Patches are verified as follows. 1. run TestS3AContractRootDir to see it succeeds {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.855 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10.341 s [INFO] Finished at: 2015-04-02T05:41:48+00:00 [INFO] Final Memory: 28M/407M [INFO] {code} 2. apply the test patch(003-2), and run TestS3AContractRootDir {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 21.296 sec <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 4.608 sec <<< ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.509 sec <<< ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 3.006 sec <<< ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 2B352694A5577C62, AWS Error Code: MalformedXML, AWS Err
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.003-2.patch This is the patch to fix the unit test, _AbstractContractRootDirectoryTest_. Changes are: # setup() prepares an empty directory # assertion was added to make sure the root dir is empty in testRmEmptyRootDirNonRecursive() # teardown() does nothing > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, > HADOOP-11742-branch-2.7.003-2.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.003-1.patch This is the patch to fix _S3AFileSystem#getFileStatus_. The dedicated part to process a root directory was added, which is entered only when key.isEmpty() == true. > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-11742: I confirmed mkdir fails on an empty bucket for AWS as follows: 1. make sure the bucket is empty, but get an exception {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 01:49:09 DEBUG http.wire: >> "HEAD / HTTP/1.1[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Host: s3atest.s3.amazonaws.com[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Authorization: AWS XXX=[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Date: Thu, 02 Apr 2015 01:49:08 GMT[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Connection: Keep-Alive[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "HTTP/1.1 200 OK[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "x-amz-id-2: XXX[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "x-amz-request-id: XXX[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Date: Thu, 02 Apr 2015 01:49:10 GMT[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Content-Type: application/xml[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Transfer-Encoding: chunked[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Server: AmazonS3[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "[\r][\n]" 15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 01:49:09 DEBUG http.wire: >> "GET /?delimiter=%2F&max-keys=1&prefix= HTTP/1.1[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Host: s3atest.s3.amazonaws.com[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Authorization: AWS XXX=[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Date: Thu, 02 Apr 2015 01:49:09 GMT[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "Connection: Keep-Alive[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: >> "[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "HTTP/1.1 200 OK[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "x-amz-id-2: XXX[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "x-amz-request-id: XXX[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Date: Thu, 02 Apr 2015 01:49:10 GMT[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Content-Type: application/xml[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Transfer-Encoding: chunked[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "Server: AmazonS3[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "fe[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "[\n]" 15/04/02 01:49:09 DEBUG http.wire: << "http://s3.amazonaws.com/doc/2006-03-01/";>s3atest1/false" 15/04/02 01:49:09 DEBUG http.wire: << "[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "0[\r][\n]" 15/04/02 01:49:09 DEBUG http.wire: << "[\r][\n]" 15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/ ls: `s3a://s3atest/': No such file or directory {code} 2. create a directory, but get an exception {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root 15/04/02 01:49:41 DEBUG http.wire: >> "HEAD / HTTP/1.1[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "Host: s3atest.s3.amazonaws.com[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "Authorization: AWS XXX=[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "Date: Thu, 02 Apr 2015 01:49:41 GMT[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "Connection: Keep-Alive[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: >> "[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "HTTP/1.1 200 OK[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "x-amz-id-2: XXX[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "x-amz-request-id: XXX[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "Date: Thu, 02 Apr 2015 01:49:42 GMT[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "Content-Type: application/xml[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "Transfer-Encoding: chunked[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "Server: AmazonS3[\r][\n]" 15/04/02 01:49:41 DEBUG http.wire: << "[\r][\n]" 15/04
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386493#comment-14386493 ] Takenori Sato commented on HADOOP-11753: Thanks, it makes sense. I will discuss internally. > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.0.0, 2.7.0 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11753-branch-2.7.001.patch > > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-11742: Reopen to mark this as invalid. > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-11742. Resolution: Invalid > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386172#comment-14386172 ] Takenori Sato commented on HADOOP-11742: Thomas, Steve, yes, again this is against our own. I will check the difference. Let me close. > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Resolution: Fixed Assignee: Takenori Sato Status: Resolved (was: Patch Available) > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.7.0 > Environment: CentOS 7 >Reporter: Takenori Sato >Assignee: Takenori Sato >Priority: Minor > Attachments: HADOOP-11742-branch-2.7.001.patch, > HADOOP-11742-branch-2.7.002.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-11753. Resolution: Invalid > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.0.0, 2.7.0 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11753-branch-2.7.001.patch > > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386168#comment-14386168 ] Takenori Sato commented on HADOOP-11753: Thanks for the clarification. Yes, this is against Cloudian. So let me close. Will check AWS as well for further tests. > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 3.0.0, 2.7.0 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11753-branch-2.7.001.patch > > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.002.patch I found _AbstractFSContractTestBase#setup_ always creates a test directory, which is removed at _teardown_. Thus, an empty directory was not tested in concrete test cases. The problem here is not calling mkdir on an empty bucket. But when you call _S3AFileSystem#getFileStatus("/")_ on an empty bucket, it throws an exception. To setup such a condition, I rather chose to remove the test directory at setup, then no-op at teardown. Then, without this fix, TestS3AContractRootDir failed as follows. {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 8.027 sec <<< FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.82 sec <<< ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 0.475 sec <<< ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:96) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.922 sec <<< ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 368CF290D38711E4, AWS Error Code: MalformedXML, AWS Error Message: The XML you provided was not well-formed or did not validate against our published schema. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Cl
[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11753: --- Attachment: HADOOP-11753-branch-2.7.001.patch set Range header only when contentLength > 0 {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 19.821 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.186 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.563 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.412 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.687 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.29 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.943 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Running org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 16.791 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate Running org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.891 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete Running org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.791 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir Running org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.736 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.308 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractRename Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.716 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir Running org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.433 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek Running org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.641 sec - in org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract Running org.apache.hadoop.fs.s3.TestINode Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.127 sec - in org.apache.hadoop.fs.s3.TestINode Running org.apache.hadoop.fs.s3.TestS3Credentials Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.36 sec - in org.apache.hadoop.fs.s3.TestS3Credentials Running org.apache.hadoop.fs.s3.TestS3FileSystem Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.362 sec - in org.apache.hadoop.fs.s3.TestS3FileSystem Running org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.754 sec - in org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem Running org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 500.943 sec - in org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Running org.apache.hadoop.fs.s3a.TestS3ABlocksize Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.25 sec - in org.apache.hadoop.fs.s3a.TestS3ABlocksize Running org.apache.hadoop.fs.s3a.TestS3AConfiguration Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.334 sec - in org.apache.hadoop.fs.s3a.TestS3AConfiguration Running org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.867 sec - in org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Running org.apache.hadoop.fs.s3a.TestS3AFileSystemContract Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.965 sec - in org.apache.ha
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381384#comment-14381384 ] Takenori Sato commented on HADOOP-11753: Hi, OK, thanks. I was about to start. So I leave this to you. > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Takenori Sato >Assignee: J.Andreina > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381351#comment-14381351 ] Takenori Sato commented on HADOOP-11742: OK, will do. But found the current s3a related unit tests won't finish successfully. Filed as HADOOP-11753. > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Environment: CentOS 7 >Reporter: Takenori Sato > Attachments: HADOOP-11742-branch-2.7.001.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381343#comment-14381343 ] Takenori Sato commented on HADOOP-11753: Another one. {code} testSeekZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek) Time elapsed: 9.478 sec <<< ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: 29E6B1A0D37011E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractSeekTest.testSeekZeroByteFile(AbstractContractSeekTest.java:88) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Takenori Sato > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381338#comment-14381338 ] Takenori Sato commented on HADOOP-11753: _TestS3AContractSeek#testBlockReadZeroByteFile_ fails by the same reason, too. > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Takenori Sato > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11753: --- Summary: TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header (was: TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header) > TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range > header > --- > > Key: HADOOP-11753 > URL: https://issues.apache.org/jira/browse/HADOOP-11753 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Takenori Sato > > _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. > {code} > testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) > Time elapsed: 3.312 sec <<< ERROR! > com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS > Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: > InvalidRange, AWS Error Message: The requested range cannot be satisfied. > at > com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) > at > com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) > at > com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) > at > com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) > at > com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) > at > org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) > at > org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) > at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) > at java.io.FilterInputStream.read(FilterInputStream.java:83) > at > org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > {code} > This is because the header is wrong when calling _S3AInputStream#read_ after > _S3AInputStream#open_. > {code} > Range: bytes=0--1 > * from 0 to -1 > {code} > Tested on the latest branch-2.7. > {quote} > $ git log > commit d286673c602524af08935ea132c8afd181b6e2e4 > Author: Jitendra Pandey > Date: Tue Mar 24 16:17:06 2015 -0700 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header
Takenori Sato created HADOOP-11753: -- Summary: TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec <<< ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379380#comment-14379380 ] Takenori Sato commented on HADOOP-11742: I checked how this is covered in test cases. _NativeS3FileSystemContractBaseTest#testListStatusForRoot_ looks like a relevant test for s3n. But something similar is not found for s3a. But _TestS3AContractRootDir_ is supposed to test this scenario, correct? > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Environment: CentOS 7 >Reporter: Takenori Sato > Attachments: HADOOP-11742-branch-2.7.001.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.001.patch An empty key means a root directory instead of "Not Found". This is the same behavior as _NativeS3FileSystem#getFileStatus_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/ 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/25 06:28:22 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/ 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3a/foo Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3a://s3a/foo {code} > mkdir by file system shell fails on an empty bucket > --- > > Key: HADOOP-11742 > URL: https://issues.apache.org/jira/browse/HADOOP-11742 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Environment: CentOS 7 >Reporter: Takenori Sato > Attachments: HADOOP-11742-branch-2.7.001.patch > > > I have built the latest 2.7, and tried S3AFileSystem. > Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as > follows: > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo > 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for > s3a://s3a/foo (foo) > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > mkdir: `s3a://s3a/foo': No such file or directory > {code} > So does _ls_. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ > () > 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ > ls: `s3a://s3a/': No such file or directory > {code} > This is how it works via s3n. > {code} > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo > # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ > Found 1 items > drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo > {code} > The snapshot is the following: > {quote} > \# git branch > \* branch-2.7 > trunk > \# git log > commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 > Author: Harsh J > Date: Sun Mar 22 10:18:32 2015 +0530 > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
Takenori Sato created HADOOP-11742: -- Summary: mkdir by file system shell fails on an empty bucket Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Environment: CentOS 7 Reporter: Takenori Sato I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: (was: HADOOP-11730-branch-2.6.0.001.patch) > The broken s3n read retry logic causes a wrong output being committed > - > > Key: HADOOP-11730 > URL: https://issues.apache.org/jira/browse/HADOOP-11730 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11730-branch-2.6.0.001.patch > > > s3n attempts to read again when it encounters IOException during read. But > the current logic does not reopen the connection, thus, it ends up with > no-op, and committing the wrong(truncated) output. > Here's a stack trace as an example. > {quote} > 2015-03-13 20:17:24,835 [TezChild] INFO > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - > Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex > scope-12 > 2015-03-13 20:17:24,866 [TezChild] DEBUG > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - > Released HttpMethod as its response data stream threw an exception > org.apache.http.ConnectionClosedException: Premature end of Content-Length > delimited message body (expected: 296587138; received: 155648 > at > org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) > at > org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) > at > org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) > at > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) > at > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at >
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: HADOOP-11730-branch-2.6.0.001.patch The first patch with the updated test case. > The broken s3n read retry logic causes a wrong output being committed > - > > Key: HADOOP-11730 > URL: https://issues.apache.org/jira/browse/HADOOP-11730 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11730-branch-2.6.0.001.patch > > > s3n attempts to read again when it encounters IOException during read. But > the current logic does not reopen the connection, thus, it ends up with > no-op, and committing the wrong(truncated) output. > Here's a stack trace as an example. > {quote} > 2015-03-13 20:17:24,835 [TezChild] INFO > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - > Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex > scope-12 > 2015-03-13 20:17:24,866 [TezChild] DEBUG > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - > Released HttpMethod as its response data stream threw an exception > org.apache.http.ConnectionClosedException: Premature end of Content-Length > delimited message body (expected: 296587138; received: 155648 > at > org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) > at > org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) > at > org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) > at > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) > at > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) > at java.util.concurrent.FutureTask.run(F
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: HADOOP-11730-branch-2.6.0.001.patch The first proposal without the test case. 2015-03-20 12:05:08,473 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', attempting to reopen. 2015-03-20 12:05:08,473 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrieving All information for bucket shared and object user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119 Verified manually that it reopens a new connection after IOException. > The broken s3n read retry logic causes a wrong output being committed > - > > Key: HADOOP-11730 > URL: https://issues.apache.org/jira/browse/HADOOP-11730 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 > Environment: HDP 2.2 >Reporter: Takenori Sato >Assignee: Takenori Sato > Attachments: HADOOP-11730-branch-2.6.0.001.patch > > > s3n attempts to read again when it encounters IOException during read. But > the current logic does not reopen the connection, thus, it ends up with > no-op, and committing the wrong(truncated) output. > Here's a stack trace as an example. > {quote} > 2015-03-13 20:17:24,835 [TezChild] INFO > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - > Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex > scope-12 > 2015-03-13 20:17:24,866 [TezChild] DEBUG > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - > Released HttpMethod as its response data stream threw an exception > org.apache.http.ConnectionClosedException: Premature end of Content-Length > delimited message body (expected: 296587138; received: 155648 > at > org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) > at > org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) > at > org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) > at > org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at java.io.DataInputStream.read(DataInputStream.java:100) > at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) > at > org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) > at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) > at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTa
[jira] [Created] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
Takenori Sato created HADOOP-11730: -- Summary: The broken s3n read retry logic causes a wrong output being committed Key: HADOOP-11730 URL: https://issues.apache.org/jira/browse/HADOOP-11730 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Takenori Sato Assignee: Takenori Sato s3n attempts to read again when it encounters IOException during read. But the current logic does not reopen the connection, thus, it ends up with no-op, and committing the wrong(truncated) output. Here's a stack trace as an example. {quote} 2015-03-13 20:17:24,835 [TezChild] INFO org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex scope-12 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-13 20:17:24,867 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading 'user/hadoop/tsato/readlarge/input/clou
[jira] [Resolved] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-10037. Resolution: Fixed The issue that had reopened this turned out being a separate issue. > s3n read truncated, but doesn't throw exception > > > Key: HADOOP-10037 > URL: https://issues.apache.org/jira/browse/HADOOP-10037 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.0.0-alpha > Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge) >Reporter: David Rosenstrauch > Fix For: 2.6.0 > > Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html > > > For months now we've been finding that we've been experiencing frequent data > truncation issues when reading from S3 using the s3n:// protocol. I finally > was able to gather some debugging output on the issue in a job I ran last > night, and so can finally file a bug report. > The job I ran last night was on a 16-node cluster (all of them AWS EC2 > cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0). The job > was a Hadoop streaming job, which reads through a large number (i.e., > ~55,000) of files on S3, each of them approximately 300K bytes in size. > All of the files contain 46 columns of data in each record. But I added in > an extra check in my mapper code to count and verify the number of columns in > every record - throwing an error and crashing the map task if the column > count is wrong. > If you look in the attached task logs, you'll see 2 attempts on the same > task. The first one fails due to data truncated (i.e., my job intentionally > fails the map task due to the current record failing the column count check). > The task then gets retried on a different machine and runs to a succesful > completion. > You can see further evidence of the truncation further down in the task logs, > where it displays the count of the records read: the failed task says 32953 > records read, while the successful task says 63133. > Any idea what the problem might be here and/or how to work around it? This > issue is a very common occurrence on our clusters. E.g., in the job I ran > last night before I had gone to bed I had already encountered 8 such > failuers, and the job was only 10% complete. (~25,000 out of ~250,000 tasks.) > I realize that it's common for I/O errors to occur - possibly even frequently > - in a large Hadoop job. But I would think that if an I/O failure (like a > truncated read) did occur, that something in the underlying infrastructure > code (i.e., either in NativeS3FileSystem or in jets3t) should detect the > error and throw an IOException accordingly. It shouldn't be up to the > calling code to detect such failures, IMO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14370292#comment-14370292 ] Takenori Sato commented on HADOOP-10037: David, thanks for your clarification. I heard from Steve that my issue was introduced by some optimizations done for 2.4. So let me close this as FIXED. I will create a new issue for mine. > s3n read truncated, but doesn't throw exception > > > Key: HADOOP-10037 > URL: https://issues.apache.org/jira/browse/HADOOP-10037 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.0.0-alpha > Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge) >Reporter: David Rosenstrauch > Fix For: 2.6.0 > > Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html > > > For months now we've been finding that we've been experiencing frequent data > truncation issues when reading from S3 using the s3n:// protocol. I finally > was able to gather some debugging output on the issue in a job I ran last > night, and so can finally file a bug report. > The job I ran last night was on a 16-node cluster (all of them AWS EC2 > cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0). The job > was a Hadoop streaming job, which reads through a large number (i.e., > ~55,000) of files on S3, each of them approximately 300K bytes in size. > All of the files contain 46 columns of data in each record. But I added in > an extra check in my mapper code to count and verify the number of columns in > every record - throwing an error and crashing the map task if the column > count is wrong. > If you look in the attached task logs, you'll see 2 attempts on the same > task. The first one fails due to data truncated (i.e., my job intentionally > fails the map task due to the current record failing the column count check). > The task then gets retried on a different machine and runs to a succesful > completion. > You can see further evidence of the truncation further down in the task logs, > where it displays the count of the records read: the failed task says 32953 > records read, while the successful task says 63133. > Any idea what the problem might be here and/or how to work around it? This > issue is a very common occurrence on our clusters. E.g., in the job I ran > last night before I had gone to bed I had already encountered 8 such > failuers, and the job was only 10% complete. (~25,000 out of ~250,000 tasks.) > I realize that it's common for I/O errors to occur - possibly even frequently > - in a large Hadoop job. But I would think that if an I/O failure (like a > truncated read) did occur, that something in the underlying infrastructure > code (i.e., either in NativeS3FileSystem or in jets3t) should detect the > error and throw an IOException accordingly. It shouldn't be up to the > calling code to detect such failures, IMO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-10037: I confirmed this happens on Hadoop 2.6.0, and found the reason. Here's the stacktrace. {quote} 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-13 20:17:24,867 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', attempting to reopen. 2015-03-13 20:17:24,867 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream is fully consumed 2015-03-13 20:17:24,868 [TezChild] INFO org.apache.tez.dag.app.TaskAttemptListenerImpTezDag - Commit go/no-go request from attempt_1426245338920_0001_1_00_04_0 2015-03-13 20:17:24,868 [TezChild] INFO org.apache.tez.dag.app.dag.impl.TaskImpl - attempt_1426245338920_0001_1_00_04_0 given a go for committing the task output. {quote} The problem is that a job successfully finishes after the exception. T
[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation
[ https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104967#comment-14104967 ] Takenori Sato commented on HADOOP-10400: Hi Jordan Mendelson, I came from HADOOP-10643, where you suggested that a new improvement over NativeS3FileSystem should be done here. So I've made 2 pull requests for your upstream repository. 1. make endpoint configurable https://github.com/Aloisius/hadoop-s3a/pull/8 jets3t allows a user to configure an endpoint(protocol, host, and port) through jets3t.properties. But a user can't configure without calling a particular method with AmazonSDK. This fix is to simply allow it. 2. subclass of AbstractFileSystem https://github.com/Aloisius/hadoop-s3a/pull/9 This contains a fix for a similar problem as HADOOP-10643. The difference is that this fix is simpler, and now modification to AbstractFileSystem. Also, when using this subclass, HADOOP-8984 becomes obvious, so whose fix is included as well. Btw, on my test with Pig, I needed to apply the following fix to make this work. "Ensure the file is open before trying to seek" https://github.com/Aloisius/hadoop-s3a/pull/6 > Incorporate new S3A FileSystem implementation > - > > Key: HADOOP-10400 > URL: https://issues.apache.org/jira/browse/HADOOP-10400 > Project: Hadoop Common > Issue Type: Improvement > Components: fs, fs/s3 >Affects Versions: 2.4.0 >Reporter: Jordan Mendelson >Assignee: Jordan Mendelson > Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, > HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch, > HADOOP-10400-6.patch > > > The s3native filesystem has a number of limitations (some of which were > recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses > the aws-sdk instead of the jets3t library. There are a number of improvements > over s3native including: > - Parallel copy (rename) support (dramatically speeds up commits on large > files) > - AWS S3 explorer compatible empty directories files "xyz/" instead of > "xyz_$folder$" (reduces littering) > - Ignores s3native created _$folder$ files created by s3native and other S3 > browsing utilities > - Supports multiple output buffer dirs to even out IO when uploading files > - Supports IAM role-based authentication > - Allows setting a default canned ACL for uploads (public, private, etc.) > - Better error recovery handling > - Should handle input seeks without having to download the whole file (used > for splits a lot) > This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to > various pom files to get it to build against trunk. I've been using 0.0.1 in > production with CDH 4 for several months and CDH 5 for a few days. The > version here is 0.0.2 which changes around some keys to hopefully bring the > key name style more inline with the rest of hadoop 2.x. > *Tunable parameters:* > fs.s3a.access.key - Your AWS access key ID (omit for role authentication) > fs.s3a.secret.key - Your AWS secret key (omit for role authentication) > fs.s3a.connection.maximum - Controls how many parallel connections > HttpClient spawns (default: 15) > fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 > (default: true) > fs.s3a.attempts.maximum - How many times we should retry commands on > transient errors (default: 10) > fs.s3a.connection.timeout - Socket connect timeout (default: 5000) > fs.s3a.paging.maximum - How many keys to request from S3 when doing > directory listings at a time (default: 5000) > fs.s3a.multipart.size - How big (in bytes) to split a upload or copy > operation up into (default: 104857600) > fs.s3a.multipart.threshold - Until a file is this large (in bytes), use > non-parallel upload (default: 2147483647) > fs.s3a.acl.default - Set a canned ACL on newly created/copied objects > (private | public-read | public-read-write | authenticated-read | > log-delivery-write | bucket-owner-read | bucket-owner-full-control) > fs.s3a.multipart.purge - True if you want to purge existing multipart > uploads that may not have been completed/aborted correctly (default: false) > fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads > to purge (default: 86400) > fs.s3a.buffer.dir - Comma separated list of directories that will be used > to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a ) > *Caveats*: > Hadoop uses a standard output committer which uploads files as > filename.COPYING before renaming them. This can cause unnecessary performance > issues with S3 because it does not have a rename operation and S3 already > verifies uploads against an md5 that the driver sets on the upload request. > While this FileSyste