[jira] [Commented] (HADOOP-11851) s3n to swallow IOEs on inner stream close
[ https://issues.apache.org/jira/browse/HADOOP-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503893#comment-14503893 ] Takenori Sato commented on HADOOP-11851: Isn't this the duplicate of HADOOP-11730? s3n to swallow IOEs on inner stream close - Key: HADOOP-11851 URL: https://issues.apache.org/jira/browse/HADOOP-11851 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 2.6.0 Reporter: Steve Loughran Assignee: Anu Engineer Priority: Minor We've seen a situation where some work was failing from (recurrent) connection reset exceptions. Irrespective of the root cause, these were surfacing not in the read operations, but when the input stream was being closed -including during a seek() These exceptions could be caught logged warn, rather than trigger immediate failures. It shouldn't matter to the next GET whether the last stream closed prematurely, as long as the new one works -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392251#comment-14392251 ] Takenori Sato commented on HADOOP-11742: _mkdir_ and _ls_ worked as expected with the fix. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/ 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/root (root) 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/ 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory 15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3atest/root Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3a://s3atest/root {code} The created directory didn't become visible immediately. But the successive _ls_ showed it was successful. mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, HADOOP-11742-branch-2.7.003-2.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.003-1.patch This is the patch to fix _S3AFileSystem#getFileStatus_. The dedicated part to process a root directory was added, which is entered only when key.isEmpty() == true. mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392231#comment-14392231 ] Takenori Sato commented on HADOOP-11742: Patches are verified as follows. 1. run TestS3AContractRootDir to see it succeeds {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.855 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Results : Tests run: 5, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 10.341 s [INFO] Finished at: 2015-04-02T05:41:48+00:00 [INFO] Final Memory: 28M/407M [INFO] {code} 2. apply the test patch(003-2), and run TestS3AContractRootDir {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 21.296 sec FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 4.608 sec ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.509 sec ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:104) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 3.006 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 2B352694A5577C62, AWS Error Code: MalformedXML, AWS Error Message:
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.003-2.patch This is the patch to fix the unit test, _AbstractContractRootDirectoryTest_. Changes are: # setup() prepares an empty directory # assertion was added to make sure the root dir is empty in testRmEmptyRootDirNonRecursive() # teardown() does nothing mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, HADOOP-11742-branch-2.7.003-2.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-11742: I confirmed mkdir fails on an empty bucket for AWS as follows: 1. make sure the bucket is empty, but get an exception {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/ 15/04/02 01:49:09 DEBUG http.wire: HEAD / HTTP/1.1[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Host: s3atest.s3.amazonaws.com[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Authorization: AWS XXX=[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:08 GMT[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Connection: Keep-Alive[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG http.wire: HTTP/1.1 200 OK[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: x-amz-id-2: XXX[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: x-amz-request-id: XXX[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:10 GMT[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Content-Type: application/xml[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Transfer-Encoding: chunked[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Server: AmazonS3[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3atest/ () 15/04/02 01:49:09 DEBUG http.wire: GET /?delimiter=%2Fmax-keys=1prefix= HTTP/1.1[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Host: s3atest.s3.amazonaws.com[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Authorization: AWS XXX=[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:09 GMT[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Connection: Keep-Alive[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG http.wire: HTTP/1.1 200 OK[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: x-amz-id-2: XXX[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: x-amz-request-id: XXX[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:10 GMT[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Content-Type: application/xml[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Transfer-Encoding: chunked[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: Server: AmazonS3[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG http.wire: fe[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: ?xml version=1.0 encoding=UTF-8?[\n] 15/04/02 01:49:09 DEBUG http.wire: ListBucketResult xmlns=http://s3.amazonaws.com/doc/2006-03-01/;Names3atest/NamePrefix/PrefixMarker/MarkerMaxKeys1/MaxKeysDelimiter//DelimiterIsTruncatedfalse/IsTruncated/ListBucketResult 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG http.wire: 0[\r][\n] 15/04/02 01:49:09 DEBUG http.wire: [\r][\n] 15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/ ls: `s3a://s3atest/': No such file or directory {code} 2. create a directory, but get an exception {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY -Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root 15/04/02 01:49:41 DEBUG http.wire: HEAD / HTTP/1.1[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Host: s3atest.s3.amazonaws.com[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Authorization: AWS XXX=[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:41 GMT[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: User-Agent: aws-sdk-java/1.7.4 Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Content-Type: application/x-www-form-urlencoded; charset=utf-8[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Connection: Keep-Alive[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: [\r][\n] 15/04/02 01:49:41 DEBUG http.wire: HTTP/1.1 200 OK[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: x-amz-id-2: XXX[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: x-amz-request-id: XXX[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Date: Thu, 02 Apr 2015 01:49:42 GMT[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Content-Type: application/xml[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Transfer-Encoding: chunked[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: Server: AmazonS3[\r][\n] 15/04/02 01:49:41 DEBUG http.wire: [\r][\n] 15/04/02 01:49:41 DEBUG s3a.S3AFileSystem: Getting path
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386493#comment-14386493 ] Takenori Sato commented on HADOOP-11753: Thanks, it makes sense. I will discuss internally. TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0, 2.7.0 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11753-branch-2.7.001.patch _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-11753. Resolution: Invalid TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0, 2.7.0 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11753-branch-2.7.001.patch _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386168#comment-14386168 ] Takenori Sato commented on HADOOP-11753: Thanks for the clarification. Yes, this is against Cloudian. So let me close. Will check AWS as well for further tests. TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 3.0.0, 2.7.0 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11753-branch-2.7.001.patch _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386172#comment-14386172 ] Takenori Sato commented on HADOOP-11742: Thomas, Steve, yes, again this is against our own. I will check the difference. Let me close. mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Resolution: Fixed Assignee: Takenori Sato Status: Resolved (was: Patch Available) mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-11742. Resolution: Invalid mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-11742: Reopen to mark this as invalid. mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.7.0 Environment: CentOS 7 Reporter: Takenori Sato Assignee: Takenori Sato Priority: Minor Attachments: HADOOP-11742-branch-2.7.001.patch, HADOOP-11742-branch-2.7.002.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11753: --- Attachment: HADOOP-11753-branch-2.7.001.patch set Range header only when contentLength 0 {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 19.821 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.186 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.563 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.412 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.687 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRename Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.29 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Running org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.943 sec - in org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek Running org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 16.791 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate Running org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.891 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete Running org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.791 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir Running org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.736 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRename Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.308 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractRename Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.716 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir Running org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.433 sec - in org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek Running org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.641 sec - in org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract Running org.apache.hadoop.fs.s3.TestINode Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.127 sec - in org.apache.hadoop.fs.s3.TestINode Running org.apache.hadoop.fs.s3.TestS3Credentials Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.36 sec - in org.apache.hadoop.fs.s3.TestS3Credentials Running org.apache.hadoop.fs.s3.TestS3FileSystem Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.362 sec - in org.apache.hadoop.fs.s3.TestS3FileSystem Running org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.754 sec - in org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem Running org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 500.943 sec - in org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles Running org.apache.hadoop.fs.s3a.TestS3ABlocksize Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.25 sec - in org.apache.hadoop.fs.s3a.TestS3ABlocksize Running org.apache.hadoop.fs.s3a.TestS3AConfiguration Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.334 sec - in org.apache.hadoop.fs.s3a.TestS3AConfiguration Running org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.867 sec - in org.apache.hadoop.fs.s3a.TestS3AFastOutputStream Running org.apache.hadoop.fs.s3a.TestS3AFileSystemContract Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.965 sec - in
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.002.patch I found _AbstractFSContractTestBase#setup_ always creates a test directory, which is removed at _teardown_. Thus, an empty directory was not tested in concrete test cases. The problem here is not calling mkdir on an empty bucket. But when you call _S3AFileSystem#getFileStatus(/)_ on an empty bucket, it throws an exception. To setup such a condition, I rather chose to remove the test directory at setup, then no-op at teardown. Then, without this fix, TestS3AContractRootDir failed as follows. {code} --- T E S T S --- Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 8.027 sec FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.82 sec ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 0.475 sec ERROR! java.io.FileNotFoundException: No such file or directory: / at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77) at org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464) at org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:96) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir) Time elapsed: 2.922 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: 368CF290D38711E4, AWS Error Code: MalformedXML, AWS Error Message: The XML you provided was not well-formed or did not validate against our published schema. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379380#comment-14379380 ] Takenori Sato commented on HADOOP-11742: I checked how this is covered in test cases. _NativeS3FileSystemContractBaseTest#testListStatusForRoot_ looks like a relevant test for s3n. But something similar is not found for s3a. But _TestS3AContractRootDir_ is supposed to test this scenario, correct? mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Environment: CentOS 7 Reporter: Takenori Sato Attachments: HADOOP-11742-branch-2.7.001.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11742: --- Attachment: HADOOP-11742-branch-2.7.001.patch An empty key means a root directory instead of Not Found. This is the same behavior as _NativeS3FileSystem#getFileStatus_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/ 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/25 06:28:22 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/ 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for directory 15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3a/foo Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3a://s3a/foo {code} mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Environment: CentOS 7 Reporter: Takenori Sato Attachments: HADOOP-11742-branch-2.7.001.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381384#comment-14381384 ] Takenori Sato commented on HADOOP-11753: Hi, OK, thanks. I was about to start. So I leave this to you. TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato Assignee: J.Andreina _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header
Takenori Sato created HADOOP-11753: -- Summary: TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11753: --- Summary: TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header (was: TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
[ https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381351#comment-14381351 ] Takenori Sato commented on HADOOP-11742: OK, will do. But found the current s3a related unit tests won't finish successfully. Filed as HADOOP-11753. mkdir by file system shell fails on an empty bucket --- Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Environment: CentOS 7 Reporter: Takenori Sato Attachments: HADOOP-11742-branch-2.7.001.patch I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381338#comment-14381338 ] Takenori Sato commented on HADOOP-11753: _TestS3AContractSeek#testBlockReadZeroByteFile_ fails by the same reason, too. TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} This is because the header is wrong when calling _S3AInputStream#read_ after _S3AInputStream#open_. {code} Range: bytes=0--1 * from 0 to -1 {code} Tested on the latest branch-2.7. {quote} $ git log commit d286673c602524af08935ea132c8afd181b6e2e4 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local Date: Tue Mar 24 16:17:06 2015 -0700 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header
[ https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381343#comment-14381343 ] Takenori Sato commented on HADOOP-11753: Another one. {code} testSeekZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek) Time elapsed: 9.478 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: 29E6B1A0D37011E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractSeekTest.testSeekZeroByteFile(AbstractContractSeekTest.java:88) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) {code} TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header --- Key: HADOOP-11753 URL: https://issues.apache.org/jira/browse/HADOOP-11753 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Takenori Sato _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows. {code} testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) Time elapsed: 3.312 sec ERROR! com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: InvalidRange, AWS Error Message: The requested range cannot be satisfied. at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62) at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at
[jira] [Created] (HADOOP-11742) mkdir by file system shell fails on an empty bucket
Takenori Sato created HADOOP-11742: -- Summary: mkdir by file system shell fails on an empty bucket Key: HADOOP-11742 URL: https://issues.apache.org/jira/browse/HADOOP-11742 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Environment: CentOS 7 Reporter: Takenori Sato I have built the latest 2.7, and tried S3AFileSystem. Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows: {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/foo (foo) 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ mkdir: `s3a://s3a/foo': No such file or directory {code} So does _ls_. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/ 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ () 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/ ls: `s3a://s3a/': No such file or directory {code} This is how it works via s3n. {code} # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/ Found 1 items drwxrwxrwx - 0 1970-01-01 00:00 s3n://s3n/foo {code} The snapshot is the following: {quote} \# git branch \* branch-2.7 trunk \# git log commit 929b04ce3a4fe419dece49ed68d4f6228be214c1 Author: Harsh J ha...@cloudera.com Date: Sun Mar 22 10:18:32 2015 +0530 {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: (was: HADOOP-11730-branch-2.6.0.001.patch) The broken s3n read retry logic causes a wrong output being committed - Key: HADOOP-11730 URL: https://issues.apache.org/jira/browse/HADOOP-11730 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11730-branch-2.6.0.001.patch s3n attempts to read again when it encounters IOException during read. But the current logic does not reopen the connection, thus, it ends up with no-op, and committing the wrong(truncated) output. Here's a stack trace as an example. {quote} 2015-03-13 20:17:24,835 [TezChild] INFO org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex scope-12 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: HADOOP-11730-branch-2.6.0.001.patch The first patch with the updated test case. The broken s3n read retry logic causes a wrong output being committed - Key: HADOOP-11730 URL: https://issues.apache.org/jira/browse/HADOOP-11730 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11730-branch-2.6.0.001.patch s3n attempts to read again when it encounters IOException during read. But the current logic does not reopen the connection, thus, it ends up with no-op, and committing the wrong(truncated) output. Here's a stack trace as an example. {quote} 2015-03-13 20:17:24,835 [TezChild] INFO org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex scope-12 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
[jira] [Resolved] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato resolved HADOOP-10037. Resolution: Fixed The issue that had reopened this turned out being a separate issue. s3n read truncated, but doesn't throw exception Key: HADOOP-10037 URL: https://issues.apache.org/jira/browse/HADOOP-10037 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.0.0-alpha Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge) Reporter: David Rosenstrauch Fix For: 2.6.0 Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html For months now we've been finding that we've been experiencing frequent data truncation issues when reading from S3 using the s3n:// protocol. I finally was able to gather some debugging output on the issue in a job I ran last night, and so can finally file a bug report. The job I ran last night was on a 16-node cluster (all of them AWS EC2 cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0). The job was a Hadoop streaming job, which reads through a large number (i.e., ~55,000) of files on S3, each of them approximately 300K bytes in size. All of the files contain 46 columns of data in each record. But I added in an extra check in my mapper code to count and verify the number of columns in every record - throwing an error and crashing the map task if the column count is wrong. If you look in the attached task logs, you'll see 2 attempts on the same task. The first one fails due to data truncated (i.e., my job intentionally fails the map task due to the current record failing the column count check). The task then gets retried on a different machine and runs to a succesful completion. You can see further evidence of the truncation further down in the task logs, where it displays the count of the records read: the failed task says 32953 records read, while the successful task says 63133. Any idea what the problem might be here and/or how to work around it? This issue is a very common occurrence on our clusters. E.g., in the job I ran last night before I had gone to bed I had already encountered 8 such failuers, and the job was only 10% complete. (~25,000 out of ~250,000 tasks.) I realize that it's common for I/O errors to occur - possibly even frequently - in a large Hadoop job. But I would think that if an I/O failure (like a truncated read) did occur, that something in the underlying infrastructure code (i.e., either in NativeS3FileSystem or in jets3t) should detect the error and throw an IOException accordingly. It shouldn't be up to the calling code to detect such failures, IMO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370292#comment-14370292 ] Takenori Sato commented on HADOOP-10037: David, thanks for your clarification. I heard from Steve that my issue was introduced by some optimizations done for 2.4. So let me close this as FIXED. I will create a new issue for mine. s3n read truncated, but doesn't throw exception Key: HADOOP-10037 URL: https://issues.apache.org/jira/browse/HADOOP-10037 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.0.0-alpha Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge) Reporter: David Rosenstrauch Fix For: 2.6.0 Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html For months now we've been finding that we've been experiencing frequent data truncation issues when reading from S3 using the s3n:// protocol. I finally was able to gather some debugging output on the issue in a job I ran last night, and so can finally file a bug report. The job I ran last night was on a 16-node cluster (all of them AWS EC2 cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0). The job was a Hadoop streaming job, which reads through a large number (i.e., ~55,000) of files on S3, each of them approximately 300K bytes in size. All of the files contain 46 columns of data in each record. But I added in an extra check in my mapper code to count and verify the number of columns in every record - throwing an error and crashing the map task if the column count is wrong. If you look in the attached task logs, you'll see 2 attempts on the same task. The first one fails due to data truncated (i.e., my job intentionally fails the map task due to the current record failing the column count check). The task then gets retried on a different machine and runs to a succesful completion. You can see further evidence of the truncation further down in the task logs, where it displays the count of the records read: the failed task says 32953 records read, while the successful task says 63133. Any idea what the problem might be here and/or how to work around it? This issue is a very common occurrence on our clusters. E.g., in the job I ran last night before I had gone to bed I had already encountered 8 such failuers, and the job was only 10% complete. (~25,000 out of ~250,000 tasks.) I realize that it's common for I/O errors to occur - possibly even frequently - in a large Hadoop job. But I would think that if an I/O failure (like a truncated read) did occur, that something in the underlying infrastructure code (i.e., either in NativeS3FileSystem or in jets3t) should detect the error and throw an IOException accordingly. It shouldn't be up to the calling code to detect such failures, IMO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
Takenori Sato created HADOOP-11730: -- Summary: The broken s3n read retry logic causes a wrong output being committed Key: HADOOP-11730 URL: https://issues.apache.org/jira/browse/HADOOP-11730 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Takenori Sato Assignee: Takenori Sato s3n attempts to read again when it encounters IOException during read. But the current logic does not reopen the connection, thus, it ends up with no-op, and committing the wrong(truncated) output. Here's a stack trace as an example. {quote} 2015-03-13 20:17:24,835 [TezChild] INFO org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex scope-12 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-13 20:17:24,867 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading
[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed
[ https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato updated HADOOP-11730: --- Attachment: HADOOP-11730-branch-2.6.0.001.patch The first proposal without the test case. 2015-03-20 12:05:08,473 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', attempting to reopen. 2015-03-20 12:05:08,473 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrieving All information for bucket shared and object user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119 Verified manually that it reopens a new connection after IOException. The broken s3n read retry logic causes a wrong output being committed - Key: HADOOP-11730 URL: https://issues.apache.org/jira/browse/HADOOP-11730 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Takenori Sato Assignee: Takenori Sato Attachments: HADOOP-11730-branch-2.6.0.001.patch s3n attempts to read again when it encounters IOException during read. But the current logic does not reopen the connection, thus, it ends up with no-op, and committing the wrong(truncated) output. Here's a stack trace as an example. {quote} 2015-03-13 20:17:24,835 [TezChild] INFO org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex scope-12 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at
[jira] [Reopened] (HADOOP-10037) s3n read truncated, but doesn't throw exception
[ https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takenori Sato reopened HADOOP-10037: I confirmed this happens on Hadoop 2.6.0, and found the reason. Here's the stacktrace. {quote} 2015-03-13 20:17:24,866 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream threw an exception org.apache.http.ConnectionClosedException: Premature end of Content-Length delimited message body (expected: 296587138; received: 155648 at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78) at org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146) at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307) at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313) at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-03-13 20:17:24,867 [TezChild] INFO org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', attempting to reopen. 2015-03-13 20:17:24,867 [TezChild] DEBUG org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released HttpMethod as its response data stream is fully consumed 2015-03-13 20:17:24,868 [TezChild] INFO org.apache.tez.dag.app.TaskAttemptListenerImpTezDag - Commit go/no-go request from attempt_1426245338920_0001_1_00_04_0 2015-03-13 20:17:24,868 [TezChild] INFO org.apache.tez.dag.app.dag.impl.TaskImpl - attempt_1426245338920_0001_1_00_04_0 given a go for committing the task output. {quote} The problem is that a job successfully finishes after the exception.
[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation
[ https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104967#comment-14104967 ] Takenori Sato commented on HADOOP-10400: Hi Jordan Mendelson, I came from HADOOP-10643, where you suggested that a new improvement over NativeS3FileSystem should be done here. So I've made 2 pull requests for your upstream repository. 1. make endpoint configurable https://github.com/Aloisius/hadoop-s3a/pull/8 jets3t allows a user to configure an endpoint(protocol, host, and port) through jets3t.properties. But a user can't configure without calling a particular method with AmazonSDK. This fix is to simply allow it. 2. subclass of AbstractFileSystem https://github.com/Aloisius/hadoop-s3a/pull/9 This contains a fix for a similar problem as HADOOP-10643. The difference is that this fix is simpler, and now modification to AbstractFileSystem. Also, when using this subclass, HADOOP-8984 becomes obvious, so whose fix is included as well. Btw, on my test with Pig, I needed to apply the following fix to make this work. Ensure the file is open before trying to seek https://github.com/Aloisius/hadoop-s3a/pull/6 Incorporate new S3A FileSystem implementation - Key: HADOOP-10400 URL: https://issues.apache.org/jira/browse/HADOOP-10400 Project: Hadoop Common Issue Type: Improvement Components: fs, fs/s3 Affects Versions: 2.4.0 Reporter: Jordan Mendelson Assignee: Jordan Mendelson Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch, HADOOP-10400-6.patch The s3native filesystem has a number of limitations (some of which were recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses the aws-sdk instead of the jets3t library. There are a number of improvements over s3native including: - Parallel copy (rename) support (dramatically speeds up commits on large files) - AWS S3 explorer compatible empty directories files xyz/ instead of xyz_$folder$ (reduces littering) - Ignores s3native created _$folder$ files created by s3native and other S3 browsing utilities - Supports multiple output buffer dirs to even out IO when uploading files - Supports IAM role-based authentication - Allows setting a default canned ACL for uploads (public, private, etc.) - Better error recovery handling - Should handle input seeks without having to download the whole file (used for splits a lot) This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to various pom files to get it to build against trunk. I've been using 0.0.1 in production with CDH 4 for several months and CDH 5 for a few days. The version here is 0.0.2 which changes around some keys to hopefully bring the key name style more inline with the rest of hadoop 2.x. *Tunable parameters:* fs.s3a.access.key - Your AWS access key ID (omit for role authentication) fs.s3a.secret.key - Your AWS secret key (omit for role authentication) fs.s3a.connection.maximum - Controls how many parallel connections HttpClient spawns (default: 15) fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 (default: true) fs.s3a.attempts.maximum - How many times we should retry commands on transient errors (default: 10) fs.s3a.connection.timeout - Socket connect timeout (default: 5000) fs.s3a.paging.maximum - How many keys to request from S3 when doing directory listings at a time (default: 5000) fs.s3a.multipart.size - How big (in bytes) to split a upload or copy operation up into (default: 104857600) fs.s3a.multipart.threshold - Until a file is this large (in bytes), use non-parallel upload (default: 2147483647) fs.s3a.acl.default - Set a canned ACL on newly created/copied objects (private | public-read | public-read-write | authenticated-read | log-delivery-write | bucket-owner-read | bucket-owner-full-control) fs.s3a.multipart.purge - True if you want to purge existing multipart uploads that may not have been completed/aborted correctly (default: false) fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads to purge (default: 86400) fs.s3a.buffer.dir - Comma separated list of directories that will be used to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a ) *Caveats*: Hadoop uses a standard output committer which uploads files as filename.COPYING before renaming them. This can cause unnecessary performance issues with S3 because it does not have a rename operation and S3 already verifies uploads against an md5 that the driver sets on the upload request. While this FileSystem should be significantly faster than the built-in s3native driver