[jira] [Commented] (HADOOP-11851) s3n to swallow IOEs on inner stream close

2015-04-20 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503893#comment-14503893
 ] 

Takenori Sato commented on HADOOP-11851:


Isn't this the duplicate of HADOOP-11730?

 s3n to swallow IOEs on inner stream close
 -

 Key: HADOOP-11851
 URL: https://issues.apache.org/jira/browse/HADOOP-11851
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Steve Loughran
Assignee: Anu Engineer
Priority: Minor

 We've seen a situation where some work was failing from (recurrent) 
 connection reset exceptions.
 Irrespective of the root cause, these were surfacing not in the read 
 operations, but when the input stream was being closed -including during a 
 seek()
 These exceptions could be caught  logged  warn, rather than trigger 
 immediate failures. It shouldn't matter to the next GET whether the last 
 stream closed prematurely, as long as the new one works



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-04-02 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392251#comment-14392251
 ] 

Takenori Sato commented on HADOOP-11742:


_mkdir_ and _ls_ worked as expected with the fix.

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY 
-Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true
15/04/02 06:52:55 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for 
directory 
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY 
-Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/root (root)
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3atest/root
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/root (root)
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/root (root)
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/root
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:53:20 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? true
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY 
-Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3atest/
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: s3a://s3atest/ is empty? false
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for 
directory 
15/04/02 06:53:26 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3atest/root
Found 1 items
drwxrwxrwx   -  0 1970-01-01 00:00 s3a://s3atest/root 
{code}

The created directory didn't become visible immediately. But the successive 
_ls_ showed it was successful.

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, 
 HADOOP-11742-branch-2.7.003-2.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-04-02 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11742:
---
Attachment: HADOOP-11742-branch-2.7.003-1.patch

This is the patch to fix _S3AFileSystem#getFileStatus_. The dedicated part to 
process a root directory was added, which is entered only when key.isEmpty() == 
true.

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-04-02 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392231#comment-14392231
 ] 

Takenori Sato commented on HADOOP-11742:


Patches are verified as follows.

1. run TestS3AContractRootDir to see it succeeds

{code}
---
 T E S T S
---
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.855 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 10.341 s
[INFO] Finished at: 2015-04-02T05:41:48+00:00
[INFO] Final Memory: 28M/407M
[INFO] 
{code}

2. apply the test patch(003-2), and run TestS3AContractRootDir

{code}
---
 T E S T S
---
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 21.296 sec  
FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)
  Time elapsed: 4.608 sec   ERROR!
java.io.FileNotFoundException: No such file or directory: /
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464)
at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)  
Time elapsed: 2.509 sec   ERROR!
java.io.FileNotFoundException: No such file or directory: /
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:996)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464)
at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)
  Time elapsed: 3.006 sec   ERROR!
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS 
Service: Amazon S3, AWS Request ID: 2B352694A5577C62, AWS Error Code: 
MalformedXML, AWS Error Message: 

[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-04-02 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11742:
---
Attachment: HADOOP-11742-branch-2.7.003-2.patch

This is the patch to fix the unit test, _AbstractContractRootDirectoryTest_.

Changes are:
# setup() prepares an empty directory
# assertion was added to make sure the root dir is empty in 
testRmEmptyRootDirNonRecursive()
# teardown() does nothing

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch, HADOOP-11742-branch-2.7.003-1.patch, 
 HADOOP-11742-branch-2.7.003-2.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-04-01 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato reopened HADOOP-11742:


I confirmed mkdir fails on an empty bucket for AWS as follows:

1. make sure the bucket is empty, but get an exception

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY 
-Dfs.s3a.secret.key=SECRET_KEY -ls s3a://s3atest/
15/04/02 01:49:09 DEBUG http.wire:  HEAD / HTTP/1.1[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Host: s3atest.s3.amazonaws.com[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Authorization: AWS XXX=[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:08 
GMT[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  User-Agent: aws-sdk-java/1.7.4 
Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 
Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Content-Type: 
application/x-www-form-urlencoded; charset=utf-8[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Connection: Keep-Alive[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  HTTP/1.1 200 OK[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  x-amz-id-2: XXX[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  x-amz-request-id: XXX[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:10 
GMT[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Content-Type: application/xml[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Transfer-Encoding: chunked[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Server: AmazonS3[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3atest/ ()
15/04/02 01:49:09 DEBUG http.wire:  GET /?delimiter=%2Fmax-keys=1prefix= 
HTTP/1.1[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Host: s3atest.s3.amazonaws.com[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Authorization: AWS XXX=[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:09 
GMT[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  User-Agent: aws-sdk-java/1.7.4 
Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 
Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Content-Type: 
application/x-www-form-urlencoded; charset=utf-8[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Connection: Keep-Alive[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  HTTP/1.1 200 OK[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  x-amz-id-2: XXX[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  x-amz-request-id: XXX[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:10 
GMT[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Content-Type: application/xml[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Transfer-Encoding: chunked[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  Server: AmazonS3[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  fe[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  ?xml version=1.0 
encoding=UTF-8?[\n]
15/04/02 01:49:09 DEBUG http.wire:  ListBucketResult 
xmlns=http://s3.amazonaws.com/doc/2006-03-01/;Names3atest/NamePrefix/PrefixMarker/MarkerMaxKeys1/MaxKeysDelimiter//DelimiterIsTruncatedfalse/IsTruncated/ListBucketResult
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  0[\r][\n]
15/04/02 01:49:09 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:09 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3atest/
ls: `s3a://s3atest/': No such file or directory
{code}

2. create a directory, but get an exception

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -Dfs.s3a.access.key=ACCESS_KEY 
-Dfs.s3a.secret.key=SECRET_KEY -mkdir s3a://s3atest/root
15/04/02 01:49:41 DEBUG http.wire:  HEAD / HTTP/1.1[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Host: s3atest.s3.amazonaws.com[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Authorization: AWS XXX=[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:41 
GMT[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  User-Agent: aws-sdk-java/1.7.4 
Linux/3.10.0-123.8.1.el7.centos.plus.x86_64 
Java_HotSpot(TM)_64-Bit_Server_VM/24.75-b04/1.7.0_75[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Content-Type: 
application/x-www-form-urlencoded; charset=utf-8[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Connection: Keep-Alive[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  HTTP/1.1 200 OK[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  x-amz-id-2: XXX[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  x-amz-request-id: XXX[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Date: Thu, 02 Apr 2015 01:49:42 
GMT[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Content-Type: application/xml[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Transfer-Encoding: chunked[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  Server: AmazonS3[\r][\n]
15/04/02 01:49:41 DEBUG http.wire:  [\r][\n]
15/04/02 01:49:41 DEBUG s3a.S3AFileSystem: Getting path 

[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-30 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386493#comment-14386493
 ] 

Takenori Sato commented on HADOOP-11753:


Thanks, it makes sense. I will discuss internally.

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0, 2.7.0
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11753-branch-2.7.001.patch


 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-29 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato resolved HADOOP-11753.

Resolution: Invalid

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0, 2.7.0
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11753-branch-2.7.001.patch


 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-29 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386168#comment-14386168
 ] 

Takenori Sato commented on HADOOP-11753:


Thanks for the clarification. Yes, this is against Cloudian. So let me close. 
Will check AWS as well for further tests.

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.0.0, 2.7.0
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11753-branch-2.7.001.patch


 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-29 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386172#comment-14386172
 ] 

Takenori Sato commented on HADOOP-11742:


Thomas, Steve, yes, again this is against our own. I will check the difference. 
Let me close.

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-29 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11742:
---
Resolution: Fixed
  Assignee: Takenori Sato
Status: Resolved  (was: Patch Available)

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-29 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato resolved HADOOP-11742.

Resolution: Invalid

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-29 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato reopened HADOOP-11742:


Reopen to mark this as invalid.

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
 Environment: CentOS 7
Reporter: Takenori Sato
Assignee: Takenori Sato
Priority: Minor
 Attachments: HADOOP-11742-branch-2.7.001.patch, 
 HADOOP-11742-branch-2.7.002.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-26 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11753:
---
Attachment: HADOOP-11753-branch-2.7.001.patch

set Range header only when contentLength  0

{code}
---
 T E S T S
---
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 19.821 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractCreate
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.186 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractDelete
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.563 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractMkdir
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.412 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 25.687 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractRename
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.29 sec - in 
org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.943 sec - 
in org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate
Tests run: 6, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 16.791 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractCreate
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.891 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractDelete
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.791 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractMkdir
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.736 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractOpen
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 22.308 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractRename
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.716 sec - in 
org.apache.hadoop.fs.contract.s3n.TestS3NContractRootDir
Running org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.433 sec - 
in org.apache.hadoop.fs.contract.s3n.TestS3NContractSeek
Running org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract
Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.641 sec - in 
org.apache.hadoop.fs.s3.TestInMemoryS3FileSystemContract
Running org.apache.hadoop.fs.s3.TestINode
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.127 sec - in 
org.apache.hadoop.fs.s3.TestINode
Running org.apache.hadoop.fs.s3.TestS3Credentials
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.36 sec - in 
org.apache.hadoop.fs.s3.TestS3Credentials
Running org.apache.hadoop.fs.s3.TestS3FileSystem
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.362 sec - in 
org.apache.hadoop.fs.s3.TestS3FileSystem
Running org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.754 sec - in 
org.apache.hadoop.fs.s3.TestS3InMemoryFileSystem
Running org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 500.943 sec - 
in org.apache.hadoop.fs.s3a.scale.TestS3ADeleteManyFiles
Running org.apache.hadoop.fs.s3a.TestS3ABlocksize
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.25 sec - in 
org.apache.hadoop.fs.s3a.TestS3ABlocksize
Running org.apache.hadoop.fs.s3a.TestS3AConfiguration
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.334 sec - in 
org.apache.hadoop.fs.s3a.TestS3AConfiguration
Running org.apache.hadoop.fs.s3a.TestS3AFastOutputStream
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.867 sec - in 
org.apache.hadoop.fs.s3a.TestS3AFastOutputStream
Running org.apache.hadoop.fs.s3a.TestS3AFileSystemContract
Tests run: 31, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 79.965 sec - 
in 

[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-26 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11742:
---
Attachment: HADOOP-11742-branch-2.7.002.patch

I found _AbstractFSContractTestBase#setup_ always creates a test directory, 
which is removed at _teardown_. Thus, an empty directory was not tested in 
concrete test cases.

The problem here is not calling mkdir on an empty bucket. But when you call 
_S3AFileSystem#getFileStatus(/)_ on an empty bucket, it throws an exception.

To setup such a condition, I rather chose to remove the test directory at 
setup, then no-op at teardown.

Then, without this fix, TestS3AContractRootDir failed as follows.

{code}
---
 T E S T S
---
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
Tests run: 5, Failures: 0, Errors: 4, Skipped: 0, Time elapsed: 8.027 sec  
FAILURE! - in org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir
testRmEmptyRootDirNonRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)
  Time elapsed: 2.82 sec   ERROR!
java.io.FileNotFoundException: No such file or directory: /
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464)
at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmEmptyRootDirNonRecursive(AbstractContractRootDirectoryTest.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

testRmRootRecursive(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)  
Time elapsed: 0.475 sec   ERROR!
java.io.FileNotFoundException: No such file or directory: /
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:995)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at 
org.apache.hadoop.fs.contract.ContractTestUtils.assertIsDirectory(ContractTestUtils.java:464)
at 
org.apache.hadoop.fs.contract.AbstractContractRootDirectoryTest.testRmRootRecursive(AbstractContractRootDirectoryTest.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

testCreateFileOverRoot(org.apache.hadoop.fs.contract.s3a.TestS3AContractRootDir)
  Time elapsed: 2.922 sec   ERROR!
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS 
Service: Amazon S3, AWS Request ID: 368CF290D38711E4, AWS Error Code: 
MalformedXML, AWS Error Message: The XML you provided was not well-formed or 
did not validate against our published schema.
at 
com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at 

[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-25 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379380#comment-14379380
 ] 

Takenori Sato commented on HADOOP-11742:


I checked how this is covered in test cases. 
_NativeS3FileSystemContractBaseTest#testListStatusForRoot_ looks like a 
relevant test for s3n. But something similar is not found for s3a.

But _TestS3AContractRootDir_ is supposed to test this scenario, correct?

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
 Environment: CentOS 7
Reporter: Takenori Sato
 Attachments: HADOOP-11742-branch-2.7.001.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-25 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11742:
---
Attachment: HADOOP-11742-branch-2.7.001.patch

An empty key means a root directory instead of Not Found. This is the same 
behavior as _NativeS3FileSystem#getFileStatus_.

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/
15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/25 06:28:05 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for 
directory 
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
15/03/25 06:28:22 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3a/foo (foo)
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Making directory: s3a://s3a/foo
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3a/foo (foo)
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
15/03/25 06:28:23 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3a/foo (foo)
15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
15/03/25 06:28:24 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: List status for path: s3a://s3a/
15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: listStatus: doing listObjects for 
directory 
15/03/25 06:28:31 DEBUG s3a.S3AFileSystem: Adding: rd: s3a://s3a/foo
Found 1 items
drwxrwxrwx   -  0 1970-01-01 00:00 s3a://s3a/foo
{code}


 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
 Environment: CentOS 7
Reporter: Takenori Sato
 Attachments: HADOOP-11742-branch-2.7.001.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-25 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381384#comment-14381384
 ] 

Takenori Sato commented on HADOOP-11753:


Hi, OK, thanks. I was about to start. So I leave this to you.

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Takenori Sato
Assignee: J.Andreina

 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fals due to negative range header

2015-03-25 Thread Takenori Sato (JIRA)
Takenori Sato created HADOOP-11753:
--

 Summary: TestS3AContractOpen#testOpenReadZeroByteFile fals due to 
negative range header
 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Takenori Sato


_TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.

{code}
testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen) 
 Time elapsed: 3.312 sec   ERROR!
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
InvalidRange, AWS Error Message: The requested range cannot be satisfied.
at 
com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at 
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

This is because the header is wrong when calling _S3AInputStream#read_ after 
_S3AInputStream#open_.

{code}
Range: bytes=0--1
* from 0 to -1
{code}

Tested on the latest branch-2.7.

{quote}
$ git log
commit d286673c602524af08935ea132c8afd181b6e2e4
Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
Date:   Tue Mar 24 16:17:06 2015 -0700
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-25 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11753:
---
Summary: TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative 
range header  (was: TestS3AContractOpen#testOpenReadZeroByteFile fals due to 
negative range header)

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Takenori Sato

 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-25 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381351#comment-14381351
 ] 

Takenori Sato commented on HADOOP-11742:


OK, will do. But found the current s3a related unit tests won't finish 
successfully. Filed as HADOOP-11753. 

 mkdir by file system shell fails on an empty bucket
 ---

 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
 Environment: CentOS 7
Reporter: Takenori Sato
 Attachments: HADOOP-11742-branch-2.7.001.patch


 I have built the latest 2.7, and tried S3AFileSystem.
 Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as 
 follows:
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
 15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
 s3a://s3a/foo (foo)
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 mkdir: `s3a://s3a/foo': No such file or directory
 {code}
 So does _ls_.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ 
 ()
 15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
 ls: `s3a://s3a/': No such file or directory
 {code}
 This is how it works via s3n.
 {code}
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
 # hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
 Found 1 items
 drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
 {code}
 The snapshot is the following:
 {quote}
 \# git branch
 \* branch-2.7
   trunk
 \# git log
 commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
 Author: Harsh J ha...@cloudera.com
 Date:   Sun Mar 22 10:18:32 2015 +0530
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-25 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381338#comment-14381338
 ] 

Takenori Sato commented on HADOOP-11753:


_TestS3AContractSeek#testBlockReadZeroByteFile_ fails by the same reason, too.

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Takenori Sato

 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at 
 org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
 {code}
 This is because the header is wrong when calling _S3AInputStream#read_ after 
 _S3AInputStream#open_.
 {code}
 Range: bytes=0--1
 * from 0 to -1
 {code}
 Tested on the latest branch-2.7.
 {quote}
 $ git log
 commit d286673c602524af08935ea132c8afd181b6e2e4
 Author: Jitendra Pandey Jitendra@Jitendra-Pandeys-MacBook-Pro-4.local
 Date:   Tue Mar 24 16:17:06 2015 -0700
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11753) TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range header

2015-03-25 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381343#comment-14381343
 ] 

Takenori Sato commented on HADOOP-11753:


Another one.

{code}
testSeekZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractSeek)  
Time elapsed: 9.478 sec   ERROR!
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
Service: Amazon S3, AWS Request ID: 29E6B1A0D37011E4, AWS Error Code: 
InvalidRange, AWS Error Message: The requested range cannot be satisfied.
at 
com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
at 
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at 
com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
at 
org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at 
org.apache.hadoop.fs.contract.AbstractContractSeekTest.testSeekZeroByteFile(AbstractContractSeekTest.java:88)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{code}

 TestS3AContractOpen#testOpenReadZeroByteFile fails due to negative range 
 header
 ---

 Key: HADOOP-11753
 URL: https://issues.apache.org/jira/browse/HADOOP-11753
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Takenori Sato

 _TestS3AContractOpen#testOpenReadZeroByteFile_ fails as follows.
 {code}
 testOpenReadZeroByteFile(org.apache.hadoop.fs.contract.s3a.TestS3AContractOpen)
   Time elapsed: 3.312 sec   ERROR!
 com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
 Service: Amazon S3, AWS Request ID: A58A95E0D36811E4, AWS Error Code: 
 InvalidRange, AWS Error Message: The requested range cannot be satisfied.
   at 
 com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:798)
   at 
 com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:421)
   at 
 com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
   at 
 com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
   at 
 com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
   at 
 org.apache.hadoop.fs.s3a.S3AInputStream.openIfNeeded(S3AInputStream.java:62)
   at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:127)
   at java.io.FilterInputStream.read(FilterInputStream.java:83)
   at 
 org.apache.hadoop.fs.contract.AbstractContractOpenTest.testOpenReadZeroByteFile(AbstractContractOpenTest.java:66)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 

[jira] [Created] (HADOOP-11742) mkdir by file system shell fails on an empty bucket

2015-03-23 Thread Takenori Sato (JIRA)
Takenori Sato created HADOOP-11742:
--

 Summary: mkdir by file system shell fails on an empty bucket
 Key: HADOOP-11742
 URL: https://issues.apache.org/jira/browse/HADOOP-11742
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
 Environment: CentOS 7
Reporter: Takenori Sato


I have built the latest 2.7, and tried S3AFileSystem.

Then found that _mkdir_ fails on an empty bucket, named *s3a* here, as follows:

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3a://s3a/foo
15/03/24 03:49:35 DEBUG s3a.S3AFileSystem: Getting path status for 
s3a://s3a/foo (foo)
15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/foo
15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/24 03:49:36 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
mkdir: `s3a://s3a/foo': No such file or directory
{code}

So does _ls_.

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3a://s3a/
15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Getting path status for s3a://s3a/ ()
15/03/24 03:47:48 DEBUG s3a.S3AFileSystem: Not Found: s3a://s3a/
ls: `s3a://s3a/': No such file or directory
{code}

This is how it works via s3n.

{code}
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -mkdir s3n://s3n/foo
# hadoop-2.7.0-SNAPSHOT/bin/hdfs dfs -ls s3n://s3n/
Found 1 items
drwxrwxrwx   -  0 1970-01-01 00:00 s3n://s3n/foo
{code}

The snapshot is the following:

{quote}
\# git branch
\* branch-2.7
  trunk
\# git log
commit 929b04ce3a4fe419dece49ed68d4f6228be214c1
Author: Harsh J ha...@cloudera.com
Date:   Sun Mar 22 10:18:32 2015 +0530
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed

2015-03-20 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11730:
---
Attachment: (was: HADOOP-11730-branch-2.6.0.001.patch)

 The broken s3n read retry logic causes a wrong output being committed
 -

 Key: HADOOP-11730
 URL: https://issues.apache.org/jira/browse/HADOOP-11730
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11730-branch-2.6.0.001.patch


 s3n attempts to read again when it encounters IOException during read. But 
 the current logic does not reopen the connection, thus, it ends up with 
 no-op, and committing the wrong(truncated) output.
 Here's a stack trace as an example.
 {quote}
 2015-03-13 20:17:24,835 [TezChild] INFO  
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - 
 Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex 
 scope-12
 2015-03-13 20:17:24,866 [TezChild] DEBUG 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - 
 Released HttpMethod as its response data stream threw an exception
 org.apache.http.ConnectionClosedException: Premature end of Content-Length 
 delimited message body (expected: 296587138; received: 155648
   at 
 org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
   at 
 org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
   at 
 org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
   at 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
   at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at java.io.DataInputStream.read(DataInputStream.java:100)
   at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
   at 
 org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
   at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
   at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
   at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   

[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed

2015-03-20 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11730:
---
Attachment: HADOOP-11730-branch-2.6.0.001.patch

The first patch with the updated test case.

 The broken s3n read retry logic causes a wrong output being committed
 -

 Key: HADOOP-11730
 URL: https://issues.apache.org/jira/browse/HADOOP-11730
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11730-branch-2.6.0.001.patch


 s3n attempts to read again when it encounters IOException during read. But 
 the current logic does not reopen the connection, thus, it ends up with 
 no-op, and committing the wrong(truncated) output.
 Here's a stack trace as an example.
 {quote}
 2015-03-13 20:17:24,835 [TezChild] INFO  
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - 
 Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex 
 scope-12
 2015-03-13 20:17:24,866 [TezChild] DEBUG 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - 
 Released HttpMethod as its response data stream threw an exception
 org.apache.http.ConnectionClosedException: Premature end of Content-Length 
 delimited message body (expected: 296587138; received: 155648
   at 
 org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
   at 
 org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
   at 
 org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
   at 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
   at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at java.io.DataInputStream.read(DataInputStream.java:100)
   at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
   at 
 org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
   at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
   at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
   at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 

[jira] [Resolved] (HADOOP-10037) s3n read truncated, but doesn't throw exception

2015-03-19 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato resolved HADOOP-10037.

Resolution: Fixed

The issue that had reopened this turned out being a separate issue.

 s3n read truncated, but doesn't throw exception 
 

 Key: HADOOP-10037
 URL: https://issues.apache.org/jira/browse/HADOOP-10037
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.0.0-alpha
 Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge)
Reporter: David Rosenstrauch
 Fix For: 2.6.0

 Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html


 For months now we've been finding that we've been experiencing frequent data 
 truncation issues when reading from S3 using the s3n:// protocol.  I finally 
 was able to gather some debugging output on the issue in a job I ran last 
 night, and so can finally file a bug report.
 The job I ran last night was on a 16-node cluster (all of them AWS EC2 
 cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0).  The job 
 was a Hadoop streaming job, which reads through a large number (i.e., 
 ~55,000) of files on S3, each of them approximately 300K bytes in size.
 All of the files contain 46 columns of data in each record.  But I added in 
 an extra check in my mapper code to count and verify the number of columns in 
 every record - throwing an error and crashing the map task if the column 
 count is wrong.
 If you look in the attached task logs, you'll see 2 attempts on the same 
 task.  The first one fails due to data truncated (i.e., my job intentionally 
 fails the map task due to the current record failing the column count check). 
  The task then gets retried on a different machine and runs to a succesful 
 completion.
 You can see further evidence of the truncation further down in the task logs, 
 where it displays the count of the records read:  the failed task says 32953 
 records read, while the successful task says 63133.
 Any idea what the problem might be here and/or how to work around it?  This 
 issue is a very common occurrence on our clusters.  E.g., in the job I ran 
 last night before I had gone to bed I had already encountered 8 such 
 failuers, and the job was only 10% complete.  (~25,000 out of ~250,000 tasks.)
 I realize that it's common for I/O errors to occur - possibly even frequently 
 - in a large Hadoop job.  But I would think that if an I/O failure (like a 
 truncated read) did occur, that something in the underlying infrastructure 
 code (i.e., either in NativeS3FileSystem or in jets3t) should detect the 
 error and throw an IOException accordingly.  It shouldn't be up to the 
 calling code to detect such failures, IMO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10037) s3n read truncated, but doesn't throw exception

2015-03-19 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370292#comment-14370292
 ] 

Takenori Sato commented on HADOOP-10037:


David, thanks for your clarification.

I heard from Steve that my issue was introduced by some optimizations done for 
2.4.

So let me close this as FIXED. I will create a new issue for mine.

 s3n read truncated, but doesn't throw exception 
 

 Key: HADOOP-10037
 URL: https://issues.apache.org/jira/browse/HADOOP-10037
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.0.0-alpha
 Environment: Ubuntu Linux 13.04 running on Amazon EC2 (cc2.8xlarge)
Reporter: David Rosenstrauch
 Fix For: 2.6.0

 Attachments: S3ReadFailedOnTruncation.html, S3ReadSucceeded.html


 For months now we've been finding that we've been experiencing frequent data 
 truncation issues when reading from S3 using the s3n:// protocol.  I finally 
 was able to gather some debugging output on the issue in a job I ran last 
 night, and so can finally file a bug report.
 The job I ran last night was on a 16-node cluster (all of them AWS EC2 
 cc2.8xlarge machines, running Ubuntu 13.04 and Cloudera CDH4.3.0).  The job 
 was a Hadoop streaming job, which reads through a large number (i.e., 
 ~55,000) of files on S3, each of them approximately 300K bytes in size.
 All of the files contain 46 columns of data in each record.  But I added in 
 an extra check in my mapper code to count and verify the number of columns in 
 every record - throwing an error and crashing the map task if the column 
 count is wrong.
 If you look in the attached task logs, you'll see 2 attempts on the same 
 task.  The first one fails due to data truncated (i.e., my job intentionally 
 fails the map task due to the current record failing the column count check). 
  The task then gets retried on a different machine and runs to a succesful 
 completion.
 You can see further evidence of the truncation further down in the task logs, 
 where it displays the count of the records read:  the failed task says 32953 
 records read, while the successful task says 63133.
 Any idea what the problem might be here and/or how to work around it?  This 
 issue is a very common occurrence on our clusters.  E.g., in the job I ran 
 last night before I had gone to bed I had already encountered 8 such 
 failuers, and the job was only 10% complete.  (~25,000 out of ~250,000 tasks.)
 I realize that it's common for I/O errors to occur - possibly even frequently 
 - in a large Hadoop job.  But I would think that if an I/O failure (like a 
 truncated read) did occur, that something in the underlying infrastructure 
 code (i.e., either in NativeS3FileSystem or in jets3t) should detect the 
 error and throw an IOException accordingly.  It shouldn't be up to the 
 calling code to detect such failures, IMO.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed

2015-03-19 Thread Takenori Sato (JIRA)
Takenori Sato created HADOOP-11730:
--

 Summary: The broken s3n read retry logic causes a wrong output 
being committed
 Key: HADOOP-11730
 URL: https://issues.apache.org/jira/browse/HADOOP-11730
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Takenori Sato
Assignee: Takenori Sato


s3n attempts to read again when it encounters IOException during read. But the 
current logic does not reopen the connection, thus, it ends up with no-op, and 
committing the wrong(truncated) output.

Here's a stack trace as an example.

{quote}
2015-03-13 20:17:24,835 [TezChild] INFO  
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - 
Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex 
scope-12
2015-03-13 20:17:24,866 [TezChild] DEBUG 
org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released 
HttpMethod as its response data stream threw an exception
org.apache.http.ConnectionClosedException: Premature end of Content-Length 
delimited message body (expected: 296587138; received: 155648
at 
org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
at 
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at 
org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
at 
org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at 
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-03-13 20:17:24,867 [TezChild] INFO  
org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while 
reading 

[jira] [Updated] (HADOOP-11730) The broken s3n read retry logic causes a wrong output being committed

2015-03-19 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato updated HADOOP-11730:
---
Attachment: HADOOP-11730-branch-2.6.0.001.patch

The first proposal without the test case.

2015-03-20 12:05:08,473 [TezChild] INFO  
org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while 
reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', 
attempting to reopen.
2015-03-20 12:05:08,473 [TezChild] DEBUG 
org.jets3t.service.impl.rest.httpclient.RestStorageService - Retrieving All 
information for bucket shared and object 
user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119

Verified manually that it reopens a new connection after IOException.



 The broken s3n read retry logic causes a wrong output being committed
 -

 Key: HADOOP-11730
 URL: https://issues.apache.org/jira/browse/HADOOP-11730
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Takenori Sato
Assignee: Takenori Sato
 Attachments: HADOOP-11730-branch-2.6.0.001.patch


 s3n attempts to read again when it encounters IOException during read. But 
 the current logic does not reopen the connection, thus, it ends up with 
 no-op, and committing the wrong(truncated) output.
 Here's a stack trace as an example.
 {quote}
 2015-03-13 20:17:24,835 [TezChild] INFO  
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor - 
 Starting output org.apache.tez.mapreduce.output.MROutput@52008dbd to vertex 
 scope-12
 2015-03-13 20:17:24,866 [TezChild] DEBUG 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - 
 Released HttpMethod as its response data stream threw an exception
 org.apache.http.ConnectionClosedException: Premature end of Content-Length 
 delimited message body (expected: 296587138; received: 155648
   at 
 org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
   at 
 org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
   at 
 org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
   at 
 org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
   at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145)
   at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
   at java.io.DataInputStream.read(DataInputStream.java:100)
   at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
   at 
 org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
   at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
   at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
   at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
   at 
 org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91)
   at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313)
   at 
 org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
   at 
 org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
   at 

[jira] [Reopened] (HADOOP-10037) s3n read truncated, but doesn't throw exception

2015-03-18 Thread Takenori Sato (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takenori Sato reopened HADOOP-10037:


I confirmed this happens on Hadoop 2.6.0, and found the reason.

Here's the stacktrace.

{quote}

2015-03-13 20:17:24,866 [TezChild] DEBUG 
org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released 
HttpMethod as its response data stream threw an exception
org.apache.http.ConnectionClosedException: Premature end of Content-Length 
delimited message body (expected: 296587138; received: 155648
at 
org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:184)
at 
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
at 
org.jets3t.service.io.InterruptableInputStream.read(InterruptableInputStream.java:78)
at 
org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream.read(HttpMethodReleaseInputStream.java:146)
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.read(NativeS3FileSystem.java:145)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.util.LineReader.fillBuffer(LineReader.java:180)
at 
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:216)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:185)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:259)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at 
org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:106)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNextTuple(POFilter.java:91)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:117)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:313)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:192)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-03-13 20:17:24,867 [TezChild] INFO  
org.apache.hadoop.fs.s3native.NativeS3FileSystem - Received IOException while 
reading 'user/hadoop/tsato/readlarge/input/cloudian-s3.log.20141119', 
attempting to reopen.
2015-03-13 20:17:24,867 [TezChild] DEBUG 
org.jets3t.service.impl.rest.httpclient.HttpMethodReleaseInputStream - Released 
HttpMethod as its response data stream is fully consumed
2015-03-13 20:17:24,868 [TezChild] INFO  
org.apache.tez.dag.app.TaskAttemptListenerImpTezDag - Commit go/no-go request 
from attempt_1426245338920_0001_1_00_04_0
2015-03-13 20:17:24,868 [TezChild] INFO  
org.apache.tez.dag.app.dag.impl.TaskImpl - 
attempt_1426245338920_0001_1_00_04_0 given a go for committing the task 
output.

{quote}

The problem is that a job successfully finishes after the exception. 

[jira] [Commented] (HADOOP-10400) Incorporate new S3A FileSystem implementation

2014-08-20 Thread Takenori Sato (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104967#comment-14104967
 ] 

Takenori Sato commented on HADOOP-10400:


Hi Jordan Mendelson,

I came from HADOOP-10643, where you suggested that a new improvement over 
NativeS3FileSystem should be done here.

So I've made 2 pull requests for your upstream repository.

1. make endpoint configurable
https://github.com/Aloisius/hadoop-s3a/pull/8

jets3t allows a user to configure an endpoint(protocol, host, and port) through 
jets3t.properties. But a user can't configure without calling a particular 
method with AmazonSDK. This fix is to simply allow it.

2. subclass of AbstractFileSystem
https://github.com/Aloisius/hadoop-s3a/pull/9

This contains a fix for a similar problem as HADOOP-10643. The difference is 
that this fix is simpler, and now modification to AbstractFileSystem.
Also, when using this subclass, HADOOP-8984 becomes obvious, so whose fix is 
included as well.


Btw, on my test with Pig, I needed to apply the following fix to make this work.
Ensure the file is open before trying to seek
https://github.com/Aloisius/hadoop-s3a/pull/6

 Incorporate new S3A FileSystem implementation
 -

 Key: HADOOP-10400
 URL: https://issues.apache.org/jira/browse/HADOOP-10400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs, fs/s3
Affects Versions: 2.4.0
Reporter: Jordan Mendelson
Assignee: Jordan Mendelson
 Attachments: HADOOP-10400-1.patch, HADOOP-10400-2.patch, 
 HADOOP-10400-3.patch, HADOOP-10400-4.patch, HADOOP-10400-5.patch, 
 HADOOP-10400-6.patch


 The s3native filesystem has a number of limitations (some of which were 
 recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses 
 the aws-sdk instead of the jets3t library. There are a number of improvements 
 over s3native including:
 - Parallel copy (rename) support (dramatically speeds up commits on large 
 files)
 - AWS S3 explorer compatible empty directories files xyz/ instead of 
 xyz_$folder$ (reduces littering)
 - Ignores s3native created _$folder$ files created by s3native and other S3 
 browsing utilities
 - Supports multiple output buffer dirs to even out IO when uploading files
 - Supports IAM role-based authentication
 - Allows setting a default canned ACL for uploads (public, private, etc.)
 - Better error recovery handling
 - Should handle input seeks without having to download the whole file (used 
 for splits a lot)
 This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to 
 various pom files to get it to build against trunk. I've been using 0.0.1 in 
 production with CDH 4 for several months and CDH 5 for a few days. The 
 version here is 0.0.2 which changes around some keys to hopefully bring the 
 key name style more inline with the rest of hadoop 2.x.
 *Tunable parameters:*
 fs.s3a.access.key - Your AWS access key ID (omit for role authentication)
 fs.s3a.secret.key - Your AWS secret key (omit for role authentication)
 fs.s3a.connection.maximum - Controls how many parallel connections 
 HttpClient spawns (default: 15)
 fs.s3a.connection.ssl.enabled - Enables or disables SSL connections to S3 
 (default: true)
 fs.s3a.attempts.maximum - How many times we should retry commands on 
 transient errors (default: 10)
 fs.s3a.connection.timeout - Socket connect timeout (default: 5000)
 fs.s3a.paging.maximum - How many keys to request from S3 when doing 
 directory listings at a time (default: 5000)
 fs.s3a.multipart.size - How big (in bytes) to split a upload or copy 
 operation up into (default: 104857600)
 fs.s3a.multipart.threshold - Until a file is this large (in bytes), use 
 non-parallel upload (default: 2147483647)
 fs.s3a.acl.default - Set a canned ACL on newly created/copied objects 
 (private | public-read | public-read-write | authenticated-read | 
 log-delivery-write | bucket-owner-read | bucket-owner-full-control)
 fs.s3a.multipart.purge - True if you want to purge existing multipart 
 uploads that may not have been completed/aborted correctly (default: false)
 fs.s3a.multipart.purge.age - Minimum age in seconds of multipart uploads 
 to purge (default: 86400)
 fs.s3a.buffer.dir - Comma separated list of directories that will be used 
 to buffer file writes out of (default: uses ${hadoop.tmp.dir}/s3a )
 *Caveats*:
 Hadoop uses a standard output committer which uploads files as 
 filename.COPYING before renaming them. This can cause unnecessary performance 
 issues with S3 because it does not have a rename operation and S3 already 
 verifies uploads against an md5 that the driver sets on the upload request. 
 While this FileSystem should be significantly faster than the built-in 
 s3native driver