[jira] [Commented] (HADOOP-11335) KMS ACL in meta data or database
[ https://issues.apache.org/jira/browse/HADOOP-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281666#comment-14281666 ] Dian Fu commented on HADOOP-11335: -- The test failure is unrelated to this patch. I have run the failed test case locally and it passed. > KMS ACL in meta data or database > > > Key: HADOOP-11335 > URL: https://issues.apache.org/jira/browse/HADOOP-11335 > Project: Hadoop Common > Issue Type: Improvement > Components: kms >Affects Versions: 2.6.0 >Reporter: Jerry Chen >Assignee: Dian Fu > Labels: Security > Attachments: HADOOP-11335.001.patch, HADOOP-11335.002.patch, > HADOOP-11335.003.patch, KMS ACL in metadata or database.pdf > > Original Estimate: 504h > Remaining Estimate: 504h > > Currently Hadoop KMS has implemented ACL for keys and the per key ACL are > stored in the configuration file kms-acls.xml. > The management of ACL in configuration file would not be easy in enterprise > usage and it is put difficulties for backup and recovery. > It is ideal to store the ACL for keys in the key meta data similar to what > file system ACL does. In this way, the backup and recovery that works on > keys should work for ACL for keys too. > On the other hand, with the ACL in meta data, the ACL of each key can be > easily manipulate with API or command line tool and take effect instantly. > This is very important for enterprise level access control management. This > feature can be addressed by separate JIRA. While with the configuration file, > these would be hard to provide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11463) Replace method-local TransferManager object with S3AFileSystem#transfers
[ https://issues.apache.org/jira/browse/HADOOP-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281506#comment-14281506 ] Hadoop QA commented on HADOOP-11463: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692927/hadoop-11463-003.patch against trunk revision 2908fe4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5423//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5423//console This message is automatically generated. > Replace method-local TransferManager object with S3AFileSystem#transfers > > > Key: HADOOP-11463 > URL: https://issues.apache.org/jira/browse/HADOOP-11463 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: hadoop-11463-001.patch, hadoop-11463-002.patch, > hadoop-11463-003.patch > > > This is continuation of HADOOP-11446. > The following changes are made according to Thomas Demoor's comments: > 1. Replace method-local TransferManager object with S3AFileSystem#transfers > 2. Do not shutdown TransferManager after purging existing multipart file - > otherwise the current transfer is unable to proceed > 3. Shutdown TransferManager instance in the close method of S3AFileSystem -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11463) Replace method-local TransferManager object with S3AFileSystem#transfers
[ https://issues.apache.org/jira/browse/HADOOP-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HADOOP-11463: Attachment: hadoop-11463-003.patch > Replace method-local TransferManager object with S3AFileSystem#transfers > > > Key: HADOOP-11463 > URL: https://issues.apache.org/jira/browse/HADOOP-11463 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: hadoop-11463-001.patch, hadoop-11463-002.patch, > hadoop-11463-003.patch > > > This is continuation of HADOOP-11446. > The following changes are made according to Thomas Demoor's comments: > 1. Replace method-local TransferManager object with S3AFileSystem#transfers > 2. Do not shutdown TransferManager after purging existing multipart file - > otherwise the current transfer is unable to proceed > 3. Shutdown TransferManager instance in the close method of S3AFileSystem -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11171) Enable using a proxy server to connect to S3a.
[ https://issues.apache.org/jira/browse/HADOOP-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281481#comment-14281481 ] Hudson commented on HADOOP-11171: - FAILURE: Integrated in Hadoop-trunk-Commit #6883 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6883/]) HADOOP-11171 Enable using a proxy server to connect to S3a. (Thomas Demoor via stevel) (stevel: rev 2908fe4ec52f78d74e4207274a34d88d54cd468f) * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java * hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AConfiguration.java * hadoop-common-project/hadoop-common/CHANGES.txt > Enable using a proxy server to connect to S3a. > -- > > Key: HADOOP-11171 > URL: https://issues.apache.org/jira/browse/HADOOP-11171 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 2.4.0 >Reporter: Thomas Demoor >Assignee: Thomas Demoor > Labels: amazon, s3 > Fix For: 2.7.0 > > Attachments: HADOOP-11171-10.patch, HADOOP-11171-2.patch, > HADOOP-11171-3.patch, HADOOP-11171-4.patch, HADOOP-11171-5.patch, > HADOOP-11171-6.patch, HADOOP-11171-7.patch, HADOOP-11171-8.patch, > HADOOP-11171-9.patch, HADOOP-11171.patch > > > This exposes the AWS SDK config for a proxy (host and port) to s3a through > config settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11463) Replace method-local TransferManager object with S3AFileSystem#transfers
[ https://issues.apache.org/jira/browse/HADOOP-11463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281479#comment-14281479 ] Steve Loughran commented on HADOOP-11463: - I had a look at the other {{FileSystem}} classes -afraid you must also call {{super.close()}} for its cleanup work > Replace method-local TransferManager object with S3AFileSystem#transfers > > > Key: HADOOP-11463 > URL: https://issues.apache.org/jira/browse/HADOOP-11463 > Project: Hadoop Common > Issue Type: Task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: hadoop-11463-001.patch, hadoop-11463-002.patch > > > This is continuation of HADOOP-11446. > The following changes are made according to Thomas Demoor's comments: > 1. Replace method-local TransferManager object with S3AFileSystem#transfers > 2. Do not shutdown TransferManager after purging existing multipart file - > otherwise the current transfer is unable to proceed > 3. Shutdown TransferManager instance in the close method of S3AFileSystem -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11171) Enable using a proxy server to connect to S3a.
[ https://issues.apache.org/jira/browse/HADOOP-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-11171. - Resolution: Fixed Fix Version/s: 2.7.0 committed. You are building up more things to document through > Enable using a proxy server to connect to S3a. > -- > > Key: HADOOP-11171 > URL: https://issues.apache.org/jira/browse/HADOOP-11171 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 2.4.0 >Reporter: Thomas Demoor >Assignee: Thomas Demoor > Labels: amazon, s3 > Fix For: 2.7.0 > > Attachments: HADOOP-11171-10.patch, HADOOP-11171-2.patch, > HADOOP-11171-3.patch, HADOOP-11171-4.patch, HADOOP-11171-5.patch, > HADOOP-11171-6.patch, HADOOP-11171-7.patch, HADOOP-11171-8.patch, > HADOOP-11171-9.patch, HADOOP-11171.patch > > > This exposes the AWS SDK config for a proxy (host and port) to s3a through > config settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11171) Enable using a proxy server to connect to S3a.
[ https://issues.apache.org/jira/browse/HADOOP-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281477#comment-14281477 ] Steve Loughran commented on HADOOP-11171: - now takes ~3s for me, dropping to ~ 2.7s on Java 8. +1, committing. > Enable using a proxy server to connect to S3a. > -- > > Key: HADOOP-11171 > URL: https://issues.apache.org/jira/browse/HADOOP-11171 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 2.4.0 >Reporter: Thomas Demoor >Assignee: Thomas Demoor > Labels: amazon, s3 > Attachments: HADOOP-11171-10.patch, HADOOP-11171-2.patch, > HADOOP-11171-3.patch, HADOOP-11171-4.patch, HADOOP-11171-5.patch, > HADOOP-11171-6.patch, HADOOP-11171-7.patch, HADOOP-11171-8.patch, > HADOOP-11171-9.patch, HADOOP-11171.patch > > > This exposes the AWS SDK config for a proxy (host and port) to s3a through > config settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281476#comment-14281476 ] Hudson commented on HADOOP-10542: - FAILURE: Integrated in Hadoop-trunk-Commit #6882 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6882/]) HADOOP-10542 Potential null pointer dereference in Jets3tFileSystemStore retrieveBlock(). (Ted Yu via stevel) (stevel: rev c6c0f4eb25e511944915bc869e741197f7a277e0) * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3/Jets3tFileSystemStore.java * hadoop-common-project/hadoop-common/CHANGES.txt > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: hadoop-10542-001.patch > > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-10542: Resolution: Fixed Status: Resolved (was: Patch Available) +1, committing > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: hadoop-10542-001.patch > > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11487) NativeS3FileSystem.getStatus must retry on FileNotFoundException
[ https://issues.apache.org/jira/browse/HADOOP-11487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281462#comment-14281462 ] Steve Loughran commented on HADOOP-11487: - # Which version of hadoop? # Which S3 zone? Only US-east lacks create consistency Blobstores are the bane of our lives. They aren't real filesystems...really code around it needs to recognise this and act on it, though as they all have standard expectations of files and their metadata, that's not easy It's not enough to retry on FS status as there are other inconsistencies: directory renames and deletes, blob updates, etc. There's a new FS client, s3a, in hadoop 2.6 which is where all future fs/s3 work is going on. Try it to see if it is any better, though I doubt it. If we were to fix it, the route would be to go with something derived off NetFlix S3mper. Retrying on a 404 is not sufficient. > NativeS3FileSystem.getStatus must retry on FileNotFoundException > > > Key: HADOOP-11487 > URL: https://issues.apache.org/jira/browse/HADOOP-11487 > Project: Hadoop Common > Issue Type: Bug > Components: fs, fs/s3 >Reporter: Paulo Motta > > I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm > getting the following exception: > {code:java} > 2015-01-16 20:53:18,187 ERROR [main] > org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying > hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz > java.io.FileNotFoundException: No such file or directory > 's3n://s3-bucket/file.gz' > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) > at > org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: > Exception running child : java.io.FileNotFoundException: No such file or > directory 's3n://s3-bucket/file.gz' > at > org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) > at > org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > {code} > However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. > So probably due to Amazon's S3 eventual consistency the job failure. > In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus > must use fs.s3.maxRetries property in order to avoid failures like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11335) KMS ACL in meta data or database
[ https://issues.apache.org/jira/browse/HADOOP-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281452#comment-14281452 ] Hadoop QA commented on HADOOP-11335: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692914/HADOOP-11335.003.patch against trunk revision 43302f6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1205 javac compiler warnings (more than the trunk's current 1204 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-common-project/hadoop-kms hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5420//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/5420//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5420//console This message is automatically generated. > KMS ACL in meta data or database > > > Key: HADOOP-11335 > URL: https://issues.apache.org/jira/browse/HADOOP-11335 > Project: Hadoop Common > Issue Type: Improvement > Components: kms >Affects Versions: 2.6.0 >Reporter: Jerry Chen >Assignee: Dian Fu > Labels: Security > Attachments: HADOOP-11335.001.patch, HADOOP-11335.002.patch, > HADOOP-11335.003.patch, KMS ACL in metadata or database.pdf > > Original Estimate: 504h > Remaining Estimate: 504h > > Currently Hadoop KMS has implemented ACL for keys and the per key ACL are > stored in the configuration file kms-acls.xml. > The management of ACL in configuration file would not be easy in enterprise > usage and it is put difficulties for backup and recovery. > It is ideal to store the ACL for keys in the key meta data similar to what > file system ACL does. In this way, the backup and recovery that works on > keys should work for ACL for keys too. > On the other hand, with the ACL in meta data, the ACL of each key can be > easily manipulate with API or command line tool and take effect instantly. > This is very important for enterprise level access control management. This > feature can be addressed by separate JIRA. While with the configuration file, > these would be hard to provide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11209) Configuration is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281400#comment-14281400 ] Hadoop QA commented on HADOOP-11209: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692920/HADOOP-11209.003.patch against trunk revision 43302f6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common: org.apache.hadoop.ha.TestZKFailoverControllerStress Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5421//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5421//console This message is automatically generated. > Configuration is not thread-safe > > > Key: HADOOP-11209 > URL: https://issues.apache.org/jira/browse/HADOOP-11209 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Josh Rosen >Assignee: Varun Saxena > Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, > HADOOP-11209.003.patch > > > {{Configuration}} objects are not fully thread-safe, which causes problems in > multi-threaded frameworks like Spark that use these configurations to > interact with existing Hadoop APIs (such as InputFormats). > SPARK-2546 is an example of a problem caused by this lack of thread-safety. > In that bug, multiple concurrent modifications of the same Configuration (in > third-party code) caused an infinite loop because Configuration's internal > {{java.util.HashMap}} is not thread-safe. > One workaround is for our code to clone Configuration objects; unfortunately, > this also suffers from thread-safety issues on older Hadoop versions because > Configuration's constructor wasn't thread-safe (HADOOP-10456). > [Looking at a recent version of > Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666], > it seems that the private {{updatingResource}} HashMap and > {{finalParameters}} HashSet fields the only non-thread-safe collections in > Configuration (Java's {{Properties}} class is thread-safe), so I don't think > that it would be hard to make Configuration fully thread-safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281392#comment-14281392 ] Hadoop QA commented on HADOOP-10542: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692922/hadoop-10542-001.patch against trunk revision 43302f6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-tools/hadoop-aws. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5422//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5422//console This message is automatically generated. > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: hadoop-10542-001.patch > > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HADOOP-10542: Attachment: hadoop-10542-001.patch > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: hadoop-10542-001.patch > > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HADOOP-10542: Status: Patch Available (was: Open) > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > Attachments: hadoop-10542-001.patch > > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HADOOP-10542: --- Assignee: Ted Yu > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Assignee: Ted Yu >Priority: Minor > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11209) Configuration is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated HADOOP-11209: -- Status: Patch Available (was: Open) > Configuration is not thread-safe > > > Key: HADOOP-11209 > URL: https://issues.apache.org/jira/browse/HADOOP-11209 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Josh Rosen >Assignee: Varun Saxena > Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, > HADOOP-11209.003.patch > > > {{Configuration}} objects are not fully thread-safe, which causes problems in > multi-threaded frameworks like Spark that use these configurations to > interact with existing Hadoop APIs (such as InputFormats). > SPARK-2546 is an example of a problem caused by this lack of thread-safety. > In that bug, multiple concurrent modifications of the same Configuration (in > third-party code) caused an infinite loop because Configuration's internal > {{java.util.HashMap}} is not thread-safe. > One workaround is for our code to clone Configuration objects; unfortunately, > this also suffers from thread-safety issues on older Hadoop versions because > Configuration's constructor wasn't thread-safe (HADOOP-10456). > [Looking at a recent version of > Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666], > it seems that the private {{updatingResource}} HashMap and > {{finalParameters}} HashSet fields the only non-thread-safe collections in > Configuration (Java's {{Properties}} class is thread-safe), so I don't think > that it would be hard to make Configuration fully thread-safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11209) Configuration is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated HADOOP-11209: -- Status: Open (was: Patch Available) > Configuration is not thread-safe > > > Key: HADOOP-11209 > URL: https://issues.apache.org/jira/browse/HADOOP-11209 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Josh Rosen >Assignee: Varun Saxena > Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, > HADOOP-11209.003.patch > > > {{Configuration}} objects are not fully thread-safe, which causes problems in > multi-threaded frameworks like Spark that use these configurations to > interact with existing Hadoop APIs (such as InputFormats). > SPARK-2546 is an example of a problem caused by this lack of thread-safety. > In that bug, multiple concurrent modifications of the same Configuration (in > third-party code) caused an infinite loop because Configuration's internal > {{java.util.HashMap}} is not thread-safe. > One workaround is for our code to clone Configuration objects; unfortunately, > this also suffers from thread-safety issues on older Hadoop versions because > Configuration's constructor wasn't thread-safe (HADOOP-10456). > [Looking at a recent version of > Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666], > it seems that the private {{updatingResource}} HashMap and > {{finalParameters}} HashSet fields the only non-thread-safe collections in > Configuration (Java's {{Properties}} class is thread-safe), so I don't think > that it would be hard to make Configuration fully thread-safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11209) Configuration is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated HADOOP-11209: -- Attachment: HADOOP-11209.003.patch > Configuration is not thread-safe > > > Key: HADOOP-11209 > URL: https://issues.apache.org/jira/browse/HADOOP-11209 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Josh Rosen >Assignee: Varun Saxena > Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, > HADOOP-11209.003.patch > > > {{Configuration}} objects are not fully thread-safe, which causes problems in > multi-threaded frameworks like Spark that use these configurations to > interact with existing Hadoop APIs (such as InputFormats). > SPARK-2546 is an example of a problem caused by this lack of thread-safety. > In that bug, multiple concurrent modifications of the same Configuration (in > third-party code) caused an infinite loop because Configuration's internal > {{java.util.HashMap}} is not thread-safe. > One workaround is for our code to clone Configuration objects; unfortunately, > this also suffers from thread-safety issues on older Hadoop versions because > Configuration's constructor wasn't thread-safe (HADOOP-10456). > [Looking at a recent version of > Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666], > it seems that the private {{updatingResource}} HashMap and > {{finalParameters}} HashSet fields the only non-thread-safe collections in > Configuration (Java's {{Properties}} class is thread-safe), so I don't think > that it would be hard to make Configuration fully thread-safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11209) Configuration is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281355#comment-14281355 ] Varun Saxena commented on HADOOP-11209: --- Thanks [~ozawa] for the review. Its difficult to simulate thread safety issues so will just update a test case which access/modifies the {{Configuration}} object from multiple threads. > Configuration is not thread-safe > > > Key: HADOOP-11209 > URL: https://issues.apache.org/jira/browse/HADOOP-11209 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Josh Rosen >Assignee: Varun Saxena > Attachments: HADOOP-11209.001.patch, HADOOP-11209.002.patch, > HADOOP-11209.003.patch > > > {{Configuration}} objects are not fully thread-safe, which causes problems in > multi-threaded frameworks like Spark that use these configurations to > interact with existing Hadoop APIs (such as InputFormats). > SPARK-2546 is an example of a problem caused by this lack of thread-safety. > In that bug, multiple concurrent modifications of the same Configuration (in > third-party code) caused an infinite loop because Configuration's internal > {{java.util.HashMap}} is not thread-safe. > One workaround is for our code to clone Configuration objects; unfortunately, > this also suffers from thread-safety issues on older Hadoop versions because > Configuration's constructor wasn't thread-safe (HADOOP-10456). > [Looking at a recent version of > Configuration.java|https://github.com/apache/hadoop/blob/d989ac04449dc33da5e2c32a7f24d59cc92de536/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java#L666], > it seems that the private {{updatingResource}} HashMap and > {{finalParameters}} HashSet fields the only non-thread-safe collections in > Configuration (Java's {{Properties}} class is thread-safe), so I don't think > that it would be hard to make Configuration fully thread-safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11335) KMS ACL in meta data or database
[ https://issues.apache.org/jira/browse/HADOOP-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281352#comment-14281352 ] Dian Fu commented on HADOOP-11335: -- Hi [~asuresh], thanks a lot for your review and comments. {quote} 1) JavaKeyStoreProvider: * createKey() : I think it might be a bit odd that we have metadata with version == 0 {quote} OK, have updated the patch and version will no longer be equal to 0. {quote} * deleteKey() : Am sorry, I might be missing something but, I did not quite understand why we can't just delete the metadata from the cache when the key is deleted {quote} As we may store ACL in metadata and ACL shouldn't be deleted when the key is deleted. {quote} 2) KeyProvider: * The Metadata Class now has a dependency on {{KeyOpType}} which is defined in {{KeyProviderAuthorizationExtension}}. The Extension classes were meant to add functionality to a KeyProvider. It seems a bit weird that the KeyProvider class should have a dependency on an Extension. {quote} Agree. Have moved {{KeyOpType}} to {{KeyProvider}}. {quote} * Not very confortable adding setters to Metadata. I Guess the original implementaion conciously made a choice to not allow modification of metadata once it is created (except for the version) {quote} Agree. Have removed the setters of Metadata. {quote} 3) KeyProviderExtension: * Do we really need the read and write locks ? The undelying KeyProvider should take care of the synchronization. (for eg. {{JavaKeyStoreProvider}}) does infact use write and read locks for createKey etc.. this would probably lead to unnecessary double locking. 4) KeyProviderAuthorizationExtension: * Same as above.. do we really need the read and write locks ? I feel the Extension class should handle its own concurrency semantics {quote} The lock in {{JavaKeyStoreProvider}} is only used for key related operations, such as {{createKey}}, {{deleteKey}}, etc. As the new added methods such as {{createKeyAcl}}, {{deleteKeyAcl}} also need to read/write keystore and these methods only exist in {{KeyProviderAuthorizationExtension}}, not in {{KeyProvider}}. So we add lock in {{KeyProviderExtension}} and the lock in {{KeyProviderAuthorizationExtension}} is inherited from {{KeyProviderExtension}}. {quote} 5) MetadataKeyAuthorizer * Remove commented code {quote} Removed the commented code in latest patch. {quote} Looking at the commented code in {{MetadataKeyAuthorizer}}, I see that you had initially toyed with having an extended {{MetadataWithACL}} class. Any reason why you did not pursue that design ? It seems to me like that could have been a way to probably avoid having to modify the {{JavaKeyStoreProvider}} and {{KeyProvider}}. One suggestion would have been to templatized {{KeyProvider}} like so : {noformat} public class KeyProvider ... {noformat} and have different implemenetation of a {{KeyProvider}} like : {noformat} public classs KeyProviderWithACls extends KeyProvider ... {noformat} {quote} The method you suggested is a good method. I have tried to use this method but I found there are some problems. For example, currently we create {{KeyProviderAuthorizationExtension}} by wrapping a {{KeyProvider}} in it and this {{KeyProvider}} should be created first. Then when we create {{KeyProvider}}, we should know the type for {{Metadata}}, for example whether it's {{Metadata}} or {{MetadataWithAcl}}. This is a little weird as whether ACL is stored in Metadata should be controlled in {{KeyProviderAuthorizationExtension}}. I choose to solve this issue by modifying {{Metadata}} and it currently has two elements {{MetadataForKey}} and {{MetadataForAcl}}. You can refer to the latest patch (revision 003) for detailed implementation. > KMS ACL in meta data or database > > > Key: HADOOP-11335 > URL: https://issues.apache.org/jira/browse/HADOOP-11335 > Project: Hadoop Common > Issue Type: Improvement > Components: kms >Affects Versions: 2.6.0 >Reporter: Jerry Chen >Assignee: Dian Fu > Labels: Security > Attachments: HADOOP-11335.001.patch, HADOOP-11335.002.patch, > HADOOP-11335.003.patch, KMS ACL in metadata or database.pdf > > Original Estimate: 504h > Remaining Estimate: 504h > > Currently Hadoop KMS has implemented ACL for keys and the per key ACL are > stored in the configuration file kms-acls.xml. > The management of ACL in configuration file would not be easy in enterprise > usage and it is put difficulties for backup and recovery. > It is ideal to store the ACL for keys in the key meta data similar to what > file system ACL does. In this way, the backup and recovery that works on > keys should work for ACL for keys too. > On the other hand, with the ACL in meta data, the ACL of each key can be > easily manipulate
[jira] [Created] (HADOOP-11487) NativeS3FileSystem.getStatus must retry on FileNotFoundException
Paulo Motta created HADOOP-11487: Summary: NativeS3FileSystem.getStatus must retry on FileNotFoundException Key: HADOOP-11487 URL: https://issues.apache.org/jira/browse/HADOOP-11487 Project: Hadoop Common Issue Type: Bug Components: fs, fs/s3 Reporter: Paulo Motta I'm trying to copy a large amount of files from HDFS to S3 via distcp and I'm getting the following exception: {code:java} 2015-01-16 20:53:18,187 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying hdfs://10.165.35.216/hdfsFolder/file.gz to s3n://s3-bucket/file.gz java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz' at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) 2015-01-16 20:53:18,276 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.FileNotFoundException: No such file or directory 's3n://s3-bucket/file.gz' at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:445) at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:187) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:233) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) {code} However, when I try hadoop fs -ls s3n://s3-bucket/file.gz the file is there. So probably due to Amazon's S3 eventual consistency the job failure. In my opinion, in order to fix this problem NativeS3FileSystem.getFileStatus must use fs.s3.maxRetries property in order to avoid failures like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11335) KMS ACL in meta data or database
[ https://issues.apache.org/jira/browse/HADOOP-11335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu updated HADOOP-11335: - Attachment: HADOOP-11335.003.patch > KMS ACL in meta data or database > > > Key: HADOOP-11335 > URL: https://issues.apache.org/jira/browse/HADOOP-11335 > Project: Hadoop Common > Issue Type: Improvement > Components: kms >Affects Versions: 2.6.0 >Reporter: Jerry Chen >Assignee: Dian Fu > Labels: Security > Attachments: HADOOP-11335.001.patch, HADOOP-11335.002.patch, > HADOOP-11335.003.patch, KMS ACL in metadata or database.pdf > > Original Estimate: 504h > Remaining Estimate: 504h > > Currently Hadoop KMS has implemented ACL for keys and the per key ACL are > stored in the configuration file kms-acls.xml. > The management of ACL in configuration file would not be easy in enterprise > usage and it is put difficulties for backup and recovery. > It is ideal to store the ACL for keys in the key meta data similar to what > file system ACL does. In this way, the backup and recovery that works on > keys should work for ACL for keys too. > On the other hand, with the ACL in meta data, the ACL of each key can be > easily manipulate with API or command line tool and take effect instantly. > This is very important for enterprise level access control management. This > feature can be addressed by separate JIRA. While with the configuration file, > these would be hard to provide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10542) Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock()
[ https://issues.apache.org/jira/browse/HADOOP-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14281318#comment-14281318 ] Steve Loughran commented on HADOOP-10542: - IOE: it's the only thing that ensures callers won't themselves NPE > Potential null pointer dereference in Jets3tFileSystemStore#retrieveBlock() > --- > > Key: HADOOP-10542 > URL: https://issues.apache.org/jira/browse/HADOOP-10542 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Ted Yu >Priority: Minor > > {code} > in = get(blockToKey(block), byteRangeStart); > out = new BufferedOutputStream(new FileOutputStream(fileBlock)); > byte[] buf = new byte[bufferSize]; > int numRead; > while ((numRead = in.read(buf)) >= 0) { > {code} > get() may return null. > The while loop dereferences in without null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-9248) Allow configuration of Amazon S3 Endpoint
[ https://issues.apache.org/jira/browse/HADOOP-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-9248. Resolution: Won't Fix Fix Version/s: 2.7.0 OK, closing as a wontfix then. Timur, when the next beta release of Hadoop comes out, (or even better, grab the branch-2 branch and build it), please test the s3a support and make sure it works for you > Allow configuration of Amazon S3 Endpoint > - > > Key: HADOOP-9248 > URL: https://issues.apache.org/jira/browse/HADOOP-9248 > Project: Hadoop Common > Issue Type: New Feature > Components: fs/s3 > Environment: All environments connecting to S3 >Reporter: Timur Perelmutov > Fix For: 2.7.0 > > > http://wiki.apache.org/hadoop/AmazonS3 page describes configuration of Hadoop > with S3 as storage. Other systems like EMC Atmos now implement S3 Interface, > but in order to be able to connect to them, the endpoint needs to be > configurable. Please add a configuration parameter that would be propagated > to underlying jets3t library as s3service.s3-endpoint param. -- This message was sent by Atlassian JIRA (v6.3.4#6332)