[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations
[ https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235221#comment-16235221 ] Todd Lipcon commented on HDFS-12737: In the data transfer protocol we just pass tokens with each operation. Could the relevant RPCs just be modified to take tokens as parameters rather than using them as part of the connection context? > Thousands of sockets lingering in TIME_WAIT state due to frequent file open > operations > -- > > Key: HDFS-12737 > URL: https://issues.apache.org/jira/browse/HDFS-12737 > Project: Hadoop HDFS > Issue Type: Bug > Components: ipc > Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > > On a HBase cluster we found HBase RegionServers have thousands of sockets in > TIME_WAIT state. It depleted system resources and caused other services to > fail. > After months of troubleshooting, we found the issue is the cluster has > hundreds of replication peers, and has multi-WAL = 2. That creates hundreds > of replication threads in HBase RS, and each thread opens WAL file *every > second*. > We found that the IPC client closes socket right away, and does not reuse > socket connection. Since each closed socket stays in TIME_WAIT state for 60 > seconds in Linux by default, that generates thousands of TIME_WAIT sockets. > {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy} > // Since we're creating a new UserGroupInformation here, we know that no > // future RPC proxies will be able to re-use the same connection. And > // usages of this proxy tend to be one-off calls. > // > // This is a temporary fix: callers should really achieve this by using > // RPC.stopProxy() on the resulting object, but this is currently not > // working in trunk. See the discussion on HDFS-1965. > Configuration confWithNoIpcIdle = new Configuration(conf); > confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic > .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0); > {code} > This piece of code is used in DistributedFileSystem#open() > {noformat} > 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection > Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from > blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001 > java.lang.Throwable: For logging stack trace, not a real exception > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198) > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271) > at > org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:747) > at >
[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm
[ https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235211#comment-16235211 ] Weiwei Yang commented on HDFS-12443: All sound good to me, [~linyiqun]. Please go ahead. Thanks a lot. > Ozone: Improve SCM block deletion throttling algorithm > --- > > Key: HDFS-12443 > URL: https://issues.apache.org/jira/browse/HDFS-12443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Yiqun Lin >Priority: Major > Labels: OzonePostMerge > Attachments: HDFS-12443-HDFS-7240.001.patch, > HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, > HDFS-12443-SCM-blockdeletion-throttle.pdf > > > Currently SCM scans delLog to send deletion transactions to datanode > periodically, the throttling algorithm is simple, it scans at most > {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is > non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN > will only get 1 TX to proceed in an interval, this will make the deletion > slow. An improvement to this is to make this throttling by datanode, e.g 50 > TXs per datanode per interval. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations
[ https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235182#comment-16235182 ] Jitendra Nath Pandey commented on HDFS-12737: - [~yzhangal], The block token is also being used to authorize the access to a block. Therefore, a connection context must be established using that particular block token. In method {{DataNode#checkReadAccess}}. The block-id from the token-identifier in the UGI is used to authorize the access. Therefore, sharing of connections for different block tokens will likely expose a security risk. > Thousands of sockets lingering in TIME_WAIT state due to frequent file open > operations > -- > > Key: HDFS-12737 > URL: https://issues.apache.org/jira/browse/HDFS-12737 > Project: Hadoop HDFS > Issue Type: Bug > Components: ipc > Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > > On a HBase cluster we found HBase RegionServers have thousands of sockets in > TIME_WAIT state. It depleted system resources and caused other services to > fail. > After months of troubleshooting, we found the issue is the cluster has > hundreds of replication peers, and has multi-WAL = 2. That creates hundreds > of replication threads in HBase RS, and each thread opens WAL file *every > second*. > We found that the IPC client closes socket right away, and does not reuse > socket connection. Since each closed socket stays in TIME_WAIT state for 60 > seconds in Linux by default, that generates thousands of TIME_WAIT sockets. > {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy} > // Since we're creating a new UserGroupInformation here, we know that no > // future RPC proxies will be able to re-use the same connection. And > // usages of this proxy tend to be one-off calls. > // > // This is a temporary fix: callers should really achieve this by using > // RPC.stopProxy() on the resulting object, but this is currently not > // working in trunk. See the discussion on HDFS-1965. > Configuration confWithNoIpcIdle = new Configuration(conf); > confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic > .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0); > {code} > This piece of code is used in DistributedFileSystem#open() > {noformat} > 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection > Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from > blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001 > java.lang.Throwable: For logging stack trace, not a real exception > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198) > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271) > at > org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70) > at >
[jira] [Commented] (HDFS-12719) Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch
[ https://issues.apache.org/jira/browse/HDFS-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235176#comment-16235176 ] Hadoop QA commented on HDFS-12719: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-12719 does not apply to HDFS-7240. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12719 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12894723/HDFS-12719-HDFS-7240.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21924/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch > --- > > Key: HDFS-12719 > URL: https://issues.apache.org/jira/browse/HDFS-12719 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12719-HDFS-7240.001.patch > > > There are outstanding whitespace/javac/checkstyle issues on the HDFS-7240 > branch. These were observed by uploading the branch diff to the trunk via > parent jira HDFS-7240. This jira will fix all the valid outstanding issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12719) Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch
[ https://issues.apache.org/jira/browse/HDFS-12719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDFS-12719: - Status: Patch Available (was: Open) > Ozone: Fix checkstyle, javac, whitespace issues in HDFS-7240 branch > --- > > Key: HDFS-12719 > URL: https://issues.apache.org/jira/browse/HDFS-12719 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12719-HDFS-7240.001.patch > > > There are outstanding whitespace/javac/checkstyle issues on the HDFS-7240 > branch. These were observed by uploading the branch diff to the trunk via > parent jira HDFS-7240. This jira will fix all the valid outstanding issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12390) Support to refresh DNS to switch mapping
[ https://issues.apache.org/jira/browse/HDFS-12390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235134#comment-16235134 ] Hadoop QA commented on HDFS-12390: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 25s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 46s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:red}-1{color} | {color:red} cc {color} | {color:red} 0m 46s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 46s{color} | {color:red} hadoop-hdfs-project in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 46s{color} | {color:orange} hadoop-hdfs-project: The patch generated 9 new + 611 unchanged - 0 fixed = 620 total (was 611) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 3m 12s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 13s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 28s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 47m 18s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12390 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12885292/HDFS-12390.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 00cbfc182bf5 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool
[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm
[ https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235120#comment-16235120 ] Yiqun Lin commented on HDFS-12443: -- Thanks comments, [~cheersyang]. I think we are in the same direction now. Some details I'd like to confirm with you. bq. How you plan to define the max number of containers for each node? I'd like to calculated this based on container, block size that was configured. The Calculation way I had mentioned in above comment. Please have a look. bq. I think we need a in-memory data structure to handle this... For this new data structure, I'd like to make a change based on current class {{DatanodeBlockDeletionTransactions}} and to make this being a independent class. That will be convenient for us to test. Please see if it looks good to you or any suggestion. Then I will start work on this. Thank you. > Ozone: Improve SCM block deletion throttling algorithm > --- > > Key: HDFS-12443 > URL: https://issues.apache.org/jira/browse/HDFS-12443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Yiqun Lin >Priority: Major > Labels: OzonePostMerge > Attachments: HDFS-12443-HDFS-7240.001.patch, > HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, > HDFS-12443-SCM-blockdeletion-throttle.pdf > > > Currently SCM scans delLog to send deletion transactions to datanode > periodically, the throttling algorithm is simple, it scans at most > {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is > non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN > will only get 1 TX to proceed in an interval, this will make the deletion > slow. An improvement to this is to make this throttling by datanode, e.g 50 > TXs per datanode per interval. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12739) Add Support for SCM --init command
[ https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235095#comment-16235095 ] Yiqun Lin commented on HDFS-12739: -- Thanks [~shashikant] for updating patch. I'm +1 for the change. Please wait [~nandakumar131]'s review comments on the latest patch. We may attach the same patch to re-trigger Jenkins. Thanks. > Add Support for SCM --init command > -- > > Key: HDFS-12739 > URL: https://issues.apache.org/jira/browse/HDFS-12739 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: HDFS-7240 >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-12739-HDFS-7240.001.patch, > HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch, > HDFS-12739-HDFS-7240.004.patch > > > SCM --init command will generate cluster ID and persist it locally. The same > cluster Id will be shared with KSM and the datanodes. IF the cluster Id is > already available in the locally available version file, it will just read > the cluster Id . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235080#comment-16235080 ] Konstantin Shvachko commented on HDFS-7240: --- ??I hope this addresses your concerns.?? I don't think _that_ addressed any of my concerns. * Ozone by itself does not solve any of HDFS problems. It uses HDFS-agnostic S3-like API, and I cannot use it on my clusters. Unless I can convince thousands of my users to rewrite their thousands of applications, along with the existing computational frameworks: YARN, Hive, Pig, Spark, .. created over the past 10 years. * I was talking about futuristic architecture, when you start using Ozone for block management, and rewrite NameNode to store its namespace in LevelDB. If this is still your plan. I agree this architecture solves the objects-count problem. But it does not solve the problem of scaling RPC requests, which is more important to me than the # of objects, since you still cannot grow the cluster beyond the single-NameNode's-RPC-processing capability. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, > HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled
[ https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235061#comment-16235061 ] Weiwei Yang commented on HDFS-12744: Thanks [~subru], that's nice. > More logs when short-circuit read is failed and disabled > > > Key: HDFS-12744 > URL: https://issues.apache.org/jira/browse/HDFS-12744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: supportability > Fix For: 2.9.0, 3.0.0 > > Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch > > > Short-circuit read (SCR) failed with following error > {noformat} > 2017-10-21 16:42:28,024 WARN > [B.defaultRpcServer.handler=7,queue=7,port=16020] > impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR > while attempting to set up short-circuit access. Block xxx is not valid > {noformat} > then short-circuit read is disabled for *10 minutes* without any warning > message given in the log. This causes us spent some more time to figure out > why we had a long time window that SCR was not working. Propose to add a > warning log (other places already did) to indicate SCR is disabled and some > more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11096) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-11096: --- Target Version/s: 3.0.1 (was: 3.0.0) Thanks Sean, I'm going to bump this to 3.0.1 then. > Support rolling upgrade between 2.x and 3.x > --- > > Key: HDFS-11096 > URL: https://issues.apache.org/jira/browse/HDFS-11096 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rolling upgrades >Affects Versions: 3.0.0-alpha1 >Reporter: Andrew Wang >Assignee: Sean Mackrory >Priority: Blocker > Attachments: HDFS-11096.001.patch, HDFS-11096.002.patch, > HDFS-11096.003.patch, HDFS-11096.004.patch, HDFS-11096.005.patch, > HDFS-11096.006.patch, HDFS-11096.007.patch > > > trunk has a minimum software version of 3.0.0-alpha1. This means we can't > rolling upgrade between branch-2 and trunk. > This is a showstopper for large deployments. Unless there are very compelling > reasons to break compatibility, let's restore the ability to rolling upgrade > to 3.x releases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12744) More logs when short-circuit read is failed and disabled
[ https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234697#comment-16234697 ] Subru Krishnan edited comment on HDFS-12744 at 11/2/17 1:16 AM: [~cheersyang]/[~jzhuge], I cherry-picked to branch-2.9 since you want to include this in 2.9.0 release. was (Author: subru): [~cheersyang]/[~jzhuge], you should cherry-pick to branch-2.9 if you want to include in 2.9.0 release. Thanks. > More logs when short-circuit read is failed and disabled > > > Key: HDFS-12744 > URL: https://issues.apache.org/jira/browse/HDFS-12744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: supportability > Fix For: 2.9.0, 3.0.0 > > Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch > > > Short-circuit read (SCR) failed with following error > {noformat} > 2017-10-21 16:42:28,024 WARN > [B.defaultRpcServer.handler=7,queue=7,port=16020] > impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR > while attempting to set up short-circuit access. Block xxx is not valid > {noformat} > then short-circuit read is disabled for *10 minutes* without any warning > message given in the log. This causes us spent some more time to figure out > why we had a long time window that SCR was not working. Propose to add a > warning log (other places already did) to indicate SCR is disabled and some > more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12618) fsck -includeSnapshots reports wrong amount of total blocks
[ https://issues.apache.org/jira/browse/HDFS-12618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235007#comment-16235007 ] Xiao Chen commented on HDFS-12618: -- Thanks for the new patch Wellington. >From a quick look this seems to work, nice job. I'd like to see: - more thorough unit tests covering the description scenario (2 snapshots referring to a deleted file) - tests covering some combinations of create / delete snapshot, and verify the number is correct - not an expert on lambda expert, but it seems {{DirTypeCheck}} could be private. - In general we'd need to acquire the FSDirectory lock as well as the FSNamesystem lock. So need dir.readLock() after the name system readlock, and dir.readUnlock before the fsn unlock. - Looks like you have applied a formatter to the entire NamenodeFsck.java (instead of just the changed code), which resulted in some unnecessary changes. Let's not make those changes. Will provide a more complete review later this week. > fsck -includeSnapshots reports wrong amount of total blocks > --- > > Key: HDFS-12618 > URL: https://issues.apache.org/jira/browse/HDFS-12618 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 3.0.0-alpha3 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: HDFS-121618.initial, HDFS-12618.001.patch, > HDFS-12618.002.patch, HDFS-12618.003.patch > > > When snapshot is enabled, if a file is deleted but is contained by a > snapshot, *fsck* will not reported blocks for such file, showing different > number of *total blocks* than what is exposed in the Web UI. > This should be fine, as *fsck* provides *-includeSnapshots* option. The > problem is that *-includeSnapshots* option causes *fsck* to count blocks for > every occurrence of a file on snapshots, which is wrong because these blocks > should be counted only once (for instance, if a 100MB file is present on 3 > snapshots, it would still map to one block only in hdfs). This causes fsck to > report much more blocks than what actually exist in hdfs and is reported in > the Web UI. > Here's an example: > 1) HDFS has two files of 2 blocks each: > {noformat} > $ hdfs dfs -ls -R / > drwxr-xr-x - root supergroup 0 2017-10-07 21:21 /snap-test > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:16 /snap-test/file1 > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:17 /snap-test/file2 > drwxr-xr-x - root supergroup 0 2017-05-13 13:03 /test > {noformat} > 2) There are two snapshots, with the two files present on each of the > snapshots: > {noformat} > $ hdfs dfs -ls -R /snap-test/.snapshot > drwxr-xr-x - root supergroup 0 2017-10-07 21:21 > /snap-test/.snapshot/snap1 > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:16 > /snap-test/.snapshot/snap1/file1 > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:17 > /snap-test/.snapshot/snap1/file2 > drwxr-xr-x - root supergroup 0 2017-10-07 21:21 > /snap-test/.snapshot/snap2 > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:16 > /snap-test/.snapshot/snap2/file1 > -rw-r--r-- 1 root supergroup 209715200 2017-10-07 20:17 > /snap-test/.snapshot/snap2/file2 > {noformat} > 3) *fsck -includeSnapshots* reports 12 blocks in total (4 blocks for the > normal file path, plus 4 blocks for each snapshot path): > {noformat} > $ hdfs fsck / -includeSnapshots > FSCK started by root (auth:SIMPLE) from /127.0.0.1 for path / at Mon Oct 09 > 15:15:36 BST 2017 > Status: HEALTHY > Number of data-nodes:1 > Number of racks: 1 > Total dirs: 6 > Total symlinks: 0 > Replicated Blocks: > Total size: 1258291200 B > Total files: 6 > Total blocks (validated):12 (avg. block size 104857600 B) > Minimally replicated blocks: 12 (100.0 %) > Over-replicated blocks: 0 (0.0 %) > Under-replicated blocks: 0 (0.0 %) > Mis-replicated blocks: 0 (0.0 %) > Default replication factor: 1 > Average block replication: 1.0 > Missing blocks: 0 > Corrupt blocks: 0 > Missing replicas:0 (0.0 %) > {noformat} > 4) Web UI shows the correct number (4 blocks only): > {noformat} > Security is off. > Safemode is off. > 5 files and directories, 4 blocks = 9 total filesystem object(s). > {noformat} > I would like to work on this solution, will propose an initial solution > shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12756) Ozone: Add datanodeID to heartbeat responses and container protocol
[ https://issues.apache.org/jira/browse/HDFS-12756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-12756: Attachment: HDFS-12756-HDFS-7240.001.patch cc: [~xyao], [~nandakumar131], [~elek], [~Weiwei Yang] Please take a look when you get a chance. The first step towards having a cluster simulator as part of MiniOzoneCluster. > Ozone: Add datanodeID to heartbeat responses and container protocol > --- > > Key: HDFS-12756 > URL: https://issues.apache.org/jira/browse/HDFS-12756 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-12756-HDFS-7240.001.patch > > > if we have datanode ID in the HBs responses and commands send to datanode, we > will be able to do additional sanity checking on datanode before executing > the command. This is also very helpful in creating a MiniOzoneCluster with > 1000s of simulated nodes. This is needed for scale based unit tests of SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12756) Ozone: Add datanodeID to heartbeat responses and container protocol
Anu Engineer created HDFS-12756: --- Summary: Ozone: Add datanodeID to heartbeat responses and container protocol Key: HDFS-12756 URL: https://issues.apache.org/jira/browse/HDFS-12756 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Anu Engineer Assignee: Anu Engineer if we have datanode ID in the HBs responses and commands send to datanode, we will be able to do additional sanity checking on datanode before executing the command. This is also very helpful in creating a MiniOzoneCluster with 1000s of simulated nodes. This is needed for scale based unit tests of SCM. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234959#comment-16234959 ] Anu Engineer commented on HDFS-7240: [~ste...@apache.org] Thank you for the comments. bq. For now, biggest issue I have is that OzoneException needs to become an IOE I have filed HDFS-12755 for converting the OzoneException to an IOException. bq. What's your scale limit? I see a single PUT for the upload, GET path > tmp in open() . Is there a test for different sizes of file? We have tested with different sizes from 1 byte files to 2 GB. There is no size limit imposed by ozone architecture. However, we have always planned to follow the S3 limit of 5 GB. We can certainly add tests for different size of files -- but creating these data files during unit tests take time. We have strived to keep the unit tests of ozone under 4 mins so far. Large key sizes add prohibitive unit test times. So our approach is to use Corona, which is a load-generation tool for ozone. we run this 4 times daily with different key sizes. It is trivial to setup and run. For the comments on the OzoneFileSystem, I will let the appropriate person respond. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey >Priority: Major > Attachments: HDFS-7240.001.patch, HDFS-7240.002.patch, > HDFS-7240.003.patch, HDFS-7240.003.patch, HDFS-7240.004.patch, > Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12755) Ozone: OzoneException needs to become an IOException
Anu Engineer created HDFS-12755: --- Summary: Ozone: OzoneException needs to become an IOException Key: HDFS-12755 URL: https://issues.apache.org/jira/browse/HDFS-12755 Project: Hadoop HDFS Issue Type: Sub-task Components: ozone Affects Versions: HDFS-7240 Reporter: Anu Engineer Assignee: Anu Engineer Priority: Critical Fix For: HDFS-7240 >From Review Comments from [~ste...@apache.org]: For now, the biggest issue I have is that OzoneException needs to become an IOE, so simplifying exception handling all round, preserving information, not losing stack traces, and generally leading to happy support teams as well as developers. Changing the base class isn't itself traumatic, but it will implicate the client code as there's almost no longer any need to catch & wrap things. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234947#comment-16234947 ] Xiao Chen commented on HDFS-12682: -- Fixing checkstyle... > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, > HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12682: - Attachment: HDFS-12682.08.patch > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, > HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12682: - Attachment: (was: HDFS-12682.08.patch) > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, > HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234926#comment-16234926 ] Hadoop QA commented on HDFS-12682: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 56s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 29s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 24s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 11s{color} | {color:orange} root: The patch generated 1 new + 647 unchanged - 2 fixed = 648 total (was 649) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 28s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}117m 56s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}125m 26s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 1s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}353m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12682 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895226/HDFS-12682.08.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2c8d0b595d67 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool |
[jira] [Commented] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
[ https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234925#comment-16234925 ] Hadoop QA commented on HDFS-12720: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 33s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 56s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 39s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 38s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 6s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-hdfs-project: The patch generated 1 new + 1 unchanged - 1 fixed = 2 total (was 2) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 22s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 8s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s{color} | {color:red} The patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}165m 7s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestErasureCodingPolicies | | | hadoop.hdfs.TestHdfsAdmin | | | hadoop.hdfs.TestMaintenanceState | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithRandomECPolicy | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure180 | | | hadoop.hdfs.TestReconstructStripedFile | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure110 | | | hadoop.ozone.scm.container.TestContainerMapping | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure010 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | |
[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234905#comment-16234905 ] Kuhu Shukla commented on HDFS-12754: This deadlock was found during testing on our end about an year or so ago. The fix (attached patch) was deployed to our production clusters ever since and has had significant amount of run time. > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: HDFS-12754.001.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12474) Ozone: SCM: Handling container report with key count and container usage.
[ https://issues.apache.org/jira/browse/HDFS-12474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234876#comment-16234876 ] Xiaoyu Yao commented on HDFS-12474: --- Thanks [~nandakumar131] for the update. +1 for the v001 patch. I will commit it tomorrow if [~linyiqun] and others don't have additional comments. > Ozone: SCM: Handling container report with key count and container usage. > - > > Key: HDFS-12474 > URL: https://issues.apache.org/jira/browse/HDFS-12474 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Nanda kumar >Priority: Major > Labels: ozoneMerge > Attachments: HDFS-12474-HDFS-7240.000.patch, > HDFS-12474-HDFS-7240.001.patch > > > Currently, the container report only contains the # of reports sent to SCM. > We will need to provide the key count and the usage of each individual > containers to update the SCM container state maintained by > ContainerStateManager. This has a dependency on HDFS-12387. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234842#comment-16234842 ] Hadoop QA commented on HDFS-12754: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 58s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 15s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs-client: The patch generated 1 new + 96 unchanged - 0 fixed = 97 total (was 96) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 13s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12754 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895265/HDFS-12754.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux cb15286aecc2 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 70f1a94 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/21920/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/21920/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21920/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234770#comment-16234770 ] Steve Loughran commented on HDFS-7240: -- I'm starting with hadoop-common and hadoop-ozone; more to follow on thursday. For now, biggest issue I have is that OzoneException needs to become an IOE, so simplifying excpetion handling all round, preserving information, not losing stack traces, and generally leading to happy support teams as well as developers. Changing the base class isn't itself traumatic, but it will implicate the client code as there's almost no longer any need to catch & wrap things. Other: What's your scale limit? I see a single PUT for the upload, GET path > tmp in open() . Is there a test for different sizes of file? h2. hadoop-common h3. Config I've filed some comments on thecreated HADOOP-15007, "Stabilize and document Configuration element", to cover making sure that there are the tests & docs for this to go in. * HDFSPropertyTag: s/DEPRICATED/r/DEPRECATED/ * OzonePropertyTag: s/there/their/ * OzoneConfig Property.toString() is going to be "key valuenull" if there is no tag defined. Space? h3. FileUtils minor: imports all shuffled about compared to trunk & branch-2. revert. h3. OzoneException This is is own exception, not an IOE, and at least in OzoneFileSystem the process to build an IOE from itinvariably loses the inner stack trace and all meaningful information about the exception type. Equally, OzoneBucket catches all forms of IOException, converts to an {{OzoneRestClientException}}. We don't need to do this. it will lose stack trace data, cause confusion, is already making the client code over complex with catching IOEs, wrapping to OzoneException, catching OzoneException and converting to an IOE, at which point all core information is lost. 1. Make this subclass of IOE, consistent with the rest of our code, and then clients can throw up untouched, except in the special case that they need to perform some form of exception. 1. Except for (any?) special cases, pass up IOEs raised in the http client as is. Also. * confused by the overridding of message/getmessage. Is for serialization? * Consider adding a setMessage(String format, string...args) and calling STring.format: it would tie in with uses in the code. * override setThrowable and setMessage() called to set the nested ex (hence full stack) and handle the case where the exception returns null for getMessage(). {code} OzoneException initCause(Throwable t) { super.initCause(t) setMessage(t.getMessage() != null ? t.getMessage() : t.toString()) } {code} h2. OzoneFileSystem h3. general * various places use LOG.info("text " + something). they should all move to LOG.info("text {}", something) * Once OzoneException -> IOE, you can cut the catch and translate here. * qualify path before all uses. That's needed to stop them being relative, and to catch things like someone calling ozfs.rename("o3://bucket/src", "s3a://bucket/dest"), delete("s3a://bucket/path"), etc, as well as problems with validation happening before paths are made absolute. * {{RenameIterator.iterate()}} it's going to log @ warn whenever it can't delete a temp file because it doesn't exist, which may be a distraction in failures. Better: {{if(!tmpFile.delete() && tmpFile.exists())}}, as that will only warn if the temp file is actually there. h3. OzoneFileSystem.rename(). Rename() is the operation to fear on an object store. I haven't looked at in full detail,. * Qualify all the paths before doing directory validation. Otherwise you can defeat the "don't rename into self checks" rename("/path/src", "/path/../path/src/dest"). * Log @ debu all the paths taken before returning so you can debug if needed. * S3A rename ended up having a special RenameFailedException() which innerRename() raises, with text and return code. Outer rename logs the text and returns the return code. This means that all failing paths have an exception clearly thrown, and when we eventually make rename/3 public, it's lined up to throw exceptions back to the caller. Consider copying this code. h3. OzoneFileSystem.delete * qualify path before use * dont' log at error if you can't delete a nonexistent path, it is used everywhere for silent cleanup. Cut it h3. OzoneFileSystem.ListStatusIterator * make status field final h3. OzoneFileSystem.mkdir Liked your algorithm here; took me a moment to understand how rollback didn't need to track all created directories. nice. * do qualify path first. h3. OzoneFileSystem.getFileStatus {{getKeyInfo()}} catches all exceptions and maps to null, which is interpreted not found and eventually surfaces as FNFE. This is misleading if the failure is for any other reason. Once OzoneException -> IOException, {{getKeyInfo()}} should only catch & downgrade the explicit not found (404?)
[jira] [Commented] (HDFS-12737) Thousands of sockets lingering in TIME_WAIT state due to frequent file open operations
[ https://issues.apache.org/jira/browse/HDFS-12737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234752#comment-16234752 ] Yongjun Zhang commented on HDFS-12737: -- Many thanks [~jnp], that make sense! If we could make the BlockTokenSelector also check block id, when finding it's a block token, it would help, but it looks not an easy thing to do at all. > Thousands of sockets lingering in TIME_WAIT state due to frequent file open > operations > -- > > Key: HDFS-12737 > URL: https://issues.apache.org/jira/browse/HDFS-12737 > Project: Hadoop HDFS > Issue Type: Bug > Components: ipc > Environment: CDH5.10.2, HBase Multi-WAL=2, 250 replication peers >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > > On a HBase cluster we found HBase RegionServers have thousands of sockets in > TIME_WAIT state. It depleted system resources and caused other services to > fail. > After months of troubleshooting, we found the issue is the cluster has > hundreds of replication peers, and has multi-WAL = 2. That creates hundreds > of replication threads in HBase RS, and each thread opens WAL file *every > second*. > We found that the IPC client closes socket right away, and does not reuse > socket connection. Since each closed socket stays in TIME_WAIT state for 60 > seconds in Linux by default, that generates thousands of TIME_WAIT sockets. > {code:title=ClientDatanodeProtocolTranslatorPB:createClientDatanodeProtocolProxy} > // Since we're creating a new UserGroupInformation here, we know that no > // future RPC proxies will be able to re-use the same connection. And > // usages of this proxy tend to be one-off calls. > // > // This is a temporary fix: callers should really achieve this by using > // RPC.stopProxy() on the resulting object, but this is currently not > // working in trunk. See the discussion on HDFS-1965. > Configuration confWithNoIpcIdle = new Configuration(conf); > confWithNoIpcIdle.setInt(CommonConfigurationKeysPublic > .IPC_CLIENT_CONNECTION_MAXIDLETIME_KEY, 0); > {code} > This piece of code is used in DistributedFileSystem#open() > {noformat} > 2017-10-27 14:01:44,152 DEBUG org.apache.hadoop.ipc.Client: New connection > Thread[IPC Client (1838187805) connection to /172.131.21.48:20001 from > blk_1013754707_14032,5,main] for remoteId /172.131.21.48:20001 > java.lang.Throwable: For logging stack trace, not a real exception > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1556) > at org.apache.hadoop.ipc.Client.call(Client.java:1482) > at org.apache.hadoop.ipc.Client.call(Client.java:1443) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy28.getReplicaVisibleLength(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getReplicaVisibleLength(ClientDatanodeProtocolTranslatorPB.java:198) > at > org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:365) > at > org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:335) > at > org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:271) > at > org.apache.hadoop.hdfs.DFSInputStream.(DFSInputStream.java:263) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1585) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326) > at > org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322) > at > org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:162) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:293) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:267) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:255) > at > org.apache.hadoop.hbase.wal.WALFactory.createReader(WALFactory.java:414) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationWALReaderManager.openReader(ReplicationWALReaderManager.java:70) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource$ReplicationSourceWorkerThread.openReader(ReplicationSource.java:747) > at >
[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled
[ https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234697#comment-16234697 ] Subru Krishnan commented on HDFS-12744: --- [~cheersyang]/[~jzhuge], you should cherry-pick to branch-2.9 if you want to include in 2.9.0 release. Thanks. > More logs when short-circuit read is failed and disabled > > > Key: HDFS-12744 > URL: https://issues.apache.org/jira/browse/HDFS-12744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: supportability > Fix For: 2.9.0, 3.0.0 > > Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch > > > Short-circuit read (SCR) failed with following error > {noformat} > 2017-10-21 16:42:28,024 WARN > [B.defaultRpcServer.handler=7,queue=7,port=16020] > impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR > while attempting to set up short-circuit access. Block xxx is not valid > {noformat} > then short-circuit read is disabled for *10 minutes* without any warning > message given in the log. This causes us spent some more time to figure out > why we had a long time window that SCR was not working. Propose to add a > warning log (other places already did) to indicate SCR is disabled and some > more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234685#comment-16234685 ] Hadoop QA commented on HDFS-12725: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 17s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 35s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 53s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 3s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s{color} | {color:red} The patch generated 96 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}124m 0s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestSafeModeWithStripedFile | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 | | | hadoop.hdfs.TestFileLengthOnClusterRestart | | | hadoop.hdfs.TestWriteReadStripedFile | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 | | | hadoop.hdfs.TestReconstructStripedFile | | Timed out junit tests | org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData | | | org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingDeletedData | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12725 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895241/HDFS-12725.05.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2598baa15338 3.13.0-123-generic #172-Ubuntu SMP Mon Jun 26 18:04:35 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 56b88b0 | | maven
[jira] [Commented] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
[ https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234673#comment-16234673 ] Tsz Wo Nicholas Sze commented on HDFS-12720: +1 for the v5 patch. > Ozone: Ratis options are not passed from KSM Client protobuf helper correctly. > -- > > Key: HDFS-12720 > URL: https://issues.apache.org/jira/browse/HDFS-12720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozoneMerge > Fix For: HDFS-7240 > > Attachments: HDFS-12720-HDFS-7240.001.patch, > HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, > HDFS-12720-HDFS-7240.004.patch, HDFS-12720-HDFS-7240.005.patch > > > {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and > {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the > ratis replication factor and replication type to the KSM server. this causes > the allocations using ratis model to resort to standalone mode even when > Ratis mode is specified. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234646#comment-16234646 ] Kuhu Shukla commented on HDFS-12754: CC: [~kihwal]. > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: HDFS-12754.001.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-12754: --- Attachment: HDFS-12754.001.patch This patch calls removal only when necessary in endFileLease(). > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: HDFS-12754.001.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-12754: --- Status: Patch Available (was: Open) > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Major > Attachments: HDFS-12754.001.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12754) Lease renewal can hit a deadlock
Kuhu Shukla created HDFS-12754: -- Summary: Lease renewal can hit a deadlock Key: HDFS-12754 URL: https://issues.apache.org/jira/browse/HDFS-12754 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.1 Reporter: Kuhu Shukla Assignee: Kuhu Shukla Priority: Major The Client and the renewer can hit a deadlock during close operation since closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12753) Getting file not found exception while using distcp with s3a
[ https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234627#comment-16234627 ] Logesh Rangan commented on HDFS-12753: -- But our production environment doesn't offer a Dynamo DB instance for S3 Guard. Is there a way to tune the options for distcp to copy the huge files. I'm looking for below information, 1) How to select the number of map and it's size. I have a directory which has ~1+ files with total size of ~250 GB. When I run with below option, it is taking ~1.30 hours. hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' '-Dfs.s3a.threads.keepalivetime=600' '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy dynamic -m 200 -numListstatusThreads 30 /src/ s3a://bucket/dest 2) I'm not seeing the throughput of 3gbps even after configuring the -bandwidth as 3072. 3) How to configure the Java heap and map size for the huge file, so that distcp will give better performance. 4) WIth fast upload option, I'm writing the files to S3 using threads. Could you please help me in providing some tuning option for this. Appreciate Your Help. > Getting file not found exception while using distcp with s3a > > > Key: HDFS-12753 > URL: https://issues.apache.org/jira/browse/HDFS-12753 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Logesh Rangan > > I'm using the distcp option to copy the huge files from Hadoop to S3. > Sometimes i'm getting the below error, > *Command:* (Copying 378 GB data) > _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D > 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D > 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' > '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' > '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' > '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' > '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' > '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' > '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' > '-Dfs.s3a.threads.keepalivetime=600' > '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy > dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest > _ > 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_000165_0, Status : FAILED > Error: java.io.FileNotFoundException: No such file or directory: > s3a://bucketname/filename > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78) > at > org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_10_0, Status : FAILED > Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> > s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6 > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) > at
[jira] [Commented] (HDFS-12753) Getting file not found exception while using distcp with s3a
[ https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234533#comment-16234533 ] Wei-Chiu Chuang commented on HDFS-12753: Looks like you are hit by S3's eventual consistency. Check out S3Guard which should help with your problem: https://blog.cloudera.com/blog/2017/08/introducing-s3guard-s3-consistency-for-apache-hadoop/ https://hortonworks.com/blog/s3guard-amazon-s3-consistency/ > Getting file not found exception while using distcp with s3a > > > Key: HDFS-12753 > URL: https://issues.apache.org/jira/browse/HDFS-12753 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Logesh Rangan > > I'm using the distcp option to copy the huge files from Hadoop to S3. > Sometimes i'm getting the below error, > *Command:* (Copying 378 GB data) > _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D > 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D > 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' > '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' > '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' > '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' > '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' > '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' > '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' > '-Dfs.s3a.threads.keepalivetime=600' > '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy > dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest > _ > 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_000165_0, Status : FAILED > Error: java.io.FileNotFoundException: No such file or directory: > s3a://bucketname/filename > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78) > at > org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_10_0, Status : FAILED > Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> > s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6 > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.io.IOException: Couldn't run retriable-command: Copying > hdfs://nameservice1/filename to s3a://bucketname/filename > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) > ... 10 more > Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse > XML document with handler class > com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > at >
[jira] [Updated] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
[ https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDFS-12720: - Attachment: HDFS-12720-HDFS-7240.005.patch Patch v5 fixes the unit test failures and checkstyle issues. > Ozone: Ratis options are not passed from KSM Client protobuf helper correctly. > -- > > Key: HDFS-12720 > URL: https://issues.apache.org/jira/browse/HDFS-12720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozoneMerge > Fix For: HDFS-7240 > > Attachments: HDFS-12720-HDFS-7240.001.patch, > HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, > HDFS-12720-HDFS-7240.004.patch, HDFS-12720-HDFS-7240.005.patch > > > {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and > {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the > ratis replication factor and replication type to the KSM server. this causes > the allocations using ratis model to resort to standalone mode even when > Ratis mode is specified. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12720) Ozone: Ratis options are not passed from KSM Client protobuf helper correctly.
[ https://issues.apache.org/jira/browse/HDFS-12720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-12720: --- Hadoop Flags: Reviewed +1 the new patch looks good. Thanks > Ozone: Ratis options are not passed from KSM Client protobuf helper correctly. > -- > > Key: HDFS-12720 > URL: https://issues.apache.org/jira/browse/HDFS-12720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh >Priority: Major > Labels: ozoneMerge > Fix For: HDFS-7240 > > Attachments: HDFS-12720-HDFS-7240.001.patch, > HDFS-12720-HDFS-7240.002.patch, HDFS-12720-HDFS-7240.003.patch, > HDFS-12720-HDFS-7240.004.patch > > > {{KeySpaceManagerProtocolClientSideTranslatorPB#allocateBlock}} and > {{KeySpaceManagerProtocolClientSideTranslatorPB#openKey}} do not pass the > ratis replication factor and replication type to the KSM server. this causes > the allocations using ratis model to resort to standalone mode even when > Ratis mode is specified. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12725: - Attachment: HDFS-12725.05.patch I was thinking about this patch and IMO we should still WARN in NN logs even if it's placed, so the situation doesn't go unnoticed. Will now emit an message like: {noformat} 2017-11-01 10:49:27,081 [IPC Server handler 8 on 55407] WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyRackFaultTolerant.java:chooseTargetInOrder(142)) - Only able to place 7 of total expected 9 (maxNodesPerRack=2, numOfReplicas=4) nodes evenly across racks, falling back to uneven placement. {noformat} > BlockPlacementPolicyRackFaultTolerant still fails with racks with very few > nodes > > > Key: HDFS-12725 > URL: https://issues.apache.org/jira/browse/HDFS-12725 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Major > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, > HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch > > > HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in > extremely rack-imbalanced cluster. > The added fall-back step of the fix could be improved to do a best-effort > placement. This is more likely to happen in testing than in real clusters. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12564) Add the documents of swebhdfs configurations on the client side
[ https://issues.apache.org/jira/browse/HDFS-12564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234398#comment-16234398 ] Xiaoyu Yao commented on HDFS-12564: --- Thanks [~tasanuma0829]. The patch looks good to me overall, here are a few comments: Distcp.md.vm Line 423: suggest adding a separate section and put the content(links) under it. "#H3 Secure Copy over the wire with distcp" ServerSetup.md.vm This page is for HTTPFS. To avoid confusion, I would suggest we add a detailed ssl-client.xml example instead of linking it to Swebhdfs document. Webhdfs.md Line 161: /etc/hadoop/hdfs-site.xml has a configuration key to enable secure http, i.e., dfs.http.policy=HTTPS_ONLY Also note that dfs.http.policy is not for swebhdfs only. This will also affect all the HTTP endpoints of HDFS such as the NN, DN WebUI, JMX, QJM. Line 198: suggest give a full path: ssl-client.xml -> /etc/hadoop/ssl-client.xml We also need to document settings for the server side settings, e.g., ssl-server.xml. > Add the documents of swebhdfs configurations on the client side > --- > > Key: HDFS-12564 > URL: https://issues.apache.org/jira/browse/HDFS-12564 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation, webhdfs >Reporter: Takanobu Asanuma >Assignee: Takanobu Asanuma >Priority: Major > Attachments: HDFS-12564.1.patch, HDFS-12564.2.patch > > > Documentation does not cover the swebhdfs configurations on the client side. > We can reuse the hftp/hsftp documents which was removed from Hadoop-3.0 in > HDFS-5570, HDFS-9640. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12735) Make ContainerStateMachine#applyTransaction async
[ https://issues.apache.org/jira/browse/HDFS-12735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lokesh Jain updated HDFS-12735: --- Attachment: HDFS-12735-HDFS-7240.000.patch > Make ContainerStateMachine#applyTransaction async > - > > Key: HDFS-12735 > URL: https://issues.apache.org/jira/browse/HDFS-12735 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Lokesh Jain >Assignee: Lokesh Jain >Priority: Major > Labels: performance > Attachments: HDFS-12735-HDFS-7240.000.patch > > > Currently ContainerStateMachine#applyTransaction makes a synchronous call to > dispatch client requests. Idea is to have a thread pool which dispatches > client requests and returns a CompletableFuture. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12753) Getting file not found exception while using distcp with s3a
[ https://issues.apache.org/jira/browse/HDFS-12753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Logesh Rangan updated HDFS-12753: - Summary: Getting file not found exception while using distcp with s3a (was: Getting file not founf exception while using distcp with s3a) > Getting file not found exception while using distcp with s3a > > > Key: HDFS-12753 > URL: https://issues.apache.org/jira/browse/HDFS-12753 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Logesh Rangan > > I'm using the distcp option to copy the huge files from Hadoop to S3. > Sometimes i'm getting the below error, > *Command:* (Copying 378 GB data) > _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D > 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D > 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' > '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' > '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' > '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' > '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' > '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' > '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' > '-Dfs.s3a.threads.keepalivetime=600' > '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy > dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest > _ > 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_000165_0, Status : FAILED > Error: java.io.FileNotFoundException: No such file or directory: > s3a://bucketname/filename > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78) > at > org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : > attempt_1497120915913_2792335_m_10_0, Status : FAILED > Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> > s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6 > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) > at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.io.IOException: Couldn't run retriable-command: Copying > hdfs://nameservice1/filename to s3a://bucketname/filename > at > org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) > at > org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) > ... 10 more > Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse > XML document with handler class > com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler > at > com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164) > at >
[jira] [Created] (HDFS-12753) Getting file not founf exception while using distcp with s3a
Logesh Rangan created HDFS-12753: Summary: Getting file not founf exception while using distcp with s3a Key: HDFS-12753 URL: https://issues.apache.org/jira/browse/HDFS-12753 Project: Hadoop HDFS Issue Type: Bug Reporter: Logesh Rangan I'm using the distcp option to copy the huge files from Hadoop to S3. Sometimes i'm getting the below error, *Command:* (Copying 378 GB data) _hadoop distcp -D HADOOP_OPTS=-Xmx12g -D HADOOP_CLIENT_OPTS='-Xmx12g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled' -D 'mapreduce.map.memory.mb=12288' -D 'mapreduce.map.java.opts=-Xmx10g' -D 'mapreduce.reduce.memory.mb=12288' -D 'mapreduce.reduce.java.opts=-Xmx10g' '-Dfs.s3a.proxy.host=edhmgrn-prod.cloud.capitalone.com' '-Dfs.s3a.proxy.port=8088' '-Dfs.s3a.access.key=XXX' '-Dfs.s3a.secret.key=XXX' '-Dfs.s3a.connection.timeout=18' '-Dfs.s3a.attempts.maximum=5' '-Dfs.s3a.fast.upload=true' '-Dfs.s3a.fast.upload.buffer=array' '-Dfs.s3a.fast.upload.active.blocks=50' '-Dfs.s3a.multipart.size=262144000' '-Dfs.s3a.threads.max=500' '-Dfs.s3a.threads.keepalivetime=600' '-Dfs.s3a.server-side-encryption-algorithm=AES256' -bandwidth 3072 -strategy dynamic -m 220 -numListstatusThreads 30 /src/ s3a://bucket/dest _ 17/11/01 12:23:27 INFO mapreduce.Job: Task Id : attempt_1497120915913_2792335_m_000165_0, Status : FAILED Error: java.io.FileNotFoundException: No such file or directory: s3a://bucketname/filename at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1132) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:78) at org.apache.hadoop.tools.util.DistCpUtils.preserve(DistCpUtils.java:197) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:256) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 17/11/01 12:28:32 INFO mapreduce.Job: Task Id : attempt_1497120915913_2792335_m_10_0, Status : FAILED Error: java.io.IOException: File copy failed: hdfs://nameservice1/filena --> s3a://cof-prod-lake-card/src/seam/acct_scores/acctmdlscore_card_cobna_anon_vldtd/instnc_id=2016102300/04_0_copy_6 at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:284) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:252) at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:50) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1912) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://nameservice1/filename to s3a://bucketname/filename at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101) at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:280) ... 10 more Caused by: com.cloudera.com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListBucketHandler at com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:164) at com.cloudera.com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListBucketObjectsResponse(XmlResponsesSaxParser.java:299) at com.cloudera.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:77) at com.cloudera.com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsUnmarshaller.unmarshall(Unmarshallers.java:74) at com.cloudera.com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at
[jira] [Commented] (HDFS-11661) GetContentSummary uses excessive amounts of memory
[ https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234342#comment-16234342 ] Xiao Chen commented on HDFS-11661: -- {quote} There are more bugs related to snapshots and content summary and quota usage discrepencies. I almost have a patch ready that optimizes content summary and appears to fix the snapshot issues. {quote} Hi [~daryn] and [~shahrs87], Just wanted to check if this was eventually done? And could you share the jira if so? Thanks! > GetContentSummary uses excessive amounts of memory > -- > > Key: HDFS-11661 > URL: https://issues.apache.org/jira/browse/HDFS-11661 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.8.0, 3.0.0-alpha2 >Reporter: Nathan Roberts >Assignee: Wei-Chiu Chuang >Priority: Blocker > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap > growth.png > > > ContentSummaryComputationContext::nodeIncluded() is being used to keep track > of all INodes visited during the current content summary calculation. This > can be all of the INodes in the filesystem, making for a VERY large hash > table. This simply won't work on large filesystems. > We noticed this after upgrading a namenode with ~100Million filesystem > objects was spending significantly more time in GC. Fortunately this system > had some memory breathing room, other clusters we have will not run with this > additional demand on memory. > This was added as part of HDFS-10797 as a way of keeping track of INodes that > have already been accounted for - to avoid double counting. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12739) Add Support for SCM --init command
[ https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12739: --- Attachment: HDFS-12739-HDFS-7240.004.patch [~linyiqun], Thanks for the review comments. The patch addresses Review comments . Please have a look. > Add Support for SCM --init command > -- > > Key: HDFS-12739 > URL: https://issues.apache.org/jira/browse/HDFS-12739 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: HDFS-7240 >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-12739-HDFS-7240.001.patch, > HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch, > HDFS-12739-HDFS-7240.004.patch > > > SCM --init command will generate cluster ID and persist it locally. The same > cluster Id will be shared with KSM and the datanodes. IF the cluster Id is > already available in the locally available version file, it will just read > the cluster Id . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12682: - Attachment: HDFS-12682.08.patch Thanks for the review Rakesh! Patch 8 to address all the comments. > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, > HDFS-12682.06.patch, HDFS-12682.07.patch, HDFS-12682.08.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12681) Fold HdfsLocatedFileStatus into HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234304#comment-16234304 ] Chris Douglas commented on HDFS-12681: -- Test failures are unrelated to the patch; all are due to resource exhaustion. Checkstyle errors are from the builder pattern. > Fold HdfsLocatedFileStatus into HdfsFileStatus > -- > > Key: HDFS-12681 > URL: https://issues.apache.org/jira/browse/HDFS-12681 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chris Douglas >Priority: Minor > Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, > HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, > HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, > HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch > > > {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of > {{LocatedFileStatus}}. Conversion requires copying common fields and shedding > unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to > extend {{LocatedFileStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
[ https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234261#comment-16234261 ] Xiaoyu Yao commented on HDFS-12750: --- Thanks [~cheersyang] for the commit. This is a very low risk unit test only change given you have done all the local verification, that should be OK. > Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions > > > Key: HDFS-12750 > URL: https://issues.apache.org/jira/browse/HDFS-12750 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12750-HDFS-7240.001.patch > > > Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so > that the metadata db and test files are cleaned up for subsequent tests. > TestStorageContainerManager#testBlockDeletionTransactions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12350) Support meta tags in configs
[ https://issues.apache.org/jira/browse/HDFS-12350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234240#comment-16234240 ] Steve Loughran commented on HDFS-12350: --- This is a change to hadoop-common. It should have been filed and discussed there. Please don't make changes to hadoop-common in hdfs patches without some publicity. thanks > Support meta tags in configs > > > Key: HDFS-12350 > URL: https://issues.apache.org/jira/browse/HDFS-12350 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ajay Kumar >Assignee: Ajay Kumar >Priority: Major > Fix For: 3.1.0 > > Attachments: HDFS-12350.01.patch, HDFS-12350.02.patch, > HDFS-12350.03.patch > > > We should tag the hadoop/hdfs config so that we can retrieve properties by > there usage/application like PERFORMANCE, NAMENODE etc. Right now we don't > have an option available to group or list related properties together. > Grouping properties through some restricted set of Meta tags and then > exposing them in Configuration class will be useful for end users. > For example, here is an config file with tags. > {code} > > > dfs.namenode.servicerpc-bind-host > localhost >REQUIRED > > > > dfs.namenode.fs-limits.min-block-size >1048576 >PERFORMANCE,REQUIRED > > > dfs.namenode.logging.level > Info > HDFS, DEBUG > > > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-12219: --- Fix Version/s: (was: 3.1.0) > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 3.0.0 > > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-12219: --- Fix Version/s: 3.1.0 > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 3.0.0, 3.1.0 > > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11096) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234197#comment-16234197 ] Sean Mackrory commented on HDFS-11096: -- >From an HDFS standpoint, definitely - I've run many successful rolling upgrade >and distcp-over-webhdfs tests this week and updated the patch. The only thing >remaining is to get automation itself in place after this is committed. I looked into the YARN issues. I'm still seeing very similar symptoms to the YARN-6457 issue mentioned above in both branch-3.0 and trunk. In trunk I'm also seeing this: {quote} 17/10/31 23:05:49 INFO security.AMRMTokenSecretManager: Creating password for appattempt_1509490231144_0628_02 17/10/31 23:05:49 INFO amlauncher.AMLauncher: Error launching appattempt_1509490231144_0628_02. Got exception: org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid container token used for starting container on : container-5.docker:35151 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:127) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455) at sun.reflect.GeneratedConstructorAccessor70.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:131) at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy89.startContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid container token used for starting container on : container-5.docker:35151 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70) at
[jira] [Comment Edited] (HDFS-11096) Support rolling upgrade between 2.x and 3.x
[ https://issues.apache.org/jira/browse/HDFS-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234197#comment-16234197 ] Sean Mackrory edited comment on HDFS-11096 at 11/1/17 3:16 PM: --- >From an HDFS standpoint, definitely - I've run many successful rolling upgrade >and distcp-over-webhdfs tests this week and updated the patch. The only thing >remaining is to get automation itself in place after this is committed. I looked into the YARN issues. I'm still seeing very similar symptoms to the YARN-6457 issue mentioned above in both branch-3.0 and trunk. In trunk I'm also seeing this: {code} 17/10/31 23:05:49 INFO security.AMRMTokenSecretManager: Creating password for appattempt_1509490231144_0628_02 17/10/31 23:05:49 INFO amlauncher.AMLauncher: Error launching appattempt_1509490231144_0628_02. Got exception: org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid container token used for starting container on : container-5.docker:35151 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70) at org.apache.hadoop.yarn.proto.ContainerManagementProtocol$ContainerManagementProtocolService$2.callBlockingMethod(ContainerManagementProtocol.java:127) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:845) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:788) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2455) at sun.reflect.GeneratedConstructorAccessor70.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53) at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateIOException(RPCUtil.java:80) at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:119) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:131) at sun.reflect.GeneratedMethodAccessor85.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) at com.sun.proxy.$Proxy89.startContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid container token used for starting container on : container-5.docker:35151 at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.verifyAndGetContainerTokenIdentifier(ContainerManagerImpl.java:974) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.startContainers(ContainerManagerImpl.java:789) at org.apache.hadoop.yarn.api.impl.pb.service.ContainerManagementProtocolPBServiceImpl.startContainers(ContainerManagementProtocolPBServiceImpl.java:70) at
[jira] [Updated] (HDFS-12708) Fix hdfs haadmin usage
[ https://issues.apache.org/jira/browse/HDFS-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fang zhenyi updated HDFS-12708: --- Attachment: (was: HDFS-15004.001.patch) > Fix hdfs haadmin usage > --- > > Key: HDFS-12708 > URL: https://issues.apache.org/jira/browse/HDFS-12708 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha4 >Reporter: fang zhenyi >Assignee: fang zhenyi >Priority: Minor > Fix For: 3.1.0 > > Attachments: HDFS-12708.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12708) Fix hdfs haadmin usage
[ https://issues.apache.org/jira/browse/HDFS-12708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] fang zhenyi updated HDFS-12708: --- Attachment: HDFS-15004.001.patch > Fix hdfs haadmin usage > --- > > Key: HDFS-12708 > URL: https://issues.apache.org/jira/browse/HDFS-12708 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0-alpha4 >Reporter: fang zhenyi >Assignee: fang zhenyi >Priority: Minor > Fix For: 3.1.0 > > Attachments: HDFS-12708.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-12711: Attachment: fakepatch.branch-2.txt > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234060#comment-16234060 ] Hadoop QA commented on HDFS-10323: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 26s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 13s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 39s{color} | {color:orange} hadoop-common-project/hadoop-common: The patch generated 4 new + 76 unchanged - 0 fixed = 80 total (was 76) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 1s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 56s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 33s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 91m 1s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-10323 | | GITHUB PR | https://github.com/apache/hadoop/pull/287 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ee35e48afebf 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 56b88b0 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/21913/artifact/out/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/21913/testReport/ | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21913/console | | Powered by | Apache
[jira] [Updated] (HDFS-11807) libhdfs++: Get minidfscluster tests running under valgrind
[ https://issues.apache.org/jira/browse/HDFS-11807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anatoli Shein updated HDFS-11807: - Attachment: HDFS-11807.HDFS-8707.004.patch Whitespace fix > libhdfs++: Get minidfscluster tests running under valgrind > -- > > Key: HDFS-11807 > URL: https://issues.apache.org/jira/browse/HDFS-11807 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: Anatoli Shein >Priority: Major > Attachments: HDFS-11807.HDFS-8707.000.patch, > HDFS-11807.HDFS-8707.001.patch, HDFS-11807.HDFS-8707.002.patch, > HDFS-11807.HDFS-8707.003.patch, HDFS-11807.HDFS-8707.004.patch > > > The gmock based unit tests generally don't expose race conditions and memory > stomps. A good way to expose these is running libhdfs++ stress tests and > tools under valgrind and pointing them at a real cluster. Right now the CI > tools don't do that so bugs occasionally slip in and aren't caught until they > cause trouble in applications that use libhdfs++ for HDFS access. > The reason the minidfscluster tests don't run under valgrind is because the > GC and JIT compiler in the embedded JVM do things that look like errors to > valgrind. I'd like to have these tests do some basic setup and then fork > into two processes: one for the minidfscluster stuff and one for the > libhdfs++ client test. A small amount of shared memory can be used to > provide a place for the minidfscluster to stick the hdfsBuilder object that > the client needs to get info about which port to connect to. Can also stick > a condition variable there to let the minidfscluster know when it can shut > down. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12443) Ozone: Improve SCM block deletion throttling algorithm
[ https://issues.apache.org/jira/browse/HDFS-12443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233972#comment-16233972 ] Weiwei Yang commented on HDFS-12443: Hi [~linyiqun] bq. add datanode info into delLog This might not be a good option. It is not flexible if we have fixed containerName to datanode mapping, because a container might be replicated to another nodes if the original DN is lost. bq. Scan the entire delLog from the beginning to end, getting blocks list info for each node. If one node reach maximum container number, then its record will be skipped. I think this approach is the best for now. How you plan to define the max number of containers for each node? Actually I am fine with a fixed number, e.g 50 for simplify the problem. bq. If not, keep scanning log until it reach the maximum value. Yes, good idea. I think we need a in-memory data structure to handle this. It maintains a map, key is datanodeID and value is the a list of {{DeletedBlocksTransaction}}, e.g DatanodeBlockDeletionTransactions. Each datanodeID is bounded with a max size for the length of the {{DeletedBlocksTransaction}}, it behaves like: # a KV entry is full once the value reaches the max length, add more element to this datanodeID will be skipped # the map is full only when all KV entries are full # each value has no duplicate element distinguished by the TXID and each time we ensure the DelLog at most can be scanned once. Suggest to write a separate test case to test such structure, to ensure the behavior is well tested. then the implementation in SCMBlockDeletingService will be straightforward. Thanks for driving this forward, appreciate. > Ozone: Improve SCM block deletion throttling algorithm > --- > > Key: HDFS-12443 > URL: https://issues.apache.org/jira/browse/HDFS-12443 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone, scm >Reporter: Weiwei Yang >Assignee: Yiqun Lin >Priority: Major > Labels: OzonePostMerge > Attachments: HDFS-12443-HDFS-7240.001.patch, > HDFS-12443-HDFS-7240.002.patch, HDFS-12443-HDFS-7240.002.patch, > HDFS-12443-SCM-blockdeletion-throttle.pdf > > > Currently SCM scans delLog to send deletion transactions to datanode > periodically, the throttling algorithm is simple, it scans at most > {{BLOCK_DELETE_TX_PER_REQUEST_LIMIT}} (by default 50) at a time. This is > non-optimal, worst case it might cache 50 TXs for 50 different DNs so each DN > will only get 1 TX to proceed in an interval, this will make the deletion > slow. An improvement to this is to make this throttling by datanode, e.g 50 > TXs per datanode per interval. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233962#comment-16233962 ] Wenxin He edited comment on HDFS-10323 at 11/1/17 11:51 AM: I find this problem too when using spark. And undeleted files leading to HDFS cluster no space left. So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems. I submit 001 patch to fix the problem: In this patch # FileSystem.Cache.map changed to {color:red}LinkedHashmap{color} in which fs are stored in {color:red}insertion order{color}. When ViewFileSystem is initialized, DistributedFileSystem is first stored in FileSystem.Cache.map and then ViewFileSystem. # When FileSystem.Cache.closeAll invoke, all cached fs {color:red}close inversely{color}, which like LiFO model. So ViewFileSystem close before its referred DistributedFileSystems, and all deleteOnExit files will be deleted safely before DistributedFileSystems close. was (Author: vincent he): I find this problem too when using spark. And undeleted files leading to HDFS cluster no space left. So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems. I submit 001 patch to fix the problem: In this patch FileSystem.Cache.map changed to LinkedHashmap in which fs are stored in insertion order. When ViewFileSystem is initialized, DistributedFileSystem is first stored in FileSystem.Cache.map and then ViewFileSystem. When FileSystem.Cache.closeAll invoke, all cached fs close inversely, which like LiFO model. So ViewFileSystem close before its referred DistributedFileSystems, and all deleteOnExit files will be deleted safely before DistributedFileSystems close. > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1 >Reporter: Ben Podgursky >Assignee: Wenxin He >Priority: Major > Attachments: HDFS-10323.001.patch > > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenxin He updated HDFS-10323: - Attachment: HDFS-10323.001.patch > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Affects Versions: 2.6.0, 2.7.4, 3.0.0-beta1 >Reporter: Ben Podgursky >Assignee: Wenxin He >Priority: Major > Attachments: HDFS-10323.001.patch > > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenxin He updated HDFS-10323: - Affects Version/s: 2.7.4 3.0.0-beta1 Status: Patch Available (was: Open) I find this problem too when using spark. And undeleted files leading to HDFS cluster no space left. So according to [~bpodgursky]'s suggestion and [~cmccabe]'s comment bq. 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all other FileSystems. I submit 001 patch to fix the problem: In this patch FileSystem.Cache.map changed to LinkedHashmap in which fs are stored in insertion order. When ViewFileSystem is initialized, DistributedFileSystem is first stored in FileSystem.Cache.map and then ViewFileSystem. When FileSystem.Cache.closeAll invoke, all cached fs close inversely, which like LiFO model. So ViewFileSystem close before its referred DistributedFileSystems, and all deleteOnExit files will be deleted safely before DistributedFileSystems close. > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Affects Versions: 3.0.0-beta1, 2.7.4, 2.6.0 >Reporter: Ben Podgursky >Assignee: Wenxin He >Priority: Major > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233952#comment-16233952 ] Hadoop QA commented on HDFS-12682: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 39s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 6 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 51s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 28s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 16s{color} | {color:green} root: The patch generated 0 new + 647 unchanged - 2 fixed = 647 total (was 649) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 39s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 20s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}123m 30s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}120m 20s{color} | {color:red} hadoop-mapreduce-client-jobclient in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 43s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}338m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.TestEncryptionZones | | | hadoop.hdfs.server.blockmanagement.TestReconstructStripedBlocksWithRackAwareness | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | Timed out junit tests | org.apache.hadoop.mapred.pipes.TestPipeApplication | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12682 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895124/HDFS-12682.07.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
[jira] [Updated] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
[ https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12750: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) > Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions > > > Key: HDFS-12750 > URL: https://issues.apache.org/jira/browse/HDFS-12750 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Fix For: HDFS-7240 > > Attachments: HDFS-12750-HDFS-7240.001.patch > > > Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so > that the metadata db and test files are cleaned up for subsequent tests. > TestStorageContainerManager#testBlockDeletionTransactions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
[ https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233838#comment-16233838 ] Weiwei Yang commented on HDFS-12750: Oops ... I just realized the patch did not trigger a jenkins job while I committing it... The patch no longer applies as I have it committed so it reported the error just now... I've done verification on my local env before (and after) committing it, so the patch was OK. I don't think we should revert the patch and do it all over again. Closing this now. But feel free to revert and reopen if you disagrees... Apologies again. > Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions > > > Key: HDFS-12750 > URL: https://issues.apache.org/jira/browse/HDFS-12750 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-12750-HDFS-7240.001.patch > > > Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so > that the metadata db and test files are cleaned up for subsequent tests. > TestStorageContainerManager#testBlockDeletionTransactions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12744) More logs when short-circuit read is failed and disabled
[ https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233837#comment-16233837 ] Hudson commented on HDFS-12744: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13174 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13174/]) HDFS-12744. More logs when short-circuit read is failed and disabled. (wwei: rev 56b88b06705441f6f171eec7fb2fa77946ca204b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/BlockReaderFactory.java > More logs when short-circuit read is failed and disabled > > > Key: HDFS-12744 > URL: https://issues.apache.org/jira/browse/HDFS-12744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: supportability > Fix For: 2.9.0, 3.0.0 > > Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch > > > Short-circuit read (SCR) failed with following error > {noformat} > 2017-10-21 16:42:28,024 WARN > [B.defaultRpcServer.handler=7,queue=7,port=16020] > impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR > while attempting to set up short-circuit access. Block xxx is not valid > {noformat} > then short-circuit read is disabled for *10 minutes* without any warning > message given in the log. This causes us spent some more time to figure out > why we had a long time window that SCR was not working. Propose to add a > warning log (other places already did) to indicate SCR is disabled and some > more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233836#comment-16233836 ] ASF GitHub Bot commented on HDFS-10323: --- GitHub user wenxinhe opened a pull request: https://github.com/apache/hadoop/pull/287 HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close() ordering You can merge this pull request into a Git repository by running: $ git pull https://github.com/wenxinhe/hadoop trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/287.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #287 commit a8b39e070b09005b2781ee46a9b2f3a09c04246e Author: wenxinheDate: 2017-11-01T09:05:16Z HDFS-10323. transient deleteOnExit failure in ViewFileSystem due to close() ordering > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Affects Versions: 2.6.0 >Reporter: Ben Podgursky >Assignee: Wenxin He >Priority: Major > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
[ https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233831#comment-16233831 ] Hadoop QA commented on HDFS-12750: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-12750 does not apply to HDFS-7240. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12750 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895081/HDFS-12750-HDFS-7240.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/21912/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions > > > Key: HDFS-12750 > URL: https://issues.apache.org/jira/browse/HDFS-12750 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-12750-HDFS-7240.001.patch > > > Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so > that the metadata db and test files are cleaned up for subsequent tests. > TestStorageContainerManager#testBlockDeletionTransactions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12750) Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions
[ https://issues.apache.org/jira/browse/HDFS-12750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233817#comment-16233817 ] Weiwei Yang commented on HDFS-12750: +1, committing the patch now, thanks [~xyao] for fixing this. > Ozone: Fix TestStorageContainerManager#testBlockDeletionTransactions > > > Key: HDFS-12750 > URL: https://issues.apache.org/jira/browse/HDFS-12750 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Major > Attachments: HDFS-12750-HDFS-7240.001.patch > > > Some of the newly added ozone tests need to shutdown the MiniOzoneCluster so > that the metadata db and test files are cleaned up for subsequent tests. > TestStorageContainerManager#testBlockDeletionTransactions -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11902) [READ] Merge BlockFormatProvider and FileRegionProvider.
[ https://issues.apache.org/jira/browse/HDFS-11902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233812#comment-16233812 ] Hadoop QA commented on HDFS-11902: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 4m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} HDFS-9806 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 5m 36s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 6s{color} | {color:green} HDFS-9806 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 22s{color} | {color:green} HDFS-9806 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 14s{color} | {color:green} HDFS-9806 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 34s{color} | {color:green} HDFS-9806 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 41s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 34s{color} | {color:red} hadoop-tools/hadoop-fs2img in HDFS-9806 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s{color} | {color:green} HDFS-9806 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 16s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 12m 17s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 7s{color} | {color:orange} root: The patch generated 9 new + 448 unchanged - 11 fixed = 457 total (was 459) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 13s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 43s{color} | {color:green} hadoop-fs2img in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 58s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}182m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy | | | hadoop.hdfs.server.namenode.TestStartup | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure210 | | | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.TestSetrepDecreasing | | Timed out junit tests | org.apache.hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData | | |
[jira] [Updated] (HDFS-12744) More logs when short-circuit read is failed and disabled
[ https://issues.apache.org/jira/browse/HDFS-12744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12744: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 2.9.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2 and branch-3.0. Thanks [~jzhuge] for the review. > More logs when short-circuit read is failed and disabled > > > Key: HDFS-12744 > URL: https://issues.apache.org/jira/browse/HDFS-12744 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Weiwei Yang >Assignee: Weiwei Yang >Priority: Major > Labels: supportability > Fix For: 2.9.0, 3.0.0 > > Attachments: HDFS-12744.001.patch, HDFS-12744.002.patch > > > Short-circuit read (SCR) failed with following error > {noformat} > 2017-10-21 16:42:28,024 WARN > [B.defaultRpcServer.handler=7,queue=7,port=16020] > impl.BlockReaderFactory: BlockReaderFactory(xxx): unknown response code ERROR > while attempting to set up short-circuit access. Block xxx is not valid > {noformat} > then short-circuit read is disabled for *10 minutes* without any warning > message given in the log. This causes us spent some more time to figure out > why we had a long time window that SCR was not working. Propose to add a > warning log (other places already did) to indicate SCR is disabled and some > more logging in DN to display what happened. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes
[ https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233783#comment-16233783 ] Hadoop QA commented on HDFS-12725: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 54s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 4 unchanged - 1 fixed = 5 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 40s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 36s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}164m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-12725 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12895133/HDFS-12725.04.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 871cf8bcf635 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b8c8b5b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/21908/artifact/out/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/21908/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results |
[jira] [Commented] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
[ https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233761#comment-16233761 ] Weiwei Yang commented on HDFS-12748: Thanks [~daryn], your comment makes sense to me. Just uploaded v2 patch, this patch pulls some common methods out for re-use, and remove the FileSystem call for GETHOMEDIRECTORY, please help to review, thanks. Note, GETTRASHROOT has same issue, but it requires more refactor (related to EC) to make it work consistent in webhdfs and HDFS, I think we need a separate JIRA to fix. Please let me know if this makes sense, thanks. > NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY > > > Key: HDFS-12748 > URL: https://issues.apache.org/jira/browse/HDFS-12748 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch > > > In our production environment, the standby NN often do fullgc, through mat we > found the largest object is FileSystem$Cache, which contains 7,844,890 > DistributedFileSystem. > By view hierarchy of method FileSystem.get() , I found only > NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating > different DistributedFileSystem every time instead of get a FileSystem from > cache. > {code:java} > case GETHOMEDIRECTORY: { > final String js = JsonUtil.toJsonString("Path", > FileSystem.get(conf != null ? conf : new Configuration()) > .getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } > {code} > When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc. > {code:java} > case GETHOMEDIRECTORY: { > FileSystem fs = null; > try { > fs = FileSystem.get(conf != null ? conf : new Configuration()); > final String js = JsonUtil.toJsonString("Path", > fs.getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } finally { > if (fs != null) { > fs.close(); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12748) NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY
[ https://issues.apache.org/jira/browse/HDFS-12748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiwei Yang updated HDFS-12748: --- Attachment: HDFS-12748.002.patch > NameNode memory leak when accessing webhdfs GETHOMEDIRECTORY > > > Key: HDFS-12748 > URL: https://issues.apache.org/jira/browse/HDFS-12748 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Assignee: Weiwei Yang >Priority: Major > Attachments: HDFS-12748.001.patch, HDFS-12748.002.patch > > > In our production environment, the standby NN often do fullgc, through mat we > found the largest object is FileSystem$Cache, which contains 7,844,890 > DistributedFileSystem. > By view hierarchy of method FileSystem.get() , I found only > NamenodeWebHdfsMethods#get call FileSystem.get(). I don't know why creating > different DistributedFileSystem every time instead of get a FileSystem from > cache. > {code:java} > case GETHOMEDIRECTORY: { > final String js = JsonUtil.toJsonString("Path", > FileSystem.get(conf != null ? conf : new Configuration()) > .getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } > {code} > When we close FileSystem when GETHOMEDIRECTORY, NN don't do fullgc. > {code:java} > case GETHOMEDIRECTORY: { > FileSystem fs = null; > try { > fs = FileSystem.get(conf != null ? conf : new Configuration()); > final String js = JsonUtil.toJsonString("Path", > fs.getHomeDirectory().toUri().getPath()); > return Response.ok(js).type(MediaType.APPLICATION_JSON).build(); > } finally { > if (fs != null) { > fs.close(); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12622) Fix enumerate in HDFSErasureCoding.md
[ https://issues.apache.org/jira/browse/HDFS-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233754#comment-16233754 ] Akira Ajisaka commented on HDFS-12622: -- Thanks! > Fix enumerate in HDFSErasureCoding.md > - > > Key: HDFS-12622 > URL: https://issues.apache.org/jira/browse/HDFS-12622 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Reporter: Akira Ajisaka >Assignee: Yiqun Lin >Priority: Minor > Labels: newbie > Fix For: 3.0.0 > > Attachments: HDFS-12622.001.patch, HDFS-12622.001.patch, Screen Shot > 2017-10-10 at 17.36.16.png, screenshot.png > > > {noformat} > HDFS native implementation of default RS codec leverages Intel ISA-L > library to improve the encoding and decoding calculation. To enable and use > Intel ISA-L, there are three steps. > 1. Build ISA-L library. Please refer to the official site > "https://github.com/01org/isa-l/; for detail information. > 2. Build Hadoop with ISA-L support. Please refer to "Intel ISA-L build > options" section in "Build instructions for Hadoop" in (BUILDING.txt) in the > source code. > 3. Use `-Dbundle.isal` to copy the contents of the `isal.lib` directory > into the final tar file. Deploy Hadoop with the tar file. Make sure ISA-L is > available on HDFS clients and DataNodes. > {noformat} > Missing empty line before enumerate. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-12219: - Fix Version/s: (was: 3.1.0) 3.0.0 Cherry-picked to branch-3.0. Thanks! > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 3.0.0 > > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233730#comment-16233730 ] Hudson commented on HDFS-12219: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13173 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13173/]) HDFS-12219. Javadoc for FSNamesystem#getMaxObjects is incorrect. (yqlin: rev 20304b91cc1513e3d82a01d36f4ee9c4c81b60e4) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 3.1.0 > > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-10323) transient deleteOnExit failure in ViewFileSystem due to close() ordering
[ https://issues.apache.org/jira/browse/HDFS-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenxin He reassigned HDFS-10323: Assignee: Wenxin He > transient deleteOnExit failure in ViewFileSystem due to close() ordering > > > Key: HDFS-10323 > URL: https://issues.apache.org/jira/browse/HDFS-10323 > Project: Hadoop HDFS > Issue Type: Bug > Components: federation >Affects Versions: 2.6.0 >Reporter: Ben Podgursky >Assignee: Wenxin He >Priority: Major > > After switching to using a ViewFileSystem, fs.deleteOnExit calls began > failing frequently, displaying this error on failure: > 16/04/21 13:56:24 INFO fs.FileSystem: Ignoring failure to deleteOnExit for > path /tmp/delete_on_exit_test_123/a438afc0-a3ca-44f1-9eb5-010ca4a62d84 > Since FileSystem eats the error involved, it is difficult to be sure what the > error is, but I believe what is happening is that the ViewFileSystem’s child > FileSystems are being close()’d before the ViewFileSystem, due to the random > order ClientFinalizer closes FileSystems; so then when the ViewFileSystem > tries to close(), it tries to forward the delete() calls to the appropriate > child, and fails because the child is already closed. > I’m unsure how to write an actual Hadoop test to reproduce this, since it > involves testing behavior on actual JVM shutdown. However, I can verify that > while > {code:java} > fs.deleteOnExit(randomTemporaryDir); > {code} > regularly (~50% of the time) fails to delete the temporary directory, this > code: > {code:java} > ViewFileSystem viewfs = (ViewFileSystem)fs1; > for (FileSystem fileSystem : viewfs.getChildFileSystems()) { > if (fileSystem.exists(randomTemporaryDir)) { > fileSystem.deleteOnExit(randomTemporaryDir); > } > } > {code} > always successfully deletes the temporary directory on JVM shutdown. > I am not very familiar with FileSystem inheritance hierarchies, but at first > glance I see two ways to fix this behavior: > 1) ViewFileSystem could forward deleteOnExit calls to the appropriate child > FileSystem, and not hold onto that path itself. > 2) FileSystem.Cache.closeAll could first close all ViewFileSystems, then all > other FileSystems. > Would appreciate any thoughts of whether this seems accurate, and thoughts > (or help) on the fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-12219: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Just committed this to trunk. Thanks [~xkrogen] for the contribution and thanks [~hanishakoneru], [~ajisakaa] for the review. > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 3.1.0 > > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12219) Javadoc for FSNamesystem#getMaxObjects is incorrect
[ https://issues.apache.org/jira/browse/HDFS-12219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233705#comment-16233705 ] Yiqun Lin commented on HDFS-12219: -- +1. I'd like to help commit this, :) > Javadoc for FSNamesystem#getMaxObjects is incorrect > --- > > Key: HDFS-12219 > URL: https://issues.apache.org/jira/browse/HDFS-12219 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Erik Krogen >Assignee: Erik Krogen > Attachments: HDFS-12219.000.patch > > > The Javadoc states that this represents the total number of objects in the > system, but it really represents the maximum allowed number of objects (as > correctly stated on the Javadoc for {{FSNamesystemMBean#getMaxObjects()}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12682) ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as DISABLED
[ https://issues.apache.org/jira/browse/HDFS-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233703#comment-16233703 ] Rakesh R commented on HDFS-12682: - Good work! [~xiaochen]. Apart from the below comments, overall patch looks good to me. # Please make ErasureCodingPolicyInfo {{implements Serializable}} # Could you rename {{DFSTestUtil#getPolicyState}} method to {{DFSTestUtil#getECPolicyState}} # It returns both system and user defined policies, so please change message to {{ErasureCodingPolicy <" + policy + "> doesn't exist in the policies:" + Arrays.toString(policyInfos)}} {code} DFSTestUtil#getPolicyState(policy) throw new IllegalArgumentException("Policy <" + policy + "> is not in" + " system policies:" + Arrays.toString(policyInfos)); {code} # Considering we make the ECP class {{InterfaceAudience.Private}}, can we also make ECPS to {{@InterfaceAudience.Private}} ? {code} @InterfaceAudience.Public @InterfaceStability.Evolving public enum ErasureCodingPolicyState { {code} > ECAdmin -listPolicies will always show SystemErasureCodingPolicies state as > DISABLED > > > Key: HDFS-12682 > URL: https://issues.apache.org/jira/browse/HDFS-12682 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Reporter: Xiao Chen >Assignee: Xiao Chen >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12682.01.patch, HDFS-12682.02.patch, > HDFS-12682.03.patch, HDFS-12682.04.patch, HDFS-12682.05.patch, > HDFS-12682.06.patch, HDFS-12682.07.patch > > > On a real cluster, {{hdfs ec -listPolicies}} will always show policy state as > DISABLED. > {noformat} > [hdfs@nightly6x-1 root]$ hdfs ec -listPolicies > Erasure Coding Policies: > ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5, State=DISABLED] > ErasureCodingPolicy=[Name=RS-3-2-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=3, numParityUnits=2]], CellSize=1048576, Id=2, State=DISABLED] > ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, > numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1, State=DISABLED] > ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, > Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], > CellSize=1048576, Id=3, State=DISABLED] > ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, > numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4, State=DISABLED] > [hdfs@nightly6x-1 root]$ hdfs ec -getPolicy -path /ecec > XOR-2-1-1024k > {noformat} > This is because when [deserializing > protobuf|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java#L2942], > the static instance of [SystemErasureCodingPolicies > class|https://github.com/apache/hadoop/blob/branch-3.0.0-beta1/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SystemErasureCodingPolicies.java#L101] > is first checked, and always returns the cached policy objects, which are > created by default with state=DISABLED. > All the existing unit tests pass, because that static instance that the > client (e.g. ECAdmin) reads in unit test is updated by NN. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12739) Add Support for SCM --init command
[ https://issues.apache.org/jira/browse/HDFS-12739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233697#comment-16233697 ] Yiqun Lin commented on HDFS-12739: -- Thanks for working on this, [~shashikant]. The following are some comments from me: # The usage of {{GENCLUSTERID}} is missing in {{USAGE}}. # The return value of {{StorageContainerManager#scmInit}} looks confused. Scm initial successful, the method return false, if failed, return true. Can we make a change on this? And let {{aborted = !scmInit(conf);}}. # Following line only prints cluster id and no other description, would you make a change? {code} private static boolean scmInit(OzoneConfiguration conf) throws IOException { ... if (state != StorageState.NORMAL) { try { scmStorage.createStorageDir(); clusterId = StartupOption.INIT.getClusterId(); if (clusterId == null || clusterId.isEmpty()) { //Generate a new cluster id clusterId = SCMStorage.newClusterID(); } scmStorage.setClusterID(clusterId); scmStorage.writeProperties(); System.out.println(clusterId); <= return false; } {code} # Here we introduces new scm commands, we need to add some test to verify the behaviour of these commands. Thanks. > Add Support for SCM --init command > -- > > Key: HDFS-12739 > URL: https://issues.apache.org/jira/browse/HDFS-12739 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: HDFS-7240 >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-12739-HDFS-7240.001.patch, > HDFS-12739-HDFS-7240.002.patch, HDFS-12739-HDFS-7240.003.patch > > > SCM --init command will generate cluster ID and persist it locally. The same > cluster Id will be shared with KSM and the datanodes. IF the cluster Id is > already available in the locally available version file, it will just read > the cluster Id . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169
[ https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233687#comment-16233687 ] Hudson commented on HDFS-12714: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13172/]) HDFS-12714. Hadoop 3 missing fix for HDFS-5169. Contributed by Joe (jzhuge: rev b8c8b5bc274211b29be125e5463662795a363f84) * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c > Hadoop 3 missing fix for HDFS-5169 > -- > > Key: HDFS-12714 > URL: https://issues.apache.org/jira/browse/HDFS-12714 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, > 3.0.0-alpha3 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: 3.0.0-beta1, 3.1.0 > > Attachments: HDFS-12714.001.patch > > > HDFS-5169 is a fix for a null pointer dereference in translateZCRException. > This line in hdfs.c: > ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > should be: > ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > Plainly, translateZCRException should print the exception (exc) passed in to > the function rather than the uninitialized local jthr. > The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but > it is missing on hadoop 3 branches including trunk. > Hadoop 2.8: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514 > Hadoop 3.0: > https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12482) Provide a configuration to adjust the weight of EC recovery tasks to adjust the speed of recovery
[ https://issues.apache.org/jira/browse/HDFS-12482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233686#comment-16233686 ] Hudson commented on HDFS-12482: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13172/]) HDFS-12482. Provide a configuration to adjust the weight of EC recovery (lei: rev 9367c25dbdfedf60cdbd65611281cf9c667829e6) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestReconstructStripedFile.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/ErasureCodingWorker.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSErasureCoding.md * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/erasurecode/StripedBlockReconstructor.java > Provide a configuration to adjust the weight of EC recovery tasks to adjust > the speed of recovery > - > > Key: HDFS-12482 > URL: https://issues.apache.org/jira/browse/HDFS-12482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding >Affects Versions: 3.0.0-alpha4 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Minor > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0 > > Attachments: HDFS-12482.00.patch, HDFS-12482.01.patch, > HDFS-12482.02.patch, HDFS-12482.03.patch, HDFS-12482.04.patch, > HDFS-12482.05.patch > > > The relative speed of EC recovery comparing to 3x replica recovery is a > function of (EC codec, number of sources, NIC speed, and CPU speed, and etc). > Currently the EC recovery has a fixed {{xmitsInProgress}} of {{max(# of > sources, # of targets)}} comparing to {{1}} for 3x replica recovery, and NN > uses {{xmitsInProgress}} to decide how much recovery tasks to schedule to the > DataNode this we can add a coefficient for user to tune the weight of EC > recovery tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-5169) hdfs.c: translateZCRException: null pointer deref when translating some exceptions
[ https://issues.apache.org/jira/browse/HDFS-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233688#comment-16233688 ] Hudson commented on HDFS-5169: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13172 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13172/]) HDFS-12714. Hadoop 3 missing fix for HDFS-5169. Contributed by Joe (jzhuge: rev b8c8b5bc274211b29be125e5463662795a363f84) * (edit) hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c > hdfs.c: translateZCRException: null pointer deref when translating some > exceptions > -- > > Key: HDFS-5169 > URL: https://issues.apache.org/jira/browse/HDFS-5169 > Project: Hadoop HDFS > Issue Type: Bug > Components: libhdfs >Affects Versions: HDFS-4949 >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Minor > Fix For: HDFS-4949 > > Attachments: HDFS-5169-caching.001.patch > > > hdfs.c: translateZCRException: there is a null pointer deref when translating > some exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169
[ https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233678#comment-16233678 ] John Zhuge edited comment on HDFS-12714 at 11/1/17 6:03 AM: Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for reporting and fixing the issue! was (Author: jzhuge): Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for the contribution! > Hadoop 3 missing fix for HDFS-5169 > -- > > Key: HDFS-12714 > URL: https://issues.apache.org/jira/browse/HDFS-12714 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, > 3.0.0-alpha3 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: 3.0.0-beta1, 3.1.0 > > Attachments: HDFS-12714.001.patch > > > HDFS-5169 is a fix for a null pointer dereference in translateZCRException. > This line in hdfs.c: > ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > should be: > ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > Plainly, translateZCRException should print the exception (exc) passed in to > the function rather than the uninitialized local jthr. > The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but > it is missing on hadoop 3 branches including trunk. > Hadoop 2.8: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514 > Hadoop 3.0: > https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-12714) Hadoop 3 missing fix for HDFS-5169
[ https://issues.apache.org/jira/browse/HDFS-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge resolved HDFS-12714. --- Resolution: Fixed Fix Version/s: 3.0.0-beta1 3.1.0 Committed to trunk and branch-3.0. Thanks [~joemcdonnell] for the contribution! > Hadoop 3 missing fix for HDFS-5169 > -- > > Key: HDFS-12714 > URL: https://issues.apache.org/jira/browse/HDFS-12714 > Project: Hadoop HDFS > Issue Type: Bug > Components: native >Affects Versions: 3.0.0-alpha1, 3.0.0-beta1, 3.0.0-alpha2, 3.0.0-alpha4, > 3.0.0-alpha3 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Major > Fix For: 3.1.0, 3.0.0-beta1 > > Attachments: HDFS-12714.001.patch > > > HDFS-5169 is a fix for a null pointer dereference in translateZCRException. > This line in hdfs.c: > ret = printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > should be: > ret = printExceptionAndFree(env, exc, PRINT_EXC_ALL, "hadoopZeroCopyRead: > ZeroCopyCursor#read failed"); > Plainly, translateZCRException should print the exception (exc) passed in to > the function rather than the uninitialized local jthr. > The fix for HDFS-5169 (part of HDFS-4949) exists on hadoop 2.* branches, but > it is missing on hadoop 3 branches including trunk. > Hadoop 2.8: > https://github.com/apache/hadoop/blob/branch-2.8/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2514 > Hadoop 3.0: > https://github.com/apache/hadoop/blob/branch-3.0/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c#L2691 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org