[jira] [Assigned] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai reassigned HDFS-7335: - Assignee: Milan Desai Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7338) Reed-Solomon codec library support
[ https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng reassigned HDFS-7338: --- Assignee: Kai Zheng (was: Li Bo) Reed-Solomon codec library support -- Key: HDFS-7338 URL: https://issues.apache.org/jira/browse/HDFS-7338 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-EC Reporter: Zhe Zhang Assignee: Kai Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7285: Fix Version/s: HDFS-EC Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Fix For: HDFS-EC Attachments: HDFSErasureCodingDesign-20141028.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195968#comment-14195968 ] Hadoop QA commented on HDFS-7314: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679168/HDFS-7314.patch against trunk revision 2bb327e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8638//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8638//console This message is automatically generated. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state.
[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195980#comment-14195980 ] Steve Loughran commented on HDFS-6803: -- maybe we should say MUST be consistent with serialized operations SHOULD be concurrent What we really wants is for two parallel operations to always produce the right data; concurrency boosts throughput, but is not guarantees {code} read(pos1,dest,, len) - dest[0..len-1] = [data(FS, path, pos1), data(FS, path, pos1+1) ... data(FS, path, pos1+ len -1] {code} and {{read(pos2, dest2, len2)}} does the same for pos2..pos2+len2-1 This defines the isolation; the SHOULD/MAY sets the policy. Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, DocumentingDFSClientDFSInputStream.v2.pdf Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()
[ https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14195981#comment-14195981 ] Steve Loughran commented on HDFS-6698: -- bq. it's just so easy to write incorrect code with volatile. yes, but its very fast incorrect code ... try to optimize DFSInputStream.getFileLength() -- Key: HDFS-6698 URL: https://issues.apache.org/jira/browse/HDFS-6698 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, HDFS-6698v2.txt, HDFS-6698v3.txt HBase prefers to invoke read() serving scan request, and invoke pread() serving get reqeust. Because pread() almost holds no lock. Let's image there's a read() running, because the definition is: {code} public synchronized int read {code} so no other read() request could run concurrently, this is known, but pread() also could not run... because: {code} public int read(long position, byte[] buffer, int offset, int length) throws IOException { // sanity checks dfsClient.checkOpen(); if (closed) { throw new IOException(Stream closed); } failures = 0; long filelen = getFileLength(); {code} the getFileLength() also needs lock. so we need to figure out a no lock impl for getFileLength() before HBase multi stream feature done. [~saint@gmail.com] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7017: --- Attachment: HDFS-7017-pnative.003.patch Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7017: --- Attachment: (was: HDFS-7017-pnative.003.patch) Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhanwei Wang updated HDFS-7017: --- Attachment: HDFS-7017-pnative.003.patch Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7337: Fix Version/s: (was: HDFS-EC) Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7339: Fix Version/s: (was: HDFS-EC) Create block groups for initial block encoding -- Key: HDFS-7339 URL: https://issues.apache.org/jira/browse/HDFS-7339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196025#comment-14196025 ] Hudson commented on HDFS-7324: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/733/]) HDFS-7324. haadmin command usage prints incorrect command name. Contributed by Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt haadmin command usage prints incorrect command name --- Key: HDFS-7324 URL: https://issues.apache.org/jira/browse/HDFS-7324 Project: Hadoop HDFS Issue Type: Bug Components: ha, tools Affects Versions: 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: HDFS-7324.patch Scenario: === Try the help command for hadadmin like following.. Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we can check following command. [root@linux156 bin]# *{color:red}./hdfs haadmin{color}* No GC_PROFILE is given. Defaults to medium. *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* [-transitionToActive serviceId [--forceactive]] [-transitionToStandby serviceId] [-failover [--forcefence] [--forceactive] serviceId serviceId] [-getServiceState serviceId] [-checkHealth serviceId] [-help command] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}* Error: Could not find or load main class DFSHAAdmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7147) Update archival storage user documentation
[ https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196017#comment-14196017 ] Hudson commented on HDFS-7147: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/733/]) HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876) * hadoop-project/src/site/site.xml * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Update archival storage user documentation -- Key: HDFS-7147 URL: https://issues.apache.org/jira/browse/HDFS-7147 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Blocker Fix For: 2.6.0 Attachments: h7147_20140926.patch, h7147_20141101.patch, h7147_20141103.patch The Configurations section is no longer valid. It should be removed. Also, if there are new APIs able to get in such as the addStoragePolicy API proposed in HDFS-7076, the corresponding user documentation should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.
[ https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196024#comment-14196024 ] Hudson commented on HDFS-7328: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #733 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/733/]) HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java TestTraceAdmin assumes Unix line endings. - Key: HDFS-7328 URL: https://issues.apache.org/jira/browse/HDFS-7328 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7328.1.patch {{TestTraceAdmin}} contains some string assertions that assume Unix line endings. The test fails on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196027#comment-14196027 ] Zhanwei Wang commented on HDFS-7017: Add log when fail to close file, remove OutputStream::lastError and related code. I catch std::bad_alloc in lease renewer, if overcommit turned on, it does nothing, but if it is thrown in some case, I do not want the library die in backend working thread. std::bad_alloc will be thrown again somewhere in main thread and the API can handle it well. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7344) Erasure Coding worker and support in DataNode
[ https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7344: Target Version/s: HDFS-EC Affects Version/s: (was: HDFS-EC) Erasure Coding worker and support in DataNode - Key: HDFS-7344 URL: https://issues.apache.org/jira/browse/HDFS-7344 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Kai Zheng Assignee: Li Bo According to HDFS-7285 and the design, this handles DataNode side extension and related support for Erasure Coding, and implements ECWorker. It mainly covers the following aspects, and separate tasks may be opened to handle each of them. * Process encoding work, calculating parity blocks as specified in block groups and codec schema; * Process decoding work, recovering data blocks according to block groups and codec schema; * Handle client requests for passive recovery blocks data and serving data on demand while reconstructing; * Write parity blocks according to storage policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7345) Local Reconstruction Codes (LRC)
[ https://issues.apache.org/jira/browse/HDFS-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7345: Target Version/s: HDFS-EC Affects Version/s: (was: HDFS-EC) Local Reconstruction Codes (LRC) Key: HDFS-7345 URL: https://issues.apache.org/jira/browse/HDFS-7345 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple Erasure Coding codecs via pluggable framework and implements Reed Solomon code by default. This is to support a more advanced coding mechanism, Local Reconstruction Codes (LRC). As discussed in the paper (https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low. The important benefits of LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes, while still allowing a significant reduction in storage overhead. Intel ISA library also supports LRC in its update and can also be leveraged. The implementation would also consider how to distribute the calculating of local and global parity blocks to other relevant DataNodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7285: - Target Version/s: HDFS-EC Fix Version/s: (was: HDFS-EC) Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: HDFSErasureCodingDesign-20141028.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7343) A comprehensive and flexible storage policy engine
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7343: - Target Version/s: HDFS-EC A comprehensive and flexible storage policy engine -- Key: HDFS-7343 URL: https://issues.apache.org/jira/browse/HDFS-7343 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Kai Zheng As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage policy engine considering file attributes, metadata, data temperature, storage type, EC codec, available hardware capabilities, user/application preference and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7347) Configurable erasure coding policy for individual files and directories
[ https://issues.apache.org/jira/browse/HDFS-7347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7347: - Fix Version/s: (was: HDFS-EC) Configurable erasure coding policy for individual files and directories --- Key: HDFS-7347 URL: https://issues.apache.org/jira/browse/HDFS-7347 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS users and admins should be able to turn on and off erasure coding for individual files or directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7349) Support DFS command for the EC encoding
Vinayakumar B created HDFS-7349: --- Summary: Support DFS command for the EC encoding Key: HDFS-7349 URL: https://issues.apache.org/jira/browse/HDFS-7349 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Vinayakumar B Assignee: Vinayakumar B Support implementation of the following commands *hdfs dfs -convertToEC path* path: Converts all blocks under this path to EC form (if not already in EC form, and if can be coded). *hdfs dfs -convertToRep path* path: Converts all blocks under this path to be replicated form. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7350) WebHDFS: Support EC commands through webhdfs
Uma Maheswara Rao G created HDFS-7350: - Summary: WebHDFS: Support EC commands through webhdfs Key: HDFS-7350 URL: https://issues.apache.org/jira/browse/HDFS-7350 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-EC Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7351) Document the HDFS Erasure Coding feature
Uma Maheswara Rao G created HDFS-7351: - Summary: Document the HDFS Erasure Coding feature Key: HDFS-7351 URL: https://issues.apache.org/jira/browse/HDFS-7351 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-EC Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7352) Common side changes for HDFS Erasure coding support
Uma Maheswara Rao G created HDFS-7352: - Summary: Common side changes for HDFS Erasure coding support Key: HDFS-7352 URL: https://issues.apache.org/jira/browse/HDFS-7352 Project: Hadoop HDFS Issue Type: Bug Affects Versions: HDFS-EC Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This is umbrella JIRA for tracking the common side changes for HDFS Erasure Coding support, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7352) Common side changes for HDFS Erasure coding support
[ https://issues.apache.org/jira/browse/HDFS-7352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-7352: -- Issue Type: New Feature (was: Bug) Common side changes for HDFS Erasure coding support --- Key: HDFS-7352 URL: https://issues.apache.org/jira/browse/HDFS-7352 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: HDFS-EC Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G This is umbrella JIRA for tracking the common side changes for HDFS Erasure Coding support, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7353) Common Erasure Codec API and plugin support
Kai Zheng created HDFS-7353: --- Summary: Common Erasure Codec API and plugin support Key: HDFS-7353 URL: https://issues.apache.org/jira/browse/HDFS-7353 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng This is to abstract and define common codec API across different codec algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. It provides default implementation and also allows to plugin vendor specific ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196137#comment-14196137 ] Hudson commented on HDFS-7324: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/]) HDFS-7324. haadmin command usage prints incorrect command name. Contributed by Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java haadmin command usage prints incorrect command name --- Key: HDFS-7324 URL: https://issues.apache.org/jira/browse/HDFS-7324 Project: Hadoop HDFS Issue Type: Bug Components: ha, tools Affects Versions: 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: HDFS-7324.patch Scenario: === Try the help command for hadadmin like following.. Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we can check following command. [root@linux156 bin]# *{color:red}./hdfs haadmin{color}* No GC_PROFILE is given. Defaults to medium. *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* [-transitionToActive serviceId [--forceactive]] [-transitionToStandby serviceId] [-failover [--forcefence] [--forceactive] serviceId serviceId] [-getServiceState serviceId] [-checkHealth serviceId] [-help command] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}* Error: Could not find or load main class DFSHAAdmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7147) Update archival storage user documentation
[ https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196129#comment-14196129 ] Hudson commented on HDFS-7147: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/]) HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-project/src/site/site.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java Update archival storage user documentation -- Key: HDFS-7147 URL: https://issues.apache.org/jira/browse/HDFS-7147 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Blocker Fix For: 2.6.0 Attachments: h7147_20140926.patch, h7147_20141101.patch, h7147_20141103.patch The Configurations section is no longer valid. It should be removed. Also, if there are new APIs able to get in such as the addStoragePolicy API proposed in HDFS-7076, the corresponding user documentation should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7353) Common Erasure Codec API and plugin support
[ https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng reassigned HDFS-7353: --- Assignee: Kai Zheng Common Erasure Codec API and plugin support --- Key: HDFS-7353 URL: https://issues.apache.org/jira/browse/HDFS-7353 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng This is to abstract and define common codec API across different codec algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. It provides default implementation and also allows to plugin vendor specific ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.
[ https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196136#comment-14196136 ] Hudson commented on HDFS-7328: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1922 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1922/]) HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestTraceAdmin assumes Unix line endings. - Key: HDFS-7328 URL: https://issues.apache.org/jira/browse/HDFS-7328 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7328.1.patch {{TestTraceAdmin}} contains some string assertions that assume Unix line endings. The test fails on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7338) Reed-Solomon codec library support
[ https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196142#comment-14196142 ] Kai Zheng commented on HDFS-7338: - This will follow the common codec API and plugin support to be defined in HDFS-7353, supports the RS codec. We're considering to provide the default implementation by utilizing Intel ISA library as it's desired for better performance and the license is friendly. Reed-Solomon codec library support -- Key: HDFS-7338 URL: https://issues.apache.org/jira/browse/HDFS-7338 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-EC Reporter: Zhe Zhang Assignee: Kai Zheng -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7338) Reed-Solomon codec library support
[ https://issues.apache.org/jira/browse/HDFS-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7338: Description: This is to provide RS codec implementation for encoding and decoding. Reed-Solomon codec library support -- Key: HDFS-7338 URL: https://issues.apache.org/jira/browse/HDFS-7338 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-EC Reporter: Zhe Zhang Assignee: Kai Zheng This is to provide RS codec implementation for encoding and decoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7337: Description: According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. was: According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196154#comment-14196154 ] Kai Zheng commented on HDFS-7337: - Also opened HDFS-7353 to focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196163#comment-14196163 ] Kai Zheng commented on HDFS-7337: - Zhe, let me consider these issues together and think about how to define and implement such configurable and pluggable codec plus schema. Will give my thoughts here for the discussion. Assigned to me. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng reassigned HDFS-7337: --- Assignee: Kai Zheng Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Kai Zheng According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7353) Common Erasure Codec API and plugin support
[ https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7353: Fix Version/s: HDFS-EC Common Erasure Codec API and plugin support --- Key: HDFS-7353 URL: https://issues.apache.org/jira/browse/HDFS-7353 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Fix For: HDFS-EC This is to abstract and define common codec API across different codec algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. It provides default implementation and also allows to plugin vendor specific ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7147) Update archival storage user documentation
[ https://issues.apache.org/jira/browse/HDFS-7147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196198#comment-14196198 ] Hudson commented on HDFS-7147: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/]) HDFS-7147. Update archival storage user documentation. Contributed by Tsz Wo Nicholas Sze. (wheat9: rev 35d353e0f66b424508e2dd93bd036718cc4d5876) * hadoop-project/src/site/site.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/blockStoragePolicy-default.xml * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockStoragePolicySuite.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/ArchivalStorage.apt.vm Update archival storage user documentation -- Key: HDFS-7147 URL: https://issues.apache.org/jira/browse/HDFS-7147 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Blocker Fix For: 2.6.0 Attachments: h7147_20140926.patch, h7147_20141101.patch, h7147_20141103.patch The Configurations section is no longer valid. It should be removed. Also, if there are new APIs able to get in such as the addStoragePolicy API proposed in HDFS-7076, the corresponding user documentation should be added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7324) haadmin command usage prints incorrect command name
[ https://issues.apache.org/jira/browse/HDFS-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196206#comment-14196206 ] Hudson commented on HDFS-7324: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/]) HDFS-7324. haadmin command usage prints incorrect command name. Contributed by Brahma Reddy Battula. (sureshms: rev 237890feabc809ade4e7542039634e04219d0bcb) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java haadmin command usage prints incorrect command name --- Key: HDFS-7324 URL: https://issues.apache.org/jira/browse/HDFS-7324 Project: Hadoop HDFS Issue Type: Bug Components: ha, tools Affects Versions: 2.5.1 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Fix For: 2.7.0 Attachments: HDFS-7324.patch Scenario: === Try the help command for hadadmin like following.. Here usage is coming as DFSHAAdmin -ns, Ideally this not availble which we can check following command. [root@linux156 bin]# *{color:red}./hdfs haadmin{color}* No GC_PROFILE is given. Defaults to medium. *{color:red}Usage: DFSHAAdmin [-ns nameserviceId]{color}* [-transitionToActive serviceId [--forceactive]] [-transitionToStandby serviceId] [-failover [--forcefence] [--forceactive] serviceId serviceId] [-getServiceState serviceId] [-checkHealth serviceId] [-help command] Generic options supported are -conf configuration file specify an application configuration file -D property=valueuse value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:portspecify a job tracker -files comma separated list of filesspecify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jarsspecify comma separated jar files to include in the classpath. -archives comma separated list of archivesspecify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] *{color:blue}[root@linux156 bin]# ./hdfs DFSHAAdmin -ns 100{color}* Error: Could not find or load main class DFSHAAdmin -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7328) TestTraceAdmin assumes Unix line endings.
[ https://issues.apache.org/jira/browse/HDFS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196205#comment-14196205 ] Hudson commented on HDFS-7328: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1947 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1947/]) HDFS-7328. TestTraceAdmin assumes Unix line endings. Contributed by Chris Nauroth. (cnauroth: rev 2bb327eb939f57626d3dac10f7016ed634375d94) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTraceAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestTraceAdmin assumes Unix line endings. - Key: HDFS-7328 URL: https://issues.apache.org/jira/browse/HDFS-7328 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Fix For: 2.6.0 Attachments: HDFS-7328.1.patch {{TestTraceAdmin}} contains some string assertions that assume Unix line endings. The test fails on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196334#comment-14196334 ] Suresh Srinivas commented on HDFS-7340: --- +1 for the patch. Couple of comments: bq. upgrade has been finalized Can you please change this to upgrade already has been finalized? Also please add Idempotent annotation to the method. make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-7340.000.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196336#comment-14196336 ] Plamen Jeliazkov commented on HDFS-3107: [~cmccabe], At the time [~shv] talked about my new patch there was nothing posted yet in HDFS-7056 minus Konstantin's design doc. We only uploaded even newer patches yesterday around noon. Please be careful not to confuse [~shv] and [~cos]. The snapshot support patch (for HDFS-7056) was not ready yet when [~cos] made his comment. We don't have to commit HDFS-3107 on its own. There is the option to treat the combined patch HDFS-3107--7056 as the first patch, which accounts for upgrade and rollback functionality as well as snapshot support, demonstrated in unit test. This should address your comment: My reasoning is that if the first patch breaks rollback, it's tough to see it getting into trunk. I am not objecting to do work on a branch but I am unsure it is necessary given the combined patch seems to meet the support requirements asked for this work. I'll investigate the FindBugs. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196355#comment-14196355 ] Tsz Wo Nicholas Sze commented on HDFS-7340: --- +1 patch looks good. No additional comments. make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-7340.000.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7218) FSNamesystem ACL operations should write to audit log on failure
[ https://issues.apache.org/jira/browse/HDFS-7218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7218: Hadoop Flags: Reviewed +1 from me too. Thank you, Charles. FSNamesystem ACL operations should write to audit log on failure Key: HDFS-7218 URL: https://issues.apache.org/jira/browse/HDFS-7218 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7218.001.patch, HDFS-7218.002.patch, HDFS-7218.003.patch, HDFS-7218.004.patch, HDFS-7218.005.patch Various Acl methods in FSNamesystem do not write to the audit log when the operation is not successful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7339: Description: All erasure codec operations center around the concept of _block groups_, which are formed in encoding and looked up in decoding. This JIRA creates a lightweight {{BlockGroup}} class to record the original and parity blocks in an encoding group, as well as a pointer to the codec schema. Pluggable codec schemas will be supported in HDFS-7337. The NameNode creates and maintains {{BlockGroup}} instances through 2 new components; the attached figure has an illustration of the architecture. {{ECManager}}: This module manages {{BlockGroups}} and associated codec schemas. As a simple example, it stores the codec schema of Reed-Solomon algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each {{BlockGroup}} points to the schema it uses. To facilitate lookups during recovery requests, {{BlockGroups}} should be oraganized as a map keyed by {{Blocks}}. {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. This module analyzes the incoming events, and dispatches tasks to {{UnderReplicatedBlocks}} to create parity blocks. A new queue ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues to maintain the relative order of encoding and replication tasks. * Whenever a block is finalized and meets EC criteria -- including 1) block size is full; 2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it needs to store a set of blocks waiting to be encoded. Different grouping algorithms can be applied -- e.g., always grouping blocks in the same file. Blocks in a group should also reside on different DataNodes, and ideally on different racks, to tolerate node and rack failures. If successful, it records the formed group with {{ECManager}} and insert the parity blocks into {{QUEUE_INITIAL_ENCODING}}. * When a parity block or a raw block in {{ENCODED}} state is found missing, {{ErasureCodingBlocks}} adds it to existing priority queues in {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be added for fine grained differentiation (e.g., loss of a raw block versus a parity one). Create block groups for initial block encoding -- Key: HDFS-7339 URL: https://issues.apache.org/jira/browse/HDFS-7339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: Encoding-design-NN.jpg All erasure codec operations center around the concept of _block groups_, which are formed in encoding and looked up in decoding. This JIRA creates a lightweight {{BlockGroup}} class to record the original and parity blocks in an encoding group, as well as a pointer to the codec schema. Pluggable codec schemas will be supported in HDFS-7337. The NameNode creates and maintains {{BlockGroup}} instances through 2 new components; the attached figure has an illustration of the architecture. {{ECManager}}: This module manages {{BlockGroups}} and associated codec schemas. As a simple example, it stores the codec schema of Reed-Solomon algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each {{BlockGroup}} points to the schema it uses. To facilitate lookups during recovery requests, {{BlockGroups}} should be oraganized as a map keyed by {{Blocks}}. {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. This module analyzes the incoming events, and dispatches tasks to {{UnderReplicatedBlocks}} to create parity blocks. A new queue ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues to maintain the relative order of encoding and replication tasks. * Whenever a block is finalized and meets EC criteria -- including 1) block size is full; 2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it needs to store a set of blocks waiting to be encoded. Different grouping algorithms can be applied -- e.g., always grouping blocks in the same file. Blocks in a group should also reside on different DataNodes, and ideally on different racks, to tolerate node and rack failures. If successful, it records the formed group with {{ECManager}} and insert the parity blocks into {{QUEUE_INITIAL_ENCODING}}. * When a parity block or a raw block in {{ENCODED}} state is found missing, {{ErasureCodingBlocks}} adds it to existing priority queues in {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might
[jira] [Updated] (HDFS-7339) Create block groups for initial block encoding
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7339: Attachment: Encoding-design-NN.jpg Architecture of NameNode extensions Create block groups for initial block encoding -- Key: HDFS-7339 URL: https://issues.apache.org/jira/browse/HDFS-7339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: Encoding-design-NN.jpg All erasure codec operations center around the concept of _block groups_, which are formed in encoding and looked up in decoding. This JIRA creates a lightweight {{BlockGroup}} class to record the original and parity blocks in an encoding group, as well as a pointer to the codec schema. Pluggable codec schemas will be supported in HDFS-7337. The NameNode creates and maintains {{BlockGroup}} instances through 2 new components; the attached figure has an illustration of the architecture. {{ECManager}}: This module manages {{BlockGroups}} and associated codec schemas. As a simple example, it stores the codec schema of Reed-Solomon algorithm with 3 original and 2 parity blocks (5 blocks in each group). Each {{BlockGroup}} points to the schema it uses. To facilitate lookups during recovery requests, {{BlockGroups}} should be oraganized as a map keyed by {{Blocks}}. {{ErasureCodingBlocks}}: Block encoding work is triggered by multiple events. This module analyzes the incoming events, and dispatches tasks to {{UnderReplicatedBlocks}} to create parity blocks. A new queue ({{QUEUE_INITIAL_ENCODING}}) will be added to the 5 existing priority queues to maintain the relative order of encoding and replication tasks. * Whenever a block is finalized and meets EC criteria -- including 1) block size is full; 2) the file’s storage policy allows EC -- {{ErasureCodingBlocks}} tries to form a {{BlockGroup}}. In order to do so it needs to store a set of blocks waiting to be encoded. Different grouping algorithms can be applied -- e.g., always grouping blocks in the same file. Blocks in a group should also reside on different DataNodes, and ideally on different racks, to tolerate node and rack failures. If successful, it records the formed group with {{ECManager}} and insert the parity blocks into {{QUEUE_INITIAL_ENCODING}}. * When a parity block or a raw block in {{ENCODED}} state is found missing, {{ErasureCodingBlocks}} adds it to existing priority queues in {{UnderReplicatedBlocks}}. E.g., if all parity blocks in a group are lost, they should be added to {{QUEUE_HIGHEST_PRIORITY}}. New priorities might be added for fine grained differentiation (e.g., loss of a raw block versus a parity one). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196496#comment-14196496 ] Colin Patrick McCabe commented on HDFS-7199: Can you post a new patch with the else on the same line as the close brace as per our coding standard? Then I'll commit this. Thanks guys. DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7340: Attachment: HDFS-7340.001.patch Thanks Suresh and Nicholas for the review. Update the patch to address Suresh's comments. bq. add Idempotent annotation to the method The ClientProtocol#rollingUpgrade has already been annotated as idempotent before the fix. make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-6803: Attachment: HDFS-6803v2.txt Thanks [~ste...@apache.org]. Here is v2. Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, DocumentingDFSClientDFSInputStream.v2.pdf, HDFS-6803v2.txt Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196511#comment-14196511 ] Hadoop QA commented on HDFS-7340: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679263/HDFS-7340.001.patch against trunk revision 3dfd6e6. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8639//console This message is automatically generated. make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196514#comment-14196514 ] Hudson commented on HDFS-7340: -- FAILURE: Integrated in Hadoop-trunk-Commit #6434 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6434/]) HDFS-7340. Make rollingUpgrade start/finalize idempotent. Contributed by Jing Zhao. (jing9: rev 3dfd6e68fe5028fe3766ae5056dc175c38cc97e1) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestRollingUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7340) make rollingUpgrade start/finalize idempotent
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7340: Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Since the change between 000 and 001 patch is only adding a word into the dfsadmin output (and we do not check the content of this output in the current unit tests), I committed this patch before waiting for Jenkins again. make rollingUpgrade start/finalize idempotent - Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.6.0 Attachments: HDFS-7340.000.patch, HDFS-7340.001.patch I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196576#comment-14196576 ] Haohui Mai commented on HDFS-7334: -- Looks good to me. {code} conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_MAX_RETRIES_KEY, 1); -conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_CHECK_PERIOD_KEY, 1); +conf.set(DFSConfigKeys.DFS_NAMENODE_CHECKPOINT_PERIOD_KEY, 1); {code} I think that the code should call {{setInt}} instead. Can you use the jira to clean them up? Thanks. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7334.001.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196585#comment-14196585 ] Colin Patrick McCabe commented on HDFS-7017: Thanks, this looks better. bq. I catch std::bad_alloc in lease renewer, if overcommit turned on, it does nothing, but if it is thrown in some case, I do not want the library die in backend working thread. std::bad_alloc will be thrown again somewhere in main thread and the API can handle it well. I really can't agree with this rationale. If {{std::bad_alloc}} is causing arbitrary threads to terminate (without any message, since we don't log anything currently), how is the user supposed to know? And why do we think that std::bad_alloc will be thrown again somewhere in main thread? Perhaps terminating this thread freed up enough memory to proceed. I think that 99.% of all users will run with memory overcommit turned on, which means that this catch block will never be an issue. The fact that nobody runs with overcommit disabled also means this code will never be tested. If we want to keep the catch block, let's at least log a message. If you're concerned that the logging will throw another exception, we can have another try... catch block. Implement OutputStream for libhdfs3 --- Key: HDFS-7017 URL: https://issues.apache.org/jira/browse/HDFS-7017 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-7017-pnative.002.patch, HDFS-7017-pnative.003.patch, HDFS-7017.patch Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196603#comment-14196603 ] Colin Patrick McCabe commented on HDFS-7314: Thanks, [~mingma]. It's interesting that all the unit tests pass with the changed behavior of {{DFSClient#abort}}. I would prefer not to add this new configuration key, because I really can't think of any cases where I'd like to set it to {{true}}. I think it would be better just to have the lease timeout logic call a function other than {{DFSClient#abort}}. Basically create something like {{DFSClient#abortOpenFiles}} and have the lease timeout code call this instead of abort. That way we don't get confused about what abort means, but we also have the nice behavior that our client continues to be useful after a lease timeout. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7354) Support parity blocks in block management
Zhe Zhang created HDFS-7354: --- Summary: Support parity blocks in block management Key: HDFS-7354 URL: https://issues.apache.org/jira/browse/HDFS-7354 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang In the current block management system, each block is associated with a file. Orphan blocks are considered corrupt and will be removed. In this JIRA we extend {{Block}} with a binary flag denoting whether it is a parity block ({{isParity}}). Parity blocks are created, stored, and reported the same way as raw ones. They have regular block IDs which are unrelated to those of the raw blocks in the same group; their replicas (normally only 1) are stored in RBW and finalized directories on the DataNode depending on the stage; they are also included in block reports. The only distinction of a parity block is the lack of file affiliation. The block management system will be aware of parity blocks and will _not_ try to remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196619#comment-14196619 ] Colin Patrick McCabe commented on HDFS-3107: bq. There is the option to treat the combined patch HDFS-3107--7056 as the first patch, which accounts for upgrade and rollback functionality as well as snapshot support, demonstrated in unit test. That's fine with me. It can go into trunk directly if it doesn't break rollback + snapshots. bq. I am not objecting to do work on a branch but I am unsure it is necessary given the combined patch seems to meet the support requirements asked for this work. I suggested a branch since I thought it would let us commit things quicker. But I don't think it's necessary if you can do things without breaking trunk. It is going to be no more than 3-4 patches anyway as I understand. Whatever is easiest for you guys. Just one request: Can you post the combined patch on a subtask rather than this JIRA? I think having patches on this umbrella jira is very confusing. If you're going to combine the patches, post the combined patch on either HDFS-7341 or HDFS-7056 please. Thanks. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196620#comment-14196620 ] Haohui Mai commented on HDFS-6982: -- {code} +if (bucket.isStaleNow(time)) { + bucket.safeReset(time); +} {code} Maybe I'm missing something, but it looks like that it resets the bucket on every intervals, causing hiccups in the data. It might make more sense to use a decay function in this case: {code} bucket - alpha * bucket + delta {code} Where 0 alpha 1. Assuming that the requests follow a Poisson distribution, you can calculate alpha w.r.t. each window based on the timespan of the delta. nntop: top-like tool for name node users - Key: HDFS-6982 URL: https://issues.apache.org/jira/browse/HDFS-6982 Project: Hadoop HDFS Issue Type: New Feature Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, nntop-design-v1.pdf In this jira we motivate the need for nntop, a tool that, similarly to what top does in Linux, gives the list of top users of the HDFS name node and gives insight about which users are sending majority of each traffic type to the name node. This information turns out to be the most critical when the name node is under pressure and the HDFS admin needs to know which user is hammering the name node and with what kind of requests. Here we present the design of nntop which has been in production at Twitter in the past 10 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K nodes), low memory footprint (less than a few MB), and quite efficient for the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7199: - Status: Open (was: Patch Available) Need to address Collins comments. DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7199: - Status: Patch Available (was: Open) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-7199: - Attachment: HDFS-7199-1.patch Updated patch with addressing Collins comment. DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196632#comment-14196632 ] Maysam Yabandeh commented on HDFS-6982: --- Thanks [~wheat9] for the comment. Let me explain how the buckets are employed in the rolling window implementation. The rolling window can compute the total value of the event in the past period of time, lets say a minute. The last minute is divided to multiple buckets where buckets are placed in a ring. The total number of the events in the last minute the sum of the values of the buckets. As the time rolls forward a bucket of the last time period is reused for the current time period. Lets says that the bucket that we are writing to, was used to accumulate the events of 67 seconds ago. Before we start adding events to that (which will be used to compute the event of the last 60 seconds) we need to zero the content of the bucket. Whether the bucket is stale or not is determined by #isStaleNow method. Considering the above explanation let me know if the current implementation of zeroing stale buckets makes sense to you. nntop: top-like tool for name node users - Key: HDFS-6982 URL: https://issues.apache.org/jira/browse/HDFS-6982 Project: Hadoop HDFS Issue Type: New Feature Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, nntop-design-v1.pdf In this jira we motivate the need for nntop, a tool that, similarly to what top does in Linux, gives the list of top users of the HDFS name node and gives insight about which users are sending majority of each traffic type to the name node. This information turns out to be the most critical when the name node is under pressure and the HDFS admin needs to know which user is hammering the name node and with what kind of requests. Here we present the design of nntop which has been in production at Twitter in the past 10 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K nodes), low memory footprint (less than a few MB), and quite efficient for the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7334: --- Attachment: HDFS-7334.002.patch [~wheat9], Thanks for the review! The .002 patch changes the set to setInt. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196644#comment-14196644 ] Haohui Mai commented on HDFS-6982: -- bq. Before we start adding events to that (which will be used to compute the event of the last 60 seconds) we need to zero the content of the bucket. Thanks for the explanation. When the bucket is zeroed and the metrics are collected right after it, does it mean that the metrics have smaller numbers? nntop: top-like tool for name node users - Key: HDFS-6982 URL: https://issues.apache.org/jira/browse/HDFS-6982 Project: Hadoop HDFS Issue Type: New Feature Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, nntop-design-v1.pdf In this jira we motivate the need for nntop, a tool that, similarly to what top does in Linux, gives the list of top users of the HDFS name node and gives insight about which users are sending majority of each traffic type to the name node. This information turns out to be the most critical when the name node is under pressure and the HDFS admin needs to know which user is hammering the name node and with what kind of requests. Here we present the design of nntop which has been in production at Twitter in the past 10 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K nodes), low memory footprint (less than a few MB), and quite efficient for the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196649#comment-14196649 ] Haohui Mai commented on HDFS-7334: -- +1 pending jenkins. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196655#comment-14196655 ] Jing Zhao commented on HDFS-7056: - Thanks for working on this, [~shv] and [~zero45]. So far I just went through the namenode snapshot part (INode, INodeFile, FileDiff, and FileDiffList) and I will continue reviewing the remaining. # Looks like {{findLaterSnapshotWithBlocks}} and {{findEarlierSnapshotWithBlocks}} are always coupled with {{FileDiff#getBlocks}}. Maybe we can combine them so that we can wrap the logic like the following code into two methods like findBlocksAfter and findBlocksBefore? {code} +FileDiff diff = getDiffs().getDiffById(snapshot); +BlockInfo[] snapshotBlocks = diff == null ? getBlocks() : diff.getBlocks(); +if(snapshotBlocks != null) + return snapshotBlocks; +// Blocks are not in the current snapshot +// Find next snapshot with blocks present or return current file blocks +diff = getDiffs().findLaterSnapshotWithBlocks(diff.getSnapshotId()); +snapshotBlocks = (diff == null) ? getBlocks() : diff.getBlocks(); {code} # Since the same block can be included in different file diffs, we may have duplicated blocks in the {{collectedBlocks}}. Will this lead to duplicated records in invalid block list? {code} public void destroyAndCollectSnapshotBlocks( BlocksMapUpdateInfo collectedBlocks) { for(FileDiff d : asList()) d.destroyAndCollectSnapshotBlocks(collectedBlocks); } {code} # INodeFile#destroyAndCollectBlocks destroys the whole file, including the file diffs for snapshots. Thus we do not need to call {{collectBlocksAndClear}} and define a new destroyAndCollectAllBlocks method. Instead, we can simply first destroy all the blocks belonging to the current file, then check if calling {{sf.getDiffs().destroyAndCollectSnapshotBlocks}} is necessary. {code} +FileWithSnapshotFeature sf = getFileWithSnapshotFeature(); +if(sf == null || getDiffs().asList().isEmpty()) { + destroyAndCollectAllBlocks(collectedBlocks, removedINodes); + return; +} +sf.getDiffs().destroyAndCollectSnapshotBlocks(collectedBlocks); {code} # How do we currently calculate/update quota for a file? I guess we need to update the quota calculation algorithm for an INodeFile here. # I guess the semantic of {{findEarlierSnapshotWithBlocks}} is to find the FileDiff that satisfies: 1) its block list is not null, and 2) its snapshot id is less than the given {{snapshotId}}. However, if the given {{snapshotId}} is not {{CURRENT_STATE_ID}}, the current implementation may return a FileDiff whose snapshot id is = the given {{snapshotId}} (since {{getDiffById}} may return a diff with snapshot id greater than the given id). {code} public FileDiff findEarlierSnapshotWithBlocks(int snapshotId) { FileDiff diff = (snapshotId == Snapshot.CURRENT_STATE_ID) ? getLast() : getDiffById(snapshotId); BlockInfo[] snapshotBlocks = null; while(diff != null) { snapshotBlocks = diff.getBlocks(); if(snapshotBlocks != null) break; int p = getPrior(diff.getSnapshotId(), true); diff = (p == Snapshot.NO_SNAPSHOT_ID) ? null : getDiffById(p); } return diff; } {code} # Still for findEarlierSnapshotWithBlocks, because {{getPrior}} currently is a {{log\(n\)}} operation, the worst time complexity thus can be {{nlog\(n\)}}. Considering the list of the snapshot diff list is usually not big (we have an upper limit for the total number of snapshots), we may consider directly doing a linear scan for the file diff list. # In INode.java, why do we need the following change? {code} public final boolean isInLatestSnapshot(final int latestSnapshotId) { -if (latestSnapshotId == Snapshot.CURRENT_STATE_ID) { +if (latestSnapshotId == Snapshot.CURRENT_STATE_ID || +latestSnapshotId == Snapshot.NO_SNAPSHOT_ID) { {code} # Nit: need to add \{ and \} for the while loop according to our current coding style. Similar for several other places (e.g., {{FileDiffList#destroyAndCollectSnapshotBlocks}}). {code} +while(i currentBlocks.length i snapshotBlocks.length + currentBlocks[i] == snapshotBlocks[i]) + i++; +// Collect the remaining blocks of the file +while(i currentBlocks.length) + collectedBlocks.addDeleteBlock(currentBlocks[i++]); {code} # Minor: In the following code, instead of calling {{getDiffById}} to search for the file diff, we can let {{AbstractINodeDiffList#saveSelf2Snapshot}} return the diff it just finds/creates. {code} public void saveSelf2Snapshot(int latestSnapshotId, INodeFile iNodeFile, INodeFileAttributes snapshotCopy, boolean withBlocks) throws QuotaExceededException { super.saveSelf2Snapshot(latestSnapshotId, iNodeFile, snapshotCopy); if(! withBlocks) return; final FileDiff diff = getDiffById(latestSnapshotId); //
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196664#comment-14196664 ] Maysam Yabandeh commented on HDFS-6982: --- Let me take an example. Time period: 60 seconds Bucket duration: 20 seconds Bucket/window = 3 Event1: time 00:10:55 Current time: 00:11:03 Current sum for the past window (1 min) = 1 Event2: time 00:11:54 Current time: 00:12:07 Current sum for the past window (1 min) = 1 Now for this behavior to be implemented correctly we need to zero the content of the bucket number 3 because both Event1 and Event2 map to the same bucket but Event1 is irrelevant at time 00:12:07 since it happened before the last 60 seconds. Makes sense? nntop: top-like tool for name node users - Key: HDFS-6982 URL: https://issues.apache.org/jira/browse/HDFS-6982 Project: Hadoop HDFS Issue Type: New Feature Reporter: Maysam Yabandeh Assignee: Maysam Yabandeh Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, nntop-design-v1.pdf In this jira we motivate the need for nntop, a tool that, similarly to what top does in Linux, gives the list of top users of the HDFS name node and gives insight about which users are sending majority of each traffic type to the name node. This information turns out to be the most critical when the name node is under pressure and the HDFS admin needs to know which user is hammering the name node and with what kind of requests. Here we present the design of nntop which has been in production at Twitter in the past 10 months. nntop proved to have low cpu overhead ( 2% in a cluster of 4K nodes), low memory footprint (less than a few MB), and quite efficient for the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1419#comment-1419 ] Colin Patrick McCabe commented on HDFS-7199: +1 pending jenkins DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7314: -- Attachment: HDFS-7314-2.patch Thanks, [~cmccabe]. I have updated the patch based on your suggestion. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196716#comment-14196716 ] Plamen Jeliazkov commented on HDFS-3107: I will attach it to HDFS-7056 since it has the design doc attached to it and is assigned to me. Thanks [~cmccabe]. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: HDFS-3107-HDFS-7056-combined.patch Attaching combined patch here as well. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7343) A comprehensive and flexible storage policy engine
[ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196759#comment-14196759 ] Andrew Purtell commented on HDFS-7343: -- Most of the ideas mentioned in the description of HDFS-4672 have made it in. Might be worth examining the remainder in the context of this issue. (Or not.) A comprehensive and flexible storage policy engine -- Key: HDFS-7343 URL: https://issues.apache.org/jira/browse/HDFS-7343 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Kai Zheng As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage policy engine considering file attributes, metadata, data temperature, storage type, EC codec, available hardware capabilities, user/application preference and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7335: -- Status: Patch Available (was: Open) Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Attachments: HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7355: Status: Patch Available (was: Open) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7355: Attachment: HDFS-7355.1.patch The attached patch skips the test. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
Chris Nauroth created HDFS-7355: --- Summary: TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7335: -- Attachment: HDFS-7335.patch Attaching a patch that removes the checkOperation call in FSNamesystem.analyzeFileState. No tests added as this change is trivial. Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Attachments: HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196734#comment-14196734 ] Chris Nauroth commented on HDFS-7355: - http://technet.microsoft.com/en-us/library/cc783530(v=ws.10).aspx Quoting the relevant section: {quote} Permissions enable the owner of each secured object, such as a file, Active Directory object, or registry key, to control who can perform an operation or a set of operations on the object or object property. Because access to an object is at the owner’s discretion, the type of access control that is used in Windows Server 2003 is called discretionary access control. An owner of an object always has the ability to read and change permissions on the object.{quote} We'll need to skip this test on Windows. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7208) NN doesn't schedule replication when a DN storage fails
[ https://issues.apache.org/jira/browse/HDFS-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196752#comment-14196752 ] Chris Nauroth commented on HDFS-7208: - The new test cannot work correctly on Windows. See HDFS-7355 for a full explanation and a trivial patch to skip the test on Windows. NN doesn't schedule replication when a DN storage fails --- Key: HDFS-7208 URL: https://issues.apache.org/jira/browse/HDFS-7208 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.6.0 Attachments: HDFS-7208-2.patch, HDFS-7208-3.patch, HDFS-7208.patch We found the following problem. When a storage device on a DN fails, NN continues to believe replicas of those blocks on that storage are valid and doesn't schedule replication. A DN has 12 storage disks. So there is one blockReport for each storage. When a disk fails, # of blockReport from that DN is reduced from 12 to 11. Given dfs.datanode.failed.volumes.tolerated is configured to be 0, NN still considers that DN healthy. 1. A disk failed. All blocks of that disk are removed from DN dataset. {noformat} 2014-10-04 02:11:12,626 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing replica BP-1748500278-xx.xx.xx.xxx-1377803467793:1121568886 on failed volume /data/disk6/dfs/current {noformat} 2. NN receives DatanodeProtocol.DISK_ERROR. But that isn't enough to have NN remove the DN and the replicas from the BlocksMap. In addition, blockReport doesn't provide the diff given that is done per storage. {noformat} 2014-10-04 02:11:12,681 WARN org.apache.hadoop.hdfs.server.namenode.NameNode: Disk error on DatanodeRegistration(xx.xx.xx.xxx, datanodeUuid=f3b8a30b-e715-40d6-8348-3c766f9ba9ab, infoPort=50075, ipcPort=50020, storageInfo=lv=-55;cid=CID-e3c38355-fde5-4e3a-b7ce-edacebdfa7a1;nsid=420527250;c=1410283484939): DataNode failed volumes:/data/disk6/dfs/current {noformat} 3. Run fsck on the file and confirm the NN's BlocksMap still has that replica. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196781#comment-14196781 ] Hadoop QA commented on HDFS-7335: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679304/HDFS-7335.patch against trunk revision 1eed102. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 eclipse:eclipse{color}. The patch failed to build with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8644//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8644//console This message is automatically generated. Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Attachments: HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
Haohui Mai created HDFS-7356: Summary: Use DirectoryListing.hasMore() directly in nfs Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Priority: Minor In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu reassigned HDFS-7356: --- Assignee: Li Lu Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Li Lu Priority: Minor In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7233) NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException
[ https://issues.apache.org/jira/browse/HDFS-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196806#comment-14196806 ] Rushabh S Shah commented on HDFS-7233: -- All the tests passing on my local setup. NN logs unnecessary org.apache.hadoop.hdfs.protocol.UnresolvedPathException --- Key: HDFS-7233 URL: https://issues.apache.org/jira/browse/HDFS-7233 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.1 Reporter: Rushabh S Shah Assignee: Rushabh S Shah Attachments: HDFS-7233.patch Namenode logs the UnresolvedPathExceptioneven though that file exists in HDFS. Each time a symlink is accessed the NN will throw an UnresolvedPathException to have the client resolve it. This shouldn't be logged in the NN log and we could have really large NN logs if we don't fix this since every MR job on the cluster will access this symlink and cause a stacktrace to be logged. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
Tsz Wo Nicholas Sze created HDFS-7357: - Summary: FSNamesystem.checkFileProgress should log file path Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated HDFS-7356: Attachment: HDFS-7356-110414.patch Hi [~wheat9], I've fixed this in my patch. If you have time please feel free to have a look at it. Thanks! Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated HDFS-7356: Status: Patch Available (was: Open) Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7356) Use DirectoryListing.hasMore() directly in nfs
[ https://issues.apache.org/jira/browse/HDFS-7356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196821#comment-14196821 ] Jing Zhao commented on HDFS-7356: - +1 pending Jenkins Use DirectoryListing.hasMore() directly in nfs -- Key: HDFS-7356 URL: https://issues.apache.org/jira/browse/HDFS-7356 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Li Lu Priority: Minor Attachments: HDFS-7356-110414.patch In NFS the following code path can be simplified using {{DirectoryListing.hasMore()}}: {code} boolean eof = (n fstatus.length) ? false : (dlisting .getRemainingEntries() == 0); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7357: -- Status: Patch Available (was: Open) FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7357: -- Attachment: h7357_20141104.patch h7357_20141104.patch: - add path and other info to the log messages in checkFileProgress; - replace FSNamesystem.LOG with LOG; - avoid printing block pool Id; - slightly cleanup some other log messages. FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7357) FSNamesystem.checkFileProgress should log file path
[ https://issues.apache.org/jira/browse/HDFS-7357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196834#comment-14196834 ] Haohui Mai commented on HDFS-7357: -- +1 pending jenkins. FSNamesystem.checkFileProgress should log file path --- Key: HDFS-7357 URL: https://issues.apache.org/jira/browse/HDFS-7357 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7357_20141104.patch There is a log message in FSNamesystem.checkFileProgress for in-complete blocks. However, the log message does not include the file path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196849#comment-14196849 ] Ming Ma commented on HDFS-7355: --- Thanks, [~cnauroth]. The patch looks good. BTW, it seems some other test cases use {{assumeTrue(!System.getProperty(os.name).startsWith(Windows));}}. Perhaps this came up before, if we want to make unit tests pass on other non linux OS, should we set up Jenkins builds for that? TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7355) TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner.
[ https://issues.apache.org/jira/browse/HDFS-7355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196855#comment-14196855 ] Haohui Mai commented on HDFS-7355: -- +1 pending jenkins. TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure fails on Windows, because we cannot deny access to the file owner. Key: HDFS-7355 URL: https://issues.apache.org/jira/browse/HDFS-7355 Project: Hadoop HDFS Issue Type: Test Components: test Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Trivial Attachments: HDFS-7355.1.patch {{TestDataNodeVolumeFailure#testUnderReplicationAfterVolFailure}} fails on Windows. The test attempts to simulate volume failure by denying permissions to data volume directories. This doesn't work on Windows, because Windows allows the file owner access regardless of the permission settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196891#comment-14196891 ] Konstantin Shvachko commented on HDFS-7056: --- Actually Guo and I have finished the POC for this several months ago. But we couldn't open source it Hi Hu.It shows indeed in the design. Too bad you couldn't open source yours. Hope ours is similar. I know at least getBlocks(snapshotId) method is in common :-) Looking at Jing's comments. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7295) Support arbitrary max expiration times for delegation token
[ https://issues.apache.org/jira/browse/HDFS-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196896#comment-14196896 ] Allen Wittenauer commented on HDFS-7295: FWIW, I *do* think the max lifespan should be configurable. But letting that abs max time span be up to the user is suicide for security. Support arbitrary max expiration times for delegation token --- Key: HDFS-7295 URL: https://issues.apache.org/jira/browse/HDFS-7295 Project: Hadoop HDFS Issue Type: Improvement Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Currently the max lifetime of HDFS delegation tokens is hardcoded to 7 days. This is a problem for different users of HDFS such as long running YARN apps. Users should be allowed to optionally specify max lifetime for their tokens. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7231) rollingupgrade needs some guard rails
[ https://issues.apache.org/jira/browse/HDFS-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7231: --- Target Version/s: 2.6.0 rollingupgrade needs some guard rails - Key: HDFS-7231 URL: https://issues.apache.org/jira/browse/HDFS-7231 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Allen Wittenauer Priority: Blocker See first comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7335) Redundant checkOperation() in FSN.analyzeFileState()
[ https://issues.apache.org/jira/browse/HDFS-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7335: -- Attachment: HDFS-7335.patch New patch removes git diff prefix Redundant checkOperation() in FSN.analyzeFileState() Key: HDFS-7335 URL: https://issues.apache.org/jira/browse/HDFS-7335 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.1 Reporter: Konstantin Shvachko Assignee: Milan Desai Labels: newbie Attachments: HDFS-7335.patch, HDFS-7335.patch FSN.analyzeFileState() should not call checkOperation(). It is already properly checked before the call. First time as READ category, second time as WRITE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7279: - Attachment: HDFS-7279.004.patch Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7199) DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception
[ https://issues.apache.org/jira/browse/HDFS-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196933#comment-14196933 ] Hadoop QA commented on HDFS-7199: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679281/HDFS-7199-1.patch against trunk revision 1eed102. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1224 javac compiler warnings (more than the trunk's current 1223 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestLeaseRecovery2 The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileCreation {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8640//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8640//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8640//console This message is automatically generated. DFSOutputStream can silently drop data if DataStreamer crashes with a non-I/O exception --- Key: HDFS-7199 URL: https://issues.apache.org/jira/browse/HDFS-7199 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Jason Lowe Assignee: Rushabh S Shah Priority: Critical Attachments: HDFS-7199-1.patch, HDFS-7199-WIP.patch, HDFS-7199.patch If the DataStreamer thread encounters a non-I/O exception then it closes the output stream but does not set lastException. When the client later calls close on the output stream then it will see the stream is already closed with lastException == null, mistakently think this is a redundant close call, and fail to report any error to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196934#comment-14196934 ] Hadoop QA commented on HDFS-7334: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679284/HDFS-7334.002.patch against trunk revision 1eed102. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileCreation {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8641//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8641//console This message is automatically generated. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7334) Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures
[ https://issues.apache.org/jira/browse/HDFS-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196944#comment-14196944 ] Haohui Mai commented on HDFS-7334: -- The test report does not have the failure. I'll commit this patch shortly. Fix periodic failures of TestCheckpoint#testTooManyEditReplayFailures - Key: HDFS-7334 URL: https://issues.apache.org/jira/browse/HDFS-7334 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7334.001.patch, HDFS-7334.002.patch TestCheckpoint#testTooManyEditReplyFailures occasionally fails with a test timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)