[jira] [Commented] (HDFS-7609) startup used too much time to load edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289533#comment-14289533 ] Kihwal Lee commented on HDFS-7609: -- Compared to 0.23, edit replaying in 2.x is 5x-10x slower. This affects the namenode fail-over latency. [~mingma] also reported this issue before and saw the retry cahe being the bottleneck. startup used too much time to load edits Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread
[ https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7666: --- Attachment: HDFS-7666-v1.patch Datanode blockId layout upgrade threads should be daemon thread --- Key: HDFS-7666 URL: https://issues.apache.org/jira/browse/HDFS-7666 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-7666-v1.patch This jira is to mark the layout upgrade thread as daemon thread. {code} int numLinkWorkers = datanode.getConf().getInt( DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY, DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS); ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3689) Add support for variable length block
[ https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3689: Attachment: HDFS-3689.009.patch Add support for variable length block - Key: HDFS-3689 URL: https://issues.apache.org/jira/browse/HDFS-3689 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, HDFS-3689.009.patch, HDFS-3689.009.patch Currently HDFS supports fixed length blocks. Supporting variable length block will allow new use cases and features to be built on top of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289686#comment-14289686 ] Andrew Wang commented on HDFS-7337: --- I don't think it's necessary to move to HADOOP. If anything, I find it conceptually easier if everything related to erasure encoding stayed a subtask of HDFS-7285. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Kai Zheng Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec.pdf According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread
[ https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-7666: --- Status: Patch Available (was: Open) Datanode blockId layout upgrade threads should be daemon thread --- Key: HDFS-7666 URL: https://issues.apache.org/jira/browse/HDFS-7666 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-7666-v1.patch This jira is to mark the layout upgrade thread as daemon thread. {code} int numLinkWorkers = datanode.getConf().getInt( DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY, DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS); ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread
Rakesh R created HDFS-7666: -- Summary: Datanode blockId layout upgrade threads should be daemon thread Key: HDFS-7666 URL: https://issues.apache.org/jira/browse/HDFS-7666 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Rakesh R Assignee: Rakesh R This jira is to mark the layout upgrade thread as daemon thread. {code} int numLinkWorkers = datanode.getConf().getInt( DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY, DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS); ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) startup used too much time to load edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289642#comment-14289642 ] Ming Ma commented on HDFS-7609: --- Yeah, we also had this issue. It appears somehow an entry with the same client id and caller id has existed in retryCache; which ended up calling expensive PriorityQueue#remove function. Below is the call stack captured when standby was replaying the edit logs. {noformat} Edit log tailer prio=10 tid=0x7f096d491000 nid=0x533c runnable [0x7ef05ee7a000] java.lang.Thread.State: RUNNABLE at java.util.PriorityQueue.removeAt(PriorityQueue.java:605) at java.util.PriorityQueue.remove(PriorityQueue.java:364) at org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:218) at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:296) - locked 0x7ef2fe306978 (a org.apache.hadoop.ipc.RetryCache) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:801) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:507) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:804) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:785) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295) {noformat} startup used too much time to load edits Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3689) Add support for variable length block
[ https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3689: Attachment: (was: HDFS-3689.009.patch) Add support for variable length block - Key: HDFS-3689 URL: https://issues.apache.org/jira/browse/HDFS-3689 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, HDFS-3689.009.patch Currently HDFS supports fixed length blocks. Supporting variable length block will allow new use cases and features to be built on top of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3689) Add support for variable length block
[ https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3689: Attachment: HDFS-3689.009.patch editsStored Add support for variable length block - Key: HDFS-3689 URL: https://issues.apache.org/jira/browse/HDFS-3689 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, HDFS-3689.009.patch, HDFS-3689.009.patch, editsStored Currently HDFS supports fixed length blocks. Supporting variable length block will allow new use cases and features to be built on top of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7353) Raw Erasure Coder API for concrete encoding and decoding
[ https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289765#comment-14289765 ] Tsz Wo Nicholas Sze commented on HDFS-7353: --- Thanks for the update. Some comments: - ec also can mean error correcting. How about renaming the package to io.erasure? Then, using EC inside the package won't be ambiguous. - Should the package be moved under hdfs? Do you expect that it will be used outside hdfs? - Please explain what it means by Raw in the javadoc. - By The number of elements, do you mean length in bytes? Should it be long instead of int? - The javadoc An abstract raw erasure decoder class does not really explain what the class does. Could you add more description about how the class is used and the relationship with the other classes? - protected methods, especially the ones with abstract, should also has javadoc. - There are some tab characters. We should replace them using spaces. Raw Erasure Coder API for concrete encoding and decoding Key: HDFS-7353 URL: https://issues.apache.org/jira/browse/HDFS-7353 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Fix For: HDFS-EC Attachments: HDFS-7353-v1.patch, HDFS-7353-v2.patch This is to abstract and define raw erasure coder API across different codes algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side
[ https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289772#comment-14289772 ] Tsz Wo Nicholas Sze commented on HDFS-7653: --- Sound good. Thanks! Block Readers and Writers used in both client side and datanode side Key: HDFS-7653 URL: https://issues.apache.org/jira/browse/HDFS-7653 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: BlockReadersWriters.patch There're a lot of block read/write operations in HDFS-EC, for example, when client writes a file in striping layout, client has to write several blocks to several different datanodes; if a datanode wants to do an encoding/decoding task, it has to read several blocks from itself and other datanodes, and writes one or more blocks to itself or other datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7584) Enable Quota Support for Storage Types (SSD)
[ https://issues.apache.org/jira/browse/HDFS-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7584: - Attachment: HDFS-7584.3.patch Include the editsStored binary in the patch. Enable Quota Support for Storage Types (SSD) - Key: HDFS-7584 URL: https://issues.apache.org/jira/browse/HDFS-7584 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7584 Quota by Storage Type - 01202015.pdf, HDFS-7584.0.patch, HDFS-7584.1.patch, HDFS-7584.2.patch, HDFS-7584.3.patch, editsStored Phase II of the Heterogeneous storage features have completed by HDFS-6584. This JIRA is opened to enable Quota support of different storage types in terms of storage space usage. This is more important for certain storage types such as SSD as it is precious and more performant. As described in the design doc of HDFS-5682, we plan to add new quotaByStorageType command and new name node RPC protocol for it. The quota by storage type feature is applied to HDFS directory level similar to traditional HDFS space quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289778#comment-14289778 ] Rakesh R commented on HDFS-7648: bq.If a mismatch is found, it should also fix it Should I need to worry about the race between DatanodeBlockId_Layout_threads(they will do a linking ) in Datastorage and this call path? bq. DirectoryScanner seems a better place to do the verification Thanks for the hint. Let me try this as well. Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7667: --- Attachment: HDFS-7667.001.patch [~aw], Thanks for looking it over. The .001 version makes those two changes. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
[ https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289725#comment-14289725 ] Arpit Agarwal commented on HDFS-7647: - Thanks for the patch [~milandesai], this looks like a good change. Couple of comments: # {{LocatedBlocks.getStorageTypes}} and {{.getStorageIDs}} should cache the generated arrays on first invocation since existing callers expect these calls to be cheap. Except for the sorting code the content of {{locs}} is not modified once the object is initialized. # The sorting code must invalidate the cached arrays from 1. # We should add a unit test for sortLocatedBlocks specifically for the invalidation. # Also it would be good to add a comment to {{LocatedBlocks}} stating the assumption that {{locs}} must not be modified by the caller, with the exception of {{sortLocatedBlocks}}. In a separate Jira it would be good to make {{locs}} an unmodifiable list or a Guava {{ImmutableList}}. The source of the issue is that an external function reaches into the LocatedBlock object and modifies its private fields. It doesn't help that Java lacks support for C++-style const arrays. DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs -- Key: HDFS-7647 URL: https://issues.apache.org/jira/browse/HDFS-7647 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Milan Desai Assignee: Milan Desai Attachments: HDFS-7647-2.patch, HDFS-7647.patch DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside each LocatedBlock, but does not touch the array of StorageIDs and StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are mismatched. The method is called by FSNamesystem.getBlockLocations(), so the client will not know which StorageID/Type corresponds to which DatanodeInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task
[ https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289795#comment-14289795 ] Aaron T. Myers commented on HDFS-7421: -- Hey Kihwal, yes indeed, this seems like a dupe. I'll go ahead and close this one. Thanks for pointing that out, and thanks for filing/fixing the issue in HDFS-6425. Move processing of postponed over-replicated blocks to a background task Key: HDFS-7421 URL: https://issues.apache.org/jira/browse/HDFS-7421 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 2.6.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers In an HA environment, we postpone sending block invalidates to DNs until all DNs holding a given block have done at least one block report to the NN after it became active. When that first block report after becoming active does occur, we attempt to reprocess all postponed misreplicated blocks inline with the block report RPC. In the case where there are many postponed misreplicated blocks, this can cause block report RPCs to take an inordinately long time to complete, sometimes on the order of minutes, which has the potential to tie up RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process all postponed misreplicated blocks so that we can quickly send invalidate commands back to DNs, so let's move this processing outside of the RPC handler context and into a background thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7667: --- Attachment: HDFS-7667.000.patch Diffs attached. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7667: --- Status: Patch Available (was: Open) Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289838#comment-14289838 ] Allen Wittenauer commented on HDFS-7667: Oh, should probably drop --config $HADOOP_CONF_DIR , since that's pretty useless as well. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289871#comment-14289871 ] Zhe Zhang commented on HDFS-7285: - Thanks for clarifying. bq. After some discussion with Jing, we think that block group ID is not needed at all – we only need to keep the block group index within a file. Will give more details later. This is [discussed | https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868] under HDFS-7339. Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: ECAnalyzer.py, ECParser.py, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289901#comment-14289901 ] Zhe Zhang commented on HDFS-7339: - bq. First a quick comment about the current SequentialBlockGroupIdGenerator and SequentialBlockIdGenerator. The current patch tries to use a flag to distinguish contiguous and stripped blocks. However, since there may still be conflicts coming from historical randomly assigned block ID, for blocks in block reports, we still to check two places to determine if this is a contiguous block or a stripped block. If a block's ID has the 'striped' flag bit, we always _attempt_ to look up the block group map first. Without rolling upgrade we only need this one lookup. And yes, we do need to check two places in the worst case. Given that HDFS-4645 will be over 2 years old by the time erasure coding is released, I guess this won't happen a lot? Allocating and persisting block groups in NameNode -- Key: HDFS-7339 URL: https://issues.apache.org/jira/browse/HDFS-7339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg All erasure codec operations center around the concept of _block group_; they are formed in initial encoding and looked up in recoveries and conversions. A lightweight class {{BlockGroup}} is created to record the original and parity blocks in a coding group, as well as a pointer to the codec schema (pluggable codec schemas will be supported in HDFS-7337). With the striping layout, the HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. Therefore we propose to extend a file’s inode to switch between _contiguous_ and _striping_ modes, with the current mode recorded in a binary flag. An array of BlockGroups (or BlockGroup IDs) is added, which remains empty for “traditional” HDFS files with contiguous block layout. The NameNode creates and maintains {{BlockGroup}} instances through the new {{ECManager}} component; the attached figure has an illustration of the architecture. As a simple example, when a {_Striping+EC_} file is created and written to, it will serve requests from the client to allocate new {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, {{BlockGroups}} are allocated both in initial online encoding and in the conversion from replication to EC. {{ECManager}} also facilitates the lookup of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289698#comment-14289698 ] Rakesh R commented on HDFS-7648: [~szetszwo] I'm going through the block ID-based block layout on datanodes design and I come across this jira. I'm interested to implement this idea. I feel block report generation would be feasible one. Could you briefly explain about the verification points if you have anything specific in your mind. Thanks! Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7584) Enable Quota Support for Storage Types (SSD)
[ https://issues.apache.org/jira/browse/HDFS-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7584: - Attachment: HDFS-7584.2.patch Enable Quota Support for Storage Types (SSD) - Key: HDFS-7584 URL: https://issues.apache.org/jira/browse/HDFS-7584 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7584 Quota by Storage Type - 01202015.pdf, HDFS-7584.0.patch, HDFS-7584.1.patch, HDFS-7584.2.patch, editsStored Phase II of the Heterogeneous storage features have completed by HDFS-6584. This JIRA is opened to enable Quota support of different storage types in terms of storage space usage. This is more important for certain storage types such as SSD as it is precious and more performant. As described in the design doc of HDFS-5682, we plan to add new quotaByStorageType command and new name node RPC protocol for it. The quota by storage type feature is applied to HDFS directory level similar to traditional HDFS space quota. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task
[ https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-7421. -- Resolution: Duplicate Move processing of postponed over-replicated blocks to a background task Key: HDFS-7421 URL: https://issues.apache.org/jira/browse/HDFS-7421 Project: Hadoop HDFS Issue Type: Improvement Components: ha, namenode Affects Versions: 2.6.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers In an HA environment, we postpone sending block invalidates to DNs until all DNs holding a given block have done at least one block report to the NN after it became active. When that first block report after becoming active does occur, we attempt to reprocess all postponed misreplicated blocks inline with the block report RPC. In the case where there are many postponed misreplicated blocks, this can cause block report RPCs to take an inordinately long time to complete, sometimes on the order of minutes, which has the potential to tie up RPC handlers, block incoming RPCs, etc. There's no need to hurriedly process all postponed misreplicated blocks so that we can quickly send invalidate commands back to DNs, so let's move this processing outside of the RPC handler context and into a background thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289822#comment-14289822 ] Rakesh R commented on HDFS-7648: Ah, this is upgrade path. {{DataStorage.java:}} {code} line#1036 ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers); . . futures.add(linkWorkers.submit(new CallableVoid() { {code} Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289828#comment-14289828 ] Allen Wittenauer commented on HDFS-7667: While you're there, fix: {code} $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId cluster_ID {code} to be {code} $HADOOP_PREFIX/bin/hdfs --daemon start namenode --config $HADOOP_CONF_DIR -upgrade -clusterId cluster_ID {code} Also be aware that this may not apply to 2.x. The documentation is different. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289855#comment-14289855 ] Tsz Wo Nicholas Sze commented on HDFS-7648: --- Yes. It won't be a problem then. Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7614) Implement COMPLETE state of erasure coding block groups
[ https://issues.apache.org/jira/browse/HDFS-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289875#comment-14289875 ] Zhe Zhang commented on HDFS-7614: - This design question is mainly [discussed | https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868] under HDFS-7339. Implement COMPLETE state of erasure coding block groups --- Key: HDFS-7614 URL: https://issues.apache.org/jira/browse/HDFS-7614 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-7339 implements 2 states of an under-construction block group: {{UNDER_CONSTRUCTION}} and {{COMMITTED}}. The {{COMPLETE}} requires DataNode to report stored replicas, therefore will be separately implemented in this JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7648: -- Description: HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. (was: HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation do the check.) Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289727#comment-14289727 ] Tsz Wo Nicholas Sze commented on HDFS-7648: --- During block report generation or directory scanning, it traverses the directory for collecting all the replica information. We should verify whether the actual directory location of a replica has the expected directory path computed using its block ID. If a mismatch is found, it should also fix it. On a second thought, DirectoryScanner seems a better place to do the verification since the purpose of the DirectoryScanner is to verify and fix the blocks stored in the local directories. What do you think? Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7652) Process block reports for erasure coded blocks
[ https://issues.apache.org/jira/browse/HDFS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289879#comment-14289879 ] Zhe Zhang commented on HDFS-7652: - This design question is mainly [discussed | https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868] under HDFS-7339 Again, I really appreciate the in-depth thoughts! [~szetszwo] [~jingzhao] Process block reports for erasure coded blocks -- Key: HDFS-7652 URL: https://issues.apache.org/jira/browse/HDFS-7652 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang HDFS-7339 adds support in NameNode for persisting block groups. For memory efficiency, erasure coded blocks under the striping layout are not stored in {{BlockManager#blocksMap}}. Instead, entire block groups are stored in {{BlockGroupManager#blockGroups}}. When a block report arrives from the DataNode, it should be processed under the block group that it belongs to. The following naming protocol is used to calculate the group of a given block: {code} * HDFS-EC introduces a hierarchical protocol to name blocks and groups: * Contiguous: {reserved block IDs | flag | block ID} * Striped: {reserved block IDs | flag | block group ID | index in group} * * Following n bits of reserved block IDs, The (n+1)th bit in an ID * distinguishes contiguous (0) and striped (1) blocks. For a striped block, * bits (n+2) to (64-m) represent the ID of its block group, while the last m * bits represent its index of the group. The value m is determined by the * maximum number of blocks in a group (MAX_BLOCKS_IN_GROUP). {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289787#comment-14289787 ] Tsz Wo Nicholas Sze commented on HDFS-7648: --- Should I need to worry about the race between DatanodeBlockId_Layout_threads(they will do a linking ) in Datastorage and this call path? Could you show me the line number in DataStorage.java? Verify the datanode directory layout Key: HDFS-7648 URL: https://issues.apache.org/jira/browse/HDFS-7648 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze HDFS-6482 changed datanode layout to use block ID to determine the directory to store the block. We should have some mechanism to verify it. Either DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289780#comment-14289780 ] Tsz Wo Nicholas Sze commented on HDFS-7337: --- Do you expect that the erasure code package will be used outside hdfs? If not, we could put everything under hdfs for the moment. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Kai Zheng Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec.pdf According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7667) Various typos and improvements to HDFS Federation doc
Charles Lamb created HDFS-7667: -- Summary: Various typos and improvements to HDFS Federation doc Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289868#comment-14289868 ] Zhe Zhang commented on HDFS-7339: - [~jingzhao] Thanks for the insightful review! I believe this discussion also addresses comments from [~szetszwo] under HDFS-7285, HDFS-7614, and HDFS-7652. The main reason for creating a BlockGroup class and the hierarchical block ID protocol is to _minimize NN memory overhead_. As shown in the [fsimage analysis | https://issues.apache.org/jira/secure/attachment/12690129/fsimage-analysis-20150105.pdf], the {{blocksMap}} size increases 3.5x~5.4x if the NN plainly tracks every striped block -- this translates to 10s GB of memory usage. This is mainly caused by small blocks being striped into many more even smaller blocks. bq. I think DataNode does not need to know the difference between contiguous blocks and stripped blocks (when doing recovery the datanode can learn the information from NameNode). The concept of BlockGroup should be known and used only internally in NameNode (and maybe also logically known by the client while writing). bq. Datanodes and their block reports do not distinguish stripped and contiguous blocks. And we do not need to distinguish them from the block ID. They are treated equally while storing and reporting in/from the DN. Agreed. DN is indeed group-agnostic in the current design. The only DN code change will be for block recovery and conversion. It will probably be clearer when the client patch (HDFS-7545) is ready. As shown in the [design | https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf], after receiving a newly allocated block group, the client does the following: # Calculates blocks IDs from the block group ID and the group layout (number of data and parity blocks) -- a block's ID is basically the group ID plus the block's index in the group. # The {{DFSOutputStream}} starts _n_ {{DataStreamer}} threads, each write one block to its destination DN. Note that even the {{DataStreamer}} is unaware of the group -- it just follows the regular client-DN block writing protocol. Therefore the DN just receives and processes regular block creation and write requests. The DN then follows the regular block reporting protocol for all contiguous and striped blocks. Then the NN (with the logic from HDFS-7652) will parse the reported block ID, and store the reported info under either {{blocksMap}} or the map of block groups. Again, the benefit of having a separate map for block groups is to avoid the order-of-magnitude increase of {{blocksMap}} size. We can track on the unit of block groups because data loss can only happen when the entire group is under-replicated -- i.e. the number of healthy blocks in the group falls below a threshold. This coarse-grained tracking also aligns with the plan to push some monitoring and recovery workload from NN to DN, as [~sureshms] also [proposed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14192480page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14192480] in the meetup. bq. Fundamentally BlockGroup is also a BlockCollection. We do not need to assign generation stamp to BlockGroup (and even its id can be omitted). What we need is only maintaining the mapping between block and blockgroup in the original blocksmap, recording the list of blocks in the blockgroup, and recording the blockgroups in INodeFile. This is an interesting thought and does simplify the code. But it seems to me the added complexity of tracking block groups is necessary to avoid heavy NN overhead. The generation stamp of a block group will be used to derive the stamps for its blocks (this logic is not included in the patch yet). bq. I think in this way we can simplify the current design and reuse most of the current block management code. Reusing block management code is a great point. While developing this patch I did have to take many {{Block}} management logics and create counterparts for {{BlockGroup}}. One possibility is to create a common ancestor class for {{Block}} and {{BlockGroup}} (e.g., {{GeneralizedBlock}}). Main commonalities being: # Both represent a contiguous range of data in a file. Therefore each file consists of an array of {{GeneralizedBlock}}. # Both are a separate unit for NN monitoring. Therefore {{BlocksMap}} can work with {{GeneralizedBlock}} # Both have a capacity and a set of storage locations Another alternative to reuse block mgmt code is to treat each {{Block}} as a single-member {{BlockGroup}}. I discussed the above 2 alternatives offline with [~andrew.wang] and we are inclined to use separate block group management code in this JIRA and start a refactoring JIRA after more logics are fleshed out. At that time we'll see more clearly which option is easier. bq.
[jira] [Commented] (HDFS-7611) deleteSnapshot and delete of a file can leave orphaned blocks in the blocksMap on NameNode restart.
[ https://issues.apache.org/jira/browse/HDFS-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290062#comment-14290062 ] Tsz Wo Nicholas Sze commented on HDFS-7611: --- Thanks for digging deep into it. We should fix the snapshot bug. Is there a way to change TestFileTruncate for working around the bug? It is a bad advertisement for the new truncate feature if TestFileTruncate keeps failing. deleteSnapshot and delete of a file can leave orphaned blocks in the blocksMap on NameNode restart. --- Key: HDFS-7611 URL: https://issues.apache.org/jira/browse/HDFS-7611 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Konstantin Shvachko Assignee: Byron Wong Priority: Critical Attachments: blocksNotDeletedTest.patch, testTruncateEditLogLoad.log If quotas are enabled a combination of operations *deleteSnapshot* and *delete* of a file can leave orphaned blocks in the blocksMap on NameNode restart. They are counted as missing on the NameNode, and can prevent NameNode from coming out of safeMode and could cause memory leak during startup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7058) Tests for truncate CLI
[ https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290085#comment-14290085 ] Dasha Boudnik commented on HDFS-7058: - I can look into this. Thanks! Tests for truncate CLI -- Key: HDFS-7058 URL: https://issues.apache.org/jira/browse/HDFS-7058 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dasha Boudnik Modify TestCLI to include general truncate tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4919) Improve documentation of dfs.permissions.enabled flag.
[ https://issues.apache.org/jira/browse/HDFS-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290108#comment-14290108 ] Hadoop QA commented on HDFS-4919: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600936/HDFS-4919.patch against trunk revision 6c3fec5. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9316//console This message is automatically generated. Improve documentation of dfs.permissions.enabled flag. -- Key: HDFS-4919 URL: https://issues.apache.org/jira/browse/HDFS-4919 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chris Nauroth Attachments: HDFS-4919.patch The description of dfs.permissions.enabled in hdfs-default.xml does not state that permissions are always checked on certain calls regardless of this configuration. The HDFS permissions guide still mentions the deprecated dfs.permissions property instead of the currently supported dfs.permissions.enabled. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Configuration_Parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3728) Update Httpfs documentation
[ https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290113#comment-14290113 ] Hadoop QA commented on HDFS-3728: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550247/HDFS-3728.patch against trunk revision 6c3fec5. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9318//console This message is automatically generated. Update Httpfs documentation --- Key: HDFS-3728 URL: https://issues.apache.org/jira/browse/HDFS-3728 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 1.0.3, 3.0.0, 2.0.2-alpha Reporter: Santhosh Srinivasan Priority: Minor Labels: newbie Attachments: HDFS-3728.patch Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html Section: How HttpFS and Hadoop HDFS Proxy differ? # Change seening to seen # HttpFS uses a clean HTTP REST API making its use with HTTP tools more intuitive. is very subjective. Can it be rephrased or removed? Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html Section: Configure HttpFS # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir property set to... to add to the httpfs-site.xml file the httpfs.hadoop.config.dir property and set the value to ... Section: Configure Hadoop # Change defined to define Section: Restart Hadoop # Typo - to (not ot) Section: Start/Stop HttpFS # lists (plural) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: hdfs-7411.008.patch Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290173#comment-14290173 ] Andrew Wang commented on HDFS-7411: --- If you look at version 2 of the patch, you can see the initial refactor, which consisted of moving some methods from BlockManager to DecomManager. I didn't bother splitting this though since it ended up not being very interesting. DecomManager is also basically all new code, so the old code would be moved and then subsequently deleted if we split it. I think the easiest way of reviewing it is just to read through DecomManager, which really isn't that big of a class. It's quite well commented and has lots of logging, which is part of why this change as a whole appears large. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290002#comment-14290002 ] Charles Lamb commented on HDFS-7667: [~aw], Thanks for the review. I started out intending to just fix a few minor errors (missing articles, obviously wrong typos in commands, etc.). Then I couldn't help myself so I made some slightly larger grammatical changes and tightened up a few things. Please stop me before I kill any more and commit this. Thanks! Of course we still have not heard from Mr. Jenkins... I wonder where he is today. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7667: --- Resolution: Fixed Fix Version/s: 3.0.0 Target Version/s: (was: 2.7.0) Status: Resolved (was: Patch Available) lol, believe I know the feeling... :D Committed to trunk. Thanks! Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290032#comment-14290032 ] Charles Lamb commented on HDFS-7667: Thanks for the review and the commit [~aw]. If you're bored, HDFS-7644 is a 3 char fix. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4919) Improve documentation of dfs.permissions.enabled flag.
[ https://issues.apache.org/jira/browse/HDFS-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290090#comment-14290090 ] Allen Wittenauer commented on HDFS-4919: This no longer applies. :( Improve documentation of dfs.permissions.enabled flag. -- Key: HDFS-4919 URL: https://issues.apache.org/jira/browse/HDFS-4919 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.0.0, 2.1.0-beta Reporter: Chris Nauroth Attachments: HDFS-4919.patch The description of dfs.permissions.enabled in hdfs-default.xml does not state that permissions are always checked on certain calls regardless of this configuration. The HDFS permissions guide still mentions the deprecated dfs.permissions property instead of the currently supported dfs.permissions.enabled. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Configuration_Parameters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290034#comment-14290034 ] Hudson commented on HDFS-7667: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6920 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6920/]) HDFS-7667. Various typos and improvements to HDFS Federation doc (Charles Lamb via aw) (aw: rev d411460e0d66b9b9d58924df295a957ba84b17d7) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7644) minor typo in HttpFS doc
[ https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290048#comment-14290048 ] Charles Lamb commented on HDFS-7644: Gee, here I am fixing all these typos and I can't even get the Jira title correct. Thanks for the review and the commit [~aw]. minor typo in HttpFS doc Key: HDFS-7644 URL: https://issues.apache.org/jira/browse/HDFS-7644 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7644.000.patch In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7058) Tests for truncate CLI
[ https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dasha Boudnik reassigned HDFS-7058: --- Assignee: Dasha Boudnik Tests for truncate CLI -- Key: HDFS-7058 URL: https://issues.apache.org/jira/browse/HDFS-7058 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Dasha Boudnik Modify TestCLI to include general truncate tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3750) API docs don't include HDFS
[ https://issues.apache.org/jira/browse/HDFS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3750: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 Committed to trunk. Thanks! API docs don't include HDFS --- Key: HDFS-3750 URL: https://issues.apache.org/jira/browse/HDFS-3750 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jolly Chen Priority: Critical Fix For: 3.0.0 Attachments: HDFS-3750.patch [The javadocs|http://hadoop.apache.org/common/docs/current/api/index.html] don't include HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3728) Update Httpfs documentation
[ https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3728: --- Status: Open (was: Patch Available) Update Httpfs documentation --- Key: HDFS-3728 URL: https://issues.apache.org/jira/browse/HDFS-3728 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha, 1.0.3, 3.0.0 Reporter: Santhosh Srinivasan Priority: Minor Labels: newbie Attachments: HDFS-3728.patch Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html Section: How HttpFS and Hadoop HDFS Proxy differ? # Change seening to seen # HttpFS uses a clean HTTP REST API making its use with HTTP tools more intuitive. is very subjective. Can it be rephrased or removed? Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html Section: Configure HttpFS # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir property set to... to add to the httpfs-site.xml file the httpfs.hadoop.config.dir property and set the value to ... Section: Configure Hadoop # Change defined to define Section: Restart Hadoop # Typo - to (not ot) Section: Start/Stop HttpFS # lists (plural) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290095#comment-14290095 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- BTW, have we updated user documentation for the truncate CLI change? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3728) Update Httpfs documentation
[ https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-3728: --- Status: Patch Available (was: Open) Update Httpfs documentation --- Key: HDFS-3728 URL: https://issues.apache.org/jira/browse/HDFS-3728 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.0.2-alpha, 1.0.3, 3.0.0 Reporter: Santhosh Srinivasan Priority: Minor Labels: newbie Attachments: HDFS-3728.patch Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html Section: How HttpFS and Hadoop HDFS Proxy differ? # Change seening to seen # HttpFS uses a clean HTTP REST API making its use with HTTP tools more intuitive. is very subjective. Can it be rephrased or removed? Link: http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html Section: Configure HttpFS # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir property set to... to add to the httpfs-site.xml file the httpfs.hadoop.config.dir property and set the value to ... Section: Configure Hadoop # Change defined to define Section: Restart Hadoop # Typo - to (not ot) Section: Start/Stop HttpFS # lists (plural) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: (was: hdfs-7411.008.patch) Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: hdfs-7411.008.patch Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290146#comment-14290146 ] Andrew Wang commented on HDFS-7411: --- Thanks again for reviewing Colin, fixed with the following notes: bq. Grammar: is already decomissioning decommissioning in progress is a state for a node, so I think this is accurate, although ugly, language. bq. What's the rationale for initializing the DecomissionManager configuration in activate rather than in the constructor? It seems like if we initialized the conf stuff in the constructor we could make more of it final? I wasn't sure about this either, but it seems like the NN really likes for everything to be init'd with the Configuration passed when starting common services. For this particular function, I went ahead and made the config variables final since they're just scoped to that function. Since we make a new Monitor each time, those members are final there too. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289985#comment-14289985 ] Allen Wittenauer commented on HDFS-7667: There are other problems in the doc. but I don't know if you want to fix them now or wait till later. Following the mantra of Don't let best stop better, I'm +1 for committing this to trunk. Let me know if you want to continue working on it or commit this version now. :) Thanks! Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3689) Add support for variable length block
[ https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289966#comment-14289966 ] Hadoop QA commented on HDFS-3689: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694208/HDFS-3689.009.patch against trunk revision 24aa462. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 14 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.security.ssl.TestReloadingX509TrustManager org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.TestFileTruncate org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9313//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9313//console This message is automatically generated. Add support for variable length block - Key: HDFS-3689 URL: https://issues.apache.org/jira/browse/HDFS-3689 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, HDFS-3689.009.patch, HDFS-3689.009.patch, editsStored Currently HDFS supports fixed length blocks. Supporting variable length block will allow new use cases and features to be built on top of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289983#comment-14289983 ] Tsz Wo Nicholas Sze commented on HDFS-7339: --- The main reason for creating a BlockGroup class and the hierarchical block ID protocol is to minimize NN memory overhead. ... This can be achieved by using consecutive (normal) block IDs for the blocks in a block group without dividing the ID space; see below. (This is not easy to describe it. Please let me know if you are confused.) - For the block groups stored in namenode, only store the first block ID. The other block IDs can be deduced with the storage policy. - Use the same generation stamp for all the blocks. - How to support lookups in BlocksMap? There are several ways described below. -# Change the hash function so that consecutive IDs will be mapped to the same hash value and implement BlockGroup.equal(..) so that it returns true with any block id in the group. For example, we may only use the high 60-bit for computing has code. Suppose the blocks in a block group have ID from 0x302 to 0x30A. We will be able to lookup the block group using any of the block IDs. What happen if the first ID is near the low 4-bit boundary, say 0x30D? We may simply skip to 0x310 when allocating the block IDs so that it won't happen. -# We may store the first ID (or the offset to the first ID) also in datanode for ec blocks. This seems not a good solution. If we enforce block id allocation so that the lower 4-bit of the first ID must be zeros, then it is very similar to the scheme propused in the design doc except there is no notation of block group in the block IDs. Allocating and persisting block groups in NameNode -- Key: HDFS-7339 URL: https://issues.apache.org/jira/browse/HDFS-7339 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg All erasure codec operations center around the concept of _block group_; they are formed in initial encoding and looked up in recoveries and conversions. A lightweight class {{BlockGroup}} is created to record the original and parity blocks in a coding group, as well as a pointer to the codec schema (pluggable codec schemas will be supported in HDFS-7337). With the striping layout, the HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. Therefore we propose to extend a file’s inode to switch between _contiguous_ and _striping_ modes, with the current mode recorded in a binary flag. An array of BlockGroups (or BlockGroup IDs) is added, which remains empty for “traditional” HDFS files with contiguous block layout. The NameNode creates and maintains {{BlockGroup}} instances through the new {{ECManager}} component; the attached figure has an illustration of the architecture. As a simple example, when a {_Striping+EC_} file is created and written to, it will serve requests from the client to allocate new {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, {{BlockGroups}} are allocated both in initial online encoding and in the conversion from replication to EC. {{ECManager}} also facilitates the lookup of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7644) minor typo in HttpFS doc
[ https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7644: --- Summary: minor typo in HttpFS doc (was: minor typo in HffpFS doc) minor typo in HttpFS doc Key: HDFS-7644 URL: https://issues.apache.org/jira/browse/HDFS-7644 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: HDFS-7644.000.patch In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290101#comment-14290101 ] Allen Wittenauer commented on HDFS-4922: If someone updates this, we can get this committed :) Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7320) The appearance of hadoop-hdfs-httpfs site docs is inconsistent
[ https://issues.apache.org/jira/browse/HDFS-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290135#comment-14290135 ] Hudson commented on HDFS-7320: -- FAILURE: Integrated in Hadoop-trunk-Commit #6923 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6923/]) HDFS-7320. The appearance of hadoop-hdfs-httpfs site docs is inconsistent (Masatake Iwasaki via aw) (aw: rev 8f26d5a8a13539e8292c1cf7f141eff7e58984a5) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml The appearance of hadoop-hdfs-httpfs site docs is inconsistent --- Key: HDFS-7320 URL: https://issues.apache.org/jira/browse/HDFS-7320 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7320.1.patch The docs of hadoop-hdfs-httpfs use different maven-base.css and maven-theme.css from other modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290190#comment-14290190 ] Hadoop QA commented on HDFS-7667: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694230/HDFS-7667.001.patch against trunk revision 56df5f4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9315//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9315//console This message is automatically generated. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7058) Tests for truncate CLI
[ https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7058: -- Description: Modify TestCLI to include general truncate tests. (was: Comprehensive test coverage for truncate.) Summary: Tests for truncate CLI (was: Tests for truncate) Revised summary and description. Tests for truncate CLI -- Key: HDFS-7058 URL: https://issues.apache.org/jira/browse/HDFS-7058 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Modify TestCLI to include general truncate tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7644) minor typo in HttpFS doc
[ https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290054#comment-14290054 ] Hudson commented on HDFS-7644: -- FAILURE: Integrated in Hadoop-trunk-Commit #6921 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6921/]) HDFS-7644. minor typo in HttpFS doc (Charles Lamb via aw) (aw: rev 5c93ca2f3cfd9ebcb98be89c3a238a36c03f4422) * hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt minor typo in HttpFS doc Key: HDFS-7644 URL: https://issues.apache.org/jira/browse/HDFS-7644 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7644.000.patch In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290092#comment-14290092 ] Allen Wittenauer commented on HDFS-6261: I'd prefer to see this get merged into the RackAwareness documentation rather than building a completely new doc. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7320) The appearance of hadoop-hdfs-httpfs site docs is inconsistent
[ https://issues.apache.org/jira/browse/HDFS-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7320: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) +1 committed to trunk Thanks! The appearance of hadoop-hdfs-httpfs site docs is inconsistent --- Key: HDFS-7320 URL: https://issues.apache.org/jira/browse/HDFS-7320 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7320.1.patch The docs of hadoop-hdfs-httpfs use different maven-base.css and maven-theme.css from other modules. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4922) Improve the short-circuit document
[ https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290112#comment-14290112 ] Hadoop QA commented on HDFS-4922: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12622806/HDFS-4922-006.patch against trunk revision 6c3fec5. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9317//console This message is automatically generated. Improve the short-circuit document -- Key: HDFS-4922 URL: https://issues.apache.org/jira/browse/HDFS-4922 Project: Hadoop HDFS Issue Type: Improvement Components: documentation, hdfs-client Affects Versions: 3.0.0, 2.1.0-beta Reporter: Fengdong Yu Assignee: Fengdong Yu Priority: Minor Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch explain the default value and add one configure key, which doesn't show in the document, but exists in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290111#comment-14290111 ] Hadoop QA commented on HDFS-6261: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12660927/HDFS-6261.v3.patch against trunk revision 6c3fec5. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9319//console This message is automatically generated. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290152#comment-14290152 ] Tsz Wo Nicholas Sze commented on HDFS-7411: --- Could we separate code refactoring and improvement into two JIRAs? The refactoring probably with a big patch is easy to review. The improvement patch will be much smaller. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread
[ https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289994#comment-14289994 ] Hadoop QA commented on HDFS-7666: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694194/HDFS-7666-v1.patch against trunk revision 24aa462. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9312//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9312//console This message is automatically generated. Datanode blockId layout upgrade threads should be daemon thread --- Key: HDFS-7666 URL: https://issues.apache.org/jira/browse/HDFS-7666 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-7666-v1.patch This jira is to mark the layout upgrade thread as daemon thread. {code} int numLinkWorkers = datanode.getConf().getInt( DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY, DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS); ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7644) minor typo in HttpFS doc
[ https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7644: --- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) +1 Committed to branch-2 and trunk. Thanks! minor typo in HttpFS doc Key: HDFS-7644 URL: https://issues.apache.org/jira/browse/HDFS-7644 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7644.000.patch In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290072#comment-14290072 ] Tsz Wo Nicholas Sze commented on HDFS-3107: --- There is a list of unresolved JIRAs. Let's discuss it. - HDFS-7341 Add initial snapshot support based on pipeline recovery Is it still relevant? - HDFS-7058 Tests for truncate CLI Let's finish it before merging since CLI is user facing. Is anyone working on it? - HDFS-7655/HDFS-7656 Expose truncate API for Web HDFS/httpfs It seems that we should not wait for them before merging. Agree? - HDFS-7659 We should check the new length of truncate can't be a negative value Look like that this is going to be committed soon. - HDFS-7665 Add definition of truncate preconditions/postconditions to filesystem specification This is simple a documentation change. Let's finish it before merging? As a summary, how about finishing HDFS-7058, HDFS-7659 and HDFS-7665 before merging it to branch-2? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3750) API docs don't include HDFS
[ https://issues.apache.org/jira/browse/HDFS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290121#comment-14290121 ] Hudson commented on HDFS-3750: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6922 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6922/]) HDFS-3750. API docs don't include HDFS (Jolly Chen via aw) (aw: rev 6c3fec5ec25caabbd8c5ac795a5bc5229b5365de) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * pom.xml API docs don't include HDFS --- Key: HDFS-3750 URL: https://issues.apache.org/jira/browse/HDFS-3750 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.0.0-alpha Reporter: Eli Collins Assignee: Jolly Chen Priority: Critical Fix For: 3.0.0 Attachments: HDFS-3750.patch [The javadocs|http://hadoop.apache.org/common/docs/current/api/index.html] don't include HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc
[ https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290188#comment-14290188 ] Hadoop QA commented on HDFS-7667: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694227/HDFS-7667.000.patch against trunk revision 56df5f4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9314//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9314//console This message is automatically generated. Various typos and improvements to HDFS Federation doc - Key: HDFS-7667 URL: https://issues.apache.org/jira/browse/HDFS-7667 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 3.0.0 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS Federation doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7644) minor typo in HffpFS doc
[ https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289203#comment-14289203 ] Charles Lamb commented on HDFS-7644: The FB warnings are spurious. minor typo in HffpFS doc Key: HDFS-7644 URL: https://issues.apache.org/jira/browse/HDFS-7644 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Trivial Attachments: HDFS-7644.000.patch In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side
[ https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289262#comment-14289262 ] Li Bo commented on HDFS-7653: - Hi, Zhe , thanks for your comments. 1. BlockReader contains some client related methods and maybe we will add some new methods to the interface so I choose to create a new one. 2. We will create a new dfs inputstream named DFSStripeInputStream and it uses BlockGroupReader to read data from different datanodes. BlockGroupReader contains a set of BlockReaders. I will upload code to HDFS-7545 in several days. Current implementation tries to make client and datanode EC encoding/decoding work in the same model. 3. Agree that we should simplify the logic of BlockWriter between datanodes. I will optimize the code later. I am not very clear why FSOutputSummer is not appropriate for a block writer, it contains a data buffer and checksum buffer. 4. We can have a further discussion after code is uploaded to HDFS-7545. I will refer to RSStriper logic in QFS to optimize current implementation. Nits: 1. we can read a byte and put to buf immediately. The position of buf need to remain the same, I will fix it and add unit test. 2. I will change the unit test class name later. I generated the patch after all test cases pass. getBlockFile() is a static method and other classes also use it in the way of MiniDFSCluster.getBlockFile. You can have a further check. Block Readers and Writers used in both client side and datanode side Key: HDFS-7653 URL: https://issues.apache.org/jira/browse/HDFS-7653 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: BlockReadersWriters.patch There're a lot of block read/write operations in HDFS-EC, for example, when client writes a file in striping layout, client has to write several blocks to several different datanodes; if a datanode wants to do an encoding/decoding task, it has to read several blocks from itself and other datanodes, and writes one or more blocks to itself or other datanodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3689) Add support for variable length block
[ https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-3689: Attachment: HDFS-3689.009.patch Thanks again Nicholas! Update the patch to add quota verification. Add support for variable length block - Key: HDFS-3689 URL: https://issues.apache.org/jira/browse/HDFS-3689 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Affects Versions: 3.0.0 Reporter: Suresh Srinivas Assignee: Jing Zhao Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, HDFS-3689.009.patch Currently HDFS supports fixed length blocks. Supporting variable length block will allow new use cases and features to be built on top of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7418) Raw Reed-Solomon coder in pure Java
[ https://issues.apache.org/jira/browse/HDFS-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7418: Attachment: HDFS-7418-v1.patch Uploaded an initial patch, pending for submit since it depends on the one in HDFS-7353. Raw Reed-Solomon coder in pure Java --- Key: HDFS-7418 URL: https://issues.apache.org/jira/browse/HDFS-7418 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Kai Zheng Attachments: HDFS-7418-v1.patch This will implement RS coder by porting existing codes in HDFS-RAID in the new codec and coder framework, which could be useful in case native support isn't available or convenient in some environments or platforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288973#comment-14288973 ] Kai Zheng commented on HDFS-7337: - As discussed, most part of this source codes will be moved to hadoop-common side, but I'm not sure if it's OK to still use these JIRA entries that start with HDFS, instead of HADOOP. Would anyone help confirm this ? It would be great if we don't have to change, it's reasonable because it does work for HDFS, although for other considerations we'd better move over there. Configurable and pluggable Erasure Codec and schema --- Key: HDFS-7337 URL: https://issues.apache.org/jira/browse/HDFS-7337 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Kai Zheng Attachments: HDFS-7337-prototype-v1.patch, HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, PluggableErasureCodec.pdf According to HDFS-7285 and the design, this considers to support multiple Erasure Codecs via pluggable approach. It allows to define and configure multiple codec schemas with different coding algorithms and parameters. The resultant codec schemas can be utilized and specified via command tool for different file folders. While design and implement such pluggable framework, it’s also to implement a concrete codec by default (Reed Solomon) to prove the framework is useful and workable. Separate JIRA could be opened for the RS codec implementation. Note HDFS-7353 will focus on the very low level codec API and implementation to make concrete vendor libraries transparent to the upper layer. This JIRA focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7669) HDFS Design Doc references commands that no longer exist.
Allen Wittenauer created HDFS-7669: -- Summary: HDFS Design Doc references commands that no longer exist. Key: HDFS-7669 URL: https://issues.apache.org/jira/browse/HDFS-7669 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer hadoop dfs should be hadoop fs hadoop dfsadmin should be hdfs dfsadmin hadoop dfs -rmr should be hadoop fs -rm -R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290278#comment-14290278 ] Haohui Mai commented on HDFS-6673: -- -0 bq. ... a fairly large fsimage here. ... I think the current level of performance is sufficient for the vast majority of our customers. The test is on a 3G fsimage which can easily fit in the working set of your laptop. Multiple users are running much larger clusters, where their fsimage can be as big as 40G (see HDFS-5698). I see the value of using LevelDB as a swap space to handle fsimages that are bigger than the working set, but what are the net benefits that the tool bring in if it can only handle fsimages that are 10x smaller than the ones in some of the production clusters? bq. Furthermore, this is a boolean improvement over the previous state of affairs; currently, we have no delimited OIV tool, and with this patch, we do. This is not true. Delimited OIV was such a headache thus we had to revived the legacy fsimage saver / loader / oiv in HDFS-6293. bq. This is the result of a few rounds of performance tuning. You guys deserve all the credits of getting this tool working, but given we have HDFS-6293 as a solid solution today, I would much rather to see this tool to be capable of handling fsimages from the real, large-scale production runs (at least from the design point of view), instead of putting in a half-baked solution as-is. I'm also happy to provide help if necessary. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7665) Add definition of truncate preconditions/postconditions to filesystem specification
[ https://issues.apache.org/jira/browse/HDFS-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290352#comment-14290352 ] Konstantin Shvachko commented on HDFS-7665: --- Steve, could you mention where exactly the specifications should be added. I understand we need to add truncate operation to HDFS documentation here: - http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html - http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html I guess you meant something else. Add definition of truncate preconditions/postconditions to filesystem specification --- Key: HDFS-7665 URL: https://issues.apache.org/jira/browse/HDFS-7665 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Steve Loughran Fix For: 3.0.0 With the addition of a major new feature to filesystems, the filesystem specification in hadoop-common/site is now out of sync. This means that # there's no strict specification of what it should do # you can't derive tests from that specification # other people trying to implement the API will have to infer what to do from the HDFS source # there's no way to decide whether or not the HDFS implementation does what it is intended. # without matching tests against the raw local FS, differences between the HDFS impl and the Posix standard one won't be caught until it is potentially too late to fix. The operation should be relatively easy to define (after a truncate, the files bytes [0...len-1] must equal the original bytes, length(file)==len, etc) The truncate tests already written could then be pulled up into contract tests which any filesystem implementation can run against. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6729) Support maintenance mode for DN
[ https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290402#comment-14290402 ] Hadoop QA commented on HDFS-6729: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694290/HDFS-6729.004.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.cli.TestHDFSCLI Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9321//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9321//console This message is automatically generated. Support maintenance mode for DN --- Key: HDFS-6729 URL: https://issues.apache.org/jira/browse/HDFS-6729 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only takes a short amount of time (e.g., 10 minutes). In these cases, the users do not want to report missing blocks on this DN because the DN will be online shortly without data lose. Thus, we need a maintenance mode for a DN so that maintenance work can be carried out on the DN without having to decommission it or the DN being marked as dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6729) Support maintenance mode for DN
[ https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6729: Attachment: HDFS-6729.004.patch Updated the patch to: # add {{dfsadmin -setMaintenanceMode}} command and RPCs to NN # change {{dfsadmin -report}} to display maintenance node information. Support maintenance mode for DN --- Key: HDFS-6729 URL: https://issues.apache.org/jira/browse/HDFS-6729 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only takes a short amount of time (e.g., 10 minutes). In these cases, the users do not want to report missing blocks on this DN because the DN will be online shortly without data lose. Thus, we need a maintenance mode for a DN so that maintenance work can be carried out on the DN without having to decommission it or the DN being marked as dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6729) Support maintenance mode for DN
[ https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290240#comment-14290240 ] Andrew Wang commented on HDFS-6729: --- Hey Eddy, quick question, this looks like soft state that isn't persisted across NN restarts / failovers. Is that suitable for the target usecases? Support maintenance mode for DN --- Key: HDFS-6729 URL: https://issues.apache.org/jira/browse/HDFS-6729 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.5.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only takes a short amount of time (e.g., 10 minutes). In these cases, the users do not want to report missing blocks on this DN because the DN will be online shortly without data lose. Thus, we need a maintenance mode for a DN so that maintenance work can be carried out on the DN without having to decommission it or the DN being marked as dead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290297#comment-14290297 ] Andrew Wang commented on HDFS-6673: --- Oops, realized Eddy still needs to address my previous comments. Waiting on that and Jenkins. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7670) HDFS Quota guide has typos, incomplete command lines
Allen Wittenauer created HDFS-7670: -- Summary: HDFS Quota guide has typos, incomplete command lines Key: HDFS-7670 URL: https://issues.apache.org/jira/browse/HDFS-7670 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Allen Wittenauer HDFS quota guide uses fs -count, etc as a valid command instead of hadoop fs, etc. There is also a typo in 'director'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7669) HDFS Design Doc references commands that no longer exist.
[ https://issues.apache.org/jira/browse/HDFS-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7669: --- Component/s: documentation HDFS Design Doc references commands that no longer exist. - Key: HDFS-7669 URL: https://issues.apache.org/jira/browse/HDFS-7669 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Allen Wittenauer hadoop dfs should be hadoop fs hadoop dfsadmin should be hdfs dfsadmin hadoop dfs -rmr should be hadoop fs -rm -R -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7672) Handle write failure for EC blocks
Tsz Wo Nicholas Sze created HDFS-7672: - Summary: Handle write failure for EC blocks Key: HDFS-7672 URL: https://issues.apache.org/jira/browse/HDFS-7672 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze For (6, 3)-Reed-Solomon, a client writes to 6 data blocks and 3 parity blocks concurrently. We need to handle datanode or network failures when writing a EC BlockGroup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-6673: Attachment: HDFS-6673.006.patch Thanks [~andrew.wang] and [~wheat9]. This patch addressed comments previously posted. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch, HDFS-6673.006.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290331#comment-14290331 ] Haohui Mai commented on HDFS-6673: -- bq. Also, even a 40GB image easily fits in memory on server these days. The whole point of this tool is to run the oiv on machines that do not have the luxury of abundant memory. Can you clarify what point you are trying to make? bq. These seem like pretty big drawbacks to me, and are addressed by this tool. I think calling it half-baked is unfair considering it provides greater functionality. Can you clarify what the greater functionality are? The Delimiter only outputs mtime/atime and other information available from legacy fsimage. BTW, if you really want to commit this please update the document to explicit state that the tool will not work for large fsimage, so that the users won't be caught by surprise. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch, HDFS-6673.006.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7659) We should check the new length of truncate can't be a negative value.
[ https://issues.apache.org/jira/browse/HDFS-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290329#comment-14290329 ] Konstantin Shvachko commented on HDFS-7659: --- +1. Looks good. Can I ask you to add three lines to TestFileTruncate with this patch, even though it is not ditectly related to your change. This will fix TestFileTruncate failures as [~szetszwo] requested in HDFS-7611. We do not risk to loose the bug since we know the problem now. Or we can of course fix it in another jira. The lines are as follows just in the very beginning of {{testTruncateEditLogLoad()}} {code} // purge previously accumulated edits fs.setSafeMode(SafeModeAction.SAFEMODE_ENTER); fs.saveNamespace(); fs.setSafeMode(SafeModeAction.SAFEMODE_LEAVE); {code} We should check the new length of truncate can't be a negative value. - Key: HDFS-7659 URL: https://issues.apache.org/jira/browse/HDFS-7659 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Yi Liu Assignee: Yi Liu Fix For: 3.0.0 Attachments: HDFS-7659.001.patch, HDFS-7659.002.patch, HDFS-7659.003.patch It's obvious that we should check the new length of truncate can't be a negative value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290345#comment-14290345 ] Konstantin Shvachko commented on HDFS-3107: --- Yes HDFS-7659 is ready. Documentation needs to be updates. I mentioned it in my email on the dev list some days ago. Other things to do are adding truncate to DFSIO and SLive. I don't think we should wait for them to merge. My main concern is that it increases the work for developers. Branches being substancially diverged means that it is harder to merge new code into branch-2, which is not related to truncate. Also it will be easier to implement TestCLI for example or the documentation update and then merge it into both branches at once. In the end it is not that we have a release planned next week. Would it be ok with you if we commit HDFS-7659, fix TestFileTruncate as I proposed there, and then merge? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Fix For: 3.0.0 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290238#comment-14290238 ] Andrew Wang commented on HDFS-6673: --- Hey Haohui, Eddy's done performance testing with a fairly large fsimage here. This is the result of a few rounds of performance tuning. I think the current level of performance is sufficient for the vast majority of our customers. Furthermore, this is a boolean improvement over the previous state of affairs; currently, we have no delimited OIV tool, and with this patch, we do. So, with that said, I'd like to commit this and we can discuss further improvements on a follow-in JIRA. I'll go ahead and commit unless I hear otherwise by EOD Monday. Thanks. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7668) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7668: --- Component/s: documentation Convert site documentation from apt to markdown --- Key: HDFS-7668 URL: https://issues.apache.org/jira/browse/HDFS-7668 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer HDFS analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7668) Convert site documentation from apt to markdown
Allen Wittenauer created HDFS-7668: -- Summary: Convert site documentation from apt to markdown Key: HDFS-7668 URL: https://issues.apache.org/jira/browse/HDFS-7668 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Allen Wittenauer HDFS analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7671) hdfs user guide should point to the common rack awareness doc
Allen Wittenauer created HDFS-7671: -- Summary: hdfs user guide should point to the common rack awareness doc Key: HDFS-7671 URL: https://issues.apache.org/jira/browse/HDFS-7671 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Allen Wittenauer HDFS user guide has a section on rack awareness that should really just be a pointer to the common doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290294#comment-14290294 ] Andrew Wang commented on HDFS-6673: --- bq. The test is on a 3G fsimage which can easily fit in the working set of your laptop...Multiple users are running much larger clusters, where their fsimage can be as big as 40G I know there are larger customers out there, but as I said above, we surveyed the sizes of our customer's fsimages, and this was one of the larger ones. Thus this solution will work for most production deployments. Also, even a 40GB image easily fits in memory on server these days. bq. but given we have HDFS-6293 as a solid solution today, I would much rather to see this tool to be capable of handling fsimages from the real, large-scale production runs HDFS-6293 doesn't include metadata newer than the old format. It also requires the NN to write out an additional old fsimage just for OIV alongside the real one. These seem like pretty big drawbacks to me, and are addressed by this tool. I think calling it half-baked is unfair considering it provides greater functionality. Anyway, thanks for not -1'ing, I'll commit this now. Let's continue this discussion in a follow-on JIRA. Add Delimited format supports for PB OIV tool - Key: HDFS-6673 URL: https://issues.apache.org/jira/browse/HDFS-6673 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, HDFS-6673.005.patch The new oiv tool, which is designed for Protobuf fsimage, lacks a few features supported in the old {{oiv}} tool. This task adds supports of _Delimited_ processor to the oiv tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7673) synthetic load generator docs give incorrect/incomplete commands
Allen Wittenauer created HDFS-7673: -- Summary: synthetic load generator docs give incorrect/incomplete commands Key: HDFS-7673 URL: https://issues.apache.org/jira/browse/HDFS-7673 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Allen Wittenauer The synthetic load generator guide gives this helpful command to start it: {code} java LoadGenerator [options] {code} This, of course, won't work. What's the class path? What jar is it in? Is this really the command? Isn't there a shell script wrapping this? This atrocity against normal users is committed three more times after this one with equally incomplete commands for other parts of the system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290348#comment-14290348 ] Hadoop QA commented on HDFS-7411: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12694272/hdfs-7411.008.patch against trunk revision 8f26d5a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDecommission The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9320//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9320//console This message is automatically generated. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)