[jira] [Commented] (HDFS-7609) startup used too much time to load edits

2015-01-23 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289533#comment-14289533
 ] 

Kihwal Lee commented on HDFS-7609:
--

Compared to 0.23, edit replaying in 2.x is 5x-10x slower.  This affects the 
namenode fail-over latency.   [~mingma] also reported this issue before and saw 
the retry cahe being the bottleneck.

 startup used too much time to load edits
 

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
 Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread

2015-01-23 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-7666:
---
Attachment: HDFS-7666-v1.patch

 Datanode blockId layout upgrade threads should be daemon thread
 ---

 Key: HDFS-7666
 URL: https://issues.apache.org/jira/browse/HDFS-7666
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-7666-v1.patch


 This jira is to mark the layout upgrade thread as daemon thread.
 {code}
  int numLinkWorkers = datanode.getConf().getInt(
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY,
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS);
 ExecutorService linkWorkers = 
 Executors.newFixedThreadPool(numLinkWorkers);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3689) Add support for variable length block

2015-01-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3689:

Attachment: HDFS-3689.009.patch

 Add support for variable length block
 -

 Key: HDFS-3689
 URL: https://issues.apache.org/jira/browse/HDFS-3689
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, 
 HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, 
 HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, 
 HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, 
 HDFS-3689.009.patch, HDFS-3689.009.patch


 Currently HDFS supports fixed length blocks. Supporting variable length block 
 will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289686#comment-14289686
 ] 

Andrew Wang commented on HDFS-7337:
---

I don't think it's necessary to move to HADOOP. If anything, I find it 
conceptually easier if everything related to erasure encoding stayed a subtask 
of HDFS-7285.

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Kai Zheng
 Attachments: HDFS-7337-prototype-v1.patch, 
 HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
 PluggableErasureCodec.pdf


 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas with different coding algorithms and parameters. The 
 resultant codec schemas can be utilized and specified via command tool for 
 different file folders. While design and implement such pluggable framework, 
 it’s also to implement a concrete codec by default (Reed Solomon) to prove 
 the framework is useful and workable. Separate JIRA could be opened for the 
 RS codec implementation.
 Note HDFS-7353 will focus on the very low level codec API and implementation 
 to make concrete vendor libraries transparent to the upper layer. This JIRA 
 focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread

2015-01-23 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-7666:
---
Status: Patch Available  (was: Open)

 Datanode blockId layout upgrade threads should be daemon thread
 ---

 Key: HDFS-7666
 URL: https://issues.apache.org/jira/browse/HDFS-7666
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-7666-v1.patch


 This jira is to mark the layout upgrade thread as daemon thread.
 {code}
  int numLinkWorkers = datanode.getConf().getInt(
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY,
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS);
 ExecutorService linkWorkers = 
 Executors.newFixedThreadPool(numLinkWorkers);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread

2015-01-23 Thread Rakesh R (JIRA)
Rakesh R created HDFS-7666:
--

 Summary: Datanode blockId layout upgrade threads should be daemon 
thread
 Key: HDFS-7666
 URL: https://issues.apache.org/jira/browse/HDFS-7666
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Rakesh R
Assignee: Rakesh R


This jira is to mark the layout upgrade thread as daemon thread.

{code}
 int numLinkWorkers = datanode.getConf().getInt(
 DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY,
 DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS);
ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7609) startup used too much time to load edits

2015-01-23 Thread Ming Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289642#comment-14289642
 ] 

Ming Ma commented on HDFS-7609:
---

Yeah, we also had this issue. It appears somehow an entry with the same client 
id and caller id has existed in retryCache; which ended up calling expensive 
PriorityQueue#remove function. Below is the call stack captured when standby 
was replaying the edit logs.

{noformat}
Edit log tailer prio=10 tid=0x7f096d491000 nid=0x533c runnable 
[0x7ef05ee7a000]
   java.lang.Thread.State: RUNNABLE
at java.util.PriorityQueue.removeAt(PriorityQueue.java:605)
at java.util.PriorityQueue.remove(PriorityQueue.java:364)
at 
org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:218)
at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:296)
- locked 0x7ef2fe306978 (a org.apache.hadoop.ipc.RetryCache)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:801)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:507)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:224)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:133)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:804)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:785)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
{noformat}


 startup used too much time to load edits
 

 Key: HDFS-7609
 URL: https://issues.apache.org/jira/browse/HDFS-7609
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Carrey Zhan
 Attachments: HDFS-7609-CreateEditsLogWithRPCIDs.patch, 
 recovery_do_not_use_retrycache.patch


 One day my namenode crashed because of two journal node timed out at the same 
 time under very high load, leaving behind about 100 million transactions in 
 edits log.(I still have no idea why they were not rolled into fsimage.)
 I tryed to restart namenode, but it showed that almost 20 hours would be 
 needed before finish, and it was loading fsedits most of the time. I also 
 tryed to restart namenode in recover mode, the loading speed had no different.
 I looked into the stack trace, judged that it is caused by the retry cache. 
 So I set dfs.namenode.enable.retrycache to false, the restart process 
 finished in half an hour.
 I think the retry cached is useless during startup, at least during recover 
 process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3689) Add support for variable length block

2015-01-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3689:

Attachment: (was: HDFS-3689.009.patch)

 Add support for variable length block
 -

 Key: HDFS-3689
 URL: https://issues.apache.org/jira/browse/HDFS-3689
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, 
 HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, 
 HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, 
 HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, 
 HDFS-3689.009.patch


 Currently HDFS supports fixed length blocks. Supporting variable length block 
 will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3689) Add support for variable length block

2015-01-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3689:

Attachment: HDFS-3689.009.patch
editsStored

 Add support for variable length block
 -

 Key: HDFS-3689
 URL: https://issues.apache.org/jira/browse/HDFS-3689
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, 
 HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, 
 HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, 
 HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, 
 HDFS-3689.009.patch, HDFS-3689.009.patch, editsStored


 Currently HDFS supports fixed length blocks. Supporting variable length block 
 will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7353) Raw Erasure Coder API for concrete encoding and decoding

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289765#comment-14289765
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7353:
---

Thanks for the update.  Some comments:
- ec also can mean error correcting.  How about renaming the package to 
io.erasure?  Then, using EC inside the package won't be ambiguous.
- Should the package be moved under hdfs?  Do you expect that it will be used 
outside hdfs?
- Please explain what it means by Raw in the javadoc.
- By The number of elements, do you mean length in bytes?  Should it be 
long instead of int?
- The javadoc An abstract raw erasure decoder class does not really explain 
what the class does.  Could you add more description about how the class is 
used and the relationship with the other classes?
- protected methods, especially the ones with abstract, should also has javadoc.
- There are some tab characters.  We should replace them using spaces.

 Raw Erasure Coder API for concrete encoding and decoding
 

 Key: HDFS-7353
 URL: https://issues.apache.org/jira/browse/HDFS-7353
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-EC

 Attachments: HDFS-7353-v1.patch, HDFS-7353-v2.patch


 This is to abstract and define raw erasure coder API across different codes 
 algorithms like RS, XOR and etc. Such API can be implemented by utilizing 
 various library support, such as Intel ISA library and Jerasure library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289772#comment-14289772
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7653:
---

Sound good.  Thanks!

 Block Readers and Writers used in both client side and datanode side
 

 Key: HDFS-7653
 URL: https://issues.apache.org/jira/browse/HDFS-7653
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: BlockReadersWriters.patch


 There're a lot of block read/write operations in HDFS-EC, for example, when 
 client writes a file in striping layout, client has to write several blocks 
 to several different datanodes; if a datanode wants to do an 
 encoding/decoding task, it has to read several blocks from itself and other 
 datanodes, and writes one or more blocks to itself or other datanodes.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7584) Enable Quota Support for Storage Types (SSD)

2015-01-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7584:
-
Attachment: HDFS-7584.3.patch

Include the editsStored binary in the patch.

 Enable Quota Support for Storage Types (SSD) 
 -

 Key: HDFS-7584
 URL: https://issues.apache.org/jira/browse/HDFS-7584
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-7584 Quota by Storage Type - 01202015.pdf, 
 HDFS-7584.0.patch, HDFS-7584.1.patch, HDFS-7584.2.patch, HDFS-7584.3.patch, 
 editsStored


 Phase II of the Heterogeneous storage features have completed by HDFS-6584. 
 This JIRA is opened to enable Quota support of different storage types in 
 terms of storage space usage. This is more important for certain storage 
 types such as SSD as it is precious and more performant. 
 As described in the design doc of HDFS-5682, we plan to add new 
 quotaByStorageType command and new name node RPC protocol for it. The quota 
 by storage type feature is applied to HDFS directory level similar to 
 traditional HDFS space quota. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289778#comment-14289778
 ] 

Rakesh R commented on HDFS-7648:


bq.If a mismatch is found, it should also fix it
Should I need to worry about the race between 
DatanodeBlockId_Layout_threads(they will do a linking ) in Datastorage and this 
call path?

bq. DirectoryScanner seems a better place to do the verification
Thanks for the hint. Let me try this as well.


 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Attachment: HDFS-7667.001.patch

[~aw],

Thanks for looking it over. The .001 version makes those two changes.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs

2015-01-23 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289725#comment-14289725
 ] 

Arpit Agarwal commented on HDFS-7647:
-

Thanks for the patch [~milandesai], this looks like a good change.

Couple of comments:
# {{LocatedBlocks.getStorageTypes}} and {{.getStorageIDs}} should cache the 
generated arrays on first invocation since existing callers expect these calls 
to be cheap. Except for the sorting code the content of {{locs}} is not 
modified once the object is initialized.
# The sorting code must invalidate the cached arrays from 1.
# We should add a unit test for sortLocatedBlocks specifically for the 
invalidation.
# Also it would be good to add a comment to {{LocatedBlocks}} stating the 
assumption that {{locs}} must not be modified by the caller, with the exception 
of {{sortLocatedBlocks}}.

In a separate Jira it would be good to make {{locs}} an unmodifiable list or a 
Guava {{ImmutableList}}. The source of the issue is that an external function 
reaches into the LocatedBlock object and modifies its private fields. It 
doesn't help that Java lacks support for C++-style const arrays.

 DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
 --

 Key: HDFS-7647
 URL: https://issues.apache.org/jira/browse/HDFS-7647
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Milan Desai
Assignee: Milan Desai
 Attachments: HDFS-7647-2.patch, HDFS-7647.patch


 DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside 
 each LocatedBlock, but does not touch the array of StorageIDs and 
 StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are 
 mismatched. The method is called by FSNamesystem.getBlockLocations(), so the 
 client will not know which StorageID/Type corresponds to which DatanodeInfo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2015-01-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289795#comment-14289795
 ] 

Aaron T. Myers commented on HDFS-7421:
--

Hey Kihwal, yes indeed, this seems like a dupe. I'll go ahead and close this 
one. Thanks for pointing that out, and thanks for filing/fixing the issue in 
HDFS-6425.

 Move processing of postponed over-replicated blocks to a background task
 

 Key: HDFS-7421
 URL: https://issues.apache.org/jira/browse/HDFS-7421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.6.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 In an HA environment, we postpone sending block invalidates to DNs until all 
 DNs holding a given block have done at least one block report to the NN after 
 it became active. When that first block report after becoming active does 
 occur, we attempt to reprocess all postponed misreplicated blocks inline with 
 the block report RPC. In the case where there are many postponed 
 misreplicated blocks, this can cause block report RPCs to take an 
 inordinately long time to complete, sometimes on the order of minutes, which 
 has the potential to tie up RPC handlers, block incoming RPCs, etc. There's 
 no need to hurriedly process all postponed misreplicated blocks so that we 
 can quickly send invalidate commands back to DNs, so let's move this 
 processing outside of the RPC handler context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Attachment: HDFS-7667.000.patch

Diffs attached.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-7667:
---
Status: Patch Available  (was: Open)

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289838#comment-14289838
 ] 

Allen Wittenauer commented on HDFS-7667:


Oh, should probably drop --config $HADOOP_CONF_DIR , since that's pretty 
useless as well.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-01-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289871#comment-14289871
 ] 

Zhe Zhang commented on HDFS-7285:
-

Thanks for clarifying.

bq. After some discussion with Jing, we think that block group ID is not needed 
at all – we only need to keep the block group index within a file. Will give 
more details later.
This is [discussed | 
https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868]
 under HDFS-7339.

 Erasure Coding Support inside HDFS
 --

 Key: HDFS-7285
 URL: https://issues.apache.org/jira/browse/HDFS-7285
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
 Attachments: ECAnalyzer.py, ECParser.py, 
 HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, 
 fsimage-analysis-20150105.pdf


 Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice 
 of data reliability, comparing to the existing HDFS 3-replica approach. For 
 example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, 
 with storage overhead only being 40%. This makes EC a quite attractive 
 alternative for big data storage, particularly for cold data. 
 Facebook had a related open source project called HDFS-RAID. It used to be 
 one of the contribute packages in HDFS but had been removed since Hadoop 2.0 
 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends 
 on MapReduce to do encoding and decoding tasks; 2) it can only be used for 
 cold files that are intended not to be appended anymore; 3) the pure Java EC 
 coding implementation is extremely slow in practical use. Due to these, it 
 might not be a good idea to just bring HDFS-RAID back.
 We (Intel and Cloudera) are working on a design to build EC into HDFS that 
 gets rid of any external dependencies, makes it self-contained and 
 independently maintained. This design lays the EC feature on the storage type 
 support and considers compatible with existing HDFS features like caching, 
 snapshot, encryption, high availability and etc. This design will also 
 support different EC coding schemes, implementations and policies for 
 different deployment scenarios. By utilizing advanced libraries (e.g. Intel 
 ISA-L library), an implementation can greatly improve the performance of EC 
 encoding/decoding and makes the EC solution even more attractive. We will 
 post the design document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289901#comment-14289901
 ] 

Zhe Zhang commented on HDFS-7339:
-

bq. First a quick comment about the current SequentialBlockGroupIdGenerator and 
SequentialBlockIdGenerator. The current patch tries to use a flag to 
distinguish contiguous and stripped blocks. However, since there may still be 
conflicts coming from historical randomly assigned block ID, for blocks in 
block reports, we still to check two places to determine if this is a 
contiguous block or a stripped block.
If a block's ID has the 'striped' flag bit, we always _attempt_ to look up the 
block group map first. Without rolling upgrade we only need this one lookup. 
And yes, we do need to check two places in the worst case. Given that HDFS-4645 
will be over 2 years old by the time erasure coding is released, I guess this 
won't happen a lot?

 Allocating and persisting block groups in NameNode
 --

 Key: HDFS-7339
 URL: https://issues.apache.org/jira/browse/HDFS-7339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
 HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
 HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg


 All erasure codec operations center around the concept of _block group_; they 
 are formed in initial encoding and looked up in recoveries and conversions. A 
 lightweight class {{BlockGroup}} is created to record the original and parity 
 blocks in a coding group, as well as a pointer to the codec schema (pluggable 
 codec schemas will be supported in HDFS-7337). With the striping layout, the 
 HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
 Therefore we propose to extend a file’s inode to switch between _contiguous_ 
 and _striping_ modes, with the current mode recorded in a binary flag. An 
 array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
 “traditional” HDFS files with contiguous block layout.
 The NameNode creates and maintains {{BlockGroup}} instances through the new 
 {{ECManager}} component; the attached figure has an illustration of the 
 architecture. As a simple example, when a {_Striping+EC_} file is created and 
 written to, it will serve requests from the client to allocate new 
 {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
 {{BlockGroups}} are allocated both in initial online encoding and in the 
 conversion from replication to EC. {{ECManager}} also facilitates the lookup 
 of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289698#comment-14289698
 ] 

Rakesh R commented on HDFS-7648:


[~szetszwo] I'm going through the block ID-based block layout on datanodes 
design and I come across this jira. I'm interested to implement this idea. I 
feel block report generation would be feasible one. Could you briefly explain 
about the verification points if you have anything specific in your mind. 
Thanks!

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7584) Enable Quota Support for Storage Types (SSD)

2015-01-23 Thread Xiaoyu Yao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-7584:
-
Attachment: HDFS-7584.2.patch

 Enable Quota Support for Storage Types (SSD) 
 -

 Key: HDFS-7584
 URL: https://issues.apache.org/jira/browse/HDFS-7584
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-7584 Quota by Storage Type - 01202015.pdf, 
 HDFS-7584.0.patch, HDFS-7584.1.patch, HDFS-7584.2.patch, editsStored


 Phase II of the Heterogeneous storage features have completed by HDFS-6584. 
 This JIRA is opened to enable Quota support of different storage types in 
 terms of storage space usage. This is more important for certain storage 
 types such as SSD as it is precious and more performant. 
 As described in the design doc of HDFS-5682, we plan to add new 
 quotaByStorageType command and new name node RPC protocol for it. The quota 
 by storage type feature is applied to HDFS directory level similar to 
 traditional HDFS space quota. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HDFS-7421) Move processing of postponed over-replicated blocks to a background task

2015-01-23 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers resolved HDFS-7421.
--
Resolution: Duplicate

 Move processing of postponed over-replicated blocks to a background task
 

 Key: HDFS-7421
 URL: https://issues.apache.org/jira/browse/HDFS-7421
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 2.6.0
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers

 In an HA environment, we postpone sending block invalidates to DNs until all 
 DNs holding a given block have done at least one block report to the NN after 
 it became active. When that first block report after becoming active does 
 occur, we attempt to reprocess all postponed misreplicated blocks inline with 
 the block report RPC. In the case where there are many postponed 
 misreplicated blocks, this can cause block report RPCs to take an 
 inordinately long time to complete, sometimes on the order of minutes, which 
 has the potential to tie up RPC handlers, block incoming RPCs, etc. There's 
 no need to hurriedly process all postponed misreplicated blocks so that we 
 can quickly send invalidate commands back to DNs, so let's move this 
 processing outside of the RPC handler context and into a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289822#comment-14289822
 ] 

Rakesh R commented on HDFS-7648:


Ah, this is upgrade path.

{{DataStorage.java:}}
{code}
line#1036

ExecutorService linkWorkers = Executors.newFixedThreadPool(numLinkWorkers);
.
.
futures.add(linkWorkers.submit(new CallableVoid() {
{code}

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289828#comment-14289828
 ] 

Allen Wittenauer commented on HDFS-7667:


While you're there, fix:

{code}
$HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade 
-clusterId cluster_ID
{code}

to be

{code}
$HADOOP_PREFIX/bin/hdfs --daemon start namenode --config $HADOOP_CONF_DIR  
-upgrade -clusterId cluster_ID
{code}


Also be aware that this may not apply to 2.x.  The documentation is different.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289855#comment-14289855
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7648:
---

Yes. It won't be a problem then.

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7614) Implement COMPLETE state of erasure coding block groups

2015-01-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289875#comment-14289875
 ] 

Zhe Zhang commented on HDFS-7614:
-

This design question is mainly [discussed | 
https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868]
 under HDFS-7339.

 Implement COMPLETE state of erasure coding block groups
 ---

 Key: HDFS-7614
 URL: https://issues.apache.org/jira/browse/HDFS-7614
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 HDFS-7339 implements 2 states of an under-construction block group: 
 {{UNDER_CONSTRUCTION}} and {{COMMITTED}}. The  {{COMPLETE}} requires DataNode 
 to report stored replicas, therefore will be separately implemented in this 
 JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7648:
--
Description: HDFS-6482 changed datanode layout to use block ID to determine 
the directory to store the block.  We should have some mechanism to verify it.  
Either DirectoryScanner or block report generation could do the check.  (was: 
HDFS-6482 changed datanode layout to use block ID to determine the directory to 
store the block.  We should have some mechanism to verify it.  Either 
DirectoryScanner or block report generation do the check.)

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289727#comment-14289727
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7648:
---

During block report generation or directory scanning, it traverses the 
directory for collecting all the replica information.  We should verify whether 
the actual directory location of a replica has the expected directory path 
computed using its block ID.  If a mismatch is found, it should also fix it.  
On a second thought, DirectoryScanner seems a better place to do the 
verification since the purpose of the DirectoryScanner is to verify and fix the 
blocks stored in the local directories.  What do you think?

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7652) Process block reports for erasure coded blocks

2015-01-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289879#comment-14289879
 ] 

Zhe Zhang commented on HDFS-7652:
-

This design question is mainly [discussed | 
https://issues.apache.org/jira/browse/HDFS-7339?focusedCommentId=14289868page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14289868]
 under HDFS-7339

Again, I really appreciate the in-depth thoughts! [~szetszwo] [~jingzhao]


 Process block reports for erasure coded blocks
 --

 Key: HDFS-7652
 URL: https://issues.apache.org/jira/browse/HDFS-7652
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang

 HDFS-7339 adds support in NameNode for persisting block groups. For memory 
 efficiency, erasure coded blocks under the striping layout are not stored in 
 {{BlockManager#blocksMap}}. Instead, entire block groups are stored in 
 {{BlockGroupManager#blockGroups}}. When a block report arrives from the 
 DataNode, it should be processed under the block group that it belongs to. 
 The following naming protocol is used to calculate the group of a given block:
 {code}
  * HDFS-EC introduces a hierarchical protocol to name blocks and groups:
  * Contiguous: {reserved block IDs | flag | block ID}
  * Striped: {reserved block IDs | flag | block group ID | index in group}
  *
  * Following n bits of reserved block IDs, The (n+1)th bit in an ID
  * distinguishes contiguous (0) and striped (1) blocks. For a striped block,
  * bits (n+2) to (64-m) represent the ID of its block group, while the last m
  * bits represent its index of the group. The value m is determined by the
  * maximum number of blocks in a group (MAX_BLOCKS_IN_GROUP).
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7648) Verify the datanode directory layout

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289787#comment-14289787
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7648:
---

 Should I need to worry about the race between 
 DatanodeBlockId_Layout_threads(they will do a linking ) in Datastorage and 
 this call path?

Could you show me the line number in DataStorage.java?

 Verify the datanode directory layout
 

 Key: HDFS-7648
 URL: https://issues.apache.org/jira/browse/HDFS-7648
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6482 changed datanode layout to use block ID to determine the directory 
 to store the block.  We should have some mechanism to verify it.  Either 
 DirectoryScanner or block report generation could do the check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289780#comment-14289780
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7337:
---

Do you expect that the erasure code package will be used outside hdfs?  If not, 
we could put everything under hdfs for the moment.

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Kai Zheng
 Attachments: HDFS-7337-prototype-v1.patch, 
 HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
 PluggableErasureCodec.pdf


 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas with different coding algorithms and parameters. The 
 resultant codec schemas can be utilized and specified via command tool for 
 different file folders. While design and implement such pluggable framework, 
 it’s also to implement a concrete codec by default (Reed Solomon) to prove 
 the framework is useful and workable. Separate JIRA could be opened for the 
 RS codec implementation.
 Note HDFS-7353 will focus on the very low level codec API and implementation 
 to make concrete vendor libraries transparent to the upper layer. This JIRA 
 focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)
Charles Lamb created HDFS-7667:
--

 Summary: Various typos and improvements to HDFS Federation doc
 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor


Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-23 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289868#comment-14289868
 ] 

Zhe Zhang commented on HDFS-7339:
-

[~jingzhao] Thanks for the insightful review! I believe this discussion also 
addresses comments from [~szetszwo] under HDFS-7285, HDFS-7614, and HDFS-7652. 

The main reason for creating a BlockGroup class and the hierarchical block ID 
protocol is to _minimize NN memory overhead_. As shown in the [fsimage analysis 
| 
https://issues.apache.org/jira/secure/attachment/12690129/fsimage-analysis-20150105.pdf],
 the {{blocksMap}} size increases 3.5x~5.4x if the NN plainly tracks every 
striped block -- this translates to 10s GB of memory usage. This is mainly 
caused by small blocks being striped into many more even smaller blocks. 

bq. I think DataNode does not need to know the difference between contiguous 
blocks and stripped blocks (when doing recovery the datanode can learn the 
information from NameNode). The concept of BlockGroup should be known and used 
only internally in NameNode (and maybe also logically known by the client while 
writing). 
bq. Datanodes and their block reports do not distinguish stripped and 
contiguous blocks. And we do not need to distinguish them from the block ID. 
They are treated equally while storing and reporting in/from the DN.
Agreed. DN is indeed group-agnostic in the current design. The only DN code 
change will be for block recovery and conversion. It will probably be clearer 
when the client patch (HDFS-7545) is ready. As shown in the [design | 
https://issues.apache.org/jira/secure/attachment/12687886/DataStripingSupportinHDFSClient.pdf],
 after receiving a newly allocated block group, the client does the following:
# Calculates blocks IDs from the block group ID and the group layout (number of 
data and parity blocks) -- a block's ID is basically the group ID plus the 
block's index in the group.
# The {{DFSOutputStream}} starts _n_ {{DataStreamer}} threads, each write one 
block to its destination DN. Note that even the {{DataStreamer}} is unaware of 
the group -- it just follows the regular client-DN block writing protocol. 
Therefore the DN just receives and processes regular block creation and write 
requests.

The DN then follows the regular block reporting protocol for all contiguous and 
striped blocks. Then the NN (with the logic from HDFS-7652) will parse the 
reported block ID, and store the reported info under either {{blocksMap}} or 
the map of block groups. Again, the benefit of having a separate map for block 
groups is to avoid the order-of-magnitude increase of {{blocksMap}} size. 

We can track on the unit of block groups because data loss can only happen when 
the entire group is under-replicated -- i.e. the number of healthy blocks in 
the group falls below a threshold. This coarse-grained tracking also aligns 
with the plan to push some monitoring and recovery workload from NN to DN, as 
[~sureshms] also [proposed | 
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14192480page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14192480]
 in the meetup. 

bq. Fundamentally BlockGroup is also a BlockCollection. We do not need to 
assign generation stamp to BlockGroup (and even its id can be omitted). What we 
need is only maintaining the mapping between block and blockgroup in the 
original blocksmap, recording the list of blocks in the blockgroup, and 
recording the blockgroups in INodeFile.
This is an interesting thought and does simplify the code. But it seems to me 
the added complexity of tracking block groups is necessary to avoid heavy NN 
overhead. The generation stamp of a block group will be used to derive the 
stamps for its blocks (this logic is not included in the patch yet).

bq. I think in this way we can simplify the current design and reuse most of 
the current block management code. 
Reusing block management code is a great point. While developing this patch I 
did have to take many {{Block}} management logics and create counterparts for 
{{BlockGroup}}. One possibility is to create a common ancestor class for 
{{Block}} and {{BlockGroup}} (e.g., {{GeneralizedBlock}}). Main commonalities 
being:
# Both represent a contiguous range of data in a file. Therefore each file 
consists of an array of {{GeneralizedBlock}}.
# Both are a separate unit for NN monitoring. Therefore {{BlocksMap}} can work 
with {{GeneralizedBlock}}
# Both have a capacity and a set of storage locations

Another alternative to reuse block mgmt code is to treat each {{Block}} as a 
single-member {{BlockGroup}}. 

I discussed the above 2 alternatives offline with [~andrew.wang] and we are 
inclined to use separate block group management code in this JIRA and start a 
refactoring JIRA after more logics are fleshed out. At that time we'll see more 
clearly which option is easier.

bq. 

[jira] [Commented] (HDFS-7611) deleteSnapshot and delete of a file can leave orphaned blocks in the blocksMap on NameNode restart.

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290062#comment-14290062
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7611:
---

Thanks for digging deep into it.  We should fix the snapshot bug.

Is there a way to change TestFileTruncate for working around the bug?  It is a 
bad advertisement for the new truncate feature if TestFileTruncate keeps 
failing.

 deleteSnapshot and delete of a file can leave orphaned blocks in the 
 blocksMap on NameNode restart.
 ---

 Key: HDFS-7611
 URL: https://issues.apache.org/jira/browse/HDFS-7611
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Konstantin Shvachko
Assignee: Byron Wong
Priority: Critical
 Attachments: blocksNotDeletedTest.patch, testTruncateEditLogLoad.log


 If quotas are enabled a combination of operations *deleteSnapshot* and 
 *delete* of a file can leave  orphaned  blocks in the blocksMap on NameNode 
 restart. They are counted as missing on the NameNode, and can prevent 
 NameNode from coming out of safeMode and could cause memory leak during 
 startup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7058) Tests for truncate CLI

2015-01-23 Thread Dasha Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290085#comment-14290085
 ] 

Dasha Boudnik commented on HDFS-7058:
-

I can look into this. Thanks!

 Tests for truncate CLI
 --

 Key: HDFS-7058
 URL: https://issues.apache.org/jira/browse/HDFS-7058
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Dasha Boudnik

 Modify TestCLI to include general truncate tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4919) Improve documentation of dfs.permissions.enabled flag.

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290108#comment-14290108
 ] 

Hadoop QA commented on HDFS-4919:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12600936/HDFS-4919.patch
  against trunk revision 6c3fec5.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9316//console

This message is automatically generated.

 Improve documentation of dfs.permissions.enabled flag.
 --

 Key: HDFS-4919
 URL: https://issues.apache.org/jira/browse/HDFS-4919
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chris Nauroth
 Attachments: HDFS-4919.patch


 The description of dfs.permissions.enabled in hdfs-default.xml does not state 
 that permissions are always checked on certain calls regardless of this 
 configuration.  The HDFS permissions guide still mentions the deprecated 
 dfs.permissions property instead of the currently supported 
 dfs.permissions.enabled.
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Configuration_Parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3728) Update Httpfs documentation

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290113#comment-14290113
 ] 

Hadoop QA commented on HDFS-3728:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550247/HDFS-3728.patch
  against trunk revision 6c3fec5.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9318//console

This message is automatically generated.

 Update Httpfs documentation
 ---

 Key: HDFS-3728
 URL: https://issues.apache.org/jira/browse/HDFS-3728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 1.0.3, 3.0.0, 2.0.2-alpha
Reporter: Santhosh Srinivasan
Priority: Minor
  Labels: newbie
 Attachments: HDFS-3728.patch


 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html
 Section: How HttpFS and Hadoop HDFS Proxy differ?
 # Change seening to seen
 # HttpFS uses a clean HTTP REST API making its use with HTTP tools more 
 intuitive. is very subjective. Can it be rephrased or removed?
 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html
 Section: Configure HttpFS
 # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir 
 property set to... to add to the httpfs-site.xml file the 
 httpfs.hadoop.config.dir property and set the value to ...
 Section: Configure Hadoop
 # Change defined to define
 Section: Restart Hadoop
 # Typo - to (not ot)
 Section: Start/Stop HttpFS
 # lists (plural)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: hdfs-7411.008.patch

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290173#comment-14290173
 ] 

Andrew Wang commented on HDFS-7411:
---

If you look at version 2 of the patch, you can see the initial refactor, which 
consisted of moving some methods from BlockManager to DecomManager. I didn't 
bother splitting this though since it ended up not being very interesting. 
DecomManager is also basically all new code, so the old code would be moved and 
then subsequently deleted if we split it.

I think the easiest way of reviewing it is just to read through DecomManager, 
which really isn't that big of a class. It's quite well commented and has lots 
of logging, which is part of why this change as a whole appears large.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290002#comment-14290002
 ] 

Charles Lamb commented on HDFS-7667:


[~aw],

Thanks for the review. I started out intending to just fix a few minor errors 
(missing articles, obviously wrong typos in commands, etc.). Then I couldn't 
help myself so I made some slightly larger grammatical changes and tightened up 
a few things. Please stop me before I kill any more and commit this.

Thanks!

Of course we still have not heard from Mr. Jenkins... I wonder where he is 
today.


 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7667:
---
  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s:   (was: 2.7.0)
  Status: Resolved  (was: Patch Available)

lol, believe I know the feeling... :D

Committed to trunk.

Thanks!

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290032#comment-14290032
 ] 

Charles Lamb commented on HDFS-7667:


Thanks for the review and the commit [~aw]. If you're bored, HDFS-7644 is a 3 
char fix.


 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4919) Improve documentation of dfs.permissions.enabled flag.

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290090#comment-14290090
 ] 

Allen Wittenauer commented on HDFS-4919:


This no longer applies. :(

 Improve documentation of dfs.permissions.enabled flag.
 --

 Key: HDFS-4919
 URL: https://issues.apache.org/jira/browse/HDFS-4919
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chris Nauroth
 Attachments: HDFS-4919.patch


 The description of dfs.permissions.enabled in hdfs-default.xml does not state 
 that permissions are always checked on certain calls regardless of this 
 configuration.  The HDFS permissions guide still mentions the deprecated 
 dfs.permissions property instead of the currently supported 
 dfs.permissions.enabled.
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Configuration_Parameters



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290034#comment-14290034
 ] 

Hudson commented on HDFS-7667:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6920 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6920/])
HDFS-7667. Various typos and improvements to HDFS Federation doc  (Charles Lamb 
via aw) (aw: rev d411460e0d66b9b9d58924df295a957ba84b17d7)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm


 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7644) minor typo in HttpFS doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290048#comment-14290048
 ] 

Charles Lamb commented on HDFS-7644:


Gee, here I am fixing all these typos and I can't even get the Jira title 
correct.

Thanks for the review and the commit [~aw].


 minor typo in HttpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Fix For: 2.7.0

 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7058) Tests for truncate CLI

2015-01-23 Thread Dasha Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dasha Boudnik reassigned HDFS-7058:
---

Assignee: Dasha Boudnik

 Tests for truncate CLI
 --

 Key: HDFS-7058
 URL: https://issues.apache.org/jira/browse/HDFS-7058
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Dasha Boudnik

 Modify TestCLI to include general truncate tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3750) API docs don't include HDFS

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-3750:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1

Committed to trunk.

Thanks!

 API docs don't include HDFS
 ---

 Key: HDFS-3750
 URL: https://issues.apache.org/jira/browse/HDFS-3750
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jolly Chen
Priority: Critical
 Fix For: 3.0.0

 Attachments: HDFS-3750.patch


 [The javadocs|http://hadoop.apache.org/common/docs/current/api/index.html] 
 don't include HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3728) Update Httpfs documentation

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-3728:
---
Status: Open  (was: Patch Available)

 Update Httpfs documentation
 ---

 Key: HDFS-3728
 URL: https://issues.apache.org/jira/browse/HDFS-3728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha, 1.0.3, 3.0.0
Reporter: Santhosh Srinivasan
Priority: Minor
  Labels: newbie
 Attachments: HDFS-3728.patch


 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html
 Section: How HttpFS and Hadoop HDFS Proxy differ?
 # Change seening to seen
 # HttpFS uses a clean HTTP REST API making its use with HTTP tools more 
 intuitive. is very subjective. Can it be rephrased or removed?
 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html
 Section: Configure HttpFS
 # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir 
 property set to... to add to the httpfs-site.xml file the 
 httpfs.hadoop.config.dir property and set the value to ...
 Section: Configure Hadoop
 # Change defined to define
 Section: Restart Hadoop
 # Typo - to (not ot)
 Section: Start/Stop HttpFS
 # lists (plural)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290095#comment-14290095
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

BTW, have we updated user documentation for the truncate CLI change?

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Fix For: 3.0.0

 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
 HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3728) Update Httpfs documentation

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-3728:
---
Status: Patch Available  (was: Open)

 Update Httpfs documentation
 ---

 Key: HDFS-3728
 URL: https://issues.apache.org/jira/browse/HDFS-3728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.2-alpha, 1.0.3, 3.0.0
Reporter: Santhosh Srinivasan
Priority: Minor
  Labels: newbie
 Attachments: HDFS-3728.patch


 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/index.html
 Section: How HttpFS and Hadoop HDFS Proxy differ?
 # Change seening to seen
 # HttpFS uses a clean HTTP REST API making its use with HTTP tools more 
 intuitive. is very subjective. Can it be rephrased or removed?
 Link: 
 http://hadoop.apache.org/common/docs/current/hadoop-hdfs-httpfs/ServerSetup.html
 Section: Configure HttpFS
 # Change ...add to the httpfs-site.xml file the httpfs.hadoop.config.dir 
 property set to... to add to the httpfs-site.xml file the 
 httpfs.hadoop.config.dir property and set the value to ...
 Section: Configure Hadoop
 # Change defined to define
 Section: Restart Hadoop
 # Typo - to (not ot)
 Section: Start/Stop HttpFS
 # lists (plural)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: (was: hdfs-7411.008.patch)

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: hdfs-7411.008.patch

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290146#comment-14290146
 ] 

Andrew Wang commented on HDFS-7411:
---

Thanks again for reviewing Colin, fixed with the following notes:

bq. Grammar: is already decomissioning

decommissioning in progress is a state for a node, so I think this is 
accurate, although ugly, language.

bq. What's the rationale for initializing the DecomissionManager configuration 
in activate rather than in the constructor? It seems like if we initialized the 
conf stuff in the constructor we could make more of it final?

I wasn't sure about this either, but it seems like the NN really likes for 
everything to be init'd with the Configuration passed when starting common 
services.

For this particular function, I went ahead and made the config variables final 
since they're just scoped to that function. Since we make a new Monitor each 
time, those members are final there too.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289985#comment-14289985
 ] 

Allen Wittenauer commented on HDFS-7667:


There are other problems in the doc. but I don't know if you want to fix them 
now or wait till later. Following the mantra of Don't let best stop better, 
I'm +1 for committing this to trunk.  Let me know if you want to continue 
working on it or commit this version now. :)

Thanks!

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3689) Add support for variable length block

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289966#comment-14289966
 ] 

Hadoop QA commented on HDFS-3689:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694208/HDFS-3689.009.patch
  against trunk revision 24aa462.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

  org.apache.hadoop.security.ssl.TestReloadingX509TrustManager
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate
  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-nfs:

org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9313//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9313//console

This message is automatically generated.

 Add support for variable length block
 -

 Key: HDFS-3689
 URL: https://issues.apache.org/jira/browse/HDFS-3689
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, 
 HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, 
 HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, 
 HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, 
 HDFS-3689.009.patch, HDFS-3689.009.patch, editsStored


 Currently HDFS supports fixed length blocks. Supporting variable length block 
 will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289983#comment-14289983
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7339:
---

 The main reason for creating a BlockGroup class and the hierarchical block ID 
 protocol is to minimize NN memory overhead. ...

This can be achieved by using consecutive (normal) block IDs for the blocks in 
a block group without dividing the ID space; see below.  (This is not easy to 
describe it.  Please let me know if you are confused.)
- For the block groups stored in namenode, only store the first block ID.  The 
other block IDs can be deduced with the storage policy.
- Use the same generation stamp for all the blocks.
- How to support lookups in BlocksMap?  There are several ways described below.
-# Change the hash function so that consecutive IDs will be mapped to the same 
hash value and implement BlockGroup.equal(..) so that it returns true with any 
block id in the group.  For example, we may only use the high 60-bit for 
computing has code.  Suppose the blocks in a block group have ID from 0x302 to 
0x30A.  We will be able to lookup the block group using any of the block IDs.  
What happen if the first ID is near the low 4-bit boundary, say 0x30D?  We may 
simply skip to 0x310 when allocating the block IDs so that it won't happen.
-# We may store the first ID (or the offset to the first ID) also in datanode 
for ec blocks.  This seems not a good solution.

If we enforce block id allocation so that the lower 4-bit of the first ID must 
be zeros, then it is very similar to the scheme propused in the design doc 
except there is no notation of block group in the block IDs.


 Allocating and persisting block groups in NameNode
 --

 Key: HDFS-7339
 URL: https://issues.apache.org/jira/browse/HDFS-7339
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, 
 HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, 
 HDFS-7339-006.patch, Meta-striping.jpg, NN-stripping.jpg


 All erasure codec operations center around the concept of _block group_; they 
 are formed in initial encoding and looked up in recoveries and conversions. A 
 lightweight class {{BlockGroup}} is created to record the original and parity 
 blocks in a coding group, as well as a pointer to the codec schema (pluggable 
 codec schemas will be supported in HDFS-7337). With the striping layout, the 
 HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. 
 Therefore we propose to extend a file’s inode to switch between _contiguous_ 
 and _striping_ modes, with the current mode recorded in a binary flag. An 
 array of BlockGroups (or BlockGroup IDs) is added, which remains empty for 
 “traditional” HDFS files with contiguous block layout.
 The NameNode creates and maintains {{BlockGroup}} instances through the new 
 {{ECManager}} component; the attached figure has an illustration of the 
 architecture. As a simple example, when a {_Striping+EC_} file is created and 
 written to, it will serve requests from the client to allocate new 
 {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, 
 {{BlockGroups}} are allocated both in initial online encoding and in the 
 conversion from replication to EC. {{ECManager}} also facilitates the lookup 
 of {{BlockGroup}} information for block recovery work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7644) minor typo in HttpFS doc

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7644:
---
Summary: minor typo in HttpFS doc  (was: minor typo in HffpFS doc)

 minor typo in HttpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4922) Improve the short-circuit document

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290101#comment-14290101
 ] 

Allen Wittenauer commented on HDFS-4922:


If someone updates this, we can get this committed :)

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7320) The appearance of hadoop-hdfs-httpfs site docs is inconsistent

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290135#comment-14290135
 ] 

Hudson commented on HDFS-7320:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6923 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6923/])
HDFS-7320. The appearance of hadoop-hdfs-httpfs site docs is inconsistent 
(Masatake Iwasaki via aw) (aw: rev 8f26d5a8a13539e8292c1cf7f141eff7e58984a5)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml


 The appearance of hadoop-hdfs-httpfs site docs is inconsistent 
 ---

 Key: HDFS-7320
 URL: https://issues.apache.org/jira/browse/HDFS-7320
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7320.1.patch


 The docs of hadoop-hdfs-httpfs use different maven-base.css and 
 maven-theme.css from other modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290190#comment-14290190
 ] 

Hadoop QA commented on HDFS-7667:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694230/HDFS-7667.001.patch
  against trunk revision 56df5f4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9315//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9315//console

This message is automatically generated.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7058) Tests for truncate CLI

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7058:
--
Description: Modify TestCLI to include general truncate tests.  (was: 
Comprehensive test coverage for truncate.)
Summary: Tests for truncate CLI  (was: Tests for truncate)

Revised summary and description.

 Tests for truncate CLI
 --

 Key: HDFS-7058
 URL: https://issues.apache.org/jira/browse/HDFS-7058
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko

 Modify TestCLI to include general truncate tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7644) minor typo in HttpFS doc

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290054#comment-14290054
 ] 

Hudson commented on HDFS-7644:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6921 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6921/])
HDFS-7644. minor typo in HttpFS doc (Charles Lamb via aw) (aw: rev 
5c93ca2f3cfd9ebcb98be89c3a238a36c03f4422)
* hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 minor typo in HttpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Fix For: 2.7.0

 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-01-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290092#comment-14290092
 ] 

Allen Wittenauer commented on HDFS-6261:


I'd prefer to see this get merged into the RackAwareness documentation rather 
than building a completely new doc.

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7320) The appearance of hadoop-hdfs-httpfs site docs is inconsistent

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7320:
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

+1 committed to trunk

Thanks!

 The appearance of hadoop-hdfs-httpfs site docs is inconsistent 
 ---

 Key: HDFS-7320
 URL: https://issues.apache.org/jira/browse/HDFS-7320
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7320.1.patch


 The docs of hadoop-hdfs-httpfs use different maven-base.css and 
 maven-theme.css from other modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-4922) Improve the short-circuit document

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290112#comment-14290112
 ] 

Hadoop QA commented on HDFS-4922:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12622806/HDFS-4922-006.patch
  against trunk revision 6c3fec5.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9317//console

This message is automatically generated.

 Improve the short-circuit document
 --

 Key: HDFS-4922
 URL: https://issues.apache.org/jira/browse/HDFS-4922
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation, hdfs-client
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Attachments: HDFS-4922-002.patch, HDFS-4922-003.patch, 
 HDFS-4922-004.patch, HDFS-4922-005.patch, HDFS-4922-006.patch, HDFS-4922.patch


 explain the default value and add one configure key, which doesn't show in 
 the document, but exists in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290111#comment-14290111
 ] 

Hadoop QA commented on HDFS-6261:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12660927/HDFS-6261.v3.patch
  against trunk revision 6c3fec5.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9319//console

This message is automatically generated.

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290152#comment-14290152
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7411:
---

Could we separate code refactoring and improvement into two JIRAs?  The 
refactoring probably with a big patch is easy to review.  The improvement patch 
will be much smaller.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7666) Datanode blockId layout upgrade threads should be daemon thread

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289994#comment-14289994
 ] 

Hadoop QA commented on HDFS-7666:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694194/HDFS-7666-v1.patch
  against trunk revision 24aa462.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.TestDecommission
  org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9312//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9312//console

This message is automatically generated.

 Datanode blockId layout upgrade threads should be daemon thread
 ---

 Key: HDFS-7666
 URL: https://issues.apache.org/jira/browse/HDFS-7666
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Rakesh R
Assignee: Rakesh R
 Attachments: HDFS-7666-v1.patch


 This jira is to mark the layout upgrade thread as daemon thread.
 {code}
  int numLinkWorkers = datanode.getConf().getInt(
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS_KEY,
  DFSConfigKeys.DFS_DATANODE_BLOCK_ID_LAYOUT_UPGRADE_THREADS);
 ExecutorService linkWorkers = 
 Executors.newFixedThreadPool(numLinkWorkers);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7644) minor typo in HttpFS doc

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7644:
---
   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

+1

Committed to branch-2 and trunk.

Thanks!

 minor typo in HttpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Fix For: 2.7.0

 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290072#comment-14290072
 ] 

Tsz Wo Nicholas Sze commented on HDFS-3107:
---

There is a list of unresolved JIRAs.  Let's discuss it.
- HDFS-7341 Add initial snapshot support based on pipeline recovery
Is it still relevant?
- HDFS-7058 Tests for truncate CLI
Let's finish it before merging since CLI is user facing.  Is anyone working on 
it?
- HDFS-7655/HDFS-7656 Expose truncate API for Web HDFS/httpfs
It seems that we should not wait for them before merging.  Agree?
- HDFS-7659 We should check the new length of truncate can't be a negative value
Look like that this is going to be committed soon.
- HDFS-7665 Add definition of truncate preconditions/postconditions to 
filesystem specification
This is simple a documentation change.  Let's finish it before merging?

As a summary, how about finishing HDFS-7058, HDFS-7659 and HDFS-7665 before 
merging it to branch-2?


 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Fix For: 3.0.0

 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
 HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3750) API docs don't include HDFS

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290121#comment-14290121
 ] 

Hudson commented on HDFS-3750:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #6922 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6922/])
HDFS-3750. API docs don't include HDFS (Jolly Chen via aw) (aw: rev 
6c3fec5ec25caabbd8c5ac795a5bc5229b5365de)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* pom.xml


 API docs don't include HDFS
 ---

 Key: HDFS-3750
 URL: https://issues.apache.org/jira/browse/HDFS-3750
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Jolly Chen
Priority: Critical
 Fix For: 3.0.0

 Attachments: HDFS-3750.patch


 [The javadocs|http://hadoop.apache.org/common/docs/current/api/index.html] 
 don't include HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7667) Various typos and improvements to HDFS Federation doc

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290188#comment-14290188
 ] 

Hadoop QA commented on HDFS-7667:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694227/HDFS-7667.000.patch
  against trunk revision 56df5f4.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9314//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9314//console

This message is automatically generated.

 Various typos and improvements to HDFS Federation doc
 -

 Key: HDFS-7667
 URL: https://issues.apache.org/jira/browse/HDFS-7667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.6.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-7667.000.patch, HDFS-7667.001.patch


 Fix several incorrect commands, typos, grammatical errors, etc. in the HDFS 
 Federation doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7644) minor typo in HffpFS doc

2015-01-23 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289203#comment-14289203
 ] 

Charles Lamb commented on HDFS-7644:


The FB warnings are spurious.


 minor typo in HffpFS doc
 

 Key: HDFS-7644
 URL: https://issues.apache.org/jira/browse/HDFS-7644
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Trivial
 Attachments: HDFS-7644.000.patch


 In hadoop-httpfs/src/site/apt/index.apt.vm, s/seening/seen/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7653) Block Readers and Writers used in both client side and datanode side

2015-01-23 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289262#comment-14289262
 ] 

Li Bo commented on HDFS-7653:
-

Hi, Zhe , thanks for your comments. 
1.  BlockReader contains some client related methods and maybe we will add 
some new methods to the interface so I choose to create a new one.
2.  We will create a new dfs inputstream named DFSStripeInputStream and it 
uses BlockGroupReader to read data from different datanodes. BlockGroupReader 
contains a set of BlockReaders.  I will upload code to HDFS-7545 in several 
days. Current implementation tries to make client and datanode EC 
encoding/decoding work in the same model.
3.  Agree that we should simplify the logic of BlockWriter between 
datanodes. I will optimize the code later. I am not very clear why 
FSOutputSummer is not appropriate for a block writer, it contains a data buffer 
and checksum buffer. 
4.  We can have a further discussion after code is uploaded to HDFS-7545. I 
will refer to RSStriper logic in QFS to optimize current implementation.

Nits:
1.  we can read a byte and put to buf immediately. The position of buf need 
to remain the same, I will fix it and add unit test.
2.  I will change the unit test class name later. I generated the patch 
after all test cases pass.  getBlockFile() is a static method and other classes 
also use it in the way of MiniDFSCluster.getBlockFile. You can have a further 
check.


 Block Readers and Writers used in both client side and datanode side
 

 Key: HDFS-7653
 URL: https://issues.apache.org/jira/browse/HDFS-7653
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: BlockReadersWriters.patch


 There're a lot of block read/write operations in HDFS-EC, for example, when 
 client writes a file in striping layout, client has to write several blocks 
 to several different datanodes; if a datanode wants to do an 
 encoding/decoding task, it has to read several blocks from itself and other 
 datanodes, and writes one or more blocks to itself or other datanodes.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-3689) Add support for variable length block

2015-01-23 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-3689:

Attachment: HDFS-3689.009.patch

Thanks again Nicholas! Update the patch to add quota verification.

 Add support for variable length block
 -

 Key: HDFS-3689
 URL: https://issues.apache.org/jira/browse/HDFS-3689
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, hdfs-client, namenode
Affects Versions: 3.0.0
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch, 
 HDFS-3689.002.patch, HDFS-3689.003.patch, HDFS-3689.003.patch, 
 HDFS-3689.004.patch, HDFS-3689.005.patch, HDFS-3689.006.patch, 
 HDFS-3689.007.patch, HDFS-3689.008.patch, HDFS-3689.008.patch, 
 HDFS-3689.009.patch


 Currently HDFS supports fixed length blocks. Supporting variable length block 
 will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7418) Raw Reed-Solomon coder in pure Java

2015-01-23 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HDFS-7418:

Attachment: HDFS-7418-v1.patch

Uploaded an initial patch, pending for submit since it depends on the one in 
HDFS-7353.

 Raw Reed-Solomon coder in pure Java
 ---

 Key: HDFS-7418
 URL: https://issues.apache.org/jira/browse/HDFS-7418
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HDFS-7418-v1.patch


 This will implement RS coder by porting existing codes in HDFS-RAID in the 
 new codec and coder framework, which could be useful in case native support 
 isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7337) Configurable and pluggable Erasure Codec and schema

2015-01-23 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288973#comment-14288973
 ] 

Kai Zheng commented on HDFS-7337:
-

As discussed, most part of this source codes will be moved to hadoop-common 
side, but I'm not sure if it's OK to still use these JIRA entries that start 
with HDFS, instead of HADOOP.

Would anyone help confirm this ? It would be great if we don't have to change, 
it's reasonable because it does work for HDFS, although for other 
considerations we'd better move over there.

 Configurable and pluggable Erasure Codec and schema
 ---

 Key: HDFS-7337
 URL: https://issues.apache.org/jira/browse/HDFS-7337
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Kai Zheng
 Attachments: HDFS-7337-prototype-v1.patch, 
 HDFS-7337-prototype-v2.zip, HDFS-7337-prototype-v3.zip, 
 PluggableErasureCodec.pdf


 According to HDFS-7285 and the design, this considers to support multiple 
 Erasure Codecs via pluggable approach. It allows to define and configure 
 multiple codec schemas with different coding algorithms and parameters. The 
 resultant codec schemas can be utilized and specified via command tool for 
 different file folders. While design and implement such pluggable framework, 
 it’s also to implement a concrete codec by default (Reed Solomon) to prove 
 the framework is useful and workable. Separate JIRA could be opened for the 
 RS codec implementation.
 Note HDFS-7353 will focus on the very low level codec API and implementation 
 to make concrete vendor libraries transparent to the upper layer. This JIRA 
 focuses on high level stuffs that interact with configuration, schema and etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7669) HDFS Design Doc references commands that no longer exist.

2015-01-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7669:
--

 Summary: HDFS Design Doc references commands that no longer exist.
 Key: HDFS-7669
 URL: https://issues.apache.org/jira/browse/HDFS-7669
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer


hadoop dfs should be hadoop fs
hadoop dfsadmin should be hdfs dfsadmin
hadoop dfs -rmr should be hadoop fs -rm -R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290278#comment-14290278
 ] 

Haohui Mai commented on HDFS-6673:
--

-0

bq. ... a fairly large fsimage here.  ... I think the current level of 
performance is sufficient for the vast majority of our customers.

The test is on a 3G fsimage which can easily fit in the working set of your 
laptop. 

Multiple users are running much larger clusters, where their fsimage can be as 
big as 40G (see HDFS-5698). I see the value of using LevelDB as a swap space to 
handle fsimages that are bigger than the working set, but what are the net 
benefits that the tool bring in if it can only handle fsimages that are 10x 
smaller than the ones in some of the production clusters?

bq. Furthermore, this is a boolean improvement over the previous state of 
affairs; currently, we have no delimited OIV tool, and with this patch, we do.

This is not true. Delimited OIV was such a headache thus we had to revived the 
legacy fsimage saver / loader / oiv in HDFS-6293. 

bq. This is the result of a few rounds of performance tuning. 

You guys deserve all the credits of getting this tool working, but given we 
have HDFS-6293 as a solid solution today, I would much rather to see this tool 
to be capable of handling fsimages from the real, large-scale production runs 
(at least from the design point of view), instead of putting in a half-baked 
solution as-is. I'm also happy to provide help if necessary.


 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7665) Add definition of truncate preconditions/postconditions to filesystem specification

2015-01-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290352#comment-14290352
 ] 

Konstantin Shvachko commented on HDFS-7665:
---

Steve, could you mention where exactly the specifications should be added.
I understand we need to add truncate operation to HDFS documentation here:
- 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html
- 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html

I guess you meant something else.

 Add definition of truncate preconditions/postconditions to filesystem 
 specification
 ---

 Key: HDFS-7665
 URL: https://issues.apache.org/jira/browse/HDFS-7665
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 3.0.0
Reporter: Steve Loughran
 Fix For: 3.0.0


 With the addition of a major new feature to filesystems, the filesystem 
 specification in hadoop-common/site is now out of sync. 
 This means that
 # there's no strict specification of what it should do
 # you can't derive tests from that specification
 # other people trying to implement the API will have to infer what to do from 
 the HDFS source
 # there's no way to decide whether or not the HDFS implementation does what 
 it is intended.
 # without matching tests against the raw local FS, differences between the 
 HDFS impl and the Posix standard one won't be caught until it is potentially 
 too late to fix.
 The operation should be relatively easy to define (after a truncate, the 
 files bytes [0...len-1] must equal the original bytes, length(file)==len, etc)
 The truncate tests already written could then be pulled up into contract 
 tests which any filesystem implementation can run against.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6729) Support maintenance mode for DN

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290402#comment-14290402
 ] 

Hadoop QA commented on HDFS-6729:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694290/HDFS-6729.004.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.cli.TestHDFSCLI

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9321//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9321//console

This message is automatically generated.

 Support maintenance mode for DN
 ---

 Key: HDFS-6729
 URL: https://issues.apache.org/jira/browse/HDFS-6729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, 
 HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch


 Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only 
 takes a short amount of time (e.g., 10 minutes). In these cases, the users do 
 not want to report missing blocks on this DN because the DN will be online 
 shortly without data lose. Thus, we need a maintenance mode for a DN so that 
 maintenance work can be carried out on the DN without having to decommission 
 it or the DN being marked as dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6729) Support maintenance mode for DN

2015-01-23 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6729:

Attachment: HDFS-6729.004.patch

Updated the patch to:

# add {{dfsadmin -setMaintenanceMode}} command and RPCs to NN
# change {{dfsadmin -report}} to display maintenance node information.

 Support maintenance mode for DN
 ---

 Key: HDFS-6729
 URL: https://issues.apache.org/jira/browse/HDFS-6729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, 
 HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch


 Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only 
 takes a short amount of time (e.g., 10 minutes). In these cases, the users do 
 not want to report missing blocks on this DN because the DN will be online 
 shortly without data lose. Thus, we need a maintenance mode for a DN so that 
 maintenance work can be carried out on the DN without having to decommission 
 it or the DN being marked as dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6729) Support maintenance mode for DN

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290240#comment-14290240
 ] 

Andrew Wang commented on HDFS-6729:
---

Hey Eddy, quick question, this looks like soft state that isn't persisted 
across NN restarts / failovers. Is that suitable for the target usecases?

 Support maintenance mode for DN
 ---

 Key: HDFS-6729
 URL: https://issues.apache.org/jira/browse/HDFS-6729
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.5.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-6729.000.patch, HDFS-6729.001.patch, 
 HDFS-6729.002.patch, HDFS-6729.003.patch, HDFS-6729.004.patch


 Some maintenance works (e.g., upgrading RAM or add disks) on DataNode only 
 takes a short amount of time (e.g., 10 minutes). In these cases, the users do 
 not want to report missing blocks on this DN because the DN will be online 
 shortly without data lose. Thus, we need a maintenance mode for a DN so that 
 maintenance work can be carried out on the DN without having to decommission 
 it or the DN being marked as dead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290297#comment-14290297
 ] 

Andrew Wang commented on HDFS-6673:
---

Oops, realized Eddy still needs to address my previous comments. Waiting on 
that and Jenkins.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7670) HDFS Quota guide has typos, incomplete command lines

2015-01-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7670:
--

 Summary: HDFS Quota guide has typos, incomplete command lines
 Key: HDFS-7670
 URL: https://issues.apache.org/jira/browse/HDFS-7670
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Allen Wittenauer


HDFS quota guide uses fs -count, etc as a valid command instead of hadoop 
fs, etc.

There is also a typo in 'director'.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7669) HDFS Design Doc references commands that no longer exist.

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7669:
---
Component/s: documentation

 HDFS Design Doc references commands that no longer exist.
 -

 Key: HDFS-7669
 URL: https://issues.apache.org/jira/browse/HDFS-7669
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Allen Wittenauer

 hadoop dfs should be hadoop fs
 hadoop dfsadmin should be hdfs dfsadmin
 hadoop dfs -rmr should be hadoop fs -rm -R



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7672) Handle write failure for EC blocks

2015-01-23 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-7672:
-

 Summary: Handle write failure for EC blocks
 Key: HDFS-7672
 URL: https://issues.apache.org/jira/browse/HDFS-7672
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


For (6, 3)-Reed-Solomon, a client writes to 6 data blocks and 3 parity blocks 
concurrently.  We need to handle datanode or network failures when writing a EC 
BlockGroup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Lei (Eddy) Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HDFS-6673:

Attachment: HDFS-6673.006.patch

Thanks [~andrew.wang] and [~wheat9].

This patch addressed comments previously posted.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch, HDFS-6673.006.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290331#comment-14290331
 ] 

Haohui Mai commented on HDFS-6673:
--

bq. Also, even a 40GB image easily fits in memory on server these days.

The whole point of this tool is to run the oiv on machines that do not have the 
luxury of abundant memory. Can you clarify what point you are trying to make?

bq. These seem like pretty big drawbacks to me, and are addressed by this tool. 
I think calling it half-baked is unfair considering it provides greater 
functionality.

Can you clarify what the greater functionality are? The Delimiter only outputs 
mtime/atime and other information available from legacy fsimage.

BTW, if you really want to commit this please update the document to explicit 
state that the tool will not work for large fsimage, so that the users won't be 
caught by surprise.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch, HDFS-6673.006.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7659) We should check the new length of truncate can't be a negative value.

2015-01-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290329#comment-14290329
 ] 

Konstantin Shvachko commented on HDFS-7659:
---

+1. Looks good.

Can I ask you to add three lines to TestFileTruncate with this patch, even 
though it is not ditectly related to your change.
This will fix TestFileTruncate failures as [~szetszwo] requested in HDFS-7611. 
We do not risk to loose the bug since we know the problem now.
Or we can of course fix it in another jira.
The lines are as follows just in the very beginning of 
{{testTruncateEditLogLoad()}}
{code}
// purge previously accumulated edits
fs.setSafeMode(SafeModeAction.SAFEMODE_ENTER);
fs.saveNamespace();
fs.setSafeMode(SafeModeAction.SAFEMODE_LEAVE);
{code} 

 We should check the new length of truncate can't be a negative value.
 -

 Key: HDFS-7659
 URL: https://issues.apache.org/jira/browse/HDFS-7659
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 3.0.0

 Attachments: HDFS-7659.001.patch, HDFS-7659.002.patch, 
 HDFS-7659.003.patch


 It's obvious that we should check the new length of truncate can't be a 
 negative value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3107) HDFS truncate

2015-01-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290345#comment-14290345
 ] 

Konstantin Shvachko commented on HDFS-3107:
---

Yes HDFS-7659 is ready.
Documentation needs to be updates. I mentioned it in my email on the dev list 
some days ago. Other things to do are adding truncate to DFSIO and SLive.
I don't think we should wait  for them to merge. My main concern is that it 
increases the work for developers. Branches being substancially diverged means 
that it is harder to merge new code into branch-2, which is not related to 
truncate.
Also it will be easier to implement TestCLI for example or the documentation 
update and then merge it into both branches at once.
In the end it is not that we have a release planned next week.

Would it be ok with you if we commit HDFS-7659, fix TestFileTruncate as I 
proposed there, and then merge?

 HDFS truncate
 -

 Key: HDFS-3107
 URL: https://issues.apache.org/jira/browse/HDFS-3107
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Lei Chang
Assignee: Plamen Jeliazkov
 Fix For: 3.0.0

 Attachments: HDFS-3107-13.patch, HDFS-3107-14.patch, 
 HDFS-3107-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
 HDFS-3107.15_branch2.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
 HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
 HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
 editsStored, editsStored.xml

   Original Estimate: 1,344h
  Remaining Estimate: 1,344h

 Systems with transaction support often need to undo changes made to the 
 underlying storage when a transaction is aborted. Currently HDFS does not 
 support truncate (a standard Posix operation) which is a reverse operation of 
 append, which makes upper layer applications use ugly workarounds (such as 
 keeping track of the discarded byte range per file in a separate metadata 
 store, and periodically running a vacuum process to rewrite compacted files) 
 to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290238#comment-14290238
 ] 

Andrew Wang commented on HDFS-6673:
---

Hey Haohui,

Eddy's done performance testing with a fairly large fsimage here. This is the 
result of a few rounds of performance tuning. I think the current level of 
performance is sufficient for the vast majority of our customers. Furthermore, 
this is a boolean improvement over the previous state of affairs; currently, we 
have no delimited OIV tool, and with this patch, we do.

So, with that said, I'd like to commit this and we can discuss further 
improvements on a follow-in JIRA. I'll go ahead and commit unless I hear 
otherwise by EOD Monday. Thanks.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7668) Convert site documentation from apt to markdown

2015-01-23 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7668:
---
Component/s: documentation

 Convert site documentation from apt to markdown
 ---

 Key: HDFS-7668
 URL: https://issues.apache.org/jira/browse/HDFS-7668
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer

 HDFS analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7668) Convert site documentation from apt to markdown

2015-01-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7668:
--

 Summary: Convert site documentation from apt to markdown
 Key: HDFS-7668
 URL: https://issues.apache.org/jira/browse/HDFS-7668
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Allen Wittenauer


HDFS analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7671) hdfs user guide should point to the common rack awareness doc

2015-01-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7671:
--

 Summary: hdfs user guide should point to the common rack awareness 
doc
 Key: HDFS-7671
 URL: https://issues.apache.org/jira/browse/HDFS-7671
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Reporter: Allen Wittenauer


HDFS user guide has a section on rack awareness that should really just be a 
pointer to the common doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6673) Add Delimited format supports for PB OIV tool

2015-01-23 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290294#comment-14290294
 ] 

Andrew Wang commented on HDFS-6673:
---

bq. The test is on a 3G fsimage which can easily fit in the working set of your 
laptop...Multiple users are running much larger clusters, where their fsimage 
can be as big as 40G

I know there are larger customers out there, but as I said above, we surveyed 
the sizes of our customer's fsimages, and this was one of the larger ones. Thus 
this solution will work for most production deployments. Also, even a 40GB 
image easily fits in memory on server these days.

bq. but given we have HDFS-6293 as a solid solution today, I would much rather 
to see this tool to be capable of handling fsimages from the real, large-scale 
production runs

HDFS-6293 doesn't include metadata newer than the old format. It also requires 
the NN to write out an additional old fsimage just for OIV alongside the real 
one. These seem like pretty big drawbacks to me, and are addressed by this 
tool. I think calling it half-baked is unfair considering it provides greater 
functionality.

Anyway, thanks for not -1'ing, I'll commit this now. Let's continue this 
discussion in a follow-on JIRA.

 Add Delimited format supports for PB OIV tool
 -

 Key: HDFS-6673
 URL: https://issues.apache.org/jira/browse/HDFS-6673
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.4.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor
 Attachments: HDFS-6673.000.patch, HDFS-6673.001.patch, 
 HDFS-6673.002.patch, HDFS-6673.003.patch, HDFS-6673.004.patch, 
 HDFS-6673.005.patch


 The new oiv tool, which is designed for Protobuf fsimage, lacks a few 
 features supported in the old {{oiv}} tool. 
 This task adds supports of _Delimited_ processor to the oiv tool. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7673) synthetic load generator docs give incorrect/incomplete commands

2015-01-23 Thread Allen Wittenauer (JIRA)
Allen Wittenauer created HDFS-7673:
--

 Summary: synthetic load generator docs give incorrect/incomplete 
commands
 Key: HDFS-7673
 URL: https://issues.apache.org/jira/browse/HDFS-7673
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Reporter: Allen Wittenauer


The synthetic load generator guide gives this helpful command to start it:

{code}
java LoadGenerator [options]
{code}

This, of course, won't work.  What's the class path?  What jar is it in?  Is 
this really the command?  Isn't there a shell script wrapping this?

This atrocity against normal users is committed three more times after this one 
with equally incomplete commands for other parts of the system.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-01-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290348#comment-14290348
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12694272/hdfs-7411.008.patch
  against trunk revision 8f26d5a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDecommission

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9320//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9320//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >