date:20150311


[ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357154#comment-14357154
 ] 

Hadoop QA commented on HDFS-7433:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703937/HDFS-7433.patch
  against trunk revision 30c428a.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9839//console

This message is automatically generated.

 Optimize performance of DatanodeManager's node map
 --

 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7433.patch, HDFS-7433.patch, HDFS-7433.patch, 
 HDFS-7433.patch


 The datanode map is currently a {{TreeMap}}.  For many thousands of 
 datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
 Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7433) Optimize performance of DatanodeManager's node map


 [ 
https://issues.apache.org/jira/browse/HDFS-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7433:
--
Attachment: HDFS-7433.patch

Rebased due to decomm manager rewrite.  No tests, but the change is now 
completely trivial.

 Optimize performance of DatanodeManager's node map
 --

 Key: HDFS-7433
 URL: https://issues.apache.org/jira/browse/HDFS-7433
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7433.patch, HDFS-7433.patch, HDFS-7433.patch, 
 HDFS-7433.patch


 The datanode map is currently a {{TreeMap}}.  For many thousands of 
 datanodes, tree lookups are ~10X more expensive than a {{HashMap}}.  
 Insertions and removals are up to 100X more expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics

2015-03-11 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357168#comment-14357168
 ] 

Xiaoyu Yao commented on HDFS-7491:
--

Thanks Ming for the contribution. The patch v4 looks good to me. +1 
(Non-binding).

 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HDFS-6658:
--
Attachment: HDFS-6658.patch

Sorry, last minute change to revert code back to as close as possible to
current code busted the repl monitor with NPE.

Based on preconditions I've added, they are detecting some bugs in the BM that
are currently masked. Namely the BM is designed to return phony values for
blocks not in the blocks map, ie. 0 counts, 0 storages, etc - instead of the
caller dealing with the situation. Added log to getStorages when iterating a
non-existent block.

Namenode memory optimization - Block replicas list
---

Key: HDFS-6658
URL: https://issues.apache.org/jira/browse/HDFS-6658
Project: Hadoop HDFS
Issue Type: Improvement
Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Daryn Sharp
Attachments: BlockListOptimizationComparison.xlsx, BlocksMap
redesign.pdf, HDFS-6658.patch, HDFS-6658.patch, HDFS-6658.patch, Namenode
Memory Optimizations - Block replicas list.docx

Part of the memory consumed by every BlockInfo object in the Namenode is a
linked list of block references for every DatanodeStorageInfo (called
triplets).
We propose to change the way we store the list in memory.
Using primitive integer indexes instead of object references will reduce the
memory needed for every block replica (when compressed oops is disabled) and
in our new design the list overhead will be per DatanodeStorageInfo and not
per block replica.
see attached design doc. for details and evaluation results.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7880) Remove the tests for legacy Web UI in branch-2


[ 
https://issues.apache.org/jira/browse/HDFS-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357097#comment-14357097
 ] 

Brahma Reddy Battula commented on HDFS-7880:


will I raise another issue for this removal of classes..?

 Remove the tests for legacy Web UI in branch-2
 --

 Key: HDFS-7880
 URL: https://issues.apache.org/jira/browse/HDFS-7880
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
Priority: Blocker
 Attachments: HDFS-7880-002.patch, HDFS-7880.patch


 These tests fails in branch-2 because the test assert that legacy UI exists.
 * TestJournalNode.testHttpServer:174 expected:200 but was:404
 * TestNNWithQJM.testWebPageHasQjmInfo:229 expected:200 but was:404
 * TestHAWebUI.testLinkAndClusterSummary:50 expected:200 but was:404
 * TestHostsFiles.testHostsExcludeDfshealthJsp:130 expected:200 but was:404
 * TestSecondaryWebUi.testSecondaryWebUiJsp:87 expected:200 but was:404



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-03-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357016#comment-14357016
 ] 

Hudson commented on HDFS-7830:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #129 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/129/])
HDFS-7830. DataNode does not release the volume lock when adding a volume 
fails. (Lei Xu via Colin P. McCabe) (cmccabe: rev 
5c1036d598051cf6af595740f1ab82092b0b6554)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7830.000.patch, HDFS-7830.001.patch, 
 HDFS-7830.002.patch, HDFS-7830.003.patch, HDFS-7830.004.patch


 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7873) OIV webhdfs premature close channel issue

2015-03-11 Thread Benoit Perroud (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Perroud reassigned HDFS-7873:


Assignee: (was: Benoit Perroud)

 OIV webhdfs premature close channel issue
 -

 Key: HDFS-7873
 URL: https://issues.apache.org/jira/browse/HDFS-7873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.6.0, 2.5.2
Reporter: Benoit Perroud
Priority: Minor
 Attachments: HDFS-7873-v1.txt, HDFS-7873-v2.txt


 The new Offline Image Viewer (OIV) supports to load the FSImage and _emulate_ 
 a webhdfs server to explore the image without touching the NN.
 This webhdfs server is not working with folders holding a significant number 
 of children (files or other folders):
 {quote}
 $  hadoop fs -ls webhdfs://127.0.0.1:5978/a/big/folder
 15/03/03 04:28:19 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 15/03/03 04:28:21 WARN security.UserGroupInformation: 
 PriviledgedActionException as:bperroud (auth:SIMPLE) 
 cause:java.io.IOException: Response decoding failure: 
 java.lang.IllegalStateException: Expected one of '}'
 ls: Response decoding failure: java.lang.IllegalStateException: Expected one 
 of '}'
 {quote}
 The error comes from an inappropriate usage of Netty. 
 {{e.getFuture().addListener(ChannelFutureListener.CLOSE)}} is closing the 
 channel too early because the future attached to the channel already sent the 
 header so the I/O operation succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-03-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357045#comment-14357045
 ] 

Hudson commented on HDFS-7830:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2079 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2079/])
HDFS-7830. DataNode does not release the volume lock when adding a volume 
fails. (Lei Xu via Colin P. McCabe) (cmccabe: rev 
5c1036d598051cf6af595740f1ab82092b0b6554)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetTestUtil.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeHotSwapVolumes.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/Storage.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Fix For: 2.7.0

 Attachments: HDFS-7830.000.patch, HDFS-7830.001.patch, 
 HDFS-7830.002.patch, HDFS-7830.003.patch, HDFS-7830.004.patch


 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356996#comment-14356996
]

Hadoop QA commented on HDFS-6658:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12703915/HDFS-6658.patch
against trunk revision 30c428a.

{color:red}-1 patch{color}. Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9836//console

This message is automatically generated.

Namenode memory optimization - Block replicas list
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient


 [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7435:
--
Attachment: HDFS-7435.patch

Test failed because I stubbed the simulated dataset to not return reports...  
Fixed.

[~jingzhao], please review.  We'd like to add this to our internal builds to 
help alleviate BR processing issues.  We also want to leverage this change to 
speed up rolling upgrades by dumping/reading the encoded BR to disk which this 
make trivial to do.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6576) Datanode log is generating at root directory in security mode


[ 
https://issues.apache.org/jira/browse/HDFS-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357507#comment-14357507
 ] 

Allen Wittenauer commented on HDFS-6576:


Actually, this is fixed in trunk so only if someone needs it in branch-2.

 Datanode log is generating at root directory in security mode
 -

 Key: HDFS-6576
 URL: https://issues.apache.org/jira/browse/HDFS-6576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, scripts
Affects Versions: 2.4.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-6576.patch, HDFS-6576_1.patch


 In hadoop-env.sh script we are exporting HADOOP_SECURE_DN_LOG_DIR , but in 
 above line export statement for HADOOP_LOG_DIR is commented 
 If in user environment HADOOP_LOG_DIR is not exported then 
 HADOOP_SECURE_DN_LOG_DIR env variable will export with / value and DN will 
 logs in root directory.
 {noformat}
 # Where log files are stored.  $HADOOP_HOME/logs by default.
 #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
 # Where log files are stored in the secure data environment.
 export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
 {noformat}
 I think we should comment this line.
 In hadoop-daemon.sh already handled case if value of HADOOP_SECURE_DN_LOG_DIR 
 and HADOOP_LOG_DIR is empty.
 In hadoop-daemon.sh we assigning value of HADOOP_SECURE_DN_LOG_DIR in 
 HADOOP_LOG_DIR and after that we are checking if HADOOP_LOG_DIR is empty then 
 HADOOP_LOG_DIR  env variable export with $HADOOP_PREFIX/logs value
 {noformat}
 # Determine if we're starting a secure datanode, and if so, redefine 
 appropriate variables
 if [ $command == datanode ]  [ $EUID -eq 0 ]  [ -n 
 $HADOOP_SECURE_DN_USER ]; then
   export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
   export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
   export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
   starting_secure_dn=true
 fi
 if [ $HADOOP_IDENT_STRING =  ]; then
   export HADOOP_IDENT_STRING=$USER
 fi
 # get log directory
 if [ $HADOOP_LOG_DIR =  ]; then
   export HADOOP_LOG_DIR=$HADOOP_PREFIX/logs
 fi 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7748) Separate ECN flags from the Status in the DataTransferPipelineAck


[ 
https://issues.apache.org/jira/browse/HDFS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357538#comment-14357538
 ] 

Vinod Kumar Vavilapalli commented on HDFS-7748:
---

[~wheat9], should 2.7 stop for this? If so, any plans of making progress here?

 Separate ECN flags from the Status in the DataTransferPipelineAck
 -

 Key: HDFS-7748
 URL: https://issues.apache.org/jira/browse/HDFS-7748
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Blocker

 Prior to the discussions on HDFS-7270, the old clients might fail to talk to 
 the newer server when ECN is turned on. This jira proposes to separate the 
 ECN flags in a separate protobuf field to make the ack compatible on both 
 versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics

2015-03-11 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357653#comment-14357653
 ] 

Hudson commented on HDFS-7491:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7305 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7305/])
HDFS-7491. Add incremental blockreport latency to DN metrics. Contributed by 
Ming Ma. (cnauroth: rev fb34f45727e63ea55377fe90241328025307d818)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java
* hadoop-common-project/hadoop-common/src/site/markdown/Metrics.md


 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357646#comment-14357646
 ] 

Sergey Shelukhin commented on HDFS-7878:


[~cmccabe] ping?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

2015-03-11 Thread Ryan Sasson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357561#comment-14357561
 ] 

Ryan Sasson commented on HDFS-5796:
---

[~asuresh], your ticket brings up an important point about the last patch. The 
way hadoop authentication filters consume signature secrets is revamped in 2.6+ 
with support for reading secrets from zookeeper. Because of this the last patch 
would be not be fully compatible, as it does not consume signature secrets the 
same way that hadoop authentication filters do.

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6576) Datanode log is generating at root directory in security mode


 [ 
https://issues.apache.org/jira/browse/HDFS-6576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-6576:
---
Component/s: scripts

 Datanode log is generating at root directory in security mode
 -

 Key: HDFS-6576
 URL: https://issues.apache.org/jira/browse/HDFS-6576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode, scripts
Affects Versions: 2.4.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
Priority: Minor
 Attachments: HDFS-6576.patch, HDFS-6576_1.patch


 In hadoop-env.sh script we are exporting HADOOP_SECURE_DN_LOG_DIR , but in 
 above line export statement for HADOOP_LOG_DIR is commented 
 If in user environment HADOOP_LOG_DIR is not exported then 
 HADOOP_SECURE_DN_LOG_DIR env variable will export with / value and DN will 
 logs in root directory.
 {noformat}
 # Where log files are stored.  $HADOOP_HOME/logs by default.
 #export HADOOP_LOG_DIR=${HADOOP_LOG_DIR}/$USER
 # Where log files are stored in the secure data environment.
 export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}
 {noformat}
 I think we should comment this line.
 In hadoop-daemon.sh already handled case if value of HADOOP_SECURE_DN_LOG_DIR 
 and HADOOP_LOG_DIR is empty.
 In hadoop-daemon.sh we assigning value of HADOOP_SECURE_DN_LOG_DIR in 
 HADOOP_LOG_DIR and after that we are checking if HADOOP_LOG_DIR is empty then 
 HADOOP_LOG_DIR  env variable export with $HADOOP_PREFIX/logs value
 {noformat}
 # Determine if we're starting a secure datanode, and if so, redefine 
 appropriate variables
 if [ $command == datanode ]  [ $EUID -eq 0 ]  [ -n 
 $HADOOP_SECURE_DN_USER ]; then
   export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR
   export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR
   export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER
   starting_secure_dn=true
 fi
 if [ $HADOOP_IDENT_STRING =  ]; then
   export HADOOP_IDENT_STRING=$USER
 fi
 # get log directory
 if [ $HADOOP_LOG_DIR =  ]; then
   export HADOOP_LOG_DIR=$HADOOP_PREFIX/logs
 fi 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Attachment: HDFS-7915.002.patch

I added the log message.

The test should now fail when DataXceiver is not modified.  Previously I was 
injecting the failure in a slightly wrong place.

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7491) Add incremental blockreport latency to DN metrics

2015-03-11 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7491:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 for the patch.  I committed this to trunk, branch-2 and branch-2.7.  Ming, 
thank you for the patch.  Xiaoyu, thank you for assistance with the code review.

The test failures in the last Jenkins run were unrelated.  {{TestFileTruncate}} 
is a known intermittent failure tracked elsewhere.  The other failures don't 
repro when I rerun them.

 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357635#comment-14357635
]

Hadoop QA commented on HDFS-6658:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12704013/New%20primative%20indexes.jpg
against trunk revision fb34f45.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9841//console

This message is automatically generated.

Namenode memory optimization - Block replicas list
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357652#comment-14357652
 ] 

Daryn Sharp commented on HDFS-7435:
---

Tests are passing for me.  The grumbling about edit log corruption should have 
nothing to do with this patch.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7491) Add incremental blockreport latency to DN metrics


[ 
https://issues.apache.org/jira/browse/HDFS-7491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357503#comment-14357503
 ] 

Hadoop QA commented on HDFS-7491:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703926/HDFS-7491-4.patch
  against trunk revision 30c428a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ipc.TestRPCWaitForProxy
  org.apache.hadoop.hdfs.qjournal.TestSecureNNWithQJM
  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9837//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9837//console

This message is automatically generated.

 Add incremental blockreport latency to DN metrics
 -

 Key: HDFS-7491
 URL: https://issues.apache.org/jira/browse/HDFS-7491
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Reporter: Ming Ma
Assignee: Ming Ma
Priority: Minor
 Attachments: HDFS-7491-2.patch, HDFS-7491-3.patch, HDFS-7491-4.patch, 
 HDFS-7491-branch-2.patch, HDFS-7491.patch


 In a busy cluster, IBR processing could be delayed due to NN FSNamesystem 
 lock and cause NN to throw NotReplicatedYetException to DFSClient and thus 
 increase the overall application latency.
 This will be taken care of when we address the NN FSNamesystem lock 
 contention issue.
 It is useful if we can provide IBR latency metrics from DN's point of view.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6964) NN fails to fix under replication leading to data loss


 [ 
https://issues.apache.org/jira/browse/HDFS-6964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-6964:
--
Target Version/s:   (was: 2.7.0)

Moving this out of 2.7 as I see no activity here. Please revert back if you 
disagree.

 NN fails to fix under replication leading to data loss
 --

 Key: HDFS-6964
 URL: https://issues.apache.org/jira/browse/HDFS-6964
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Priority: Blocker

 We've encountered lost blocks due to node failure even when there is ample 
 time to fix the under-replication.
 2 nodes were lost.  The 3rd node with the last remaining replicas averaged 1 
 copy block per heartbeat (3s) until ~7h later when that node was lost 
 resulting in over 50 lost blocks.  When the node was restarted and sent its 
 BR the NN immediately began fixing the replication.
 In another data loss event, over 150 blocks were lost due to node failure but 
 the timing of the node loss is not known so there may have been inadequate 
 time to fix the under-replication unlike the first case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-2745) unclear to users which command to use to access the filesystem


[ 
https://issues.apache.org/jira/browse/HDFS-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357599#comment-14357599
 ] 

Allen Wittenauer commented on HDFS-2745:



There has been a lot of changes since this JIRA was filed including a lot of 
finger memory around 'hdfs dfs'...

 unclear to users which command to use to access the filesystem
 --

 Key: HDFS-2745
 URL: https://issues.apache.org/jira/browse/HDFS-2745
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0, 1.2.0, 2.0.2-alpha
Reporter: Thomas Graves
Assignee: Andrew Wang
Priority: Critical
 Attachments: hdfs-2745-1.patch, hdfs-2745-2.patch


 Its unclear to users which command to use to access the filesystem. Need some 
 background and then we can fix accordingly. We have 3 choices:
 hadoop dfs - says its deprecated and to use hdfs.  If I run hdfs usage it 
 doesn't list any options like -ls in the usage, although there is an hdfs dfs 
 command
 hdfs dfs - not in the usage of hdfs. If we recommend it when running hadoop 
 dfs it should atleast be in the usage.
 hadoop fs - seems like one to use it appears generic for any filesystem.
 Any input on this what is the recommended way to do this?  Based on that we 
 can fix up the other issues. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HDFS-6658:
--
Attachment: New primative indexes.jpg
Old triplets.jpg

Excuse my bad whiteboard drawing skills. These pictures attempt to illustrate
the triplets vs the data structures. It shows a 3-block file with repl factor
2 that is stored on 2 nodes. I started trying to diagram a 3-repl factor
picture with proper block placement on multiple nodes but it was spaghetti for
the triplets. My whiteboard isn't that big.

Everything is a reference in the triplets pic. The new pic is based on
primitive indexes. The design I recently posted goes into more detail on the
indexing.

Namenode memory optimization - Block replicas list
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357661#comment-14357661
]

Hadoop QA commented on HDFS-6658:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12704013/New%20primative%20indexes.jpg
against trunk revision fb34f45.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9842//console

This message is automatically generated.

Namenode memory optimization - Block replicas list
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357666#comment-14357666
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for updating the patch, Daryn. The latest patch looks good to me, and I 
think the new capabilities field in NamespaceInfo is a good idea. Some minors:
# Need to fix javadoc for {{decodeBuffer}}, {{encode}}, and {{getBlocksBuffer}} 
in BlockListAsLongs.
# Here older I guess you mean the current version compared with a future 
version. Maybe we should make it more clear in the comment.
{code}
// reserve upper bits for future use. decoding masks off these bits to
// allow compatibility for older namenodes
private static long NUM_BYTES_MASK = (-1L)  (64 - 48);
private static long REPLICA_STATE_MASK = (-1L)  (64 - 4);
{code}
# Looks like the following code can be simplified and we actually do not need 
isSupported?
{code}
+Capability(boolean isSupported) {
+  int bits = ordinal() - 1;
+  mask = (bits  0) ? 0 : (1L  bits);
+  if (isSupported) {
+CAPABILITIES_SUPPORTED |= mask;
+  }
+}
{code}

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations

[
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated HDFS-7913:
---
Description:
The wrong variable is deprecated in hdfs-config.sh. It should be HDFS_LOG_DIR,
not HADOOP_HDFS_LOG_DIR. This is breaking backward compatibility.

It might be worthwhile to doublecheck the other dep's to make sure they are
correct as well.

Also, release notes for the deprecation jira should be updated to reflect this
change.

was:
The wrong variable is deprecated in hdfs-env.sh. It should be HDFS_LOG_DIR,
not HADOOP_HDFS_LOG_DIR. This is breaking backward compatibility.

It might be worthwhile to doublecheck the other dep's to make sure they are
correct as well.

Also, release notes for the deprecation jira should be updated to reflect this
change.

HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
--

Key: HDFS-7913
URL: https://issues.apache.org/jira/browse/HDFS-7913
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

The wrong variable is deprecated in hdfs-config.sh. It should be
HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR. This is breaking backward
compatibility.
It might be worthwhile to doublecheck the other dep's to make sure they are
correct as well.
Also, release notes for the deprecation jira should be updated to reflect
this change.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357383#comment-14357383
 ] 

Brahma Reddy Battula commented on HDFS-7913:


Here I got confused, which is from HADOOP-11460..did not seen HDFS_LOG_DIR..
   {code}
   # ...
   # this should get deprecated at some point.
-  HADOOP_LOG_DIR=${HADOOP_HDFS_LOG_DIR:-$HADOOP_LOG_DIR}
-  HADOOP_HDFS_LOG_DIR=${HADOOP_LOG_DIR}
-  
-  HADOOP_LOGFILE=${HADOOP_HDFS_LOGFILE:-$HADOOP_LOGFILE}
-  HADOOP_HDFS_LOGFILE=${HADOOP_LOGFILE}
-  
{code}


 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7748) Separate ECN flags from the Status in the DataTransferPipelineAck

2015-03-11 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357626#comment-14357626
 ] 

Haohui Mai commented on HDFS-7748:
--

Will take care of it in a day or two.

 Separate ECN flags from the Status in the DataTransferPipelineAck
 -

 Key: HDFS-7748
 URL: https://issues.apache.org/jira/browse/HDFS-7748
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Blocker

 Prior to the discussions on HDFS-7270, the old clients might fail to talk to 
 the newer server when ECN is turned on. This jira proposes to separate the 
 ECN flags in a separate protobuf field to make the ack compatible on both 
 versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7715) Implement the Hitchhiker erasure coding algorithm

2015-03-11 Thread Rashmi Vinayak (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rashmi Vinayak updated HDFS-7715:
-
Description:
[Hitchhiker |
http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a
new erasure coding algorithm developed as a research project at UC Berkeley. It
has been shown to reduce network traffic and disk I/O by 25%-45% during data
reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC
framework, as one of the pluggable codec algorithms.

The existing implementation is based on HDFS-RAID.

was:
[Hitchhiker |
http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a
new erasure coding algorithm developed as a research project at UC Berkeley. It
has been shown to reduce network traffic and disk I/O by 25% and 45% during
data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC
framework, as one of the pluggable codec algorithms.

The existing implementation is based on HDFS-RAID.

Implement the Hitchhiker erasure coding algorithm
-

Key: HDFS-7715
URL: https://issues.apache.org/jira/browse/HDFS-7715
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: jack liuquan

[Hitchhiker |
http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is
a new erasure coding algorithm developed as a research project at UC
Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45%
during data reconstruction. This JIRA aims to introduce Hitchhiker to the
HDFS-EC framework, as one of the pluggable codec algorithms.
The existing implementation is based on HDFS-RAID.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

[
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357683#comment-14357683
]

Daryn Sharp commented on HDFS-6658:
---

I've added many preconditions, not asserts, to avoid inconsistencies so it's
already self-checking in many cases (hopefully didn't ruin performance). Since
a storage iterator is based on a block, when removing a storage it will cross
check that the block id in the storage at the given offset actually matches the
iterator's block id. Same thing when replacing a value.

I was also thinking about adding a consistency checking thread, at least while
tests are running.

Namenode memory optimization - Block replicas list
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

2015-03-11 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357693#comment-14357693
 ] 

Chris Nauroth commented on HDFS-7915:
-

Thanks for the patch, Colin.  The change looks good.  In the test, is the 
{{Visitor}} indirection necessary, or would it be easier to add 2 
{{VisibleForTesting}} getters that return the segments and slots directly to 
the test code?

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357484#comment-14357484
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703930/HDFS-7435.patch
  against trunk revision 30c428a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager
org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9838//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9838//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7816) Unable to open webhdfs paths with +


[ 
https://issues.apache.org/jira/browse/HDFS-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357491#comment-14357491
 ] 

Vinod Kumar Vavilapalli commented on HDFS-7816:
---

Bump. [~daryn] / [~kihwal] / [~wheat9], can we get this going for 2.7? Tx.

 Unable to open webhdfs paths with +
 -

 Key: HDFS-7816
 URL: https://issues.apache.org/jira/browse/HDFS-7816
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Jason Lowe
Assignee: Kihwal Lee
Priority: Blocker
 Attachments: HDFS-7816.patch, HDFS-7816.patch


 webhdfs requests to open files with % characters in the filename fail because 
 the filename is not being decoded properly.  For example:
 $ hadoop fs -cat 'webhdfs://nn/user/somebody/abc%def'
 cat: File does not exist: /user/somebody/abc%25def



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357318#comment-14357318
 ] 

Allen Wittenauer commented on HDFS-7913:


Oh!

Wait a sec.

Let me update the description.  Now I see the confusion.  I meant 
hdfs-config.sh .  haha!

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

 The wrong variable is deprecated in hdfs-env.sh.  It should be HDFS_LOG_DIR, 
 not HADOOP_HDFS_LOG_DIR.  This is breaking backward compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7820) Client Write fails after rolling upgrade rollback with block_id already exist in finalized state

2015-03-11 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357386#comment-14357386
 ] 

Arpit Agarwal commented on HDFS-7820:
-

Increasing the epoch bits to 5 or 6 will permit successive rollbacks in a short 
time interval while avoiding collisions. This addresses all practical 
scenarios. 2^58 blocks is orders of magnitude more than you would hit in a real 
cluster so I am not concerned about ID space reduction.

 Client Write fails after rolling upgrade rollback with block_id already 
 exist in finalized state
 

 Key: HDFS-7820
 URL: https://issues.apache.org/jira/browse/HDFS-7820
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
 Attachments: HDFS-7820.1.patch


 Steps to Reproduce:
 ===
 Step 1:  Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
 Step 2:  Shutdown SNN and NN
 Step 3:  Start NN with the hdfs namenode -rollingUpgrade started option.
 Step 4:  Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT 
 upgrade and restarted Datanode
 Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
 blk_1073741832_1008,blk_1073741833_1009 )
 Step 6:  Shutdown both NN and DN
 Step 7:  Start NNs with the hdfs namenode -rollingUpgrade rollback option.
  Start DNs with the -rollback option.
 Step 8:  Write 2 files to hdfs.
 Issue:
 ===
 Client write failed with below exception
 {noformat}
 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src: 
 /XXX:48545 dest: /XXX:50010
 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
 opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008 
 received exception 
 org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
 BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in 
 state FINALIZED and thus cannot be created.
 {noformat}
 Observations:
 =
 1. At Namenode side block invalidate is been sent only to 2 blocks.
 {noformat}
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741833_1009 to XXX:50010
 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
 blk_1073741831_1007 to XXX:50010
 {noformat}
 2. fsck report does not show information on blk_1073741832_1008
 {noformat}
 FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23 
 16:17:57 CST 2015
 /File1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas 
 is 3 but found 1 replica(s).
 /File11:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas 
 is 3 but found 1 replica(s).
 /File2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas 
 is 3 but found 1 replica(s).
 /AfterRollback_2:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas 
 is 3 but found 1 replica(s).
 /Test1:  Under replicated 
 BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas 
 is 3 but found 1 replica(s).
 Status: HEALTHY
  Total size:31620 B
  Total dirs:7
  Total files:   6
  Total symlinks:0
  Total blocks (validated):  5 (avg. block size 6324 B)
  Minimally replicated blocks:   5 (100.0 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   5 (100.0 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.0
  Corrupt blocks:0
  Missing replicas:  10 (66.64 %)
  Number of data-nodes:  1
  Number of racks:   1
 FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357317#comment-14357317
 ] 

Allen Wittenauer commented on HDFS-7913:


We basically don't ship one because all the content is there in hadoop-env.sh.  
It's there for future considerations and if someone wants to create one and use 
it.  

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

 The wrong variable is deprecated in hdfs-env.sh.  It should be HDFS_LOG_DIR, 
 not HADOOP_HDFS_LOG_DIR.  This is breaking backward compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357356#comment-14357356
 ] 

Allen Wittenauer commented on HDFS-7913:


That's the problem:

HADOOP_HDFS_LOG_DIR was not supposed to be deprecated because it never existed 
in previous versions of hadoop.  HDFS_LOG_DIR was the one what did exist and 
it's supposed to be replaced with HADOOP_LOG_DIR in the new code.


 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


 [ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reassigned HDFS-7913:
--

Assignee: Allen Wittenauer  (was: Brahma Reddy Battula)

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Critical
 Attachments: HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7873) OIV webhdfs premature close channel issue

2015-03-11 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357471#comment-14357471
 ] 

Haohui Mai commented on HDFS-7873:
--

Can you please format the code with the compliance of the Hadoop coding style:

http://wiki.apache.org/hadoop/CodeReviewChecklist

I also don't understand why the unit test can effectively cover this issue. Can 
you explain?

 OIV webhdfs premature close channel issue
 -

 Key: HDFS-7873
 URL: https://issues.apache.org/jira/browse/HDFS-7873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.6.0, 2.5.2
Reporter: Benoit Perroud
Priority: Minor
 Attachments: HDFS-7873-v1.txt, HDFS-7873-v2.txt


 The new Offline Image Viewer (OIV) supports to load the FSImage and _emulate_ 
 a webhdfs server to explore the image without touching the NN.
 This webhdfs server is not working with folders holding a significant number 
 of children (files or other folders):
 {quote}
 $  hadoop fs -ls webhdfs://127.0.0.1:5978/a/big/folder
 15/03/03 04:28:19 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 15/03/03 04:28:21 WARN security.UserGroupInformation: 
 PriviledgedActionException as:bperroud (auth:SIMPLE) 
 cause:java.io.IOException: Response decoding failure: 
 java.lang.IllegalStateException: Expected one of '}'
 ls: Response decoding failure: java.lang.IllegalStateException: Expected one 
 of '}'
 {quote}
 The error comes from an inappropriate usage of Netty. 
 {{e.getFuture().addListener(ChannelFutureListener.CLOSE)}} is closing the 
 channel too early because the future attached to the channel already sent the 
 header so the I/O operation succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

2015-03-11 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357324#comment-14357324
 ] 

Yongjun Zhang commented on HDFS-7915:
-

Hi [~cmccabe],

Thanks for reporting the issue and the solution. The patch looks good in 
general. I have couple of comments:

1. Can we add a log message when doing unregisterSlot below to state that slot 
x is unregistered due to ...? I think this will help future debugging of 
similar issue.
{code}
 if ((!success)  (registeredSlotId != null)) {
datanode.shortCircuitRegistry.unregisterSlot(registeredSlotId);
  }
{code}

2. I applied your patch, and reverted DataXceiver, ran the test, expecting it 
to fail, but it did not. I wonder if I missed anything.

Thanks.


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5823) Document async audit logging

2015-03-11 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357338#comment-14357338
 ] 

Benoy Antony commented on HDFS-5823:


[~daryn]

Can't I achieve the same by configuring an AsyncAppender as below :
Do you see any downside in doing this ?

{code}
appender class=org.apache.log4j.AsyncAppender name=ASYNC_HDFS_AUDIT
 param name=BufferSize value=500/
 appender-ref ref=HDFS_AUDIT/
 /appender
logger additivity=false 
name=org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit
level value=info/
appender-ref ref=ASYNC_HDFS_AUDIT/
/logger
{code}

 Document async audit logging
 

 Key: HDFS-5823
 URL: https://issues.apache.org/jira/browse/HDFS-5823
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

 HDFS-5241 added an option for async log4j audit logging.  The option is 
 considered semi-experimental and should be documented in hdfs-defaults.xml 
 after it's stability under stress is proven.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7820) Client Write fails after rolling upgrade rollback with block_id already exist in finalized state

2015-03-11 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357386#comment-14357386
]

Arpit Agarwal edited comment on HDFS-7820 at 3/11/15 6:53 PM:
--

Increasing the epoch bits to 5 or 6 will permit successive rollbacks in a short
time interval while avoiding collisions. This addresses all practical
scenarios. 2^58 blocks is orders of magnitude more than you would hit in a real
cluster so I am not concerned about ID space reduction.

bq. What if blocks are scheduled for deletion on the first report itself only
in case of RollBack?
This behavior was added before my time but I think deferring deletions speeds
up cluster startup e.g. after upgrades. DN startup time is already an issue in
large clusters.

was (Author: arpitagarwal):
Increasing the epoch bits to 5 or 6 will permit successive rollbacks in a short
time interval while avoiding collisions. This addresses all practical
scenarios. 2^58 blocks is orders of magnitude more than you would hit in a real
cluster so I am not concerned about ID space reduction.

Client Write fails after rolling upgrade rollback with block_id already
exist in finalized state

Key: HDFS-7820
URL: https://issues.apache.org/jira/browse/HDFS-7820
Project: Hadoop HDFS
Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
Attachments: HDFS-7820.1.patch

Steps to Reproduce:
===
Step 1: Prepare rolling upgrade using hdfs dfsadmin -rollingUpgrade prepare
Step 2: Shutdown SNN and NN
Step 3: Start NN with the hdfs namenode -rollingUpgrade started option.
Step 4: Executed hdfs dfsadmin -shutdownDatanode DATANODE_HOST:IPC_PORT
upgrade and restarted Datanode
Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007,
blk_1073741832_1008,blk_1073741833_1009 )
Step 6: Shutdown both NN and DN
Step 7: Start NNs with the hdfs namenode -rollingUpgrade rollback option.
Start DNs with the -rollback option.
Step 8: Write 2 files to hdfs.
Issue:
===
Client write failed with below exception
{noformat}
2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Receiving BP-1837556285-XXX-1423130389269:blk_1073741832_1008 src:
/XXX:48545 dest: /XXX:50010
2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
opWriteBlock BP-1837556285-XXX-1423130389269:blk_1073741832_1008
received exception
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
BP-1837556285-XXX-1423130389269:blk_1073741832_1008 already exists in
state FINALIZED and thus cannot be created.
{noformat}
Observations:
=
1. At Namenode side block invalidate is been sent only to 2 blocks.
{noformat}
15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
blk_1073741833_1009 to XXX:50010
15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
blk_1073741831_1007 to XXX:50010
{noformat}
2. fsck report does not show information on blk_1073741832_1008
{noformat}
FSCK started by Rex (auth:SIMPLE) from /XXX for path / at Mon Feb 23
16:17:57 CST 2015
/File1: Under replicated
BP-1837556285-XXX-1423130389269:blk_1073741825_1001. Target Replicas
is 3 but found 1 replica(s).
/File11: Under replicated
BP-1837556285-XXX-1423130389269:blk_1073741827_1003. Target Replicas
is 3 but found 1 replica(s).
/File2: Under replicated
BP-1837556285-XXX-1423130389269:blk_1073741826_1002. Target Replicas
is 3 but found 1 replica(s).
/AfterRollback_2: Under replicated
BP-1837556285-XXX-1423130389269:blk_1073741831_1007. Target Replicas
is 3 but found 1 replica(s).
/Test1: Under replicated
BP-1837556285-XXX-1423130389269:blk_1073741828_1004. Target Replicas
is 3 but found 1 replica(s).
Status: HEALTHY
Total size:31620 B
Total dirs:7
Total files: 6
Total symlinks:0
Total blocks (validated): 5 (avg. block size 6324 B)
Minimally replicated blocks: 5 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks: 5 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 1.0
Corrupt blocks:0
Missing replicas: 10 (66.64 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
{noformat}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357309#comment-14357309
 ] 

Brahma Reddy Battula commented on HDFS-7913:


Yes,, [~aw] can you please update..? Did not seen hdfs-env.sh...

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

 The wrong variable is deprecated in hdfs-env.sh.  It should be HDFS_LOG_DIR, 
 not HADOOP_HDFS_LOG_DIR.  This is breaking backward compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


 [ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-7913:
---
Attachment: HDFS-7913.patch

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


 [ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-7913 started by Brahma Reddy Battula.
--
 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7587) Edit log corruption can happen if append fails with a quota violation


[ 
https://issues.apache.org/jira/browse/HDFS-7587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357500#comment-14357500
 ] 

Vinod Kumar Vavilapalli commented on HDFS-7587:
---

[~kihwal] / [~daryn] / [~szetszwo], is this still a blocker for 2.7? Can 
progress be made in the next few days? I plan to cut an RC end of this week. 
Please update. Tx.

 Edit log corruption can happen if append fails with a quota violation
 -

 Key: HDFS-7587
 URL: https://issues.apache.org/jira/browse/HDFS-7587
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Kihwal Lee
Assignee: Daryn Sharp
Priority: Blocker
 Attachments: HDFS-7587.patch


 We have seen a standby namenode crashing due to edit log corruption. It was 
 complaining that {{OP_CLOSE}} cannot be applied because the file is not 
 under-construction.
 When a client was trying to append to the file, the remaining space quota was 
 very small. This caused a failure in {{prepareFileForWrite()}}, but after the 
 inode was already converted for writing and a lease added. Since these were 
 not undone when the quota violation was detected, the file was left in 
 under-construction with an active lease without edit logging {{OP_ADD}}.
 A subsequent {{append()}} eventually caused a lease recovery after the soft 
 limit period. This resulted in {{commitBlockSynchronization()}}, which closed 
 the file with {{OP_CLOSE}} being logged.  Since there was no corresponding 
 {{OP_ADD}}, edit replaying could not apply this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7847) Modify NNThroughputBenchmark to be able to operate on a remote NameNode

2015-03-11 Thread Charles Lamb (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Charles Lamb updated HDFS-7847:
---
Attachment: HDFS-7847.001.patch

@cmccabe, @stack, thanks for the review!

bq. DFSClient.java: this change adds three new fields to DFSClient. But they
only seem to be used by unit tests. It seems like we should just put these
inside the unit test(s) that are using these-- if necessary, by adding a helper
method. There's no reason to add more fields to DFSClient. Also remember that
when using FileContext, we create new DFSClients all the time.

Good point. I've left the existing {code}ClientProtocol namenode{code} field
alone. The other 3 proxies are created on-demand by their getters. That means
no change in DFSClient instance size.

bq. It seems kind of odd to have NameNodeProxies#createProxy create a proxy to
the datanode.

It's actually a proxy to the NN for the DatanodeProtocol. That's the same
protocol that the DN uses to speak with the NN when it's sending (among other
things) block reports.

bq. In general, when you see NameNodeProxies I think proxies used by the
NameNode and this doesn't fit with that.

These are actually proxies used to talk to the NN, not proxies used by the NN.
I didn't make the name.

bq. Can you give a little more context about why this is a good idea (as
opposed to just having some custom code in the unit test or in a unit test util
class that creates a proxy)

While the name DatanodeProtocol makes us think of an RPC protocol to the
datanode, it is in fact yet another one of the many protocols to the namenode
which is embodied in the NamenodeProtocols (plural) omnibus interface. The
problem this is addressing is that when we are talking to an in-process NN in
the NNThroughputBenchmark, then it's easy to get our hands on a
NamenodeProtocols instance -- you simply call NameNode.getRpcServer(). However,
the idea of this patch is to let you run the benchmark against a non-in-process
NN, so there's no NameNode instance to use. That means we have to create RPC
proxy objects for each of the NN protocols that we need to use.

It would be nice if we could create a single proxy for the omnibus
NamenodeProtocols interface, but we can't. Instead, we have to pick and choose
the different namenode protocols that we want to use -- ClientProtocol,
NamenodeProtocol, RefreshUserMappingProtocol, and DatanodeProtocol -- and
create proxies for them. Code to create proxies for the first three of these
already existed in NameNodeProxies.java, but we have to add a few new lines to
create the DatanodeProtocol proxy.

@stack I looked into your (offline) suggestion to try calling through the
TinyDatanode, but it's just doing the same thing that my patch does -- it uses
the same ClientProtocol instance that the rest of the test uses. TinyDataNode
is really just a skeleton and doesn't really borrow much code from the real DN.

bq. Of course the NameNode may or may not be remote here. It seems like --nnuri
or just --namenode or something like that would be more descriptive.

Yeah, I agree. I changed it to -namenode.

bq. Instead of this boilerplate, just use StringUtils#popOptionWithArgument.

Changed. I was just trying to match the existing code, but the using
StringUtils is for the better.

{code}
- replication, BLOCK_SIZE, null);
+ replication, BLOCK_SIZE, CryptoProtocolVersion.supported());
{code}

bq. This fix is a little bit separate, right? I suppose we can do it in this
JIRA, though.

Without this, the relevant PBHelper.convert code throws NPE on the
supportVersions arg.

Modify NNThroughputBenchmark to be able to operate on a remote NameNode
---

Key: HDFS-7847
URL: https://issues.apache.org/jira/browse/HDFS-7847
Project: Hadoop HDFS
Issue Type: Sub-task
Affects Versions: HDFS-7836
Reporter: Colin Patrick McCabe
Assignee: Charles Lamb
Attachments: HDFS-7847.000.patch, HDFS-7847.001.patch,
make_blocks.tar.gz

Modify NNThroughputBenchmark to be able to operate on a NN that is not in
process. A followon Jira will modify it some more to allow quantifying native
and java heap sizes, and some latency numbers.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357864#comment-14357864
 ] 

Brahma Reddy Battula commented on HDFS-7913:


should we change HADOOP_MAPRED to MAPRED, for following..? If yes, will give 
patch... Correct me If I am wrong.. 

{code}
  hadoop_deprecate_envvar HADOOP_MAPRED_LOG_DIR HADOOP_LOG_DIR

  hadoop_deprecate_envvar HADOOP_MAPRED_LOGFILE HADOOP_LOGFILE
  
  hadoop_deprecate_envvar HADOOP_MAPRED_NICENESS HADOOP_NICENESS
  
  hadoop_deprecate_envvar HADOOP_MAPRED_STOP_TIMEOUT HADOOP_STOP_TIMEOUT
  
  hadoop_deprecate_envvar HADOOP_MAPRED_PID_DIR HADOOP_PID_DIR

  hadoop_deprecate_envvar HADOOP_MAPRED_ROOT_LOGGER HADOOP_ROOT_LOGGER
{code}

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Critical
 Attachments: HDFS-7913-01.patch, HDFS-7913.patch


 The wrong variable is deprecated in hdfs-config.sh.  It should be 
 HDFS_LOG_DIR, not HADOOP_HDFS_LOG_DIR.  This is breaking backward 
 compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.


[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357758#comment-14357758
 ] 

Hadoop QA commented on HDFS-7722:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703499/HDFS-7722.003.patch
  against trunk revision 7a346bc.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9844//console

This message is automatically generated.

 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch, 
 HDFS-7722.002.patch, HDFS-7722.003.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-11 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357784#comment-14357784
]

Andrew Wang commented on HDFS-7285:
---

I think we could pack all of this into a single xattr (i.e.
{{system.storagePolicy}} as a protobuf. This will be more efficient, and also
standardize the serde since xattrs values are just bytes.

We could also leave the storage type in the file header the way, since that's
zero overhead, and just store the additional parameters into the xattr.

Erasure Coding Support inside HDFS
--

Key: HDFS-7285
URL: https://issues.apache.org/jira/browse/HDFS-7285
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Weihua Jiang
Assignee: Zhe Zhang
Attachments: ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch,
HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf,
HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf,
fsimage-analysis-20150105.pdf

Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice
of data reliability, comparing to the existing HDFS 3-replica approach. For
example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks,
with storage overhead only being 40%. This makes EC a quite attractive
alternative for big data storage, particularly for cold data.
Facebook had a related open source project called HDFS-RAID. It used to be
one of the contribute packages in HDFS but had been removed since Hadoop 2.0
for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends
on MapReduce to do encoding and decoding tasks; 2) it can only be used for
cold files that are intended not to be appended anymore; 3) the pure Java EC
coding implementation is extremely slow in practical use. Due to these, it
might not be a good idea to just bring HDFS-RAID back.
We (Intel and Cloudera) are working on a design to build EC into HDFS that
gets rid of any external dependencies, makes it self-contained and
independently maintained. This design lays the EC feature on the storage type
support and considers compatible with existing HDFS features like caching,
snapshot, encryption, high availability and etc. This design will also
support different EC coding schemes, implementations and policies for
different deployment scenarios. By utilizing advanced libraries (e.g. Intel
ISA-L library), an implementation can greatly improve the performance of EC
encoding/decoding and makes the EC solution even more attractive. We will
post the design document soon.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

2015-03-11 Thread Kai Zheng (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357797#comment-14357797
]

Kai Zheng commented on HDFS-7285:
-

Thanks [~zhz] for the great post.
This documents existing relevant discussions well and gives a good proposal
summary that unifies storage policy so EC and stripping can fit in in a much
clean and elegant way. I would explicitly point out that in this way we would
not use EC ZONE as previously discussed here and in other issues. We don't need
to explicitly create and manage EC ZONEs. What is needed now is all about a
storage policy for a file or folder. Once we all agree in this approach, we
need to update the overall design here and rebase relevant issues as well. It
would be great if we could gather as more as possible feedback and ideas this
time.

Erasure Coding Support inside HDFS
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357754#comment-14357754
]

Zhe Zhang commented on HDFS-7285:
-

We have been discussing how to fit EC with other storage policies since the
first [meetup |
https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14192480page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14192480]
and haven't reached a clear conclusion. This design is now blocking several
ongoing JIRAs: HDFS-7068, HDFS-7349, HDFS-7839, HDFS-7866. I'd like to propose
the following potential solution based on the ideas we have exchanged:

To reiterate the challenge: Multiple dimensions of storage policies could be
applied to the same file. Across these dimensions we could have a large number
of combinations -- easily over 50, could be over 100. Fitting them in a single
dimension policy space is inefficient for the system to manage and inconvenient
for admins to set / get.
* Storage-type preference: HOT / WARM / COLD
* Erasure coding schema: ReedSolomon-6-3 / XOR-2-1 (targeting 5~10)
* Block layout: Striping / contiguous
* Other potential policies, e.g. compression

We can setup a family of storage policy XAttrs, where each dimension can be
independently set / get:
* {{system.hdfs.storagePolicy.type}}
* {{system.hdfs.storagePolicy.erasurecoding}}
* {{system.hdfs.storagePolicy.layout}}

Each dimension has a default value. So if an admin only wants to change the EC
schema, the following command can be used. The {{getStoragePolicy}} should
return policies on all dimensions unless an optional argument like
{{-erasureCoding}} is used.
{code}
setStoragePolicy -erasureCoding RS63 /home/zhezhang/foo
getStoragePolicy /home/zhezhang/foo
{code}

Like the current storage policy semantics, the initial policy of a file or dir
is inherited from its parent. Nested policy setting is allowed (/home is not
ECed but /home/zhezhang is). A single file can have a storage policy without
being in a zone.

Any feedbacks are very welcome. [~jingzhao], [~szetszwo], I think we should
have another meetup to sync on this (and several other issues)?

Erasure Coding Support inside HDFS
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7917) Use file to replace data dirs in test to simulate a disk failure.

2015-03-11 Thread Lei (Eddy) Xu (JIRA)

Lei (Eddy) Xu created HDFS-7917:
---

 Summary: Use file to replace data dirs in test to simulate a disk 
failure. 
 Key: HDFS-7917
 URL: https://issues.apache.org/jira/browse/HDFS-7917
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
Priority: Minor


Currently, in several tests, e.g., {{TestDataNodeVolumeFailureXXX}} and 
{{TestDataNotHowSwapVolumes}},  we simulate a disk failure by setting a 
directory's executable permission as false. However, it raises the risk that if 
the cleanup code could not be executed, the directory can not be easily removed 
by Jenkins job. 

Since in {{DiskChecker#checkDirAccess}}:

{code}
private static void checkDirAccess(File dir) throws DiskErrorException {
if (!dir.isDirectory()) {
  throw new DiskErrorException(Not a directory: 
   + dir.toString());
}

checkAccessByFileMethods(dir);
  }
{code}

We can replace the DN data directory as a file to achieve the same fault 
injection goal, while it is safer for cleaning up in any circumstance. 
Additionally, as [~cnauroth] suggested: 

bq. That might even let us enable some of these tests that are skipped on 
Windows, because Windows allows access for the owner even after permissions 
have been stripped.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-2842) fuse-dfs fails to build from source on recent versions of Ubuntu


 [ 
https://issues.apache.org/jira/browse/HDFS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-2842:
---
Resolution: Unresolved
Status: Resolved  (was: Patch Available)

stale

 fuse-dfs fails to build from source on recent versions of Ubuntu
 

 Key: HDFS-2842
 URL: https://issues.apache.org/jira/browse/HDFS-2842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs
Affects Versions: 0.20.205.0, 0.23.0, 1.0.0
 Environment: Ubuntu Precise, Ubuntu 11.10 (Oneiric)
Reporter: James Page
 Attachments: HDFS-2842-branch-1.0.patch, HDFS-2842.patch


 I hit this issue when trying to compile fuse-dfs for the current development 
 version of Ubuntu;  the link ordering is important in more recent versions of 
 Ubuntu due to the default use of -Wl,--as-needed.  As a result the linker 
 gets confused.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7261) storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState()


[ 
https://issues.apache.org/jira/browse/HDFS-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357875#comment-14357875
 ] 

Brahma Reddy Battula commented on HDFS-7261:


[~cmccabe] Kindly review the patch and can you please kick jenkin once..?

 storageMap is accessed without synchronization in 
 DatanodeDescriptor#updateHeartbeatState()
 ---

 Key: HDFS-7261
 URL: https://issues.apache.org/jira/browse/HDFS-7261
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Brahma Reddy Battula
 Attachments: HDFS-7261-001.patch, HDFS-7261.patch


 Here is the code:
 {code}
   failedStorageInfos = new HashSetDatanodeStorageInfo(
   storageMap.values());
 {code}
 In other places, the lock on DatanodeDescriptor.storageMap is held:
 {code}
 synchronized (storageMap) {
   final CollectionDatanodeStorageInfo storages = storageMap.values();
   return storages.toArray(new DatanodeStorageInfo[storages.size()]);
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7838) Expose truncate API for libhdfs

2015-03-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357879#comment-14357879
 ] 

Yi Liu commented on HDFS-7838:
--

{quote}
+1 once those are addressed.
{quote}

Hi, [~cmccabe], will you take another look at the latest patch?

 Expose truncate API for libhdfs
 ---

 Key: HDFS-7838
 URL: https://issues.apache.org/jira/browse/HDFS-7838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-7838.001.patch, HDFS-7838.002.patch, 
 HDFS-7838.003.patch


 It's good to expose truncate in libhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357719#comment-14357719
 ] 

Hadoop QA commented on HDFS-7435:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704033/HDFS-7435.patch
  against trunk revision 7a346bc.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9843//console

This message is automatically generated.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient


[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357734#comment-14357734
 ] 

Jing Zhao commented on HDFS-7435:
-

nit-pick:
# s/blocksBuffer/blocksBuf/ in the javadoc
{code}
  /**
   * Prepare an instance to in-place decode the given ByteString buffer
   * @param numBlocks - blocks in the buffer
   * @param blocksBuffer - ByteString encoded varints
   * @return BlockListAsLongs
   */
  public static BlockListAsLongs decodeBuffer(final int numBlocks,
  final ByteString blocksBuf) {
return new BufferDecoder(numBlocks, blocksBuf);
  }
{code}

+1 after fixing it.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7068) Support multiple block placement policies

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhe Zhang reassigned HDFS-7068:
---

Assignee: Zhe Zhang (was: Walter Su)

Support multiple block placement policies
-

Key: HDFS-7068
URL: https://issues.apache.org/jira/browse/HDFS-7068
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: 2.5.1
Reporter: Zesheng Wu
Assignee: Zhe Zhang
Attachments: HDFS-7068.patch

According to the code, the current implement of HDFS only supports one
specific type of block placement policy, which is BlockPlacementPolicyDefault
by default.
The default policy is enough for most of the circumstances, but under some
special circumstances, it works not so well.
For example, on a shared cluster, we want to erasure encode all the files
under some specified directories. So the files under these directories need
to use a new placement policy.
But at the same time, other files still use the default placement policy.
Here we need to support multiple placement policies for the HDFS.
One plain thought is that, the default placement policy is still configured
as the default. On the other hand, HDFS can let user specify customized
placement policy through the extended attributes(xattr). When the HDFS choose
the replica targets, it firstly check the customized placement policy, if not
specified, it fallbacks to the default one.
Any thoughts?

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7068) Support multiple block placement policies

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhe Zhang reassigned HDFS-7068:
---

Assignee: Walter Su (was: Zhe Zhang)

Sorry, I clicked wrong button

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357821#comment-14357821
]

Zhe Zhang commented on HDFS-7285:
-

[~andrew.wang] Thanks for the comment. I forgot to include those from our
latest discussion. Yes, leaving the HSM policies as-is will work with this
proposal and will just add some logic to combine data from XAttr and file
header.

[~drankye] Good point. The semantics of EC configurations more closely resemble
storage policies than _zones_. Like mentioned above, an EC policy can exist for
a single file, and can be configured in a nesting manner.

Erasure Coding Support inside HDFS
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS

[
https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357832#comment-14357832
]

Jing Zhao commented on HDFS-7285:
-

Thanks for the summary, Zhe!

One question for the EC policy or EC ZONE is that whether we allow users to
change the policy/schema of a file/dir. Currently for storage policy like COLD,
WARM, and HOT, users can change a file/directory's policy and this change will
be applied to new created/appended data, and can be enforced on existing files
later by external tools like Mover. This lazy enforcement semantic also applies
to renamed files.

However, for EC, since whether a file is EC'ed and its EC schema directly
determines its read/write/append pattern, things become different and more
complicated. If we allow changing the EC schema associated with a directory, we
need to make sure for all the files inside its old EC schema can be found,
which means we may need to associate the schema directly on the files or even
blocks (which can be inefficient). And then how to handle new appended data and
file rename also becomes challenge. If we disallow the schema changing or
renaming across directories with different EC policies, in the end we may have
a design like EC ZONE.

Erasure Coding Support inside HDFS
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357845#comment-14357845
]

Jing Zhao commented on HDFS-7068:
-

I'm also +1 for #1.

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7894) Rolling upgrade readiness is not updated in jmx until query command is issued.


[ 
https://issues.apache.org/jira/browse/HDFS-7894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357870#comment-14357870
 ] 

Brahma Reddy Battula commented on HDFS-7894:


sure, I will update soon..thanks!!

 Rolling upgrade readiness is not updated in jmx until query command is issued.
 --

 Key: HDFS-7894
 URL: https://issues.apache.org/jira/browse/HDFS-7894
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Brahma Reddy Battula
Priority: Critical
 Attachments: HDFS-7894-002.patch, HDFS-7894.patch


 When a hdfs rolling upgrade is started and a rollback image is 
 created/uploaded, the active NN does not update its {{rollingUpgradeInfo}} 
 until it receives a query command via RPC. This results in inconsistent info 
 being showing up in the web UI and its jmx page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-7838) Expose truncate API for libhdfs

2015-03-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357879#comment-14357879
 ] 

Yi Liu edited comment on HDFS-7838 at 3/12/15 12:40 AM:


{quote}
+1 once those are addressed.
{quote}

Hi, [~cmccabe], will you take another look at the latest patch or I can commit 
it? Thanks.


was (Author: hitliuyi):
{quote}
+1 once those are addressed.
{quote}

Hi, [~cmccabe], will you take another look at the latest patch?

 Expose truncate API for libhdfs
 ---

 Key: HDFS-7838
 URL: https://issues.apache.org/jira/browse/HDFS-7838
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Yi Liu
 Attachments: HDFS-7838.001.patch, HDFS-7838.002.patch, 
 HDFS-7838.003.patch


 It's good to expose truncate in libhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357899#comment-14357899
 ] 

Hadoop QA commented on HDFS-7915:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704011/HDFS-7915.002.patch
  against trunk revision 344d7cb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9840//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9840//console

This message is automatically generated.

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7880) Remove the tests for legacy Web UI in branch-2

2015-03-11 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357925#comment-14357925
 ] 

Akira AJISAKA commented on HDFS-7880:
-

bq. will I raise another issue for this removal of classes..?
Yeah, let's do it in a separate jira.

 Remove the tests for legacy Web UI in branch-2
 --

 Key: HDFS-7880
 URL: https://issues.apache.org/jira/browse/HDFS-7880
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
Priority: Blocker
 Attachments: HDFS-7880-002.patch, HDFS-7880.patch


 These tests fails in branch-2 because the test assert that legacy UI exists.
 * TestJournalNode.testHttpServer:174 expected:200 but was:404
 * TestNNWithQJM.testWebPageHasQjmInfo:229 expected:200 but was:404
 * TestHAWebUI.testLinkAndClusterSummary:50 expected:200 but was:404
 * TestHostsFiles.testHostsExcludeDfshealthJsp:130 expected:200 but was:404
 * TestSecondaryWebUi.testSecondaryWebUiJsp:87 expected:200 but was:404



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient


 [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-7435:
--
Attachment: HDFS-7435.patch

Yes, older is with relation to the future NN.  I tried to clarify it in this 
patch.  Updated the javadocs.  The isSupported is intended for being able to 
flip off unsupported/deprecated capabilities so they aren't advertised to the 
DN.

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
 HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error

2015-03-11 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357747#comment-14357747
 ] 

Yongjun Zhang commented on HDFS-7915:
-

Hi [~cmccabe],

Thanks for the updated patch, possible to even include what exception caused 
the failure and thus need to unregister the slot?
I will try to look more about the test shortly.



 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch, HDFS-7915.002.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7369) Erasure coding: distribute block recovery work to DataNode


[ 
https://issues.apache.org/jira/browse/HDFS-7369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357774#comment-14357774
 ] 

Jing Zhao commented on HDFS-7369:
-

Thanks for working on this, Zhe! The patch looks good to me. Some comments and 
thoughts:
# For a striped block, I think it will be better to use BlockInfo(Striped), 
instead of its individual blocks, as the basic unit for recovery. E.g., suppose 
we lose 2 blocks for a 6+3 EC block. For recovery, I guess we want these two 
blocks are recovered in a single recovery work instead of 2.
# As you mentioned in HDFS-7912, {{BlockManager}} and {{ReplicationMonitor}} 
never see individual data/parity blocks currently. But it may be better to have 
a more strict type restriction in {{UnderReplicatedBlocks}}, 
{{ReplicationWork}}, {{ErasureCodingWork}}, and 
{{computeRecoveryWorkForBlocks}}'s parameter.
# Also because of #1, we may want to define a list of target DNs in 
{{BlockCodecInfo}}?

 Erasure coding: distribute block recovery work to DataNode
 --

 Key: HDFS-7369
 URL: https://issues.apache.org/jira/browse/HDFS-7369
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-7369-000-part1.patch, HDFS-7369-000-part2.patch, 
 HDFS-7369-001.patch


 This JIRA updates NameNode to handle background / offline recovery of erasure 
 coded blocks. It includes 2 parts:
 # Extend {{UnderReplicatedBlocks}} to recognize EC blocks and insert them to 
 appropriate priority levels. 
 # Update {{ReplicationMonitor}} to distinguish block codec tasks and send a 
 new DataNode command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.


[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1435#comment-1435
 ] 

Hadoop QA commented on HDFS-7722:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12704044/HDFS-7722.004.patch
  against trunk revision 7a346bc.

{color:red}-1 patch{color}.  Trunk compilation may be broken.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9845//console

This message is automatically generated.

 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch, 
 HDFS-7722.002.patch, HDFS-7722.003.patch, HDFS-7722.004.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7068) Support multiple block placement policies

[
https://issues.apache.org/jira/browse/HDFS-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357806#comment-14357806
]

Zhe Zhang commented on HDFS-7068:
-

Very good thoughts [~walter.k.su]! And thanks Kai for the helpful comments.

I think option #1 is the lightest in terms of dev effort. Assuming _all EC
files use a single placement policy_, that should work for us. Right now I
don't see a need for multiple EC placement policies. The basic logic is just to
spread across as many racks as possible based on m and k. So maybe we should
start with implementing option #1.

If we all agree with this option, then I imagine the change should look like
HDFS-3601.

Support multiple block placement policies
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7886) TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes

2015-03-11 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357877#comment-14357877
 ] 

Yi Liu commented on HDFS-7886:
--

Yes Konst, I agree with you, thanks.

 TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
 

 Key: HDFS-7886
 URL: https://issues.apache.org/jira/browse/HDFS-7886
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Plamen Jeliazkov
Priority: Minor
 Attachments: HDFS-7886.patch


 https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7864) Erasure Coding: Update safemode calculation for striped blocks

2015-03-11 Thread GAO Rui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-7864:
--
Attachment: HDFS-7864.2.patch

safemode calculation has been modified for stripedBlock by checking wether the 
storedBlock is a stripedBlock by using (instanceof BlockInfoStriped)

 Erasure Coding: Update safemode calculation for striped blocks
 --

 Key: HDFS-7864
 URL: https://issues.apache.org/jira/browse/HDFS-7864
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: GAO Rui
 Attachments: HDFS-7864.1.patch, HDFS-7864.2.patch


 We need to update the safemode calculation for striped blocks. Specifically, 
 each striped block now consists of multiple data/parity blocks stored in 
 corresponding DataNodes. The current code's calculation is thus inconsistent: 
 each striped block is only counted as 1 expected block, while each of its 
 member block may increase the number of received blocks by 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()

[
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356413#comment-14356413
]

Rakesh R commented on HDFS-5356:

Thanks [~cmccabe] for the explanation and helps to understand it more.
bq.I would be OK with the change you have now if you checked whether the FS was
in the fileSystems list prior to adding it. Currently it appears like we may
call close twice on the same fs if MiniDFSCluster#getFileSystemInstance was
called more than once.
Yeah, thats good point. I think making the datastructure to {{java.util.Set}}
instead of {{java.util.List}} would be sufficient to avoid the duplicates.
Could you please have a look at the new patch when you get some time.

MiniDFSCluster shoud close all open FileSystems when shutdown()
---

Key: HDFS-5356
URL: https://issues.apache.org/jira/browse/HDFS-5356
Project: Hadoop HDFS
Issue Type: Bug
Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch,
HDFS-5356-4.patch, HDFS-5356.patch

After add some metrics functions to DFSClient, I found that some unit tests
relates to metrics are failed. Because MiniDFSCluster are never close open
FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The
metrics of DFSClients in DefaultMetricsSystem are still exist and this make
other unit tests failed.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7864) Erasure Coding: Update safemode calculation for striped blocks

2015-03-11 Thread GAO Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356421#comment-14356421
 ] 

GAO Rui commented on HDFS-7864:
---

[~jingzhao]
Thanks for your comment, I have update a new patch. All the striped blocks is a 
instance of BlockInfoStriped, so I added the logic into the current 
{{incrementSafeBlockCount}}. I used {{numNodes}} to judge wether the striped 
Block is safe. Hope it works.

Please review this new patch when you are free, thank you very much. 

 Erasure Coding: Update safemode calculation for striped blocks
 --

 Key: HDFS-7864
 URL: https://issues.apache.org/jira/browse/HDFS-7864
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: GAO Rui
 Attachments: HDFS-7864.1.patch, HDFS-7864.2.patch


 We need to update the safemode calculation for striped blocks. Specifically, 
 each striped block now consists of multiple data/parity blocks stored in 
 corresponding DataNodes. The current code's calculation is thus inconsistent: 
 each striped block is only counted as 1 expected block, while each of its 
 member block may increase the number of received blocks by 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7864) Erasure Coding: Update safemode calculation for striped blocks

2015-03-11 Thread GAO Rui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-7864:
--
Status: Patch Available  (was: Open)

 Erasure Coding: Update safemode calculation for striped blocks
 --

 Key: HDFS-7864
 URL: https://issues.apache.org/jira/browse/HDFS-7864
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: GAO Rui
 Attachments: HDFS-7864.1.patch, HDFS-7864.2.patch


 We need to update the safemode calculation for striped blocks. Specifically, 
 each striped block now consists of multiple data/parity blocks stored in 
 corresponding DataNodes. The current code's calculation is thus inconsistent: 
 each striped block is only counted as 1 expected block, while each of its 
 member block may increase the number of received blocks by 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356440#comment-14356440
 ] 

Vinayakumar B commented on HDFS-5356:
-

Hi [~rakeshr], thanks for the patch.
Here are some comments.

Here need to passdown {{deleteDfsDir}}
{code}   public void shutdown(boolean deleteDfsDir) {
+shutdown(false, true);
+  }{code}

I Don't think below change are necessary in TestFileCreation.java.
{code}+DistributedFileSystem fs1 = null;{code}
in-fact you can remove the original re-assignment code after cluster restart. 
That should ideally work as the NameNode restarted with same ports and data. I 
have verified ;)
{code}fs = cluster.getFileSystem();{code}


Changes in TestRenameWithSnapshots.java are not at all required as {{hdfs}} is 
re-initialized everytime cluster is restarted. Let it close the FS during 
shutdown.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


 [ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-5356:
---
Attachment: HDFS-5356-4.patch

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Attachment: HDFS-7915.001.patch

Fix up the error handling in DataXceiver.  A nested try block seemed like the 
cleanest way to go.

The hardest part was writing the unit test.  I verified that the new unit test 
fails without the fix.

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-7915.001.patch


 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


 [ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-5356:

Comment: was deleted

(was: Hi [~rakeshr], thanks for the patch.
Here are some comments.

Here need to passdown {{deleteDfsDir}}
{code}   public void shutdown(boolean deleteDfsDir) {
+shutdown(false, true);
+  }{code}

I Don't think below change are necessary in TestFileCreation.java.
{code}+DistributedFileSystem fs1 = null;{code}
in-fact you can remove the original re-assignment code after cluster restart. 
That should ideally work as the NameNode restarted with same ports and data. I 
have verified ;)
{code}fs = cluster.getFileSystem();{code}


Changes in TestRenameWithSnapshots.java are not at all required as {{hdfs}} is 
re-initialized everytime cluster is restarted. Let it close the FS during 
shutdown.)

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7864) Erasure Coding: Update safemode calculation for striped blocks


[ 
https://issues.apache.org/jira/browse/HDFS-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356369#comment-14356369
 ] 

Hadoop QA commented on HDFS-7864:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703849/HDFS-7864.2.patch
  against trunk revision 30c428a.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9830//console

This message is automatically generated.

 Erasure Coding: Update safemode calculation for striped blocks
 --

 Key: HDFS-7864
 URL: https://issues.apache.org/jira/browse/HDFS-7864
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: GAO Rui
 Attachments: HDFS-7864.1.patch, HDFS-7864.2.patch


 We need to update the safemode calculation for striped blocks. Specifically, 
 each striped block now consists of multiple data/parity blocks stored in 
 corresponding DataNodes. The current code's calculation is thus inconsistent: 
 each striped block is only counted as 1 expected block, while each of its 
 member block may increase the number of received blocks by 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Attachment: HDFS-7915.001.patch

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Status: Patch Available  (was: Open)

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7915) The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell the DFSClient about it because of a network error


 [ 
https://issues.apache.org/jira/browse/HDFS-7915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7915:
---
Attachment: (was: HDFS-7915.001.patch)

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error
 -

 Key: HDFS-7915
 URL: https://issues.apache.org/jira/browse/HDFS-7915
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe

 The DataNode can sometimes allocate a ShortCircuitShm slot and fail to tell 
 the DFSClient about it because of a network error.  In 
 {{DataXceiver#requestShortCircuitFds}}, the DataNode can succeed at the first 
 part (mark the slot as used) and fail at the second part (tell the DFSClient 
 what it did). The try block for unregistering the slot only covers a 
 failure in the first part, not the second part. In this way, a divergence can 
 form between the views of which slots are allocated on DFSClient and on 
 server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7877) Support maintenance state for datanodes

2015-03-11 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356375#comment-14356375
 ] 

Ming Ma commented on HDFS-7877:
---

Thanks Eddy for the review and suggestions. Please find my response below. 
Chris might have more to add.

bq. Why is the node state the combination of live|dead and In 
service|Decommissioned|In maintenance..?
There are two state machines for datanode. One is called liveness state. 
Another one is called admin state. HDFS-7521 has some discussion around that. 
So datanode can be in any combination of these two states. That is why we have 
the case where if a node becomes dead when it is being decommissioned, it will 
remains in {{DECOMMISSION_IN_PROGRESS}} state until all the blocks are properly 
replicated.
 

bq. After NN re-starts, I think NN could not find out whether DN is in 
enter_maintenance or in_maintenance mode? 
The design handles the datanode state management for {{ENTERING_MAINTENANCE}} 
and {{IN_MAINTENANCE}} somewhat similar to {{DECOMMISSION_IN_PROGRESS}} and 
{{DECOMMISSIONED}} in the following ways.

1. When a node registers with NN ( could be datanode restart or NN restart ), 
it will first transition to DECOMMISSION_IN_PROGRESS if it is in exclude file; 
or  ENTERING_MAINTENANCE if it is in maintenance file.
2. Only after target replication has been reached, it will be transitioned to 
the final state, DECOMMISSIONED or IN_MAINTENANCE.

bq. Moreover, after NN restarts, if a DN is actually in the maintenance mode 
(DN is shutting down for maintenance), NN could not receive block reports from 
this DN.
After NN restarts, if a DN in maintenance file doesn't register with NN, then 
it won't be in {{DatanodeManager}}'s {{datanodeMap}} and thus the state won't 
be tracked. So it should be similar to how decommission is handled.

If the DN does register with NN, there is a bug in the patch that doesn't check 
if NN has received blockreport from the DN so that it doesn't prematurely 
transition the DN to {{in_maintenance}} state.

bq. Is put the dead node into maintenance mode necessary?
Good question, if it is ok to keep the node in {{dead, normal}} state when 
admins add the node to maintenance file.

The intention is to make it consistent with the actual content in maintenance 
file. It is similar to how decommission is handled; if you add a dead node to 
exclude file, the node will go directly into {{DECOMMISSIONED}} state. For 
replicas processing, {{dead, in_maintenance}} - {{live, in_maintenance}} won't 
trigger excess blocks removal; {{live, in_maintenance}} - {{live, normal}} 
will.

bq. Timeout support
Good suggestion. We discussed this topic during the design discussion. We feel 
like the admin script can handle that outside HDFS; upon timeout, the admin 
script can remove the node from maintenance file and thus trigger replication. 
If we support timeout in HDFS, nodes in maintenance files won't necessarily be 
in maintenance states. Alternatively we can add another state called 
maintenance_timeout. But that might be too complicated. I can understand the 
benefit of having a timeout here. So we would like to hear others suggestion.


There are two new topics we want to bring up.

* The original design doc uses cluster default minimal replication factor to 
decide if the node can exit {{ENTERING_MAINTENANCE}} state. We might want to 
use a new config value so that we can set the value to two. For scenario like 
hadoop software upgrade, if used together with upgrade domain two replicas 
will be met right away for most blocks. For scenario like rack repair, two 
replicas can give us better data availability. At least we can test out 
different values independent of the cluster's minimal replication factor.

* If read is allowed on node in {{ENTERING_MAINTENANCE}} state. Perhaps we 
should support that. That will handle the case where that is the only replica 
available. We can put such replica at the end of LocatedBlock.



 Support maintenance state for datanodes
 ---

 Key: HDFS-7877
 URL: https://issues.apache.org/jira/browse/HDFS-7877
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Ming Ma
 Attachments: HDFS-7877.patch, Supportmaintenancestatefordatanodes.pdf


 This requirement came up during the design for HDFS-7541. Given this feature 
 is mostly independent of upgrade domain feature, it is better to track it 
 under a separate jira. The design and draft patch will be available soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

2015-03-11 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356374#comment-14356374
 ] 

Arun Suresh commented on HDFS-5796:
---

Also, please consider using HADOOP-11702, which modifies the standard 
{{StringSignerSecretProvider}} to read secret from file if present. 

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

2015-03-11 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356370#comment-14356370
 ] 

Arun Suresh commented on HDFS-5796:
---

bq.  I'm actually inclined to say no, since the other web elements are almost 
all strictly interactive. In other words, if I'm using something like SAML for 
my normal web auth and only have Kerberos deployed for internal hadoop stuff, 
there's no need to put a Kerberos filter in front of those other UIs.
Currently, if you configure a different Auth filter via AuthFilterInitializer 
and a different one (Kerb) for dfs.web.authentication, The user still has to go 
thru the Kerberos authentication.. Basically, the user has to pass thru the 
stricter scheme anyway.. So why not use a single AuthenticationFilter as 
[~wheat9] suggested ?

Please Also note, as I mentioned in an earlier comment, the there is a THIRD 
filter involved here which is initialized by {{HttpServer2#initSpnego()}}. This 
ends up being the same filter as dfs.web.authentication, but a filter is still 
initialized none the less .. I feel this should be removed.. either this JIRA 
or another. 

w.r.t to the patch
{noformat}
+Reader reader = new InputStreamReader(new FileInputStream(
+signatureSecretFile), Charsets.UTF_8);
+int c = reader.read();
+while (c  -1) {
+  secret.append((char)c);
+  c = reader.read();
+}
+reader.close();
+p.setProperty(AuthenticationFilter.SIGNATURE_SECRET, 
secret.toString());
{noformat}
could be better written as 
{noformat}
secret = Files.readAllBytes(new File(secretFile).toPath())
{noformat}

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Kihwal Lee
Assignee: Ryan Sasson
Priority: Blocker
 Attachments: HDFS-5796.1.patch, HDFS-5796.1.patch, HDFS-5796.2.patch, 
 HDFS-5796.3.patch, HDFS-5796.3.patch, HDFS-5796.4.patch


 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356690#comment-14356690
 ] 

Hadoop QA commented on HDFS-5356:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12703854/HDFS-5356-4.patch
  against trunk revision 30c428a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9831//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9831//console

This message is automatically generated.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop


 [ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7916:

Status: Patch Available  (was: Open)

 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Attachments: HDFS-7916-01.patch


 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356724#comment-14356724
 ] 

Rakesh R commented on HDFS-5356:


Attached new patch addressing [~vinayrpet]'s comments.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356-5.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7913) HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations


[ 
https://issues.apache.org/jira/browse/HDFS-7913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356726#comment-14356726
 ] 

Vinayakumar B commented on HDFS-7913:
-

I dont see any hdfs-env.sh in trunk.
Will it be added by user?

 HADOOP_HDFS_LOG_DIR should be HDFS_LOG_DIR in deprecations
 --

 Key: HDFS-7913
 URL: https://issues.apache.org/jira/browse/HDFS-7913
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Brahma Reddy Battula
Priority: Critical

 The wrong variable is deprecated in hdfs-env.sh.  It should be HDFS_LOG_DIR, 
 not HADOOP_HDFS_LOG_DIR.  This is breaking backward compatibility.
 It might be worthwhile to doublecheck the other dep's to make sure they are 
 correct as well.
 Also, release notes for the deprecation jira should be updated to reflect 
 this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356687#comment-14356687
 ] 

Vinayakumar B commented on HDFS-5356:
-

bq. It is failing when comparing 
SnapshotTestHelper.compareDumpedTreeInFile(fsnBefore, fsnMiddle, 
compareQuota);. fsnBefore and fsnMiddle differs in their storagespace values.
This seems unrelated to closure of filesystem.
OK now I got. If you are closing the filesystem, then open files are being 
closed which results in diff in these image sizes.
I didnt check that image file sizes were being checked in the test. I just told 
about the functionality.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


 [ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-5356:
---
Attachment: HDFS-5356-5.patch

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356-5.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356696#comment-14356696
 ] 

Rakesh R commented on HDFS-5356:


OK. Do you agree to go ahead with the changes in {{TestRenameWithSnapshots}} ?

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop

Vinayakumar B created HDFS-7916:
---

 Summary: 'reportBadBlocks' from datanodes to standby Node 
BPServiceActor goes for infinite loop
 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical


if any badblock found, then BPSA for StandbyNode will go for infinite times to 
report it.

{noformat}2015-03-11 19:43:41,528 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
stobdtserver3/10.224.54.70:18010
org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to 
report bad block 
BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
at 
org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
at java.lang.Thread.run(Thread.java:745)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()


[ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356703#comment-14356703
 ] 

Vinayakumar B commented on HDFS-5356:
-

Yes sure.

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Assignee: Rakesh R
Priority: Critical
 Attachments: HDFS-5356-1.patch, HDFS-5356-2.patch, HDFS-5356-3.patch, 
 HDFS-5356-4.patch, HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop


[ 
https://issues.apache.org/jira/browse/HDFS-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356698#comment-14356698
 ] 

Vinayakumar B commented on HDFS-7916:
-

{{BPServiceActor#processQueueMessages()}} tries to execute the 
{{ReportBadBlockAction#reportTo(..)}} and on any exception, it will add back to 
queue.
So in case of standby node, adding back to queue should be ignored for the bad 
block.


 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for 
 infinite loop
 --

 Key: HDFS-7916
 URL: https://issues.apache.org/jira/browse/HDFS-7916
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical

 if any badblock found, then BPSA for StandbyNode will go for infinite times 
 to report it.
 {noformat}2015-03-11 19:43:41,528 WARN 
 org.apache.hadoop.hdfs.server.datanode.DataNode: Failed to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode: 
 stobdtserver3/10.224.54.70:18010
 org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed 
 to report bad block 
 BP-1384821822-10.224.54.68-1422634566395:blk_1079544278_5812006 to namenode:
 at 
 org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1020)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:762)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:856)
 at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7916) 'reportBadBlocks' from datanodes to standby Node BPServiceActor goes for infinite loop