[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885240#comment-13885240
 ] 

Hudson commented on HDFS-5844:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #465 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/465/])
HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) 
(arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm


 Fix broken link in WebHDFS.apt.vm
 -

 Key: HDFS-5844
 URL: https://issues.apache.org/jira/browse/HDFS-5844
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5844.patch


 There is one broken link in WebHDFS.apt.vm.
 {code}
 {{{RemoteException JSON Schema}}}
 {code}
 should be
 {code}
 {{RemoteException JSON Schema}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Work started] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-29 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-5702 started by Vinay.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch, HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-29 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5702:


Attachment: HDFS-5702.patch

Added mentioned 4 tests. 
Please review.
I couldn't avoid the long line in expected message as its necessary to compare 
the exact output.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch, HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885317#comment-13885317
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5754:
--

- In DataStorage, BPServiceActor and BlockPoolSliceStorage, it should not 
compare DATANODE_LAYOUT_VERSION with nsInfo.getLayoutVersion() anymore.

- MapInteger, TreeSetLayoutFeature should be MapInteger, 
SetLayoutFeature.  We should declear with interface Set (or should we use 
SortedSet?) instead of particular implementation TreeSet.

- In PBHelper, could we use null (i.e. unknown) instead of NodeType.NAME_NODE 
as default?  Or we could add a setStorageType(NodeType) method so that we could 
set it when it is null.

- The type parameter below is not used.  Should it be removed?
{code}
//Storage.java
   protected Storage(NodeType type, StorageInfo storageInfo) {
 super(storageInfo);
-this.storageType = type;
   }
{code}

- I suggest to move the layout version related code out from NameNode and 
DataNode to new classes, say NameNodeLayoutVersion and DataNodeLayoutVersion.


 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
 HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
 HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
 HDFS-5754.009.patch, HDFS-5754.010.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885331#comment-13885331
 ] 

Hudson commented on HDFS-5844:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1682 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1682/])
HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) 
(arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm


 Fix broken link in WebHDFS.apt.vm
 -

 Key: HDFS-5844
 URL: https://issues.apache.org/jira/browse/HDFS-5844
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5844.patch


 There is one broken link in WebHDFS.apt.vm.
 {code}
 {{{RemoteException JSON Schema}}}
 {code}
 should be
 {code}
 {{RemoteException JSON Schema}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5844) Fix broken link in WebHDFS.apt.vm

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885336#comment-13885336
 ] 

Hudson commented on HDFS-5844:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1657 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1657/])
HDFS-5844. Fix broken link in WebHDFS.apt.vm (Contributed by Akira Ajisaka) 
(arp: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562357)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/WebHDFS.apt.vm


 Fix broken link in WebHDFS.apt.vm
 -

 Key: HDFS-5844
 URL: https://issues.apache.org/jira/browse/HDFS-5844
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0, 2.3.0

 Attachments: HDFS-5844.patch


 There is one broken link in WebHDFS.apt.vm.
 {code}
 {{{RemoteException JSON Schema}}}
 {code}
 should be
 {code}
 {{RemoteException JSON Schema}}
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5585:
-

Assignee: Kihwal Lee
  Status: Patch Available  (was: Open)

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5585:
-

Attachment: HDFS-5585.patch

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885338#comment-13885338
 ] 

Hadoop QA commented on HDFS-5585:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625858/HDFS-5585.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5976//console

This message is automatically generated.

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5776:


Attachment: HDFS-5776-v11.txt

Attached v11:
1) modify isHedgedReadsEnabled() to consider pool size as well
2) modify setThreadsNumForHedgedReads to private so can not change the thread 
number from client side dynamically, and remove  synchronized also.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v2.txt, 
 HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, 
 HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885512#comment-13885512
 ] 

Hadoop QA commented on HDFS-5776:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625869/HDFS-5776-v11.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.qjournal.client.TestQuorumJournalManager

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5977//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5977//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5977//console

This message is automatically generated.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v2.txt, 
 HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, 
 HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5492) Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk

2014-01-29 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885557#comment-13885557
 ] 

Arpit Agarwal commented on HDFS-5492:
-

Thanks for cleaning up the doc, needs one fix.

{code}
+   small portions (4 KB, configurable), writes each portion to its local
{code}
The default packet size is 64KB. We can just avoid mentioning the exact size.

Thanks, Arpit.

 Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk
 --

 Key: HDFS-5492
 URL: https://issues.apache.org/jira/browse/HDFS-5492
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: documentation, newbie
 Attachments: HDFS-5492.patch, HDFS-5492.patch


 HDFS-2069 is not ported to current document.
 The description of HDFS-2069 is as follows:
 {quote}
 Current HDFS architecture information about Trash is incorrectly documented 
 as -
 The current default policy is to delete files from /trash that are more than 
 6 hours old. In the future, this policy will be configurable through a well 
 defined interface.
 It should be something like -
 Current default trash interval is set to 0 (Deletes file without storing in 
 trash ) . This value is configurable parameter stored as fs.trash.interval 
 stored in core-site.xml .
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885590#comment-13885590
 ] 

Vinay commented on HDFS-5585:
-

changes looks good kihwal.

some minor suggestions
you might want to add \n at the end of these lines for better looking,
bq. +String shutdownDatanode = -shutdownDatanode datanode_host:ipc_port 
\[upgrade\] +

bq. +String pingDatanode = -pingDatanode datanode_host:ipc_port +

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5586) Add quick-restart option for datanode

2014-01-29 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885593#comment-13885593
 ] 

Vinay commented on HDFS-5586:
-

I think this is being covered in HDFS-5585. 

Can we make it duplicate.?

 Add quick-restart option for datanode
 -

 Key: HDFS-5586
 URL: https://issues.apache.org/jira/browse/HDFS-5586
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee

 This feature, combined with the graceful shutdown feature, will enable data 
 nodes to come back up and start serving quickly.  This is likely a command 
 line option for data node, which triggers it to look for saved state 
 information in its local storage.  If the information is present and 
 reasonably up-to-date, data node would skip some of the startup steps.
 Ideally it should be able to do quick registration without requiring removal 
 of all blocks from the date node descriptor on the name node and 
 reconstructing it with the initial full block report. This implies that all 
 RBW blocks are recorded during shutdown and on start-up they are not turned 
 into RWR. Other than the quick registration, name node should treat the 
 restart as if few heart beats were lost from the node. There should be no 
 unexpected replica state changes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885626#comment-13885626
 ] 

Jing Zhao commented on HDFS-5842:
-

The failed test has been reported in HDFS-5718 and should be unrelated.

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-5776:


Attachment: HDFS-5776-v12.txt

Address the findbugs warning.

[~jingzhao] Does this patch address your concerns?  (Thanks for the review)

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, 
 HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, 
 HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5846) Assigning DEFAULT_RACK in resolveNetworkLocation method can break data resiliency

2014-01-29 Thread Nikola Vujic (JIRA)
Nikola Vujic created HDFS-5846:
--

 Summary: Assigning DEFAULT_RACK in resolveNetworkLocation method 
can break data resiliency
 Key: HDFS-5846
 URL: https://issues.apache.org/jira/browse/HDFS-5846
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Nikola Vujic
Assignee: Nikola Vujic


Medhod CachedDNSToSwitchMapping::resolve() can return NULL which requires 
careful handling. Null can be returned in two cases:
• An error occurred with topology script execution (script crashes).
• Script returns wrong number of values (other than expected)

Critical handling is in the DN registration code. DN registration code is 
responsible for assigning proper topology paths to all registered datanodes. 
Existing code handles this NULL pointer on the following way 
({{resolveNetworkLocation}} method):
{code}
/ /resolve its network location
ListString rName = dnsToSwitchMapping.resolve(names);
String networkLocation;
if (rName == null) {
  LOG.error(The resolve call returned null! Using  + 
  NetworkTopology.DEFAULT_RACK +  for host  + names);
  networkLocation = NetworkTopology.DEFAULT_RACK;
} else {
  networkLocation = rName.get(0);
}
return networkLocation;
{code}

The line of code that is assigning default rack:
{code} networkLocation = NetworkTopology.DEFAULT_RACK; {code} 
can cause a serious problem. This means if somehow we got NULL, then the 
default rack will be assigned as a DN's network location and DN's registration 
will finish successfully. Under this circumstances, we will be able to load 
data into cluster which is working with a wrong topology. Wrong  topology means 
that fault domains are not honored. 

For the end user, it means that two data replicas can end up in the same fault 
domain and a single failure can cause loss of two, or more, replicas. Cluster 
would be in the inconsistent state but it would not be aware of that and the 
whole thing would work as if everything was fine. We can notice that something 
wrong happened almost only by looking in the log for the error:
{code}
LOG.error(The resolve call returned null! Using  + 
NetworkTopology.DEFAULT_RACK +  for host  + names);
{code}
 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5847) Consolidate INodeReference into a separate section

2014-01-29 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-5847:


 Summary: Consolidate INodeReference into a separate section
 Key: HDFS-5847
 URL: https://issues.apache.org/jira/browse/HDFS-5847
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Jing Zhao


Currently each INodeDirectorySection.Entry contains variable numbers of 
INodeReference entries. The INodeReference entries are inlined, therefore it is 
difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping 
through a INodeDirectorySection.Entry without parsing is essential to parse 
these entries in parallel.

This jira proposes to consolidate INodeReferences into a section and give each 
of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as 
a repeated field. That way we can leverage the existing code in protobuf to 
quickly skip through a INodeDirectorySection.Entry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885661#comment-13885661
 ] 

Jitendra Nath Pandey commented on HDFS-5842:


checkTGTAndReloginFromKeytab is removed, it will cause issues once TGT expires.

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885668#comment-13885668
 ] 

Jing Zhao commented on HDFS-5842:
-

Thanks for the review, Jitendra. So checkTGTAndReloginFromKeytab is always 
called in URLConnectionFactory#openConnection, which is called by 
getDT/renewDT/cancelDT. Thus I think we do not need to call 
checkTGTAndReloginFromKeytab multiple times here.

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885681#comment-13885681
 ] 

Kihwal Lee commented on HDFS-5585:
--

Sorry my bad. I thought I fixed all missing newlines while testing. I will 
revise the patch soon.

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5771) Track progress when loading fsimage

2014-01-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5771:
-

Attachment: HDFS-5771.002.patch

Thanks Chris for the review. The v2 patch makes sure that {{beginStep()}} and 
{{endStep()}} are called exactly once for each step.

It also records the storage path in the step.

 Track progress when loading fsimage
 ---

 Key: HDFS-5771
 URL: https://issues.apache.org/jira/browse/HDFS-5771
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, 
 HDFS-5771.002.patch


 The old code that loads the fsimage tracks the progress during loading. This 
 jira proposes to implement the same functionality in the new code which 
 serializes the fsimage using protobuf..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5585:
-

Attachment: HDFS-5585.patch

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch, HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885685#comment-13885685
 ] 

Jitendra Nath Pandey commented on HDFS-5842:


bq. URLConnectionFactory#openConnection, which is called by 
getDT/renewDT/cancelDT. Thus I think we do not need to call 
checkTGTAndReloginFromKeytab multiple times here.

Okay, sounds good. +1 for the patch.

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885693#comment-13885693
 ] 

Hadoop QA commented on HDFS-5585:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625935/HDFS-5585.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5980//console

This message is automatically generated.

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee
Assignee: Kihwal Lee
 Attachments: HDFS-5585.patch, HDFS-5585.patch


 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5771) Track progress when loading fsimage

2014-01-29 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5771:
-

Attachment: HDFS-5771.003.patch

The v3 patch places the {{currentStep}} variable correctly.

 Track progress when loading fsimage
 ---

 Key: HDFS-5771
 URL: https://issues.apache.org/jira/browse/HDFS-5771
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, 
 HDFS-5771.002.patch, HDFS-5771.003.patch


 The old code that loads the fsimage tracks the progress during loading. This 
 jira proposes to implement the same functionality in the new code which 
 serializes the fsimage using protobuf..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5796) The file system browser in the namenode UI requires SPNEGO.

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5796:
-

Target Version/s: 2.4.0  (was: )

 The file system browser in the namenode UI requires SPNEGO.
 ---

 Key: HDFS-5796
 URL: https://issues.apache.org/jira/browse/HDFS-5796
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0
Reporter: Kihwal Lee
Priority: Critical

 After HDFS-5382, the browser makes webhdfs REST calls directly, requiring 
 SPNEGO to work between user's browser and namenode.  This won't work if the 
 cluster's security infrastructure is isolated from the regular network.  
 Moreover, SPNEGO is not supposed to be required for user-facing web pages.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5356) MiniDFSCluster shoud close all open FileSystems when shutdown()

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5356:
-

Target Version/s: 2.4.0  (was: )

 MiniDFSCluster shoud close all open FileSystems when shutdown()
 ---

 Key: HDFS-5356
 URL: https://issues.apache.org/jira/browse/HDFS-5356
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.2.0
Reporter: haosdent
Priority: Critical
 Attachments: HDFS-5356.patch


 After add some metrics functions to DFSClient, I found that some unit tests 
 relates to metrics are failed. Because MiniDFSCluster are never close open 
 FileSystems, DFSClients are alive after MiniDFSCluster shutdown(). The 
 metrics of DFSClients in DefaultMetricsSystem are still exist and this make 
 other unit tests failed.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5500) Critical datanode threads may terminate silently on uncaught exceptions

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5500:
-

Target Version/s: 2.4.0  (was: )

 Critical datanode threads may terminate silently on uncaught exceptions
 ---

 Key: HDFS-5500
 URL: https://issues.apache.org/jira/browse/HDFS-5500
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Priority: Critical

 We've seen refreshUsed (DU) thread disappearing on uncaught exceptions. This 
 can go unnoticed for a long time.  If OOM occurs, more things can go wrong.  
 On one occasion, Timer, multiple refreshUsed and DataXceiverServer thread had 
 terminated.  
 DataXceiverServer catches OutOfMemoryError and sleeps for 30 seconds, but I 
 am not sure it is really helpful. In once case, the thread did it multiple 
 times then terminated. I suspect another OOM was thrown while in a catch 
 block.  As a result, the server socket was not closed and clients hung on 
 connect. If it had at least closed the socket, client-side would have been 
 impacted less.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5138) Support HDFS upgrade in HA

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5138:
-

Target Version/s: 2.4.0  (was: )

 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Fix For: 3.0.0

 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch, 
 hdfs-5138-branch-2.txt


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5293) Symlink resolution requires unnecessary RPCs

2014-01-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated HDFS-5293:
-

Target Version/s: 3.0.0, 2.4.0  (was: 3.0.0)

 Symlink resolution requires unnecessary RPCs
 

 Key: HDFS-5293
 URL: https://issues.apache.org/jira/browse/HDFS-5293
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Priority: Critical

 When the NN encounters a symlink, it throws an {{UnresolvedLinkException}}.  
 This exception contains only the path that is a symlink.  The client issues 
 another RPC to obtain the link target, followed by another RPC with the link 
 target + remainder of the original path.
 {{UnresolvedLinkException}} should be returning both the link and the target 
 to avoid a costly and unnecessary intermediate RPC to obtain the link target.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-782) dynamic replication

2014-01-29 Thread Jordan Mendelson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885774#comment-13885774
 ] 

Jordan Mendelson commented on HDFS-782:
---

Could this not be implemented in response to a client reading a remote block? 
The client will already be copying the block across the network in order to 
operate on it. A replication storm shouldn't happen unnecessarily in this case 
since it isn't proactively copying. Since the client is reading the remote 
block, we can be reasonably sure that the block could use an extra replica. 

This could also speed up the case of replicating a recently written block since 
we can reuse the data that has just be copied (even if it is a sub-optimal 
location for the block, it would at least increase data availability until it 
can be replicated properly). Deletion of over-replicated blocks could be happen 
when free space becomes low.

The downside seems to be the potential for extra disk writes. If every remote 
read of a complete block leads to storage of that block on the machine doing 
the read, we could end up writing a lot of data. Though it seems like this 
could be somewhat mitigated with some sort of upper-replica limit.

 dynamic replication
 ---

 Key: HDFS-782
 URL: https://issues.apache.org/jira/browse/HDFS-782
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Ning Zhang

 In a large and busy cluster, a block can be requested by many clients at the 
 same time. HDFS-767 tries to solve the failing case when the # of retries 
 exceeds the maximum # of retries. However, that patch doesn't solve the 
 performance issue since all failing clients have to wait a certain period 
 before retry, and the # of retries could be high. 
 One solution to solve the performance issue is to increase the # of replicas 
 for this hot block dynamically when it is requested many times at a short 
 period. The name node need to be aware such situation and only clean up extra 
 replicas when they are not accessed recently. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-5848:


 Summary: Add a DatanodeCommand to inform datanodes that rolling 
upgrade is in progress
 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
responses so that datanode should create hardlinks when deleting blocks.  We 
need to add a new DatanodeCommand here.  The datanode change will be done in a 
separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5848:
-

Attachment: h5848_20130130.patch

h5848_20130130.patch: adds RollingUpgradeCommand.

 Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
 -

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 need to add a new DatanodeCommand here.  The datanode change will be done in 
 a separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5843) DFSClient.getFileChecksum() throws IOException if checksum is disabled

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885793#comment-13885793
 ] 

Hadoop QA commented on HDFS-5843:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625529/hdfs-5843.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5978//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5978//console

This message is automatically generated.

 DFSClient.getFileChecksum() throws IOException if checksum is disabled
 --

 Key: HDFS-5843
 URL: https://issues.apache.org/jira/browse/HDFS-5843
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Laurent Goujon
 Attachments: hdfs-5843.patch


 If a file is created with checksum disabled (using {{ChecksumOpt.disabled()}} 
 for example), calling {{FileSystem.getFileChecksum()}} throws the following 
 IOException:
 {noformat}
 java.io.IOException: Fail to get block MD5 for 
 BP-341493254-192.168.1.10-1390888724459:blk_1073741825_1001
   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1965)
   at org.apache.hadoop.hdfs.DFSClient.getFileChecksum(DFSClient.java:1771)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1186)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1)
   at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:1194)
 [...]
 {noformat}
 From the logs, the datanode is doing some wrong arithmetics because of the 
 crcPerBlock:
 {noformat}
 2014-01-27 21:58:46,329 ERROR datanode.DataNode (DataXceiver.java:run(225)) - 
 127.0.0.1:52398:DataXceiver error processing BLOCK_CHECKSUM operation  src: 
 /127.0.0.1:52407 dest: /127.0.0.1:52398
 java.lang.ArithmeticException: / by zero
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.blockChecksum(DataXceiver.java:658)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opBlockChecksum(Receiver.java:169)
   at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:77)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
   at java.lang.Thread.run(Thread.java:695)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5849) Removing ACL from an inode fails if it has only a default ACL.

2014-01-29 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-5849:
---

 Summary: Removing ACL from an inode fails if it has only a default 
ACL.
 Key: HDFS-5849
 URL: https://issues.apache.org/jira/browse/HDFS-5849
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth


When removing an ACL, the logic must restore the group permission previously 
stored in an ACL entry back into the group permission bits.  The logic for this 
in {{AclTransformation#removeINodeAcl}} assumes that the group entry must be 
found in the former ACL.  This is not the case when removing the ACL from an 
inode that only had a default ACL and not an access ACL.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-01-29 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5746:
---

Attachment: HDFS-5746.005.patch

* rename 'anchor' and 'unanchor' to 'addAnchor' and 'removeAnchor'

* add a stress test for DomainSocketWatcher, which also includes some remove 
operations.

* make some messages TRACE that were formerly INFO.  we don't want info logs 
when handling every event

* fix a bug in the native code where we were reallocating the fd_set_data 
structure, but writing the new length to the old structure

* put addNotificationSocket in a new function to avoid cluttering the main 
loop.  Remember to increment the reference count on the notificationSocket so 
that we don't get logs about mismatched reference counts when shutting down the 
watcher

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, 
 HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885808#comment-13885808
 ] 

Colin Patrick McCabe commented on HDFS-5399:


It seems that after HDFS-5291, the client tries to fail over to the other 
namenode after getting a SafeMode exception.  This seems wrong, since it means 
that the client will keep retrying forever (and hang) until the namespace comes 
out of safemode.

Formerly, we did not retry safe mode exceptions, whether or not we were in HA 
mode.  This is the correct behavior, right?

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885813#comment-13885813
 ] 

Hadoop QA commented on HDFS-5776:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625915/HDFS-5776-v12.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestAuditLogs

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5979//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5979//console

This message is automatically generated.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, 
 HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, 
 HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional

2014-01-29 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5850:


 Summary: DNS Issues during TrashEmptier initialization can 
silently leave it non-functional
 Key: HDFS-5850
 URL: https://issues.apache.org/jira/browse/HDFS-5850
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Critical


[~knoguchi] once noticed that the trash directories of a restarted cluster are 
not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885837#comment-13885837
 ] 

Jing Zhao commented on HDFS-5399:
-

The client will not fail over. It will retry the same NN (and this NN throws 
RetriableException only when it's in active state). But I think we may want to 
add a maximum retry times there.

bq. Formerly, we did not retry safe mode exceptions, whether or not we were in 
HA mode.
The issue with HA setup is that the SBN may stay in safemode for a long time 
and when it transitions to the active state, it needs at least 30s to come out 
of the safemode. This makes the actual failover time long since the old 
behavior is that the client will retry only once. This can then cause HBase 
region server to timeout and kill itself. Thus we need to let client wait and 
retry longer time.

But in the meanwhile, I think we should revisit this safemode extension and see 
if we can avoid NN to go to unnecessary safemode and shorten the safemode 
period.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885838#comment-13885838
 ] 

Colin Patrick McCabe commented on HDFS-5841:


+1 pending jenkins

 Update HDFS caching documentation with new changes
 --

 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
  Labels: caching
 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch


 The caching documentation is a little out of date, since it's missing 
 description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional

2014-01-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5850:
-

Description: 
[~knoguchi] recently noticed that the trash directories of a restarted cluster 
are not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.

  was:
[~knoguchi] once noticed that the trash directories of a restarted cluster are 
not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.


 DNS Issues during TrashEmptier initialization can silently leave it 
 non-functional
 --

 Key: HDFS-5850
 URL: https://issues.apache.org/jira/browse/HDFS-5850
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Critical

 [~knoguchi] recently noticed that the trash directories of a restarted 
 cluster are not cleaned up. It turned out that it was caused by a transient 
 DNS problem during initialization.
 TrashEmptier thread in namenode is actually a FileSystem client running in a 
 loop, which makes RPC calls to itself in order  to list, rename and delete 
 trash files.  In a secure setup, the client needs to create the right service 
 principal name for the namenode for making a RPC connection. If there is a 
 DNS issue at that moment, the SPN ends up with the IP address, not the fqdn.
 Since KDC does not recognize this SPN, TrashEmptier does not work from that 
 point on. I verified that the SPN with the IP address was what the 
 TrashEmptier thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional

2014-01-29 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5850:
-

Description: 
[~knoguchi] recently noticed that the trash directories of a restarted cluster 
were not cleaned up. It turned out that it was caused by a transient DNS 
problem during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.

  was:
[~knoguchi] recently noticed that the trash directories of a restarted cluster 
are not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.


 DNS Issues during TrashEmptier initialization can silently leave it 
 non-functional
 --

 Key: HDFS-5850
 URL: https://issues.apache.org/jira/browse/HDFS-5850
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Critical

 [~knoguchi] recently noticed that the trash directories of a restarted 
 cluster were not cleaned up. It turned out that it was caused by a transient 
 DNS problem during initialization.
 TrashEmptier thread in namenode is actually a FileSystem client running in a 
 loop, which makes RPC calls to itself in order  to list, rename and delete 
 trash files.  In a secure setup, the client needs to create the right service 
 principal name for the namenode for making a RPC connection. If there is a 
 DNS issue at that moment, the SPN ends up with the IP address, not the fqdn.
 Since KDC does not recognize this SPN, TrashEmptier does not work from that 
 point on. I verified that the SPN with the IP address was what the 
 TrashEmptier thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress

2014-01-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885875#comment-13885875
 ] 

Suresh Srinivas commented on HDFS-5848:
---

Why is this a command and not a state that is always sent to DataNode? 

 Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
 -

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 need to add a new DatanodeCommand here.  The datanode change will be done in 
 a separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885893#comment-13885893
 ] 

Hudson commented on HDFS-5842:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5061 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5061/])
HDFS-5842. Cannot create hftp filesystem when using a proxy user ugi and a doAs 
on a secure cluster. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562603)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DelegationTokenFetcher.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/HftpFileSystem.java


 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5842:


   Resolution: Fixed
Fix Version/s: 2.4.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review, Jitendra! I've committed this to trunk and branch-2.

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.4.0

 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885898#comment-13885898
 ] 

Aaron T. Myers commented on HDFS-5399:
--

bq. The issue with HA setup is that the SBN may stay in safemode for a long 
time and when it transitions to the active state, it needs at least 30s to 
come out of the safemode.

I don't follow this. Why is the SBN staying in safemode for a long time in an 
HA setup? Being in safemode and being in either the active or standby states 
should be orthogonal.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-5776:


Attachment: HDFS-5776-v12.txt

Failure seems unrelated.  Let me try again to be sure.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, 
 HDFS-5776-v9.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5771) Track progress when loading fsimage

2014-01-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5771:


 Component/s: namenode
Hadoop Flags: Reviewed

+1 for the v3 patch.  Thanks for incorporating those changes, Haohui.

 Track progress when loading fsimage
 ---

 Key: HDFS-5771
 URL: https://issues.apache.org/jira/browse/HDFS-5771
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, 
 HDFS-5771.002.patch, HDFS-5771.003.patch


 The old code that loads the fsimage tracks the progress during loading. This 
 jira proposes to implement the same functionality in the new code which 
 serializes the fsimage using protobuf..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Arpit Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885916#comment-13885916
 ] 

Arpit Gupta commented on HDFS-5399:
---

We had run into this issue while testing HA. You can see in HDFS-5291 that the 
standby NN after transitioning to active went into safemode. We saw issues 
where Resource Manager and Region Servers would crash/complain because of this. 
We ran into this frequently before HDFS-5291 was fixed.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885930#comment-13885930
 ] 

Aaron T. Myers commented on HDFS-5399:
--

On that JIRA I asked the following question:

bq. Is my understanding of this issue correct that the only thing we're trying 
to fix here is the fact the clients are not retrying attempting to talk to the 
active NN when it receives a safemode exception? i.e. it's not the case that 
the standby NN is somehow incorrectly going into safemode after a failover?

I concluded (perhaps incorrectly) based on Jing's response that I was correct 
in my understanding of the issue, but it seems that I was not. If so, the fact 
that the former standby NN is going into safemode upon transition to active is 
the real bug here, not that clients don't retry when the NN is in safemode, and 
that's what we should be fixing, not the client RPC retry behavior.

Jing/Arpit - do either of you have any insight as to why you observed the NN 
going into safemode upon transition to active? If we can figure that out, then 
we should fix that, and perhaps revert or modify the new behavior introduced in 
HDFS-5291.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HDFS-5771) Track progress when loading fsimage

2014-01-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved HDFS-5771.
-

   Resolution: Fixed
Fix Version/s: HDFS-5698 (FSImage in protobuf)

I committed the patch to the HDFS-5698 feature branch.  Thanks again, Haohui.

 Track progress when loading fsimage
 ---

 Key: HDFS-5771
 URL: https://issues.apache.org/jira/browse/HDFS-5771
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-5698 (FSImage in protobuf)
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: HDFS-5698 (FSImage in protobuf)

 Attachments: HDFS-5771.000.patch, HDFS-5771.001.patch, 
 HDFS-5771.002.patch, HDFS-5771.003.patch


 The old code that loads the fsimage tracks the progress during loading. This 
 jira proposes to implement the same functionality in the new code which 
 serializes the fsimage using protobuf..



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885944#comment-13885944
 ] 

Brandon Li commented on HDFS-5754:
--

{quote} In DataStorage, BPServiceActor and BlockPoolSliceStorage, it should not 
compare DATANODE_LAYOUT_VERSION with nsInfo.getLayoutVersion() anymore.{quote}
removed. 

{quote} MapInteger, TreeSetLayoutFeature should be MapInteger, 
SetLayoutFeature. We should declear with interface Set (or should we use 
SortedSet?) instead of particular implementation TreeSet.{quote}
yes.
{quote} In PBHelper, could we use null (i.e. unknown) instead of 
NodeType.NAME_NODE as default? Or we could add a setStorageType(NodeType) 
method so that we could set it when it is null. {quote}
If we use null as default and add new method setStorageType() to set 
storageType in a few places after receiving StorageInfo from the wire, the code 
is not as clean as just sending StorageType in the RPC payload. But I will 
upload a patch with the default null first to show the change. 
 
{quote}The type parameter below is not used. Should it be removed?{quote}
yes.
 
 {quote} I suggest to move the layout version related code out from NameNode 
and DataNode to new classes, say NameNodeLayoutVersion and 
DataNodeLayoutVersion. {quote}
Agree. It's better to hide the maps in these two classes than exposing them 
everywhere.


 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
 HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
 HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
 HDFS-5754.009.patch, HDFS-5754.010.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5781) Use an array to record the mapping between FSEditLogOpCode and the corresponding byte value

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5781:
--

Fix Version/s: (was: 2.3.0)
   2.4.0

JIRA fix versions are weird right now, I think this is only in branch-2 and not 
also branch-2.3. I think this is minor enough that it's okay to leave it out, 
but please merge it to branch-2.3 and update the fix version if you feel 
otherwise.

 Use an array to record the mapping between FSEditLogOpCode and the 
 corresponding byte value
 ---

 Key: HDFS-5781
 URL: https://issues.apache.org/jira/browse/HDFS-5781
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.3.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 2.4.0

 Attachments: HDFS-5781.000.patch, HDFS-5781.001.patch, 
 HDFS-5781.002.patch, HDFS-5781.002.patch


 HDFS-5674 uses Enum.values and enum.ordinal to identify an editlog op for a 
 given byte value. While improving the efficiency, it may cause issue. E.g., 
 when several new editlog ops are added to trunk around the same time (for 
 several different new features), it is hard to backport the editlog ops with 
 larger byte values to branch-2 before those with smaller values, since there 
 will be gaps in the byte values of the enum. 
 This jira plans to still use an array to record the mapping between editlog 
 ops and their byte values, and allow gap between valid ops. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5688) Wire-encription in QJM

2014-01-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885962#comment-13885962
 ] 

Suresh Srinivas commented on HDFS-5688:
---

[~jucaf], please provide the information required for verifying if this is 
indeed a bug. I will close this jira after a week or so, if information 
required is not posted to the jira.

 Wire-encription in QJM
 --

 Key: HDFS-5688
 URL: https://issues.apache.org/jira/browse/HDFS-5688
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, journal-node, security
Affects Versions: 2.2.0
Reporter: Juan Carlos Fernandez
Priority: Blocker
  Labels: security

 When HA is implemented with QJM and using kerberos, it's not possible to set 
 wire-encrypted data.
 If it's set property hadoop.rpc.protection to something different to 
 authentication it doesn't work propertly, getting the error:
 ERROR security.UserGroupInformation: PriviledgedActionException 
 as:principal@REALM (auth:KERBEROS) cause:javax.security.sasl.SaslException: 
 No common protection layer between client and server
 With NFS as shared storage everything works like a charm



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5614) NameNode: implement handling of ACLs in combination with snapshots.

2014-01-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5614:


Attachment: HDFS-5614.1.patch

I'm uploading the patch.  Here is a summary of the changes.
# {{DFSClient}}: Testing revealed that we weren't unwrapping 
{{NSQuotaExceededException}} in the ACL modification APIs.  This exception can 
be thrown when changing an ACL on a file that is a child of a directory that 
was previously snapshotted, because the change requires consuming more 
namespace quota.
# {{AclFeature}}: I made instances of this class immutable.  This fixed a lot 
of bugs related to copying the instance around inside a snapshot and then 
mutating the original through the ACL modification APIs.
# {{AclStorage}}: Calls to inode methods for getting and setting the ACL now 
pass snapshot ID.
# {{FSDirectory}}: Added special case handling for .snapshot path and changed 
{{getAclStatus}} to get the correct snapshot ID.
# {{FSImageFormat}}/{{FSImageSerialization}}: The current ACL is now written in 
snapshot diff lists and restored into the {{SnapshotCopy}} on load.
# {{INode}} and subclasses and related interfaces: Previously, we had the 
methods for getting and setting the {{AclFeature}} in 
{{INodeWithAdditionalFields}}.  I've now made the necessary changes throughout 
the inode class hierarchy to define these methods in the {{INode}} base class 
and return the correct results in subclasses.
# {{INodeDirectory}}: I added a special case in the copy constructor to 
preserve the ACL even if we aren't copying the other inode features.
# {{TestNameNodeAcl}}: New test suite covering the various interactions between 
ACLs and snapshots.
# I cleaned up multiple places in the code in {{FSDirectory}}, 
{{FSPermissionChecker}} and {{AclStorage}} that previously had been downcasting 
to {{INodeWithAdditionalFields}}.

I've also verified that other ACL tests in the branch are still passing with 
this patch.


 NameNode: implement handling of ACLs in combination with snapshots.
 ---

 Key: HDFS-5614
 URL: https://issues.apache.org/jira/browse/HDFS-5614
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-5614.1.patch


 Within a snapshot, all ACLs are frozen at the moment that the snapshot was 
 created.  ACL changes in the parent of the snapshot are not applied to the 
 snapshot.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-01-29 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5810:
---

Attachment: HDFS-5810.006.patch

* ShortCircuitCache#fetchOrCreate: retry here if we get a stale replica.
* ShortCircuitCache#obliterate: must set refCount to 0 here.
* fix up some logs, add more trace logs
* fix findbugs issues
* add more descriptive failure message to some asserts
* TestBlockTokenWithDFS: fix test control flow.  fix longstanding DFSClient 
leak.
* move getConfiguration and getUGI out of the RemotePeerFactory interface. 

 Unify mmap cache and short-circuit file descriptor cache
 

 Key: HDFS-5810
 URL: https://issues.apache.org/jira/browse/HDFS-5810
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
 HDFS-5810.006.patch


 We should unify the client mmap cache and the client file descriptor cache.  
 Since mmaps are granted corresponding to file descriptors in the cache 
 (currently FileInputStreamCache), they have to be tracked together to do 
 smarter things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional

2014-01-29 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885974#comment-13885974
 ] 

Daryn Sharp commented on HDFS-5850:
---

I'm not sure this issue affects 2.x.  In 0.23, the client pre-constructs the 
kerberos service principal and caches it in the ConnectionId.  All subsequent 
connections use a cached Connection which in turn reuses the cached principal 
in the ConnectionId.  Thus, if the principal is misconstructed it will never 
recover.

RPCv9 in 2.x should recover.  The client no longer preconstructs and caches the 
principal.  It verifies the principal advertised by the server.  If a transient 
DNS resolve failure occurs, the _HOST substitution in the service principal key 
will indeed yield a principal with an IP.  The client will reject the 
advertised principal because it doesn't match (ip vs hostname).  However, 
subsequent connections will attempt to reverify the advertised principal which 
involves a new DNS resolve.  The client should recover when DNS recovers.

 DNS Issues during TrashEmptier initialization can silently leave it 
 non-functional
 --

 Key: HDFS-5850
 URL: https://issues.apache.org/jira/browse/HDFS-5850
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Critical

 [~knoguchi] recently noticed that the trash directories of a restarted 
 cluster were not cleaned up. It turned out that it was caused by a transient 
 DNS problem during initialization.
 TrashEmptier thread in namenode is actually a FileSystem client running in a 
 loop, which makes RPC calls to itself in order  to list, rename and delete 
 trash files.  In a secure setup, the client needs to create the right service 
 principal name for the namenode for making a RPC connection. If there is a 
 DNS issue at that moment, the SPN ends up with the IP address, not the fqdn.
 Since KDC does not recognize this SPN, TrashEmptier does not work from that 
 point on. I verified that the SPN with the IP address was what the 
 TrashEmptier thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5850) DNS Issues during TrashEmptier initialization can silently leave it non-functional

2014-01-29 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-5850:
--

Affects Version/s: (was: 2.4.0)
   0.23.0

 DNS Issues during TrashEmptier initialization can silently leave it 
 non-functional
 --

 Key: HDFS-5850
 URL: https://issues.apache.org/jira/browse/HDFS-5850
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Kihwal Lee
Priority: Critical

 [~knoguchi] recently noticed that the trash directories of a restarted 
 cluster were not cleaned up. It turned out that it was caused by a transient 
 DNS problem during initialization.
 TrashEmptier thread in namenode is actually a FileSystem client running in a 
 loop, which makes RPC calls to itself in order  to list, rename and delete 
 trash files.  In a secure setup, the client needs to create the right service 
 principal name for the namenode for making a RPC connection. If there is a 
 DNS issue at that moment, the SPN ends up with the IP address, not the fqdn.
 Since KDC does not recognize this SPN, TrashEmptier does not work from that 
 point on. I verified that the SPN with the IP address was what the 
 TrashEmptier thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885986#comment-13885986
 ] 

Andrew Wang commented on HDFS-5842:
---

Should this be included in branch-2.3 as well?

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.4.0

 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885990#comment-13885990
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5848:
--

Heartbeat response is called DatanodeCommand in the code.  It will keep 
sending RollingUpgradeCommand for every heartbeat during rolling upgrade.

 Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
 -

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 need to add a new DatanodeCommand here.  The datanode change will be done in 
 a separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13885998#comment-13885998
 ] 

Hadoop QA commented on HDFS-5810:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12626008/HDFS-5810.006.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5985//console

This message is automatically generated.

 Unify mmap cache and short-circuit file descriptor cache
 

 Key: HDFS-5810
 URL: https://issues.apache.org/jira/browse/HDFS-5810
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, 
 HDFS-5810.006.patch


 We should unify the client mmap cache and the client file descriptor cache.  
 Since mmaps are granted corresponding to file descriptors in the cache 
 (currently FileInputStreamCache), they have to be tracked together to do 
 smarter things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools

2014-01-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886011#comment-13886011
 ] 

Andrew Wang commented on HDFS-5845:
---

I'll also note that I bumped the timeout on that seemingly unrelated test since 
it flaked twice for me at 30s.

 SecondaryNameNode dies when checkpointing with cache pools
 --

 Key: HDFS-5845
 URL: https://issues.apache.org/jira/browse/HDFS-5845
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Blocker
  Labels: caching
 Attachments: hdfs-5845-1.patch


 The SecondaryNameNode clears and reloads its FSNamesystem when doing 
 checkpointing. However, FSNamesystem#clear does not clear CacheManager state 
 during this reload. This leads to an error like the following:
 {noformat}
 org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5746) add ShortCircuitSharedMemorySegment

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886012#comment-13886012
 ] 

Hadoop QA commented on HDFS-5746:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625965/HDFS-5746.005.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1546 javac 
compiler warnings (more than the trunk's current 1541 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestPersistBlocks

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.net.unix.TestDomainSocketWatcher

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5981//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5981//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5981//console

This message is automatically generated.

 add ShortCircuitSharedMemorySegment
 ---

 Key: HDFS-5746
 URL: https://issues.apache.org/jira/browse/HDFS-5746
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, hdfs-client
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 3.0.0

 Attachments: HDFS-5746.001.patch, HDFS-5746.002.patch, 
 HDFS-5746.003.patch, HDFS-5746.004.patch, HDFS-5746.005.patch


 Add ShortCircuitSharedMemorySegment, which will be used to communicate 
 information between the datanode and the client about whether a replica is 
 mlocked.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886014#comment-13886014
 ] 

Hadoop QA commented on HDFS-4239:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625766/hdfs-4239_v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage
  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.TestSmallBlock
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
  org.apache.hadoop.hdfs.TestFileCreation
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.server.namenode.TestFileLimit
  org.apache.hadoop.hdfs.server.balancer.TestBalancer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5983//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5983//console

This message is automatically generated.

 Means of telling the datanode to stop using a sick disk
 ---

 Key: HDFS-4239
 URL: https://issues.apache.org/jira/browse/HDFS-4239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: stack
Assignee: Jimmy Xiang
 Attachments: hdfs-4239.patch, hdfs-4239_v2.patch


 If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
 occasionally, or just exhibiting high latency -- your choices are:
 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
 the rereplication of the downed datanode's data can be pretty disruptive, 
 especially if the cluster is doing low latency serving: e.g. hosting an hbase 
 cluster.
 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
 can't unmount the disk while it is in use).  This latter is better in that 
 only the bad disk's data is rereplicated, not all datanode data.
 Is it possible to do better, say, send the datanode a signal to tell it stop 
 using a disk an operator has designated 'bad'.  This would be like option #2 
 above minus the need to stop and restart the datanode.  Ideally the disk 
 would become unmountable after a while.
 Nice to have would be being able to tell the datanode to restart using a disk 
 after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886017#comment-13886017
 ] 

Jing Zhao commented on HDFS-5842:
-

Yeah, that will be great. Thanks Andrew!

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.4.0

 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5851) Support memory as a storage medium

2014-01-29 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5851:
---

 Summary: Support memory as a storage medium
 Key: HDFS-5851
 URL: https://issues.apache.org/jira/browse/HDFS-5851
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Memory can be used as a storage medium for smaller/transient files for fast 
write throughput.

More information/design will be added later.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886027#comment-13886027
 ] 

Hadoop QA commented on HDFS-5841:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12625670/hdfs-5841-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5982//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5982//console

This message is automatically generated.

 Update HDFS caching documentation with new changes
 --

 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
  Labels: caching
 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch


 The caching documentation is a little out of date, since it's missing 
 description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools

2014-01-29 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886031#comment-13886031
 ] 

Colin Patrick McCabe commented on HDFS-5845:


Looks good to me.  +1

 SecondaryNameNode dies when checkpointing with cache pools
 --

 Key: HDFS-5845
 URL: https://issues.apache.org/jira/browse/HDFS-5845
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Blocker
  Labels: caching
 Attachments: hdfs-5845-1.patch


 The SecondaryNameNode clears and reloads its FSNamesystem when doing 
 checkpointing. However, FSNamesystem#clear does not clear CacheManager state 
 during this reload. This leads to an error like the following:
 {noformat}
 org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5845:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Thanks Colin, I committed this to branch-2.3, branch-2, and trunk.

 SecondaryNameNode dies when checkpointing with cache pools
 --

 Key: HDFS-5845
 URL: https://issues.apache.org/jira/browse/HDFS-5845
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Blocker
  Labels: caching
 Fix For: 2.3.0

 Attachments: hdfs-5845-1.patch


 The SecondaryNameNode clears and reloads its FSNamesystem when doing 
 checkpointing. However, FSNamesystem#clear does not clear CacheManager state 
 during this reload. This leads to an error like the following:
 {noformat}
 org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886040#comment-13886040
 ] 

Jing Zhao commented on HDFS-5399:
-

bq. If so, the fact that the former standby NN is going into safemode upon 
transition to active is the real bug here
It's not like this. SBN will not put itself into safemode because of 
transitioning to active state. What we saw in our test is: the SBN cannot come 
out of the safemode thus the safemode object is not null when failover happens. 
And when the SBN becomes active, it can quickly go into the safemode extension 
period, but this still adds an extra 30 seconds to the no-service time. 

Thus the question is, why the NN can quickly go into the safemode extension 
period while in active state, but keeps staying in safemode in standby state? 
In our test we have a lot of file creation/deletion happening. Is it possible 
that the SBN keeps tailing the editlog while hold the FSN lock, thus the 
SafeModeMonitor thread could not get the lock to leave the safemode?

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-198) org.apache.hadoop.dfs.LeaseExpiredException during dfs write

2014-01-29 Thread sukhendu chakraborty (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886051#comment-13886051
 ] 

sukhendu chakraborty commented on HDFS-198:
---

I am seeing the lease not expired error for a partitioned hive tables in CDH 
4.5 MR1. I have a similar usecase as Sujesh above, I am using dynamic date 
partitioning for a year (365 partitions), but have 1B rows (300GB of data for 
that year). I also want to cluster the data in each partition into 32 buckets.

Here is part  of the error trace:
3:58:18.531 PM  ERROR   org.apache.hadoop.hdfs.DFSClient
Failed to close file 
/tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-1/trn_dt=20090531/_tmp.12_0
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
 No lease on 
/tmp/hive-user/hive_2014-01-29_15-33-51_510_4099525102053071439/_task_tmp.-ext-1/trn_dt=20090531/_tmp.12_0:
 File does not exist. Holder DFSClient_NONMAPREDUCE_-1745484980_1 does not have 
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2543)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2535)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2601)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2578)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:556)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44958)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

at org.apache.hadoop.ipc.Client.call(Client.java:1238)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy10.complete(Unknown Source)
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
at $Proxy10.complete(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.complete(ClientNamenodeProtocolTranslatorPB.java:330)
at 
org.apache.hadoop.hdfs.DFSOutputStream.completeFile(DFSOutputStream.java:1796)
at 
org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1783)
at 
org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:709)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:726)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.close(DistributedFileSystem.java:561)
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2399)
at 
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2415)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

 org.apache.hadoop.dfs.LeaseExpiredException during dfs write
 

 Key: HDFS-198
 URL: https://issues.apache.org/jira/browse/HDFS-198
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client, namenode
Reporter: Runping Qi

 Many long running cpu intensive map tasks failed due to 
 org.apache.hadoop.dfs.LeaseExpiredException.
 See [a comment 
 below|https://issues.apache.org/jira/browse/HDFS-198?focusedCommentId=12910298page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12910298]
  for the exceptions from the log:



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886053#comment-13886053
 ] 

Aaron T. Myers commented on HDFS-5399:
--

I see, so it sounds like the bug is that the NN is not leaving safemode (after 
startup?) automatically while it's in the standby state even though it's 
received sufficient block reports to cause it to leave safemode. It will then 
automatically enter the extension period and subsequently leave safemode only 
on transition to the active state. Is that correct?

bq. Is it possible that the SBN keeps tailing the editlog while hold the FSN 
lock, thus the SafeModeMonitor thread could not get the lock to leave the 
safemode?

I don't think this is possible. The EditLogTailer only takes the FSN lock when 
it wakes up periodically to tail edits.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5845) SecondaryNameNode dies when checkpointing with cache pools

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886049#comment-13886049
 ] 

Hudson commented on HDFS-5845:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5063 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5063/])
HDFS-5845. SecondaryNameNode dies when checkpointing with cache pools. (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562644)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCheckpoint.java


 SecondaryNameNode dies when checkpointing with cache pools
 --

 Key: HDFS-5845
 URL: https://issues.apache.org/jira/browse/HDFS-5845
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Blocker
  Labels: caching
 Fix For: 2.3.0

 Attachments: hdfs-5845-1.patch


 The SecondaryNameNode clears and reloads its FSNamesystem when doing 
 checkpointing. However, FSNamesystem#clear does not clear CacheManager state 
 during this reload. This leads to an error like the following:
 {noformat}
 org.apache.hadoop.fs.InvalidRequestException: Cache pool pool1 already exists.
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5702) FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands

2014-01-29 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5702:


 Target Version/s: HDFS ACLs (HDFS-4685)
Affects Version/s: HDFS ACLs (HDFS-4685)
 Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this later today.

 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands
 ---

 Key: HDFS-5702
 URL: https://issues.apache.org/jira/browse/HDFS-5702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client, namenode, security
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5702.patch, HDFS-5702.patch


 FsShell Cli: Add XML based End-to-End test for getfacl and setfacl commands



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886060#comment-13886060
 ] 

Andrew Wang commented on HDFS-5841:
---

No tests as this is a doc change. I'm going to commit this shortly based on 
Colin's +1, thanks Colin!

 Update HDFS caching documentation with new changes
 --

 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
  Labels: caching
 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch


 The caching documentation is a little out of date, since it's missing 
 description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)
stack created HDFS-5852:
---

 Summary: Change the colors on the hdfs UI
 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0


The HDFS UI colors are too close to HWX green.

Here is a patch that steers clear of vendor colors.

I made it a blocker thinking this something we'd want to fix before we release 
apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5841:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.3.

 Update HDFS caching documentation with new changes
 --

 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
  Labels: caching
 Fix For: 2.3.0

 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch


 The caching documentation is a little out of date, since it's missing 
 description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-5852:


Attachment: hdfs-5852.txt

Patch that changes our basis from 'green' to 'orange'.  Screen shot coming...

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-5852:


Attachment: new_hdfsui_colors.png

Here is what the patch looks like.

The colors used ... are 'International Orange (Aerospace)
#FF4F00' for  banner background and 'International Orange (Golden Gate
Bridge) #C0362C' for highlighting when an item is selected in the banner.
A lighter hue of 'International Orange (Aerospace)' courtesy of
http://www.colorhexa.com/ff4f00 is also used for ui-tabs div.  See
http://en.wikipedia.org/wiki/International_orange for more on IO.

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886072#comment-13886072
 ] 

Jing Zhao commented on HDFS-5776:
-

Thanks for updating the patch, [~xieliang007] and [~stack].

[~stack], so the latest patch changes setThreadsNumForHedgedReads to private 
and aims to make users unable to change the thread number from client side 
dynamically. However, users can still create their own configuration object, 
change the configuration for thread pool size, create an DFSClient instance, 
and change the thread number? So I think we may want to make it more clean 
here. Specifically,
# the first DFSClient who tries to enable the hedged read will initialize the 
thread pool (in the DFSClient constructor or in the enable method), so that the 
enable can be a real enable
# changing of the thread pool size (if it is necessary) should still go through 
a setThreadsNumForHedgedReads method (instead of the constructor of DFSClient), 
so that a client cannot silently change the size of the thread pool

Besides, the current patch has not addressed the comment for 
enoughNodesForHedgedRead/chooseDataNode.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, 
 HDFS-5776-v9.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886077#comment-13886077
 ] 

Andrew Wang commented on HDFS-5852:
---

I'm +1, but let's give others some time to comment before committing.

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Priority: Blocker
  Labels: webui
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5852:
--

Labels: webui  (was: )

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
  Labels: webui
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5852:
--

Assignee: stack

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
  Labels: webui
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5841) Update HDFS caching documentation with new changes

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886081#comment-13886081
 ] 

Hudson commented on HDFS-5841:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5064 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5064/])
HDFS-5841. Update HDFS caching documentation with new changes. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562649)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/CacheAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/CentralizedCacheManagement.apt.vm


 Update HDFS caching documentation with new changes
 --

 Key: HDFS-5841
 URL: https://issues.apache.org/jira/browse/HDFS-5841
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Andrew Wang
Assignee: Andrew Wang
  Labels: caching
 Fix For: 2.3.0

 Attachments: hdfs-5841-1.patch, hdfs-5841-2.patch, hdfs-5841-3.patch


 The caching documentation is a little out of date, since it's missing 
 description of features like TTL and expiration.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886083#comment-13886083
 ] 

Jing Zhao commented on HDFS-5399:
-

bq. even though it's received sufficient block reports to cause it to leave 
safemode
I'm not sure about this even though part, because we did not see 
corresponding log in our test.

bq. I don't think this is possible. The EditLogTailer only takes the FSN lock 
when it wakes up periodically to tail edits.
What if a lot of file creation/deletion requests keep coming? If the editlog 
keeps growing, is it possible that the SBN keeps tailing the editlog in a 
single session and cannot get a change to go back to sleep?

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies

2014-01-29 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886085#comment-13886085
 ] 

Jing Zhao commented on HDFS-5399:
-

I will try to setup the test again to see if I can regenerate the issue and 
find out the cause of the problem.

 Revisit SafeModeException and corresponding retry policies
 --

 Key: HDFS-5399
 URL: https://issues.apache.org/jira/browse/HDFS-5399
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao

 Currently for NN SafeMode, we have the following corresponding retry policies:
 # In non-HA setup, for certain API call (create), the client will retry if 
 the NN is in SafeMode. Specifically, the client side's RPC adopts 
 MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry 
 is enabled.
 # In HA setup, the client will retry if the NN is Active and in SafeMode. 
 Specifically, the SafeModeException is wrapped as a RetriableException in the 
 server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy 
 which recognizes RetriableException (see HDFS-5291).
 There are several possible issues in the current implementation:
 # The NN SafeMode can be a Manual SafeMode (i.e., started by administrator 
 through CLI), and the clients may not want to retry on this type of SafeMode.
 # Client may want to retry on other API calls in non-HA setup.
 # We should have a single generic strategy to address the mapping between 
 SafeMode and retry policy for both HA and non-HA setup. A possible 
 straightforward solution is to always wrap the SafeModeException in the 
 RetriableException to indicate that the clients should retry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5842:
--

Fix Version/s: (was: 2.4.0)
   2.3.0

No prob, merged to branch-2.3. Thanks Jing!

 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-4239) Means of telling the datanode to stop using a sick disk

2014-01-29 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HDFS-4239:
--

Status: Open  (was: Patch Available)

 Means of telling the datanode to stop using a sick disk
 ---

 Key: HDFS-4239
 URL: https://issues.apache.org/jira/browse/HDFS-4239
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: stack
Assignee: Jimmy Xiang
 Attachments: hdfs-4239.patch, hdfs-4239_v2.patch


 If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
 occasionally, or just exhibiting high latency -- your choices are:
 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
 disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
 the rereplication of the downed datanode's data can be pretty disruptive, 
 especially if the cluster is doing low latency serving: e.g. hosting an hbase 
 cluster.
 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
 can't unmount the disk while it is in use).  This latter is better in that 
 only the bad disk's data is rereplicated, not all datanode data.
 Is it possible to do better, say, send the datanode a signal to tell it stop 
 using a disk an operator has designated 'bad'.  This would be like option #2 
 above minus the need to stop and restart the datanode.  Ideally the disk 
 would become unmountable after a while.
 Nice to have would be being able to tell the datanode to restart using a disk 
 after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5492) Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk

2014-01-29 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-5492:


Attachment: HDFS-5492.2.patch

Thank you for your comment.
Removed mentioning the exact packet size.

 Port HDFS-2069 (Incorrect default trash interval in the docs) to trunk
 --

 Key: HDFS-5492
 URL: https://issues.apache.org/jira/browse/HDFS-5492
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.2.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA
Priority: Minor
  Labels: documentation, newbie
 Attachments: HDFS-5492.2.patch, HDFS-5492.patch, HDFS-5492.patch


 HDFS-2069 is not ported to current document.
 The description of HDFS-2069 is as follows:
 {quote}
 Current HDFS architecture information about Trash is incorrectly documented 
 as -
 The current default policy is to delete files from /trash that are more than 
 6 hours old. In the future, this policy will be configurable through a well 
 defined interface.
 It should be something like -
 Current default trash interval is set to 0 (Deletes file without storing in 
 trash ) . This value is configurable parameter stored as fs.trash.interval 
 stored in core-site.xml .
 {quote}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5842) Cannot create hftp filesystem when using a proxy user ugi and a doAs on a secure cluster

2014-01-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886100#comment-13886100
 ] 

Hudson commented on HDFS-5842:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5065 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5065/])
Update CHANGES.txt to move HDFS-5842 to 2.3.0 (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1562656)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Cannot create hftp filesystem when using a proxy user ugi and a doAs on a 
 secure cluster
 

 Key: HDFS-5842
 URL: https://issues.apache.org/jira/browse/HDFS-5842
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.2.0
Reporter: Arpit Gupta
Assignee: Jing Zhao
 Fix For: 2.3.0

 Attachments: HADOOP-10215.000.patch, HADOOP-10215.001.patch, 
 HADOOP-10215.002.patch, HADOOP-10215.002.patch


 Noticed this while debugging issues in another application. We saw an error 
 when trying to do a FileSystem.get using an hftp file system on a secure 
 cluster using a proxy user ugi.
 This is a small snippet used
 {code}
  FileSystem testFS = ugi.doAs(new PrivilegedExceptionActionFileSystem() {
 @Override
 public FileSystem run() throws IOException {
 return FileSystem.get(hadoopConf);
 }
 });
 {code}
 The same code worked for hdfs and webhdfs but not for hftp when the ugi used 
 was UserGroupInformation.createProxyUser



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5848) Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5848:
-

Attachment: h5848_20130130b.patch

h5848_20130130b.patch: add a rollingUpgradeInfo field to HeartbeatResponse 
instead of a new DatanodeCommand.

 Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress
 -

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch, h5848_20130130b.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 need to add a new DatanodeCommand here.  The datanode change will be done in 
 a separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5754) Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion

2014-01-29 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5754:
-

Attachment: HDFS-5754.012.patch

Uploaded a new patch to address the comments.

 Split LayoutVerion into NamenodeLayoutVersion and DatanodeLayoutVersion 
 

 Key: HDFS-5754
 URL: https://issues.apache.org/jira/browse/HDFS-5754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Brandon Li
 Attachments: FeatureInfo.patch, HDFS-5754.001.patch, 
 HDFS-5754.002.patch, HDFS-5754.003.patch, HDFS-5754.004.patch, 
 HDFS-5754.006.patch, HDFS-5754.007.patch, HDFS-5754.008.patch, 
 HDFS-5754.009.patch, HDFS-5754.010.patch, HDFS-5754.012.patch


 Currently, LayoutVersion defines the on-disk data format and supported 
 features of the entire cluster including NN and DNs.  LayoutVersion is 
 persisted in both NN and DNs.  When a NN/DN starts up, it checks its 
 supported LayoutVersion against the on-disk LayoutVersion.  Also, a DN with a 
 different LayoutVersion than NN cannot register with the NN.
 We propose to split LayoutVersion into two independent values that are local 
 to the nodes:
 - NamenodeLayoutVersion - defines the on-disk data format in NN, including 
 the format of FSImage, editlog and the directory structure.
 - DatanodeLayoutVersion - defines the on-disk data format in DN, including 
 the format of block data file, metadata file, block pool layout, and the 
 directory structure.  
 The LayoutVersion check will be removed in DN registration.  If 
 NamenodeLayoutVersion or DatanodeLayoutVersion is changed in a rolling 
 upgrade, then only rollback is supported and downgrade is not.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5848) Add rolling upgrade infomation to heartbeat response

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5848:
-

Description: When rolling upgrade is in progress, NN should inform 
datanodes via heartbeat responses so that datanode should create hardlinks when 
deleting blocks.  We only change heartbeat response here.  The datanode change 
will be done in a separated JIRA.  (was: When rolling upgrade is in progress, 
NN should inform datanodes via heartbeat responses so that datanode should 
create hardlinks when deleting blocks.  We need to add a new DatanodeCommand 
here.  The datanode change will be done in a separated JIRA.)
Summary: Add rolling upgrade infomation to heartbeat response  (was: 
Add a DatanodeCommand to inform datanodes that rolling upgrade is in progress)

 Add rolling upgrade infomation to heartbeat response
 

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch, h5848_20130130b.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 only change heartbeat response here.  The datanode change will be done in a 
 separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5848) Add rolling upgrade infomation to heartbeat response

2014-01-29 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886108#comment-13886108
 ] 

Suresh Srinivas commented on HDFS-5848:
---

I actually think this just be an upgrade state that is part of 
HeartbeatResponse instead of a separate command, much like the {{haStatus}} 
member it currently has.

 Add rolling upgrade infomation to heartbeat response
 

 Key: HDFS-5848
 URL: https://issues.apache.org/jira/browse/HDFS-5848
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h5848_20130130.patch, h5848_20130130b.patch


 When rolling upgrade is in progress, NN should inform datanodes via heartbeat 
 responses so that datanode should create hardlinks when deleting blocks.  We 
 only change heartbeat response here.  The datanode change will be done in a 
 separated JIRA.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5776) Support 'hedged' reads in DFSClient

2014-01-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886107#comment-13886107
 ] 

stack commented on HDFS-5776:
-

[~jingzhao] Thanks for the new input.  Please help me better understand what 
you mean by making more clean so we can adjust the patch accordingly.

Hedged reads are set on or off in the client configuration xml and per 
DFSClient instance can be enabled/disabled as you go.  Yes, you could read code 
and figure that it is possible to do some heavyweight gymnastics creating your 
own Configuration -- expensive -- and a new DFSClient -- ditto -- if you wanted 
to work around whatever is out in the configuration xml.  That seems fine by me 
especially as there is no real means of shutting down this access route.

Pardon me but I do not follow what you are asking for in 1.  Maybe you are 
referring to a 'hole' where if the thread count is = 0 on construction, the 
enable will have no effect -- and you want it to have an 'effect' post 
construction?

For 2., you are suggesting that setThreadsNumForHedgedReads not be private but 
be available API for the DFSClient to toggle as it sees fit?

I'll let @liang xie address your enoughNodesForHedgedRead comment.

Thanks for checking back.

 Support 'hedged' reads in DFSClient
 ---

 Key: HDFS-5776
 URL: https://issues.apache.org/jira/browse/HDFS-5776
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, 
 HDFS-5776-v12.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, 
 HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, 
 HDFS-5776-v9.txt, HDFS-5776.txt


 This is a placeholder of hdfs related stuff backport from 
 https://issues.apache.org/jira/browse/HBASE-7509
 The quorum read ability should be helpful especially to optimize read outliers
 we can utilize dfs.dfsclient.quorum.read.threshold.millis  
 dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read 
 ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we 
 could export the interested metric valus into client system(e.g. HBase's 
 regionserver metric).
 The core logic is in pread code path, we decide to goto the original 
 fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per 
 the above config items.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5852) Change the colors on the hdfs UI

2014-01-29 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886109#comment-13886109
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5852:
--

What if there is a vendor using orange (or any color you chosen)?

 Change the colors on the hdfs UI
 

 Key: HDFS-5852
 URL: https://issues.apache.org/jira/browse/HDFS-5852
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
  Labels: webui
 Fix For: 2.3.0

 Attachments: hdfs-5852.txt, new_hdfsui_colors.png


 The HDFS UI colors are too close to HWX green.
 Here is a patch that steers clear of vendor colors.
 I made it a blocker thinking this something we'd want to fix before we 
 release apache hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-29 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5767:
-

Priority: Blocker  (was: Major)

 Nfs implementation assumes userName userId mapping to be unique, which is not 
 true sometimes
 

 Key: HDFS-5767
 URL: https://issues.apache.org/jira/browse/HDFS-5767
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
 Environment: With LDAP enabled
Reporter: Yongjun Zhang
Assignee: Brandon Li
Priority: Blocker

 I'm seeing that the nfs implementation assumes unique userName, userId pair 
 to be returned by command  getent paswd. That is, for a given userName, 
 there should be a single userId, and for a given userId, there should be a 
 single userName.  The reason is explained in the following message:
  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = NFS gateway 
 can't start with duplicate name or id on the host system.\n
   + This is because HDFS (non-kerberos cluster) uses name as the only 
 way to identify a user or group.\n
   + The host system with duplicated user/group name or id might work 
 fine most of the time by itself.\n
   + However when NFS gateway talks to HDFS, HDFS accepts only user and 
 group name.\n
   + Therefore, same name means the same user or same group. To find the 
 duplicated names/ids, one can do:\n
   + getent passwd | cut -d: -f1,3 and getent group | cut -d: -f1,3 
 on Linux systms,\n
   + dscl . -list /Users UniqueID and dscl . -list /Groups 
 PrimaryGroupID on MacOS.;
 This requirement can not be met sometimes (e.g. because of the use of LDAP) 
 Let's do some examination:
 What exist in /etc/passwd:
 $ more /etc/passwd | grep ^bin
 bin:x:2:2:bin:/bin:/bin/sh
 $ more /etc/passwd | grep ^daemon
 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
 The above result says userName  bin has userId 2, and daemon has userId 
 1.
  
 What we can see with getent passwd command due to LDAP:
 $ getent passwd | grep ^bin
 bin:x:2:2:bin:/bin:/bin/sh
 bin:x:1:1:bin:/bin:/sbin/nologin
 $ getent passwd | grep ^daemon
 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
 daemon:x:2:2:daemon:/sbin:/sbin/nologin
 We can see that there are multiple entries for the same userName with 
 different userIds, and the same userId could be associated with different 
 userNames.
 So the assumption stated in the above DEBUG_INFO message can not be met here. 
 The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
 user/group. I'm filing this JIRA for a solution.
 Hi [~brandonli], since you implemented most of the nfs feature, would you 
 please comment? 
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5767) Nfs implementation assumes userName userId mapping to be unique, which is not true sometimes

2014-01-29 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886116#comment-13886116
 ] 

Brandon Li commented on HDFS-5767:
--

Make it a blocker for 2.3 release. I just experienced a couple real life 
examples that complete duplicated accounts exist in local database and LDAP 
server, and administrators don't want to clean up the dups.

 Nfs implementation assumes userName userId mapping to be unique, which is not 
 true sometimes
 

 Key: HDFS-5767
 URL: https://issues.apache.org/jira/browse/HDFS-5767
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.3.0
 Environment: With LDAP enabled
Reporter: Yongjun Zhang
Assignee: Brandon Li
Priority: Blocker

 I'm seeing that the nfs implementation assumes unique userName, userId pair 
 to be returned by command  getent paswd. That is, for a given userName, 
 there should be a single userId, and for a given userId, there should be a 
 single userName.  The reason is explained in the following message:
  private static final String DUPLICATE_NAME_ID_DEBUG_INFO = NFS gateway 
 can't start with duplicate name or id on the host system.\n
   + This is because HDFS (non-kerberos cluster) uses name as the only 
 way to identify a user or group.\n
   + The host system with duplicated user/group name or id might work 
 fine most of the time by itself.\n
   + However when NFS gateway talks to HDFS, HDFS accepts only user and 
 group name.\n
   + Therefore, same name means the same user or same group. To find the 
 duplicated names/ids, one can do:\n
   + getent passwd | cut -d: -f1,3 and getent group | cut -d: -f1,3 
 on Linux systms,\n
   + dscl . -list /Users UniqueID and dscl . -list /Groups 
 PrimaryGroupID on MacOS.;
 This requirement can not be met sometimes (e.g. because of the use of LDAP) 
 Let's do some examination:
 What exist in /etc/passwd:
 $ more /etc/passwd | grep ^bin
 bin:x:2:2:bin:/bin:/bin/sh
 $ more /etc/passwd | grep ^daemon
 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
 The above result says userName  bin has userId 2, and daemon has userId 
 1.
  
 What we can see with getent passwd command due to LDAP:
 $ getent passwd | grep ^bin
 bin:x:2:2:bin:/bin:/bin/sh
 bin:x:1:1:bin:/bin:/sbin/nologin
 $ getent passwd | grep ^daemon
 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
 daemon:x:2:2:daemon:/sbin:/sbin/nologin
 We can see that there are multiple entries for the same userName with 
 different userIds, and the same userId could be associated with different 
 userNames.
 So the assumption stated in the above DEBUG_INFO message can not be met here. 
 The DEBUG_INFO also stated that HDFS uses name as the only way to identify 
 user/group. I'm filing this JIRA for a solution.
 Hi [~brandonli], since you implemented most of the nfs feature, would you 
 please comment? 
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >