date:20140907


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124812#comment-14124812
 ] 

Yongjun Zhang commented on HDFS-6621:
-

Hi [~ravwojdyla],

I studied it a bit more, and it seems to me that on top of the changes you 
made, we need to replace the {{Dispather.this}} in the following code with 
{{this}}, 
{code}
try {
  synchronized (Dispatcher.this) {
Dispatcher.this.wait(1000); // wait for targets/sources to be idle
  }
} catch (InterruptedException ignored) {
}
{code}
so to make the scheduling threads with all five transfer threads 
occupied/unfinished block on its {{source}}, then later if one transfer thread 
finishes, it would notify this blocked scheduling thread (by your change for 
problem 2) that a slot is available now.

If this makes sense to you, would please try it out with the testing you have 
done?

Again, the first problem seems to be important to fix, but I don't know how 
important the second one is (see question asked in my last comment). If the fix 
of problem 1 is good enough, then we can go with it alone. Otherwise, my above 
suggested change can be explored.

Would you please comment?

Thanks a lot.







 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source

[jira] [Commented] (HDFS-7025) HDFS Credential Provider related Unit Test Failure


[ 
https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124830#comment-14124830
 ] 

Hadoop QA commented on HDFS-7025:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667066/HDFS-7025.1.patch
  against trunk revision d1fa582.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7933//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7933//console

This message is automatically generated.

 HDFS Credential Provider related  Unit Test Failure
 ---

 Key: HDFS-7025
 URL: https://issues.apache.org/jira/browse/HDFS-7025
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.4.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch


 Reported by: Xiaomara and investigated by [~cnauroth].
 The credential provider related unit tests failed on Windows. The tests try 
 to set up a URI by taking the build test directory and concatenating it with 
 other strings containing the rest of the URI format, i.e.:
 {code}
   public void testFactory() throws Exception {
 Configuration conf = new Configuration();
 conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH,
 UserProvider.SCHEME_NAME + :///, +
 JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + 
 /test.jks);
 {code}
 This logic is incorrect on Windows, because the file path separator will be 
 '\', which violates URI syntax. Forward slash is not permitted. 
 The proper fix is to always do path/URI construction through the 
 org.apache.hadoop.fs.Path class, specifically using the constructors that 
 take explicit parent and child arguments.
 The affected unit tests are:
 {code}
 * TestCryptoAdminCLI
 * TestDFSUtil
 * TestEncryptionZones
 * TestReservedRawPaths
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation


[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124869#comment-14124869
 ] 

Hudson commented on HDFS-6940:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #673 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/673/])
HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 
88209ce181b5ecc55c0ae2bceff4893ab4817e88)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java


 Initial refactoring to allow ConsensusNode implementation
 -

 Key: HDFS-6940
 URL: https://issues.apache.org/jira/browse/HDFS-6940
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.6-alpha, 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.6.0

 Attachments: HDFS-6940.patch


 Minor refactoring of FSNamesystem to open private methods that are needed for 
 CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created


[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124868#comment-14124868
 ] 

Hudson commented on HDFS-6898:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #673 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/673/])
HDFS-6898. DN must reserve space for a full block when an RBW block is created. 
(Contributed by Arpit Agarwal) (arp: rev 
d1fa58292e87bc29b4ef1278368c2be938a0afc4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java


 DN must reserve space for a full block when an RBW block is created
 ---

 Key: HDFS-6898
 URL: https://issues.apache.org/jira/browse/HDFS-6898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0
Reporter: Gopal V
Assignee: Arpit Agarwal
 Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
 HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch


 DN will successfully create two RBW blocks on the same volume even if the 
 free space is sufficient for just one full block.
 One or both block writers may subsequently get a DiskOutOfSpace exception. 
 This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation


[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124901#comment-14124901
 ] 

Hudson commented on HDFS-6940:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1864 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1864/])
HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 
88209ce181b5ecc55c0ae2bceff4893ab4817e88)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java


 Initial refactoring to allow ConsensusNode implementation
 -

 Key: HDFS-6940
 URL: https://issues.apache.org/jira/browse/HDFS-6940
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.6-alpha, 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.6.0

 Attachments: HDFS-6940.patch


 Minor refactoring of FSNamesystem to open private methods that are needed for 
 CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created


[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124900#comment-14124900
 ] 

Hudson commented on HDFS-6898:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1864 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1864/])
HDFS-6898. DN must reserve space for a full block when an RBW block is created. 
(Contributed by Arpit Agarwal) (arp: rev 
d1fa58292e87bc29b4ef1278368c2be938a0afc4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java


 DN must reserve space for a full block when an RBW block is created
 ---

 Key: HDFS-6898
 URL: https://issues.apache.org/jira/browse/HDFS-6898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0
Reporter: Gopal V
Assignee: Arpit Agarwal
 Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
 HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch


 DN will successfully create two RBW blocks on the same volume even if the 
 free space is sufficient for just one full block.
 One or both block writers may subsequently get a DiskOutOfSpace exception. 
 This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124903#comment-14124903
 ] 

Rafal Wojdyla commented on HDFS-6621:
-

Hi [~yzhangal]

Thanks for comments, sorry for delay.

First of all - I agree that first problem is more important, and we should just 
merge it in.

About solution to second problem, do we agree that the problem exists? 
Especially with big number of threads such waking up for some threads may be 
lethal even with fix for first problem. Is that correct?

It's been a while since I've made this change, and afair I tested both 
problems/solutions and it they were separate problems, both of them cause 
premature exists. First problem was more lethal tho.

About your comment with waiting - your are completely right! I missed this in 
the patch. Now I see even more the value of pushing-patches/creating-tickets 
right away ... not waiting till you have a bunch of changes. 

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was

[jira] [Updated] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-07 Thread Zhanwei Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhanwei Wang updated HDFS-6994:
---
Description:
Hi All

I just got the permission to open source libhdfs3, which is a native C/C++ HDFS
client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.

libhdfs3 provide the libhdfs style C interface and a C++ interface. Support
both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos
authentication.

libhdfs3 is currently used by HAWQ of Pivotal

I'd like to integrate libhdfs3 into HDFS source code to benefit others.

You can find libhdfs3 code from github
https://github.com/PivotalRD/libhdfs3
http://pivotalrd.github.io/libhdfs3/

was:
Hi All

I just got the permission to open source libhdfs3, which is a native C/C++ HDFS
client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.

libhdfs3 provide the libhdfs style C interface and a C++ interface. Support
both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos
authentication.

libhdfs3 is currently used by HAWQ of Pivotal

I'd like to integrate libhdfs3 into HDFS source code to benefit others.

You can find libhdfs3 code from github
https://github.com/PivotalRD/libhdfs3

libhdfs3 - A native C/C++ HDFS client
-

Key: HDFS-6994
URL: https://issues.apache.org/jira/browse/HDFS-6994
Project: Hadoop HDFS
Issue Type: Task
Components: hdfs-client
Reporter: Zhanwei Wang
Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch

Hi All
I just got the permission to open source libhdfs3, which is a native C/C++
HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
libhdfs3 provide the libhdfs style C interface and a C++ interface. Support
both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos
authentication.
libhdfs3 is currently used by HAWQ of Pivotal
I'd like to integrate libhdfs3 into HDFS source code to benefit others.
You can find libhdfs3 code from github
https://github.com/PivotalRD/libhdfs3
http://pivotalrd.github.io/libhdfs3/

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6898) DN must reserve space for a full block when an RBW block is created


[ 
https://issues.apache.org/jira/browse/HDFS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124908#comment-14124908
 ] 

Hudson commented on HDFS-6898:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1889/])
HDFS-6898. DN must reserve space for a full block when an RBW block is created. 
(Contributed by Arpit Agarwal) (arp: rev 
d1fa58292e87bc29b4ef1278368c2be938a0afc4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/BlockPoolSlice.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/HdfsConstants.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestWriteToReplica.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestRbwSpaceReservation.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDirectoryScanner.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInfo.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaBeingWritten.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ReplicaInPipeline.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java


 DN must reserve space for a full block when an RBW block is created
 ---

 Key: HDFS-6898
 URL: https://issues.apache.org/jira/browse/HDFS-6898
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0
Reporter: Gopal V
Assignee: Arpit Agarwal
 Attachments: HDFS-6898.01.patch, HDFS-6898.03.patch, 
 HDFS-6898.04.patch, HDFS-6898.05.patch, HDFS-6898.06.patch, HDFS-6898.07.patch


 DN will successfully create two RBW blocks on the same volume even if the 
 free space is sufficient for just one full block.
 One or both block writers may subsequently get a DiskOutOfSpace exception. 
 This can be avoided by allocating space up front.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation


[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124909#comment-14124909
 ] 

Hudson commented on HDFS-6940:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1889 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1889/])
HDFS-6940. Refactoring to allow ConsensusNode implementation. (shv: rev 
88209ce181b5ecc55c0ae2bceff4893ab4817e88)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/HostFileManager.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 Initial refactoring to allow ConsensusNode implementation
 -

 Key: HDFS-6940
 URL: https://issues.apache.org/jira/browse/HDFS-6940
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.6-alpha, 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.6.0

 Attachments: HDFS-6940.patch


 Minor refactoring of FSNamesystem to open private methods that are needed for 
 CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations


 [ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated HDFS-6621:

Attachment: HDFS-6621.patch_3

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo Nicholas Sze updated HDFS-6584:
--
Attachment: h6997_20140907.patch

h6997_20140907.patch: synced with new commits.

Support Archival Storage

Key: HDFS-6584
URL: https://issues.apache.org/jira/browse/HDFS-6584
Project: Hadoop HDFS
Issue Type: New Feature
Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Attachments: HDFS-6584.000.patch,
HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf,
h6997_20140907.patch

In most of the Hadoop clusters, as more and more data is stored for longer
time, the demand for storage is outstripping the compute. Hadoop needs a cost
effective and easy to manage solution to meet this demand for storage.
Current solution is:
- Delete the old unused data. This comes at operational cost of identifying
unnecessary data and deleting them manually.
- Add more nodes to the clusters. This adds along with storage capacity
unnecessary compute capacity to the cluster.
Hadoop needs a solution to decouple growing storage capacity from compute
capacity. Nodes with higher density and less expensive storage with low
compute power are becoming available and can be used as cold storage in the
clusters. Based on policy the data from hot storage can be moved to cold
storage. Adding more nodes to the cold storage can grow the storage
independent of the compute capacity in the cluster.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124916#comment-14124916
]

Hadoop QA commented on HDFS-6584:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch
against trunk revision d1fa582.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7935//console

This message is automatically generated.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124919#comment-14124919
]

Hadoop QA commented on HDFS-6584:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch
against trunk revision d1fa582.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7936//console

This message is automatically generated.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124921#comment-14124921
]

Hadoop QA commented on HDFS-6584:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch
against trunk revision d1fa582.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7938//console

This message is automatically generated.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124920#comment-14124920
]

Hadoop QA commented on HDFS-6584:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667081/h6997_20140907.patch
against trunk revision d1fa582.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7937//console

This message is automatically generated.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo Nicholas Sze updated HDFS-6584:
--
Attachment: (was: h6997_20140907.patch)

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo Nicholas Sze updated HDFS-6584:
--
Attachment: h6584_20140907.patch

Oops, uploaded a wrong file. The file should be h6584_20140907.patch.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7025) HDFS Credential Provider related Unit Test Failure

2014-09-07 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7025:

Hadoop Flags: Reviewed

+1 for the patch.  I'll commit this.

 HDFS Credential Provider related  Unit Test Failure
 ---

 Key: HDFS-7025
 URL: https://issues.apache.org/jira/browse/HDFS-7025
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.4.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch


 Reported by: Xiaomara and investigated by [~cnauroth].
 The credential provider related unit tests failed on Windows. The tests try 
 to set up a URI by taking the build test directory and concatenating it with 
 other strings containing the rest of the URI format, i.e.:
 {code}
   public void testFactory() throws Exception {
 Configuration conf = new Configuration();
 conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH,
 UserProvider.SCHEME_NAME + :///, +
 JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + 
 /test.jks);
 {code}
 This logic is incorrect on Windows, because the file path separator will be 
 '\', which violates URI syntax. Forward slash is not permitted. 
 The proper fix is to always do path/URI construction through the 
 org.apache.hadoop.fs.Path class, specifically using the constructors that 
 take explicit parent and child arguments.
 The affected unit tests are:
 {code}
 * TestCryptoAdminCLI
 * TestDFSUtil
 * TestEncryptionZones
 * TestReservedRawPaths
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7025) HDFS Credential Provider related Unit Test Failure

2014-09-07 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-7025:

   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

I committed this to trunk and branch-2.  Xiaoyu, thank you for contributing 
this fix.

 HDFS Credential Provider related  Unit Test Failure
 ---

 Key: HDFS-7025
 URL: https://issues.apache.org/jira/browse/HDFS-7025
 Project: Hadoop HDFS
  Issue Type: Test
  Components: encryption
Affects Versions: 2.4.1
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.6.0

 Attachments: HDFS-7025.0.patch, HDFS-7025.1.patch


 Reported by: Xiaomara and investigated by [~cnauroth].
 The credential provider related unit tests failed on Windows. The tests try 
 to set up a URI by taking the build test directory and concatenating it with 
 other strings containing the rest of the URI format, i.e.:
 {code}
   public void testFactory() throws Exception {
 Configuration conf = new Configuration();
 conf.set(CredentialProviderFactory.CREDENTIAL_PROVIDER_PATH,
 UserProvider.SCHEME_NAME + :///, +
 JavaKeyStoreProvider.SCHEME_NAME + ://file + tmpDir + 
 /test.jks);
 {code}
 This logic is incorrect on Windows, because the file path separator will be 
 '\', which violates URI syntax. Forward slash is not permitted. 
 The proper fix is to always do path/URI construction through the 
 org.apache.hadoop.fs.Path class, specifically using the constructors that 
 take explicit parent and child arguments.
 The affected unit tests are:
 {code}
 * TestCryptoAdminCLI
 * TestDFSUtil
 * TestEncryptionZones
 * TestReservedRawPaths
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-09-07 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:

Attachment: HDFS-6506.v3.patch

Rebase patch to lastest trunk

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, 
 HDFS-6506.v3.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HDFS-6776) distcp from insecure cluster (source) to secure cluster (destination) doesn't work via webhdfs


[ 
https://issues.apache.org/jira/browse/HDFS-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124935#comment-14124935
 ] 

Yongjun Zhang commented on HDFS-6776:
-

[~wheat9],

Hope the example I gave is convincing that webhdfs is the right place to fix. I 
think we wouldn't want to tell user that sorry, webhdfs contract doesn't allow 
accessing insecure cluster from secure cluster, if you need to, please hack 
your application like how distcp does. Would you please comment at your 
earliest convenience? Thanks.


 distcp from insecure cluster (source) to secure cluster (destination) doesn't 
 work via webhdfs
 --

 Key: HDFS-6776
 URL: https://issues.apache.org/jira/browse/HDFS-6776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.3.0, 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6776.001.patch, HDFS-6776.002.patch, 
 HDFS-6776.003.patch, HDFS-6776.004.patch, HDFS-6776.004.patch, 
 HDFS-6776.005.patch, HDFS-6776.006.NullToken.patch, 
 HDFS-6776.006.NullToken.patch, HDFS-6776.007.patch, HDFS-6776.008.patch, 
 HDFS-6776.009.patch, HDFS-6776.010.patch, HDFS-6776.011.patch, 
 dummy-token-proxy.js


 Issuing distcp command at the secure cluster side, trying to copy stuff from 
 insecure cluster to secure cluster, and see the following problem:
 {code}
 hadoopuser@yjc5u-1 ~]$ hadoop distcp webhdfs://insure-cluster:port/tmp 
 hdfs://sure-cluster:8020/tmp/tmptgt
 14/07/30 20:06:19 INFO tools.DistCp: Input Options: 
 DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, 
 ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', 
 copyStrategy='uniformsize', sourceFileListing=null, 
 sourcePaths=[webhdfs://insecure-cluster:port/tmp], 
 targetPath=hdfs://secure-cluster:8020/tmp/tmptgt, targetPathExists=true}
 14/07/30 20:06:19 INFO client.RMProxy: Connecting to ResourceManager at 
 secure-clister:8032
 14/07/30 20:06:20 WARN ssl.FileBasedKeyStoresFactory: The property 
 'ssl.client.truststore.location' has not been set, no TrustStore will be 
 loaded
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 WARN security.UserGroupInformation: 
 PriviledgedActionException as:hadoopu...@xyz.com (auth:KERBEROS) 
 cause:java.io.IOException: Failed to get the token for hadoopuser, 
 user=hadoopuser
 14/07/30 20:06:20 ERROR tools.DistCp: Exception encountered 
 java.io.IOException: Failed to get the token for hadoopuser, user=hadoopuser
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:365)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:84)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:618)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:584)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:438)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:466)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:462)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1132)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:218)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:403)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:424)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:640)
   at

[jira] [Created] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...

Yongjun Zhang created HDFS-7026:
---

 Summary: Introduce a string constant for Failed to obtain user 
group info...
 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Priority: Trivial


There are multiple places that refer to hard-coded string {{Failed to obtain 
user group information:}}, which serves as a contract between different 
places. Filing this jira to replace the hardcoded string with a constant to 
make it easier to maintain.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...


 [ 
https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang reassigned HDFS-7026:
---

Assignee: Yongjun Zhang

 Introduce a string constant for Failed to obtain user group info...
 -

 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Trivial

 There are multiple places that refer to hard-coded string {{Failed to obtain 
 user group information:}}, which serves as a contract between different 
 places. Filing this jira to replace the hardcoded string with a constant to 
 make it easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...


 [ 
https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7026:

Issue Type: Improvement  (was: Bug)

 Introduce a string constant for Failed to obtain user group info...
 -

 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Trivial

 There are multiple places that refer to hard-coded string {{Failed to obtain 
 user group information:}}, which serves as a contract between different 
 places. Filing this jira to replace the hardcoded string with a constant to 
 make it easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...


 [ 
https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7026:

Attachment: HDFS-7206.001.patch

Uploade patch rev 001. Thanks for review.


 Introduce a string constant for Failed to obtain user group info...
 -

 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Trivial
 Attachments: HDFS-7206.001.patch


 There are multiple places that refer to hard-coded string {{Failed to obtain 
 user group information:}}, which serves as a contract between different 
 places. Filing this jira to replace the hardcoded string with a constant to 
 make it easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7007) Interfaces to plugin ConsensusNode.

2014-09-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124953#comment-14124953
 ] 

Steve Loughran commented on HDFS-7007:
--

The current NN code isn't suitable for subclassing, and the fact that 
BackupNode does exactly that is a bit dangerous.

Specifically, the NN ctor calls {{initialize()}} which appears designed to be 
overrridden ... but subclasses won't be fully constructed until this happens.

The DNs are worse -they start threads in their ctors, which are one of the big 
forbidden actions of Java.

I'd propose making the NN and DN YARN services first, so we have a nice 
consistent override model. As with the RM, we can make them subclasses of 
CompositeService, so making it easy to add children. 

This does not have to be done in the consensus node branch ... it can be done 
in trunk.

 Interfaces to plugin ConsensusNode.
 ---

 Key: HDFS-7007
 URL: https://issues.apache.org/jira/browse/HDFS-7007
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko

 This is to introduce interfaces in NameNode and namesystem, which are needed 
 to plugin ConsensusNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...


 [ 
https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-7026:

Affects Version/s: 2.6.0
   Status: Patch Available  (was: Open)

 Introduce a string constant for Failed to obtain user group info...
 -

 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Trivial
 Attachments: HDFS-7206.001.patch


 There are multiple places that refer to hard-coded string {{Failed to obtain 
 user group information:}}, which serves as a contract between different 
 places. Filing this jira to replace the hardcoded string with a constant to 
 make it easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash


[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124958#comment-14124958
 ] 

James Thomas commented on HDFS-6981:


+1, looks good to me. Getting a 404 when I try to look at the Findbugs warnings 
-- any idea what's causing those?

 DN upgrade with layout version change should not use trash
 --

 Key: HDFS-6981
 URL: https://issues.apache.org/jira/browse/HDFS-6981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: Arpit Agarwal
 Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
 HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, 
 HDFS-6981.06.patch, HDFS-6981.07.patch


 Post HDFS-6800, we can encounter the following scenario:
 # We start with DN software version -55 and initiate a rolling upgrade to 
 version -56
 # We delete some blocks, and they are moved to trash
 # We roll back to DN software version -55 using the -rollback flag – since we 
 are running the old code (prior to this patch), we will restore the previous 
 directory but will not delete the trash
 # We append to some of the blocks that were deleted in step 2
 # We then restart a DN that contains blocks that were appended to – since the 
 trash still exists, it will be restored at this point, the appended-to blocks 
 will be overwritten, and we will lose the appended data
 So I think we need to avoid writing anything to the trash directory if we 
 have a previous directory.
 Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included


 [ 
https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6777:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed as part of HDFS-6634.

 Supporting consistent edit log reads when in-progress edit log segments are 
 included
 

 Key: HDFS-6777
 URL: https://issues.apache.org/jira/browse/HDFS-6777
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: qjm
Reporter: James Thomas
Assignee: James Thomas
 Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch


 For inotify, we want to be able to read transactions from in-progress edit 
 log segments so we can serve transactions to listeners soon after they are 
 committed. This JIRA works toward ensuring that we do not send unsync'ed 
 transactions back to the client by 1) discarding in-progress segments if we 
 have a finalized segment starting at the same transaction ID and 2) if there 
 are no finalized segments at the same transaction ID, using only the 
 in-progress segments with the largest seen lastWriterEpoch. See the design 
 document for more background and details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included


 [ 
https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6777:
---
Fix Version/s: 2.6.0

 Supporting consistent edit log reads when in-progress edit log segments are 
 included
 

 Key: HDFS-6777
 URL: https://issues.apache.org/jira/browse/HDFS-6777
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: qjm
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch


 For inotify, we want to be able to read transactions from in-progress edit 
 log segments so we can serve transactions to listeners soon after they are 
 committed. This JIRA works toward ensuring that we do not send unsync'ed 
 transactions back to the client by 1) discarding in-progress segments if we 
 have a finalized segment starting at the same transaction ID and 2) if there 
 are no finalized segments at the same transaction ID, using only the 
 in-progress segments with the largest seen lastWriterEpoch. See the design 
 document for more background and details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6777) Supporting consistent edit log reads when in-progress edit log segments are included


 [ 
https://issues.apache.org/jira/browse/HDFS-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Thomas updated HDFS-6777:
---
Description: For inotify, we want to be able to read transactions from 
in-progress edit log segments so we can serve transactions to listeners soon 
after they are committed. This JIRA works toward ensuring that we do not send 
unsync'ed transactions back to the client by discarding in-progress segments if 
we have a finalized segment starting at the same transaction ID. See the design 
document for more background and details.  (was: For inotify, we want to be 
able to read transactions from in-progress edit log segments so we can serve 
transactions to listeners soon after they are committed. This JIRA works toward 
ensuring that we do not send unsync'ed transactions back to the client by 1) 
discarding in-progress segments if we have a finalized segment starting at the 
same transaction ID and 2) if there are no finalized segments at the same 
transaction ID, using only the in-progress segments with the largest seen 
lastWriterEpoch. See the design document for more background and details.)

 Supporting consistent edit log reads when in-progress edit log segments are 
 included
 

 Key: HDFS-6777
 URL: https://issues.apache.org/jira/browse/HDFS-6777
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: qjm
Reporter: James Thomas
Assignee: James Thomas
 Fix For: 2.6.0

 Attachments: 6777-design.2.pdf, 6777-design.pdf, HDFS-6777.patch


 For inotify, we want to be able to read transactions from in-progress edit 
 log segments so we can serve transactions to listeners soon after they are 
 committed. This JIRA works toward ensuring that we do not send unsync'ed 
 transactions back to the client by discarding in-progress segments if we have 
 a finalized segment starting at the same transaction ID. See the design 
 document for more background and details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124963#comment-14124963
 ] 

Hadoop QA commented on HDFS-6621:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667080/HDFS-6621.patch_3
  against trunk revision d1fa582.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7934//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7934//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7934//console

This message is automatically generated.

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124980#comment-14124980
 ] 

Yongjun Zhang commented on HDFS-6621:
-

Hi [~ravwojdyla], 

Thanks a lot for the info and new rev. I meant to change both occurrences of 
{{Dispather.this}} in the quoted code but seems you only changed one. The 
unchanged one is actually the key, because it's where the block is synchronized 
upon. Would you please make that change? 

Since you mentioned that both problems are real, I think it's worth pursuing 
both. It would be great if you could still reproduce and test the fix in real 
clusters. I hope this is still feasible, would you please comment?  Thanks.




 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124981#comment-14124981
 ] 

Yongjun Zhang commented on HDFS-6621:
-

BTW, thanks [~andrew.wang] for the review and comments, which helped me to look 
further.


 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124982#comment-14124982
 ] 

Rafal Wojdyla commented on HDFS-6621:
-

[~yzhangal] you're correct :D Sorry.
Reproducing error on real cluster - that's still feasible, reproducing this in 
unit tests is kinda hard, I will try to come back with proof based on logs - is 
that fine?

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6621) Hadoop Balancer prematurely exits iterations


 [ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafal Wojdyla updated HDFS-6621:

Attachment: HDFS-6621.patch_4

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, 
 HDFS-6621.patch_4


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124995#comment-14124995
 ] 

Hadoop QA commented on HDFS-6506:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667086/HDFS-6506.v3.patch
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7940//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7940//console

This message is automatically generated.

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, 
 HDFS-6506.v3.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates:

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14124997#comment-14124997
 ] 

Yongjun Zhang commented on HDFS-6621:
-

Hi Rafal, 

Thanks for the quick response, and that's great to hear! What about we do this:

1. Have a setup to see the problem with the fix at all, to demonstrate the 
symptom.
2. Try the fix of problem 1 only, to see if there is still problem, and we 
should try to demonstrate the remaining problem
3. Try the fix of both problems (rev 4), to see if all problems are gone

Thanks again!



 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, 
 HDFS-6621.patch_4


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125007#comment-14125007
 ] 

Yongjun Zhang commented on HDFS-6621:
-

Sorry, one typo in item 1 of last comment: with meant without.


 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, 
 HDFS-6621.patch_4


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);
 return;
   }
 } else {
   // source node cannot find a pendingBlockToMove, iteration +1
   noPendingBlockIteration++;
   // in case no blocks can be moved for source node's task,
   // jump out of while-loop after 5 iterations.
   if (noPendingBlockIteration = MAX_NO_PENDING_BLOCK_ITERATIONS) {
 setScheduledSize(0);
   }
 }
 // check if time is up or not
 if (Time.now()-startTime  MAX_ITERATION_TIME) {
   isTimeUp = true;
   continue;
 }
 /* Now we can not schedule any block to move and there are
  * no new blocks added to the source block list, so we wait.
  */
 try {
   synchronized(Balancer.this) {
 Balancer.this.wait(1000);  // wait for targets/sources to be idle
   }
 } catch (InterruptedException ignored) {
 }
   }
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space

Tsz Wo Nicholas Sze created HDFS-7027:
-

 Summary: Archival Storage: Mover does not terminate when some 
storage type is out of space
 Key: HDFS-7027
 URL: https://issues.apache.org/jira/browse/HDFS-7027
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


Suppose DISK is run out of space and there are some block replicas needed to be 
moved to DISK.  In this case, it is impossible to move any replica to DISK.  
Then, Mover may not terminate since it keeps trying to schedule moving the 
replicas to DISK in each iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage

2014-09-07 Thread David Luo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Luo updated HDFS-6893:

Attachment: HDFS-6893.patch

HDFS-6893.patch
Moving crypto command to after classpath

 crypto subcommand is not sorted properly in hdfs's hadoop_usage
 ---

 Key: HDFS-6893
 URL: https://issues.apache.org/jira/browse/HDFS-6893
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Reporter: Allen Wittenauer
Priority: Trivial
  Labels: newbie
 Attachments: HDFS-6893.patch


 crypto subcommand should be after classpath/before datanode, not after zkfc, 
 in the hdfs usage output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage

2014-09-07 Thread David Luo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Luo updated HDFS-6893:

Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 crypto subcommand is not sorted properly in hdfs's hadoop_usage
 ---

 Key: HDFS-6893
 URL: https://issues.apache.org/jira/browse/HDFS-6893
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Priority: Trivial
  Labels: newbie
 Attachments: HDFS-6893.patch


 crypto subcommand should be after classpath/before datanode, not after zkfc, 
 in the hdfs usage output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125021#comment-14125021
]

Hadoop QA commented on HDFS-6584:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12667084/h6584_20140907.patch
against trunk revision d1fa582.

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 23 new
or modified test files.

{color:red}-1 javac{color}. The applied patch generated 1268 javac
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}. There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.

{color:green}+1 findbugs{color}. The patch does not introduce any new
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes
org.apache.hadoop.hdfs.server.namenode.TestINodeFile
org.apache.hadoop.hdfs.server.namenode.TestNameNodeXAttr

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes
org.apache.hadoop.hdfs.server.namenode.TestFsck
org.apache.hadoop.hdfs.TestEncryptionZones
org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl
org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA
org.apache.hadoop.hdfs.server.mover.TestStorageMover

org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS

org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
org.apache.hadoop.fs.TestSymlinkHdfsFileContext
org.apache.hadoop.hdfs.TestDistributedFileSystem
org.apache.hadoop.hdfs.server.balancer.TestBalancer
org.apache.hadoop.fs.TestSymlinkHdfsFileSystem
org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer
org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer
org.apache.hadoop.hdfs.TestListFilesInFileContext
org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream
org.apache.hadoop.hdfs.server.namenode.TestFSImage
org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl

org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/7939//testReport/
Javac warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/7939//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7939//console

This message is automatically generated.

Support Archival Storage

[jira] [Commented] (HDFS-7026) Introduce a string constant for Failed to obtain user group info...


[ 
https://issues.apache.org/jira/browse/HDFS-7026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125029#comment-14125029
 ] 

Hadoop QA commented on HDFS-7026:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667090/HDFS-7206.001.patch
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress
  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestReplication

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7941//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7941//console

This message is automatically generated.

 Introduce a string constant for Failed to obtain user group info...
 -

 Key: HDFS-7026
 URL: https://issues.apache.org/jira/browse/HDFS-7026
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
Priority: Trivial
 Attachments: HDFS-7206.001.patch


 There are multiple places that refer to hard-coded string {{Failed to obtain 
 user group information:}}, which serves as a contract between different 
 places. Filing this jira to replace the hardcoded string with a constant to 
 make it easier to maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6538) Element comment format error in org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry

2014-09-07 Thread David Luo (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Luo updated HDFS-6538:

Attachment: HDFS-6538.patch

HDFS-6538
Changed comment to javadoc

 Element comment format error in 
 org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry
 ---

 Key: HDFS-6538
 URL: https://issues.apache.org/jira/browse/HDFS-6538
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: debugging
Priority: Trivial
  Labels: documentation
 Attachments: HDFS-6538.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The element comment for javadoc should be started by {noformat}/**{noformat}, 
 but it starts with only {noformat}/*{noformat} for class ShortCircuitRegistry.
 So I think there is a {noformat}*{noformat} Omitted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space


 [ 
https://issues.apache.org/jira/browse/HDFS-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7027:
--
Attachment: h7027_20140908.patch

h7027_20140908.patch: when there is no move scheduled, returns false in 
processFile(..).

 Archival Storage: Mover does not terminate when some storage type is out of 
 space
 -

 Key: HDFS-7027
 URL: https://issues.apache.org/jira/browse/HDFS-7027
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7027_20140908.patch


 Suppose DISK is run out of space and there are some block replicas needed to 
 be moved to DISK.  In this case, it is impossible to move any replica to 
 DISK.  Then, Mover may not terminate since it keeps trying to schedule moving 
 the replicas to DISK in each iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6621) Hadoop Balancer prematurely exits iterations


[ 
https://issues.apache.org/jira/browse/HDFS-6621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125054#comment-14125054
 ] 

Hadoop QA commented on HDFS-6621:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667093/HDFS-6621.patch_4
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7942//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7942//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7942//console

This message is automatically generated.

 Hadoop Balancer prematurely exits iterations
 

 Key: HDFS-6621
 URL: https://issues.apache.org/jira/browse/HDFS-6621
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.2.0, 2.4.0
 Environment: Red Hat Enterprise Linux Server release 5.8 with Hadoop 
 2.4.0
Reporter: Benjamin Bowman
  Labels: balancer
 Attachments: HDFS-6621.patch, HDFS-6621.patch_2, HDFS-6621.patch_3, 
 HDFS-6621.patch_4


 I have been having an issue with the balancing being too slow.  The issue was 
 not with the speed with which blocks were moved, but rather the balancer 
 would prematurely exit out of it's balancing iterations.  It would move ~10 
 blocks or 100 MB then exit the current iteration (in which it said it was 
 planning on moving about 10 GB). 
 I looked in the Balancer.java code and believe I found and solved the issue.  
 In the dispatchBlocks() function there is a variable, 
 noPendingBlockIteration, which counts the number of iterations in which a 
 pending block to move cannot be found.  Once this number gets to 5, the 
 balancer exits the overall balancing iteration.  I believe the desired 
 functionality is 5 consecutive no pending block iterations - however this 
 variable is never reset to 0 upon block moves.  So once this number reaches 5 
 - even if there have been thousands of blocks moved in between these no 
 pending block iterations  - the overall balancing iteration will prematurely 
 end.  
 The fix I applied was to set noPendingBlockIteration = 0 when a pending block 
 is found and scheduled.  In this way, my iterations do not prematurely exit 
 unless there is 5 consecutive no pending block iterations.   Below is a copy 
 of my dispatchBlocks() function with the change I made.
 {code}
 private void dispatchBlocks() {
   long startTime = Time.now();
   long scheduledSize = getScheduledSize();
   this.blocksToReceive = 2*scheduledSize;
   boolean isTimeUp = false;
   int noPendingBlockIteration = 0;
   while(!isTimeUp  getScheduledSize()0 
   (!srcBlockList.isEmpty() || blocksToReceive0)) {
 PendingBlockMove pendingBlock = chooseNextBlockToMove();
 if (pendingBlock != null) {
   noPendingBlockIteration = 0;
   // move the block
   pendingBlock.scheduleBlockMove();
   continue;
 }
 /* Since we can not schedule any block to move,
  * filter any moved blocks from the source block list and
  * check if we should fetch more blocks from the namenode
  */
 filterMovedBlocks(); // filter already moved blocks
 if (shouldFetchMoreBlocks()) {
   // fetch new blocks
   try {
 blocksToReceive -= getBlockList();
 continue;
   } catch (IOException e) {
 LOG.warn(Exception while getting block list, e);

[jira] [Commented] (HDFS-6799) The invalidate method in SimulatedFSDataset.java failed to remove (invalidate) blocks from the file system.

2014-09-07 Thread Megasthenis Asteris (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125090#comment-14125090
]

Megasthenis Asteris commented on HDFS-6799:
---

_unfinalizeBlock_ indeed needs to be fixed, but I am not sure what the expected
behavior should be.
The way it is structured now, it seems that it would suffice to delete the
block from the SimulatedFSDataset's map of blocks. However, note that this is
not exactly the opposite of _finilizeBlock_ as one might expect.

Also, I realized that _TestSimulatedFSDataset_ also has bugs:
_checkInvalidBlock(ExtendedBlock b)_ creates a new simulated dataset every time
it is called. Clearly, b will not be in the new dataset and _checkInvalidBlock_
practically always assumes that b is indeed an invalid block. Should I submit
this as a separate bug, or fix it here?

The invalidate method in SimulatedFSDataset.java failed to remove
(invalidate) blocks from the file system.
---

Key: HDFS-6799
URL: https://issues.apache.org/jira/browse/HDFS-6799
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, test
Affects Versions: 2.4.1
Reporter: Megasthenis Asteris
Assignee: Megasthenis Asteris
Priority: Minor
Attachments: HDFS-6799.patch

The invalidate(String bpid, Block[] invalidBlks) method in
SimulatedFSDataset.java should remove all invalidBlks from the simulated file
system. It currently fails to do that.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-6988) Make RAM disk eviction thresholds configurable

2014-09-07 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal resolved HDFS-6988.
-
   Resolution: Duplicate
Fix Version/s: HDFS-6581
 Assignee: Arpit Agarwal

The fix for HDFS-6991 adds config keys for eviction parameters.

 Make RAM disk eviction thresholds configurable
 --

 Key: HDFS-6988
 URL: https://issues.apache.org/jira/browse/HDFS-6988
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: HDFS-6581


 Per feedback from [~cmccabe] on HDFS-6930, we can make the eviction 
 thresholds configurable. The hard-coded thresholds may not be appropriate for 
 very large RAM disks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HDFS-6991) Notify NN of evicted block before deleting it from RAM disk

2014-09-07 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6991 started by Arpit Agarwal.
---
 Notify NN of evicted block before deleting it from RAM disk
 ---

 Key: HDFS-6991
 URL: https://issues.apache.org/jira/browse/HDFS-6991
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Affects Versions: HDFS-6581
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-6991.01.patch, HDFS-6991.02.patch, 
 HDFS-6991.03.patch


 When evicting a block from RAM disk to persistent storage, the DN should 
 notify the NN of the persistent replica before deleting the replica from RAM 
 disk. Else there can be a window of time during which the block is considered 
 'missing' by the NN.
 Found by [~xyao] via HDFS-6950.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6893) crypto subcommand is not sorted properly in hdfs's hadoop_usage


[ 
https://issues.apache.org/jira/browse/HDFS-6893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125105#comment-14125105
 ] 

Hadoop QA commented on HDFS-6893:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667098/HDFS-6893.patch
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestEncryptionZones
  org.apache.hadoop.hdfs.server.datanode.TestBPOfferService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7943//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7943//console

This message is automatically generated.

 crypto subcommand is not sorted properly in hdfs's hadoop_usage
 ---

 Key: HDFS-6893
 URL: https://issues.apache.org/jira/browse/HDFS-6893
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Priority: Trivial
  Labels: newbie
 Attachments: HDFS-6893.patch


 crypto subcommand should be after classpath/before datanode, not after zkfc, 
 in the hdfs usage output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash


[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125108#comment-14125108
 ] 

Hadoop QA commented on HDFS-6981:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12667061/HDFS-6981.07.patch
  against trunk revision a23144f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsFileSystemContract
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7944//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7944//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7944//console

This message is automatically generated.

 DN upgrade with layout version change should not use trash
 --

 Key: HDFS-6981
 URL: https://issues.apache.org/jira/browse/HDFS-6981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: Arpit Agarwal
 Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
 HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, 
 HDFS-6981.06.patch, HDFS-6981.07.patch


 Post HDFS-6800, we can encounter the following scenario:
 # We start with DN software version -55 and initiate a rolling upgrade to 
 version -56
 # We delete some blocks, and they are moved to trash
 # We roll back to DN software version -55 using the -rollback flag – since we 
 are running the old code (prior to this patch), we will restore the previous 
 directory but will not delete the trash
 # We append to some of the blocks that were deleted in step 2
 # We then restart a DN that contains blocks that were appended to – since the 
 trash still exists, it will be restored at this point, the appended-to blocks 
 will be overwritten, and we will lose the appended data
 So I think we need to avoid writing anything to the trash directory if we 
 have a previous directory.
 Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-07 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125126#comment-14125126
]

Suresh Srinivas commented on HDFS-6940:
---

[~atm] had specifically asked not to commit this to trunk. Why is this
committed to trunk and branch-2 without any discussion? I agree with him that
we should not be making methods public or protected due to the burden of
maintaining this contract. You mentioned two backward incompatible changes (I
would like to know what they are). But there are numerous others that are never
detected because it is taken care of by the committers in the project. Lets not
lose sight of that. We have also had difficulty removing other dead code due to
vetoes such as BackupNode. So I want to be careful before committing code
without a decision on the content that this refactoring is being done for
should even be in HDFS. Without addressing the comments from [~atm] who has
been participating in this discussion, this patch should not have been
committed to trunk.

[~cos], please be respectful. One thing that I had held out making comment on
is, your committership was based on the work done in fault injection related
work done in Hadoop. I believe you have not contributed enough to the other
parts of the system that this patch is touching. One of the honor rule is, a
committer refrains from voting +1 on a patch related to the areas that he has
not contributed to. But I have seen in many of the jiras this is not followed
by you including this one.

I am -1 on this patch going into trunk and branch-2. Lets do this in the
feature branch. This is not a big enough refactor that makes merges difficult.
I think we should revert this change.q

I also would like to hear other committers to comment on this issue and give
their thoughts.

Initial refactoring to allow ConsensusNode implementation
-

Key: HDFS-6940
URL: https://issues.apache.org/jira/browse/HDFS-6940
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode
Affects Versions: 2.0.6-alpha, 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
Fix For: 2.6.0

Attachments: HDFS-6940.patch

Minor refactoring of FSNamesystem to open private methods that are needed for
CNode implementation.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7027) Archival Storage: Mover does not terminate when some storage type is out of space


 [ 
https://issues.apache.org/jira/browse/HDFS-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-7027:
--
Attachment: h7027_20140908b.patch

h7027_20140908b.patch: increase the capacities set in the tests for the changes 
by HDFS-6898.

 Archival Storage: Mover does not terminate when some storage type is out of 
 space
 -

 Key: HDFS-7027
 URL: https://issues.apache.org/jira/browse/HDFS-7027
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h7027_20140908.patch, h7027_20140908b.patch


 Suppose DISK is run out of space and there are some block replicas needed to 
 be moved to DISK.  In this case, it is impossible to move any replica to 
 DISK.  Then, Mover may not terminate since it keeps trying to schedule moving 
 the replicas to DISK in each iteration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6875) Archival Storage: support migration for a list of specified paths


 [ 
https://issues.apache.org/jira/browse/HDFS-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6875:
--
 Component/s: balancer
Hadoop Flags: Reviewed

The patch does not apply anymore.  Need to fix the imports.

+1 patch looks good other than that.

 Archival Storage: support migration for a list of specified paths
 -

 Key: HDFS-6875
 URL: https://issues.apache.org/jira/browse/HDFS-6875
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-6875.000.patch


 Currently the migration tool processes the whole namespace. It will be 
 helpful if we can allow users to migrate data only for a list of specified 
 paths.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7028) Archival Storage: FSDirectory should not get storage policy id from symlinks

Tsz Wo Nicholas Sze created HDFS-7028:
-

 Summary: Archival Storage: FSDirectory should not get storage 
policy id from symlinks
 Key: HDFS-7028
 URL: https://issues.apache.org/jira/browse/HDFS-7028
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor


{noformat}
java.lang.UnsupportedOperationException: Storage policy are not supported on 
symlinks
at 
org.apache.hadoop.hdfs.server.namenode.INodeSymlink.getStoragePolicyID(INodeSymlink.java:151)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFileInfo(FSDirectory.java:1506)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3992)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getLinkTarget(NameNodeRpcServer.java:1028)
at 
org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testValidSymlinkTarget(TestINodeFile.java:683)
at 
org.apache.hadoop.hdfs.server.namenode.TestINodeFile.testInodeIdBasedPaths(TestINodeFile.java:622)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6584) Support Archival Storage

[
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tsz Wo Nicholas Sze updated HDFS-6584:
--
Attachment: h6584_20140908.patch

h6584_20140908.patch: with HDFS-7028.

Support Archival Storage

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-09-07 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125197#comment-14125197
]

Colin Patrick McCabe commented on HDFS-6994:

bq. Dynamically loading libjvm is a good idea, but it seems not solve all the
problems you mentioned in HADOOP-10388. To make fall back feature work, users
have to deploy the HDFS jars on every machine. This adds operational
complexity for non-Java clients that just want to integrate with HDFS.
Otherwise, fall back feature will not work.

Sorry if I wasn't clear earlier. I don't think users of libhdfs3 (or ndfs,
etc) should be *required* to deploy the jar files. I just said that
*optionally*, if the jar files are deployed, fallback should be possible. It's
great that libhdfs3 can function without jar files, and we should preserve this
capability!

bq. And fall back feature will finally be removed when the native client
implement the full HDFS client feature.

In practice, I think fallback to JNI will continue to be useful for a long,
long time. Think about clients that want to interface with s3, Ceph, Azure
FileSystem, or even LocalFileSystem. Currently the native code doesn't support
those, and it's unlikely to get that support in the near future. So it's
useful to have a library that can speak both JNI and the native HDFS protocol,
depending on which is available. Users just want one library that they can use
that will just work for multiple different configurations.

Anyway, we can discuss the fallback code later. It might be feasible to keep
the fallback code entirely outside of libhdfs3 in some separate shim library.
I think we can merge libhdfs3 first and figure that out later.

bq. Would you please review the code and give some comments? Thanks in advance.

Thanks Zhanwei, I will take a look as soon as I get in tomorrow.

bq. Naming is hard _

I don't have a problem with naming it libhdfs3, but users might wonder what
libhdfs2 was :)

How about libndfs++ as a name?

libhdfs3 - A native C/C++ HDFS client
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6981) DN upgrade with layout version change should not use trash

2014-09-07 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125200#comment-14125200
 ] 

Arpit Agarwal commented on HDFS-6981:
-

bq. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 2.0.3) 
warnings.
I think Jenkins is hitting a bug. findbugs passed for me locally and the link 
to the warnings is broken.

{color:green}+1 overall{color}.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version ) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 DN upgrade with layout version change should not use trash
 --

 Key: HDFS-6981
 URL: https://issues.apache.org/jira/browse/HDFS-6981
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.0.0
Reporter: James Thomas
Assignee: Arpit Agarwal
 Attachments: HDFS-6981.01.patch, HDFS-6981.02.patch, 
 HDFS-6981.03.patch, HDFS-6981.04.patch, HDFS-6981.05.patch, 
 HDFS-6981.06.patch, HDFS-6981.07.patch


 Post HDFS-6800, we can encounter the following scenario:
 # We start with DN software version -55 and initiate a rolling upgrade to 
 version -56
 # We delete some blocks, and they are moved to trash
 # We roll back to DN software version -55 using the -rollback flag – since we 
 are running the old code (prior to this patch), we will restore the previous 
 directory but will not delete the trash
 # We append to some of the blocks that were deleted in step 2
 # We then restart a DN that contains blocks that were appended to – since the 
 trash still exists, it will be restored at this point, the appended-to blocks 
 will be overwritten, and we will lose the appended data
 So I think we need to avoid writing anything to the trash directory if we 
 have a previous directory.
 Thanks to [~james.thomas] for reporting this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6705) Create an XAttr that disallows the HDFS admin from accessing a file

2014-09-07 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125209#comment-14125209
]

Andrew Wang commented on HDFS-6705:
---

Hi Charles, thanks for working on this, a few review comments:

* Why did the exception messages change in TestDistributedFileSystem and
SymlinkBaseTest? This is mildly incompatible, so I'd like to understand why
it's necessary.
* We're still doing another path resolution to do checkUnreadableBySuperuser.
Can we try to reuse the inode from the IIP just below? This would also let us
avoid throwing IOException in the check method.
* Consider folding FSPermissionChecker#checkUnreadableBySuperuser into the FSN
method, it's pretty simple.
* FSN#checkXAttrChangeAccess has unrelated change?
* Indentation of FSN#checkUnreadableBySuperuser is off

Doc:
* Extra whitespace change
* Text is still kinda verbose and still mentions preventing read access to
other xattrs and writing to the file. I'd prefer something like:

{noformat}
The security namespace is reserved for internal HDFS use. This
namespace is generally not accessible through userspace methods. One particular
use of security is the security.hdfs.unreadable.by.superuser
extended attribute. This xattr can only be set on files, and it will prevent
the superuser from reading the file's contents. The superuser can still read
and modify file metadata, such as the owner, permissions, etc. This xattr can
be set and accessed by any user, assuming normal filesystem permissions. This
xattr is also write-once, and cannot be removed once set. This xattr does not
allow a value to be set.
{noformat}

* Unrelated changes in TestXAttrCLI, TestSymlinkHdfsFileSystem

FSXattrBaseTest
* High-level comment, I'd like to pare down the new tests to focus on this new
functionality
* I still see references to MAX_XATTR_SIZE which should be unrelated here. It
also involves an extra mini cluster stop and start.
* I'd like to avoid doing extra minicluster start/stops to test persistence
too. It'd be better to add some security xattrs to the existing restart tests
instead.
* The vanilla xattrs test, it doesn't have a matching call to {{fail}}. I
don't think this needs to be tested anyway, since UBS doesn't affect xattr
operations.
* verifyFileAccess also still has testing for append and create, which isn't
valid anymore.
* I see a hardcoded security.hdfs.unreadable.by.superuser still, sub in the
string constant instead?
* Is RemoteException is being thrown by DistributedFileSystem for the new
AccessControlException? I see it being unwrapped in DFSClient, so I would
expect to see an ACE here.

Tests:
* Mention of special xattr is non-specific, could we say unreadable by
superuser or UBS or something instead?

Create an XAttr that disallows the HDFS admin from accessing a file
---

Key: HDFS-6705
URL: https://issues.apache.org/jira/browse/HDFS-6705
Project: Hadoop HDFS
Issue Type: Sub-task
Components: namenode, security
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Attachments: HDFS-6705.001.patch, HDFS-6705.002.patch,
HDFS-6705.003.patch

There needs to be an xattr that specifies that the HDFS admin can not access
a file. This is needed for m/r delegation tokens and data at rest encryption.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6940) Initial refactoring to allow ConsensusNode implementation

2014-09-07 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125224#comment-14125224
 ] 

Todd Lipcon commented on HDFS-6940:
---

I'll only comment on the technical issue at hand here:

I strongly agree that implementation inheritance/subclassing is not a 
maintainable extension mechanism for the NameNode. The issue is that, while 
composition through interfaces and peer class relationships can be well defined 
and documented and typically does not expose implementation details, making 
previously private methods public is doing exactly that. When we later want to 
reorganize the (implementation-specific) code of the NameNode, the existence of 
subclasses makes this very difficult.

This is not an abstract argument. I experienced this pain first hand several 
years ago when working on HDFS-1073, and then again working on HDFS-1623. When 
methods and members are protected, then doing these kind of refactors becomes 
quite arduous -- the implementations of the base (NameNode) and the plugin 
(BackupNode in that case) are very tightly coupled, and tight coupling makes 
changes difficult.

Can we lay out which specific plug points you need to make ConsensusNode work 
and define interfaces for them instead of using overriding/subclasses?

 Initial refactoring to allow ConsensusNode implementation
 -

 Key: HDFS-6940
 URL: https://issues.apache.org/jira/browse/HDFS-6940
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.0.6-alpha, 2.5.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.6.0

 Attachments: HDFS-6940.patch


 Minor refactoring of FSNamesystem to open private methods that are needed for 
 CNode implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6843) Create FileStatus isEncrypted() method

2014-09-07 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125228#comment-14125228
 ] 

Andrew Wang commented on HDFS-6843:
---

Hi Charles, thanks for sticking with this. Hopefully we're finally closing in 
on the right solution.

* FileStatus#isEncrypted, the javadoc is incorrect in that directories can be 
encrypted too.
* When accessed from within /.reserved/raw, I think things should still show up 
as encrypted. It's a little inconsistent right now, since files wouldn't show 
up as isEncrypted, while dirs would. This would be a good thing to have in a 
unit test.
* Would like a similar test as to what's in FSAclBaseTest that makes sure we 
can't set the isEncrypted bit
* GNU {{ls}} uses {{*}} to indicate that a file is executable. I'd prefer not 
to overload this meaning in our webui.
* Can you comment on manual testing done for the webUI? I think we used to have 
unit testing for the webui, but that might have gone away with the JS rewrite.

 Create FileStatus isEncrypted() method
 --

 Key: HDFS-6843
 URL: https://issues.apache.org/jira/browse/HDFS-6843
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: 3.0.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6843.001.patch, HDFS-6843.002.patch, 
 HDFS-6843.003.patch, HDFS-6843.004.patch, HDFS-6843.005.patch, 
 HDFS-6843.005.patch


 FileStatus should have a 'boolean isEncrypted()' method. (it was in the 
 context of discussing with AndreW about FileStatus being a Writable).
 Having this method would allow MR JobSubmitter do the following:
 -
 BOOLEAN intermediateEncryption = false
 IF jobconf.contains(mr.intermidate.encryption) THEN
   intermediateEncryption = jobConf.getBoolean(mr.intermidate.encryption)
 ELSE
   IF (I/O)Format INSTANCEOF File(I/O)Format THEN
 intermediateEncryption = ANY File(I/O)Format HAS a Path with status 
 isEncrypted()==TRUE
   FI
   jobConf.setBoolean(mr.intermidate.encryption, intermediateEncryption)
 FI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6951) Saving namespace and restarting NameNode will remove existing encryption zones

2014-09-07 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125230#comment-14125230
 ] 

Andrew Wang commented on HDFS-6951:
---

Blech, it looks like test-patch doesn't like binary diffs. It used to just 
ignore the binary part of the patch, which is the behavior when I use 
{{patch}}. Maybe now that we're on git, it's time to revisit HADOOP-10926.

I'll take a look at this, but let's go back to non-binary diff to get test 
runs. Sorry for the bad instructions on my part.

 Saving namespace and restarting NameNode will remove existing encryption zones
 --

 Key: HDFS-6951
 URL: https://issues.apache.org/jira/browse/HDFS-6951
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: encryption
Affects Versions: 3.0.0
Reporter: Stephen Chu
Assignee: Charles Lamb
 Attachments: HDFS-6951-prelim.002.patch, HDFS-6951-testrepo.patch, 
 HDFS-6951.001.patch, HDFS-6951.002.patch, HDFS-6951.003.patch, 
 HDFS-6951.004.patch, HDFS-6951.005.patch, HDFS-6951.006.patch, editsStored


 Currently, when users save namespace and restart the NameNode, pre-existing 
 encryption zones will be wiped out.
 I could reproduce this on a pseudo-distributed cluster:
 * Create an encryption zone
 * List encryption zones and verify the newly created zone is present
 * Save the namespace
 * Kill and restart the NameNode
 * List the encryption zones and you'll find the encryption zone is missing
 I've attached a test case for {{TestEncryptionZones}} that reproduces this as 
 well. Removing the saveNamespace call will get the test to pass.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6584) Support Archival Storage