date:20130625


[ 
https://issues.apache.org/jira/browse/HDFS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692883#comment-13692883
 ] 

Fengdong Yu commented on HDFS-4931:
---

I don't think this is good. if a data only placed a few data nodes, then it's 
likely more map tasks run on the same node

 Extend the block placement policy interface to utilize the location 
 information of previously stored files  
 

 Key: HDFS-4931
 URL: https://issues.apache.org/jira/browse/HDFS-4931
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jihoon Son
 Attachments: HDFS-4931.patch


 Nowadays, I'm implementing a locality preserving block placement policy which 
 stores files in a directory in the same datanode. That is to say, given a 
 root directory, files under the root directory are grouped by paths of their 
 parent directories. After that, files of a group are stored in the same 
 datanode. 
 When a new file is stored at HDFS, the block placement policy choose the 
 target datanode considering locations of previously stored files. 
 In the current block placement policy interface, there are some problems. The 
 first problem is that there is no interface to keep the previously stored 
 files when HDFS is restarted. To restore the location information of all 
 files, this process should be done during the safe mode of the namenode.
 To solve the first problem, I modified the block placement policy interface 
 and FSNamesystem. Before leaving the safe mode, every necessary location 
 information is sent to the block placement policy. 
 However, there are too much changes of access modifiers from private to 
 public in my implementation. This may violate the design of the interface. 
 The second problem is occurred when some blocks are moved by the balancer or 
 node failures. In this case, the block placement policy should recognize the 
 current status, and return a new datanode to move blocks. However, the 
 current interface does not support it. 
 The attached patch is to solve the first problem, but as mentioned above, it 
 may violate the design of the interface. 
 Do you have any good ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly


[ 
https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692884#comment-13692884
 ] 

Fengdong Yu commented on HDFS-1172:
---

bq. This litters the task logs with the NotReplicatedYetException

This does look like client require a new block before the previous block 
pipeline is not finished.

 Blocks in newly completed files are considered under-replicated too quickly
 ---

 Key: HDFS-1172
 URL: https://issues.apache.org/jira/browse/HDFS-1172
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.21.0
Reporter: Todd Lipcon
 Fix For: 0.24.0

 Attachments: HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, 
 replicateBlocksFUC1.patch, replicateBlocksFUC1.patch, replicateBlocksFUC.patch


 I've seen this for a long time, and imagine it's a known issue, but couldn't 
 find an existing JIRA. It often happens that we see the NN schedule 
 replication on the last block of files very quickly after they're completed, 
 before the other DNs in the pipeline have a chance to report the new block. 
 This results in a lot of extra replication work on the cluster, as we 
 replicate the block and then end up with multiple excess replicas which are 
 very quickly deleted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4931) Extend the block placement policy interface to utilize the location information of previously stored files

2013-06-25 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692897#comment-13692897
]

Steve Loughran commented on HDFS-4931:
--

I can see the benefits of this in some applications -though MR jobs aren't
necessarily it, as scattering the blocks gives you better bandwidth. by keeping
them all one node, the max bandwidth is the #of HDDs on that node, minus all
other work going on on those disks. If scattered, the bandwidth is the #of
blocks of the file, minus other work going on against the same blocks. To make
things worse -any other code that is trying to access another file on the same
machine is going to fight for exactly the same set of hard disks.

# the failure mode of the cluster will change. You should look at that
carefully.
# you aren't going to handle a full disk very well, as at that point your
constraints don't get satisfied.
# rebalance and recovery time will increase, as now all the rebalanced blocks
are being directed to a single server, limited by both the HDD and net
bandwidth of that device, rather than the aggregate bandwidth of the cluster.
Assuming all three copies of a file's blocks are stored only on 3 machines, you
get hurt at both ends. As that time to recover increases, exposure to multiple
HDD/node failures increases too.

I think it may be an interesting experiment, but you need to start looking at
the impact of failures, and the performance problems. Overall, though, I'm not
convinced it scales well, either to large files or large clusters -the latter
offering the IO and network bandwidth this policy would fail to exploit, and
the highest failure rates. Normally that failure rate is a background noise,
but with this placement policy, it may be more visible.

What may be more useful is revisiting Facebook's work on sub-cluster placement
policy, where all blocks of a file are stored in the same set of racks in a
larger cluster. You get more chance of rack locality for multiple blocks, and
when a rack fails, while some files suffer more, a lot of files suffer less
-and recovery bandwidth is restricted to a fraction of the net, which, on a
multi-layered network, may protect the backbone.

Because its experimental and has scale issues, I don't see a rush to commit
patches to support it unless its backed up by the theory and the data
justifying this tactic.

Extend the block placement policy interface to utilize the location
information of previously stored files

Key: HDFS-4931
URL: https://issues.apache.org/jira/browse/HDFS-4931
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Jihoon Son
Attachments: HDFS-4931.patch

Nowadays, I'm implementing a locality preserving block placement policy which
stores files in a directory in the same datanode. That is to say, given a
root directory, files under the root directory are grouped by paths of their
parent directories. After that, files of a group are stored in the same
datanode.
When a new file is stored at HDFS, the block placement policy choose the
target datanode considering locations of previously stored files.
In the current block placement policy interface, there are some problems. The
first problem is that there is no interface to keep the previously stored
files when HDFS is restarted. To restore the location information of all
files, this process should be done during the safe mode of the namenode.
To solve the first problem, I modified the block placement policy interface
and FSNamesystem. Before leaving the safe mode, every necessary location
information is sent to the block placement policy.
However, there are too much changes of access modifiers from private to
public in my implementation. This may violate the design of the interface.
The second problem is occurred when some blocks are moved by the balancer or
node failures. In this case, the block placement policy should recognize the
current status, and return a new datanode to move blocks. However, the
current interface does not support it.
The attached patch is to solve the first problem, but as mentioned above, it
may violate the design of the interface.
Do you have any good ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes

Fengdong Yu created HDFS-4932:
-

 Summary: Avoid a long line on the name node webUI if we have more 
Journal nodes
 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


If we have more Journal nodes, It shows a long line on the name node webUI, 
this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: HDFS-4932.patch

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: (was: HDFS-4932.patch)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Description: If we have more Journal nodes, It shows a long line on the 
name node webUI, this patch wrapped line. just show four journal nodes on each 
line.  (was: If we have more Journal nodes, It shows a long line on the name 
node webUI, this patch wrapped line. just show fourjournal nodes on each line.)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show four journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4927) CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.

2013-06-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692916#comment-13692916
 ] 

Hudson commented on HDFS-4927:
--

Integrated in Hadoop-Yarn-trunk #251 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/251/])
HDFS-4927. CreateEditsLog creates inodes with an invalid inode ID, which 
then cannot be loaded by a namenode. Contributed by Chris Nauroth. (Revision 
1496350)

 Result = FAILURE
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1496350
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCreateEditsLog.java


 CreateEditsLog creates inodes with an invalid inode ID, which then cannot be 
 loaded by a namenode.
 --

 Key: HDFS-4927
 URL: https://issues.apache.org/jira/browse/HDFS-4927
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-4927.1.patch


 {{CreateEditsLog#addFiles}} always creates inodes with ID hard-coded to 
 {{INodeId#GRANDFATHER_INODE_ID}}.  At initialization time, namenode will not 
 load the resulting edits, because this is an invalid inode ID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Description: If we have more Journal nodes, It shows a long line on the 
name node webUI, this patch wrapped line. just show fourjournal nodes on each 
line.  (was: If we have more Journal nodes, It shows a long line on the name 
node webUI, this patch wrapped line. just show three journal nodes on each 
line.)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show fourjournal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Description: If we have more Journal nodes, It shows a long line on the 
name node webUI, this patch wrapped line. just show three journal nodes on each 
line.  (was: If we have more Journal nodes, It shows a long line on the name 
node webUI, this patch wrapped line. just show four journal nodes on each line.)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: HDFS-4932.patch

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Status: Patch Available  (was: Open)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Description: If we have more Journal nodes, It shows a long line on the 
name node webUI, this patch wrapped line. just show three journal nodes on each 
line. I don't change CSS because I don't want to affect other related web 
styles.  (was: If we have more Journal nodes, It shows a long line on the name 
node webUI, this patch wrapped line. just show three journal nodes on each 
line.)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


[ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692980#comment-13692980
 ] 

Hadoop QA commented on HDFS-4932:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589585/HDFS-4932.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4563//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4563//console

This message is automatically generated.

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4934) add symlink support to WebHDFS server side

2013-06-25 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created HDFS-4934:


 Summary: add symlink support to WebHDFS server side
 Key: HDFS-4934
 URL: https://issues.apache.org/jira/browse/HDFS-4934
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.2.0
 Environment: followup on HADOOP-8040
Reporter: Alejandro Abdelnur


follow up on HADOOP-8040

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4935) add symlink support to HttpFS server side

2013-06-25 Thread Alejandro Abdelnur (JIRA)

Alejandro Abdelnur created HDFS-4935:


 Summary: add symlink support to HttpFS server side
 Key: HDFS-4935
 URL: https://issues.apache.org/jira/browse/HDFS-4935
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: followup on HADOOP-8040
Reporter: Alejandro Abdelnur


follow up on HADOOP-8040

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4927) CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.

2013-06-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693006#comment-13693006
 ] 

Hudson commented on HDFS-4927:
--

Integrated in Hadoop-Hdfs-trunk #1441 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1441/])
HDFS-4927. CreateEditsLog creates inodes with an invalid inode ID, which 
then cannot be loaded by a namenode. Contributed by Chris Nauroth. (Revision 
1496350)

 Result = FAILURE
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1496350
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCreateEditsLog.java


 CreateEditsLog creates inodes with an invalid inode ID, which then cannot be 
 loaded by a namenode.
 --

 Key: HDFS-4927
 URL: https://issues.apache.org/jira/browse/HDFS-4927
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-4927.1.patch


 {{CreateEditsLog#addFiles}} always creates inodes with ID hard-coded to 
 {{INodeId#GRANDFATHER_INODE_ID}}.  At initialization time, namenode will not 
 load the resulting edits, because this is an invalid inode ID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)

Harsh J created HDFS-4936:
-

 Summary: Handle overflow condition for txid going over 
Long.MAX_VALUE
 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 - 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4927) CreateEditsLog creates inodes with an invalid inode ID, which then cannot be loaded by a namenode.

2013-06-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693036#comment-13693036
 ] 

Hudson commented on HDFS-4927:
--

Integrated in Hadoop-Mapreduce-trunk #1468 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1468/])
HDFS-4927. CreateEditsLog creates inodes with an invalid inode ID, which 
then cannot be loaded by a namenode. Contributed by Chris Nauroth. (Revision 
1496350)

 Result = FAILURE
cnauroth : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1496350
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/CreateEditsLog.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCreateEditsLog.java


 CreateEditsLog creates inodes with an invalid inode ID, which then cannot be 
 loaded by a namenode.
 --

 Key: HDFS-4927
 URL: https://issues.apache.org/jira/browse/HDFS-4927
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Attachments: HDFS-4927.1.patch


 {{CreateEditsLog#addFiles}} always creates inodes with ID hard-coded to 
 {{INodeId#GRANDFATHER_INODE_ID}}.  At initialization time, namenode will not 
 load the resulting edits, because this is an invalid inode ID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4931) Extend the block placement policy interface to utilize the location information of previously stored files

2013-06-25 Thread Jihoon Son (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693047#comment-13693047
 ] 

Jihoon Son commented on HDFS-4931:
--

Thanks for your comments.

I'll think more about this idea.

 Extend the block placement policy interface to utilize the location 
 information of previously stored files  
 

 Key: HDFS-4931
 URL: https://issues.apache.org/jira/browse/HDFS-4931
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jihoon Son
 Attachments: HDFS-4931.patch


 Nowadays, I'm implementing a locality preserving block placement policy which 
 stores files in a directory in the same datanode. That is to say, given a 
 root directory, files under the root directory are grouped by paths of their 
 parent directories. After that, files of a group are stored in the same 
 datanode. 
 When a new file is stored at HDFS, the block placement policy choose the 
 target datanode considering locations of previously stored files. 
 In the current block placement policy interface, there are some problems. The 
 first problem is that there is no interface to keep the previously stored 
 files when HDFS is restarted. To restore the location information of all 
 files, this process should be done during the safe mode of the namenode.
 To solve the first problem, I modified the block placement policy interface 
 and FSNamesystem. Before leaving the safe mode, every necessary location 
 information is sent to the block placement policy. 
 However, there are too much changes of access modifiers from private to 
 public in my implementation. This may violate the design of the interface. 
 The second problem is occurred when some blocks are moved by the balancer or 
 node failures. In this case, the block placement policy should recognize the 
 current status, and return a new datanode to move blocks. However, the 
 current interface does not support it. 
 The attached patch is to solve the first problem, but as mentioned above, it 
 may violate the design of the interface. 
 Do you have any good ideas?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693112#comment-13693112
 ] 

Harsh J commented on HDFS-4936:
---

Expected this response. Resolving.

(From [~tlipcon] over hdfs-dev@)
{code}
I did some back of the envelope math when implementing txids, and
determined that overflow is not ever going to happen... A busy namenode
does 1000 write transactions/second (2^10). MAX_LONG is 2^63. So, we can
run for 2^63 seconds. A year is about 2^25 seconds. So, at 1k tps, you can
run your namenode for 2^(63-10-25) = 268 million years.

Hadoop is great software and I'm sure it will be around for years to come,
but if it's still running in 268 million years, that will be a pretty
depressing rate of technological progress!

-Todd
{code}

 Handle overflow condition for txid going over Long.MAX_VALUE
 

 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

 Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
 mailing lists).
 I hacked up my local NN's txids manually to go very large (close to max) and 
 decided to try out if this causes any harm. I basically bumped up the freshly 
 formatted files' starting txid to 9223372036854775805 (and ensured image 
 references the same by hex-editing it):
 {code}
 ➜  current  ls
 VERSION
 fsimage_9223372036854775805.md5
 fsimage_9223372036854775805
 seen_txid
 ➜  current  cat seen_txid
 9223372036854775805
 {code}
 NameNode started up as expected.
 {code}
 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
 seconds.
 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
 9223372036854775805 from 
 /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
 9223372036854775806
 {code}
 I could create a bunch of files and do regular ops (counting to much after 
 the long max increments). I created over 10 files, just to make it go well 
 over the Long.MAX_VALUE.
 Quitting NameNode and restarting fails though, with the following error:
 {code}
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
 segments in 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
  - 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
 java.io.IOException: Gap in transactions. Expected to be able to read up 
 until at least txid 9223372036854775806 but unable to find any edit logs 
 containing txid -9223372036854775808
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
 {code}
 Looks like we also lose some edits when we restart, as noted by the finalized 
 edits filename:
 {code}
 VERSION
 edits_9223372036854775806-9223372036854775807
 fsimage_9223372036854775805
 fsimage_9223372036854775805.md5
 seen_txid
 {code}
 It seems like we won't be able to handle the case where txid overflows. Its a 
 very very large number so that's not an immediate concern but seemed worthy 
 of a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA

[jira] [Resolved] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE

2013-06-25 Thread Harsh J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HDFS-4936.
---

Resolution: Not A Problem

 Handle overflow condition for txid going over Long.MAX_VALUE
 

 Key: HDFS-4936
 URL: https://issues.apache.org/jira/browse/HDFS-4936
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor

 Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
 mailing lists).
 I hacked up my local NN's txids manually to go very large (close to max) and 
 decided to try out if this causes any harm. I basically bumped up the freshly 
 formatted files' starting txid to 9223372036854775805 (and ensured image 
 references the same by hex-editing it):
 {code}
 ➜  current  ls
 VERSION
 fsimage_9223372036854775805.md5
 fsimage_9223372036854775805
 seen_txid
 ➜  current  cat seen_txid
 9223372036854775805
 {code}
 NameNode started up as expected.
 {code}
 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
 seconds.
 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
 9223372036854775805 from 
 /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
 9223372036854775806
 {code}
 I could create a bunch of files and do regular ops (counting to much after 
 the long max increments). I created over 10 files, just to make it go well 
 over the Long.MAX_VALUE.
 Quitting NameNode and restarting fails though, with the following error:
 {code}
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
 segments in 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
  - 
 /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
 java.io.IOException: Gap in transactions. Expected to be able to read up 
 until at least txid 9223372036854775806 but unable to find any edit logs 
 containing txid -9223372036854775808
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
 {code}
 Looks like we also lose some edits when we restart, as noted by the finalized 
 edits filename:
 {code}
 VERSION
 edits_9223372036854775806-9223372036854775807
 fsimage_9223372036854775805
 fsimage_9223372036854775805.md5
 seen_txid
 {code}
 It seems like we won't be able to handle the case where txid overflows. Its a 
 very very large number so that's not an immediate concern but seemed worthy 
 of a report.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4888) Refactor and fix FSNamesystem.getTurnOffTip to sanity

[
https://issues.apache.org/jira/browse/HDFS-4888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693124#comment-13693124
]

Kihwal Lee commented on HDFS-4888:
--

Ravi, here are my review comments:

* {{//Automatic safemode}} : the resource-low case is also automatic. Perhaps
you meant start-up safe mode?
* Removal of the {{datanodeThreshold 0}} condition: by setting this threshold
to 0, the check is disabled. So we still want this check in the original place
and also added around the message, needs additional x live datanodes.
* When printing out the extension period, {{reached + extension - now()}} can
become negative if repl queue init takes more than the extension (30 seconds by
default). Use of Linux THP makes this a lot faster, but it can still exceed 30
seconds for large name spaces. If that happens, the time-to-exit increases
every time the message is printed. Rather than Math.abs(), it should say
something like soon.

Refactor and fix FSNamesystem.getTurnOffTip to sanity
-

Key: HDFS-4888
URL: https://issues.apache.org/jira/browse/HDFS-4888
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.0, 2.0.4-alpha, 0.23.9
Reporter: Ravi Prakash
Assignee: Ravi Prakash
Attachments: HDFS-4888.patch, HDFS-4888.patch

e.g. When resources are low, the command to leave safe mode is not printed.
This method is unnecessarily complex

[jira] [Updated] (HDFS-4762) Provide HDFS based NFSv3 and Mountd implementation

2013-06-25 Thread Brandon Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-4762:
-

Attachment: HDFS-4762.patch.3

load the patch again to run findbugs.

 Provide HDFS based NFSv3 and Mountd implementation
 --

 Key: HDFS-4762
 URL: https://issues.apache.org/jira/browse/HDFS-4762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-4762.patch, HDFS-4762.patch.2, HDFS-4762.patch.3, 
 HDFS-4762.patch.3


 This is to track the implementation of NFSv3 to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4762) Provide HDFS based NFSv3 and Mountd implementation


[ 
https://issues.apache.org/jira/browse/HDFS-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693308#comment-13693308
 ] 

Hadoop QA commented on HDFS-4762:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589647/HDFS-4762.patch.3
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to cause Findbugs 
(version 1.3.9) to fail.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4564//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4564//console

This message is automatically generated.

 Provide HDFS based NFSv3 and Mountd implementation
 --

 Key: HDFS-4762
 URL: https://issues.apache.org/jira/browse/HDFS-4762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-4762.patch, HDFS-4762.patch.2, HDFS-4762.patch.3, 
 HDFS-4762.patch.3


 This is to track the implementation of NFSv3 to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4762) Provide HDFS based NFSv3 and Mountd implementation

2013-06-25 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693315#comment-13693315
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4762:
--

 load the patch again to run findbugs.

Brandon, there are errors when running findbugs on hadoop-hdfs-nfs; see 
https://builds.apache.org/job/PreCommit-HDFS-Build/4564/artifact/trunk/patchprocess/patchFindBugsOutputhadoop-hdfs-nfs.txt


[ERROR] Could not find resource 
'/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/dev-support/findbugsExcludeFile.xml'.
 - [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/ResourceNotFoundException

 Provide HDFS based NFSv3 and Mountd implementation
 --

 Key: HDFS-4762
 URL: https://issues.apache.org/jira/browse/HDFS-4762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-4762.patch, HDFS-4762.patch.2, HDFS-4762.patch.3, 
 HDFS-4762.patch.3


 This is to track the implementation of NFSv3 to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4762) Provide HDFS based NFSv3 and Mountd implementation

2013-06-25 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693362#comment-13693362
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-4762:
--

Some more comments:
- In READDIR3Response and READDIRPLUS3Response, use System.arraycopy(..) for 
coping entries.
{code}
+  for (int i = 0; i  entries.length; i++) {
+this.entries[i] = entries[i];
+  }
{code}

- Use the following for Nfs3Utils.bytesToLong(..) and do not use ByteBuffer.
{code}
  public static long bytesToLong(byte[] data) {
long n = 0xffL  data[0];
for(int i = 1; i  8; i++) {
  n = (n  8) | (0xffL  data[i]);
}
return n;
  }
{code}

- In OffsetRange.hasOverlap(..), it seems that the case min == rangeMax and the 
case max == rangeMin are not handled correctly.  Indeed, hasOverlap(..) can be 
considered as !(noOverlap) and noOverlap must be min  rangeMax || max  
rangeMin, i.e.
{code}
  boolean noOverlap(OffsetRange range) {
return min  range.max || max  range.min;
  }

  boolean hasOverlap(OffsetRange range) {
return !noOverlap(range);
  }
{code}

- OffsetRange may not implement Comparable correctly.  The return value of 
compareTo has to follow the rules in [compareTo 
javadoc|http://docs.oracle.com/javase/6/docs/api/java/lang/Comparable.html#compareTo%28T%29].
  We cannot define it as 0: identical -1: on the left 1: on the right 2: 
overlapped.  In particular, if x.compareTo( y) return 2 (i.e. when x and y 
overlapped), then y.compare( x) also return 2.  Such x and y do not follow 
compareTo rules.


 Provide HDFS based NFSv3 and Mountd implementation
 --

 Key: HDFS-4762
 URL: https://issues.apache.org/jira/browse/HDFS-4762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-4762.patch, HDFS-4762.patch.2, HDFS-4762.patch.3, 
 HDFS-4762.patch.3


 This is to track the implementation of NFSv3 to HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes

2013-06-25 Thread Chris Nauroth (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693393#comment-13693393
 ] 

Chris Nauroth commented on HDFS-4932:
-

Hi, Fengdong.  The patch looks good.  A couple of minor comments:

# Can you please add a comment stating that this code inserts a line break 
every 3 entries to prevent very wide lines?
# Minor style issue for each of the following lines: can you please add a space 
before opening braces?  Also, can you please add a space after {{for}}?
{code}
if (null != manager){
{code}
{code}
  for(int i = 0; i  managers.length; ++i){
{code}
{code}
if (i  managers.length - 1){
{code}
{code}
if ((i+1) % 3 == 0){
{code}

{quote}
-1 tests included. The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
{quote}

I think this is OK if manual testing was done.  You'd have to do a lot of 
awkward deep mocking to be able to call this method from a test, so it's 
probably not worth it.

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()

Kihwal Lee created HDFS-4937:


 Summary: ReplicationMonitor can infinite-loop in 
BlockPlacementPolicyDefault#chooseRandom()
 Key: HDFS-4937
 URL: https://issues.apache.org/jira/browse/HDFS-4937
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.8, 2.0.4-alpha
Reporter: Kihwal Lee


When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the old cluster size is used in the terminal condition check of 
the loop. This usually happens when a block with a high replication factor is 
being processed. Since replicas/rack is calculated beforehand, no node choice 
may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the previously recorded cluster size, so it will loop infinitely. This has 
been seen in a production environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()


 [ 
https://issues.apache.org/jira/browse/HDFS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-4937:
-

Description: 
When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the cached cluster size is used in the terminal condition check 
of the loop. This usually happens when a block with a high replication factor 
is being processed. Since replicas/rack is also calculated beforehand, no node 
choice may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the cached cluster size, so it will loop infinitely. This has been seen in 
a production environment.

  was:
When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the old cluster size is used in the terminal condition check of 
the loop. This usually happens when a block with a high replication factor is 
being processed. Since replicas/rack is calculated beforehand, no node choice 
may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the previously recorded cluster size, so it will loop infinitely. This has 
been seen in a production environment.


 ReplicationMonitor can infinite-loop in 
 BlockPlacementPolicyDefault#chooseRandom()
 --

 Key: HDFS-4937
 URL: https://issues.apache.org/jira/browse/HDFS-4937
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Kihwal Lee

 When a large number of nodes are removed by refreshing node lists, the 
 network topology is updated. If the refresh happens at the right moment, the 
 replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
 This is because the cached cluster size is used in the terminal condition 
 check of the loop. This usually happens when a block with a high replication 
 factor is being processed. Since replicas/rack is also calculated beforehand, 
 no node choice may satisfy the goodness criteria if refreshing removed racks. 
 All nodes will end up in the excluded list, but the size will still be less 
 than the cached cluster size, so it will loop infinitely. This has been seen 
 in a production environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()


 [ 
https://issues.apache.org/jira/browse/HDFS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-4937:
-

Description: 
When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the cached cluster size is used in the terminal condition check 
of the loop. This usually happens when a block with a high replication factor 
is being processed. Since replicas/rack is also calculated beforehand, no node 
choice may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the cached cluster size, so it will loop infinitely. This was observed in 
a production environment.

  was:
When a large number of nodes are removed by refreshing node lists, the network 
topology is updated. If the refresh happens at the right moment, the 
replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
This is because the cached cluster size is used in the terminal condition check 
of the loop. This usually happens when a block with a high replication factor 
is being processed. Since replicas/rack is also calculated beforehand, no node 
choice may satisfy the goodness criteria if refreshing removed racks. 

All nodes will end up in the excluded list, but the size will still be less 
than the cached cluster size, so it will loop infinitely. This has been seen in 
a production environment.


 ReplicationMonitor can infinite-loop in 
 BlockPlacementPolicyDefault#chooseRandom()
 --

 Key: HDFS-4937
 URL: https://issues.apache.org/jira/browse/HDFS-4937
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Kihwal Lee

 When a large number of nodes are removed by refreshing node lists, the 
 network topology is updated. If the refresh happens at the right moment, the 
 replication monitor thread may stuck in the while loop of {{chooseRandom()}}. 
 This is because the cached cluster size is used in the terminal condition 
 check of the loop. This usually happens when a block with a high replication 
 factor is being processed. Since replicas/rack is also calculated beforehand, 
 no node choice may satisfy the goodness criteria if refreshing removed racks. 
 All nodes will end up in the excluded list, but the size will still be less 
 than the cached cluster size, so it will loop infinitely. This was observed 
 in a production environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4937) ReplicationMonitor can infinite-loop in BlockPlacementPolicyDefault#chooseRandom()

[
https://issues.apache.org/jira/browse/HDFS-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693424#comment-13693424
]

Kihwal Lee commented on HDFS-4937:
--

This can mostly be avoided by decommissioning nodes in a smaller batch, which
is the recommended practice. But for this particular case, the operator added
a large number of new nodes and decommissioned old nodes.

ReplicationMonitor can infinite-loop in
BlockPlacementPolicyDefault#chooseRandom()
--

Key: HDFS-4937
URL: https://issues.apache.org/jira/browse/HDFS-4937
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.0.4-alpha, 0.23.8
Reporter: Kihwal Lee

When a large number of nodes are removed by refreshing node lists, the
network topology is updated. If the refresh happens at the right moment, the
replication monitor thread may stuck in the while loop of {{chooseRandom()}}.
This is because the cached cluster size is used in the terminal condition
check of the loop. This usually happens when a block with a high replication
factor is being processed. Since replicas/rack is also calculated beforehand,
no node choice may satisfy the goodness criteria if refreshing removed racks.
All nodes will end up in the excluded list, but the size will still be less
than the cached cluster size, so it will loop infinitely. This was observed
in a production environment.

[jira] [Created] (HDFS-4938) Reduce redundant information in edit logs and image files

2013-06-25 Thread Arpit Agarwal (JIRA)

Arpit Agarwal created HDFS-4938:
---

 Summary: Reduce redundant information in edit logs and image files
 Key: HDFS-4938
 URL: https://issues.apache.org/jira/browse/HDFS-4938
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


Generation stamps are logged as edits and in image files on checkpoint. This is 
potentially redundant as the generation stamp is also logged with block 
creation/append. Jira is to investigate and remove any redundant fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: HDFS-4932.PNG

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch, HDFS-4932.PNG


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: HDFS-4932-002.patch

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch, HDFS-4932.PNG


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


[ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693546#comment-13693546
 ] 

Fengdong Yu commented on HDFS-4932:
---

Thanks Chris, the new patch uploaded. 

I compiled the change and restart our testing cluster.the manual testing result 
was also uploaded.

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch, HDFS-4932.PNG


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE


 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4936:
--

Description: 
Hat tip to [~azury...@gmail.com] for the question that lead to this (on mailing 
lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 - 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

  was:
Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on 
mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized

[jira] [Updated] (HDFS-4936) Handle overflow condition for txid going over Long.MAX_VALUE


 [ 
https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4936:
--

Description: 
Hat tip to [~azuryy] for the question that lead to this (on mailing lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current
13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806
 - 
/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807
13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join
java.io.IOException: Gap in transactions. Expected to be able to read up until 
at least txid 9223372036854775806 but unable to find any edit logs containing 
txid -9223372036854775808
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:609)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:590)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205)
{code}

Looks like we also lose some edits when we restart, as noted by the finalized 
edits filename:

{code}
VERSION
edits_9223372036854775806-9223372036854775807
fsimage_9223372036854775805
fsimage_9223372036854775805.md5
seen_txid
{code}

It seems like we won't be able to handle the case where txid overflows. Its a 
very very large number so that's not an immediate concern but seemed worthy of 
a report.

  was:
Hat tip to [~azury...@gmail.com] for the question that lead to this (on mailing 
lists).

I hacked up my local NN's txids manually to go very large (close to max) and 
decided to try out if this causes any harm. I basically bumped up the freshly 
formatted files' starting txid to 9223372036854775805 (and ensured image 
references the same by hex-editing it):

{code}
➜  current  ls
VERSION
fsimage_9223372036854775805.md5
fsimage_9223372036854775805
seen_txid
➜  current  cat seen_txid
9223372036854775805
{code}

NameNode started up as expected.

{code}
13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 
seconds.
13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid 
9223372036854775805 from 
/temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805
13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at 
9223372036854775806
{code}

I could create a bunch of files and do regular ops (counting to much after the 
long max increments). I created over 10 files, just to make it go well over the 
Long.MAX_VALUE.

Quitting NameNode and restarting fails though, with the following error:

{code}
13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized 
segments in

[jira] [Commented] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


[ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13693548#comment-13693548
 ] 

Hadoop QA commented on HDFS-4932:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12589691/HDFS-4932.PNG
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4565//console

This message is automatically generated.

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch, HDFS-4932.PNG


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: (was: HDFS-4932.PNG)

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes


 [ 
https://issues.apache.org/jira/browse/HDFS-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fengdong Yu updated HDFS-4932:
--

Attachment: scree-short.PNG

 Avoid a long line on the name node webUI if we have more Journal nodes
 --

 Key: HDFS-4932
 URL: https://issues.apache.org/jira/browse/HDFS-4932
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Reporter: Fengdong Yu
Assignee: Fengdong Yu
Priority: Minor
 Fix For: 2.1.0-beta

 Attachments: HDFS-4932-002.patch, HDFS-4932.patch, scree-short.PNG


 If we have more Journal nodes, It shows a long line on the name node webUI, 
 this patch wrapped line. just show three journal nodes on each line. I don't 
 change CSS because I don't want to affect other related web styles.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-4932) Avoid a long line on the name node webUI if we have more Journal nodes