[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7834:
---
Component/s: scripts

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HDFS-7834-branch-2-0.patch


 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335423#comment-14335423
 ] 

Allen Wittenauer edited comment on HDFS-7834 at 2/24/15 9:00 PM:
-

In trunk, you can set HADOOP_OPTS to something (blank, for example) and set 
HADOOP_ALLOW_IPV6 to yes.  


was (Author: aw):
In trunk, you can set HADOOP_ALLOW_IPV6. 

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 2.7.0


 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-7834:
---
Fix Version/s: (was: 2.7.0)

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HDFS-7834-branch-2-0.patch


 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-7834:

Attachment: HDFS-7834-branch-2-0.patch

Here's a patch for branch-2.

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HDFS-7834-branch-2-0.patch


 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-02-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335483#comment-14335483
 ] 

Eric Payne commented on HDFS-7818:
--

Now that I look at it, the patch in HDFS-7818.v3.txt is not exactly correct 
either. I think that if we want to keep the NULL check in a constructor, it 
should be done in {{OffsetParam(final Long value)}} instead of 
{{OffsetParam(final String str)}}, since the latter invokes the former.

 DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
 

 Key: HDFS-7818
 URL: https://issues.apache.org/jira/browse/HDFS-7818
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt


 This is a regression in 2.7 and later.
 {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
 {code}
 $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
 ... output ...
 $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
 text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335423#comment-14335423
 ] 

Allen Wittenauer commented on HDFS-7834:


In trunk, you can set HADOOP_ALLOW_IPV6. 

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark

 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-24 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-7411:
--
Attachment: hdfs-7411.011.patch

Sorry for the delay on this everyone, I was on vacation last week.

I've implemented Chris D's suggestion (with unit test) that does a per node 
limit. If the deprecated config key is set, it is used preferentially over the 
default for the new config key.

Nicholas, does this satisfy your criteria?

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode

2015-02-24 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6962:

Target Version/s: 3.0.0  (was: 2.7.0)

Hello, [~usrikanth].  Thank you for posting a prototype patch and providing a 
great written summary.

I'm now certain that it's impossible to make this change in a 
backwards-compatible way in the 2.x line.  The biggest challenge is what 
happens if someone upgrades the client ahead of the NameNode.  In that case, 
neither the client nor the NameNode would apply the umask.  Effectively, that 
means the upgraded client would start creating directories with 777 and files 
with 666, which of course would compromise security.

Another potential issue is that existing users may be accustomed to the 
behavior of the current implementation, despite this deviation from the POSIX 
ACL spec.  The effect of the proposed change would be to widen access, because 
it would stop applying umask in certain cases.  Users might find it surprising 
if their default ACLs stopped restricting access after an upgrade, and some 
would argue that this is a form of incompatibility with existing persistent 
data (metadata).  This is always a fine line, but I do suspect some would see 
it as an incompatibility.

I'm retargeting this to 3.0.0.  That means we'll also have the option of 
creating a much simpler patch, because we'll have freedom to make 
backwards-incompatible changes.

Here are a few notes on the prototype patch, although I suspect it will go in a 
very different direction for 3.0.0 anyway.
# {{CommandWithDestination}}: This change also probably would have constituted 
a backwards incompatibility.  Prior versions create files as 666 filtered by 
{{fs.permissions.umask-mode}}, not based on the permissions from the source 
file system.  I see from your notes that you were aiming to replicate the 
behavior you saw on Linux.  It might be worthwhile for us to consider doing 
that for consistency with other file systems, but it would be 
backwards-incompatible in 2.x.
# {{FSDirectory}}: Here, the NameNode is applying umask based on its configured 
value for {{fs.permissions.umask-mode}}.  Unfortunately, this won't work in the 
general case, because it's not guaranteed that the client and the NameNode are 
running with the same set of configuration files.  They might have different 
values configured for {{fs.permissions.umask-mode}}, or the client might have 
overridden it with a -D option on the command line.

 ACLs inheritance conflict with umaskmode
 

 Key: HDFS-6962
 URL: https://issues.apache.org/jira/browse/HDFS-6962
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.1
 Environment: CentOS release 6.5 (Final)
Reporter: LINTE
Assignee: Srikanth Upputuri
  Labels: hadoop, security
 Attachments: HDFS-6962.1.patch


 In hdfs-site.xml 
 property
 namedfs.umaskmode/name
 value027/value
 /property
 1/ Create a directory as superuser
 bash# hdfs dfs -mkdir  /tmp/ACLS
 2/ set default ACLs on this directory rwx access for group readwrite and user 
 toto
 bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS
 bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS
 3/ check ACLs /tmp/ACLS/
 bash# hdfs dfs -getfacl /tmp/ACLS/
 # file: /tmp/ACLS
 # owner: hdfs
 # group: hadoop
 user::rwx
 group::r-x
 other::---
 default:user::rwx
 default:user:toto:rwx
 default:group::r-x
 default:group:readwrite:rwx
 default:mask::rwx
 default:other::---
 user::rwx | group::r-x | other::--- matches with the umaskmode defined in 
 hdfs-site.xml, everything ok !
 default:group:readwrite:rwx allow readwrite group with rwx access for 
 inhéritance.
 default:user:toto:rwx allow toto user with rwx access for inhéritance.
 default:mask::rwx inhéritance mask is rwx, so no mask
 4/ Create a subdir to test inheritance of ACL
 bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs
 5/ check ACLs /tmp/ACLS/hdfs
 bash# hdfs dfs -getfacl /tmp/ACLS/hdfs
 # file: /tmp/ACLS/hdfs
 # owner: hdfs
 # group: hadoop
 user::rwx
 user:toto:rwx   #effective:r-x
 group::r-x
 group:readwrite:rwx #effective:r-x
 mask::r-x
 other::---
 default:user::rwx
 default:user:toto:rwx
 default:group::r-x
 default:group:readwrite:rwx
 default:mask::rwx
 default:other::---
 Here we can see that the readwrite group has rwx ACL bu only r-x is effective 
 because the mask is r-x (mask::r-x) in spite of default mask for inheritance 
 is set to default:mask::rwx on /tmp/ACLS/
 6/ Modifiy hdfs-site.xml et restart namenode
 property
 namedfs.umaskmode/name
 value010/value
 /property
 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode
 bash# hdfs dfs -mkdir  /tmp/ACLS/hdfs2
 8/ Check ACL on /tmp/ACLS/hdfs2
 bash# hdfs dfs -getfacl 

[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter

2015-02-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335438#comment-14335438
 ] 

Eric Payne commented on HDFS-7818:
--

Thank you for your review, [~wheat9]
bq. Maybe it might make more sense to introduce a new method {{getOffset()}} in 
{{OffsetParam}}.
If a {{getOffset()}} method is created instead of handling the NULL case in the 
constructor as is done in the HDFS-7818.V3.txt patch, won't I also have to 
change all of the {{offset.getValue()}} calls to {{offset.getOffset()}} in the 
{{NamenodeWebHdfsMethods}} class?

The change in the current patch seems less risky because it catches the NULL 
case during construction of the object and has less code change.

 DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
 

 Key: HDFS-7818
 URL: https://issues.apache.org/jira/browse/HDFS-7818
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.7.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt


 This is a regression in 2.7 and later.
 {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not:
 {code}
 $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1
 ... output ...
 $ hadoop fs -text webhdfs://myhost.com/tmp/test.1
 text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): 
 null
   at 
 org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463)
   at 
 org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-778) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames.

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-778:
--
Labels: ipv6  (was: )

 DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
 ips as hostnames.
 ---

 Key: HDFS-778
 URL: https://issues.apache.org/jira/browse/HDFS-778
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hong Tang
  Labels: ipv6

 DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
 ips as hostnames. This seems to be a breach of the 
 FileSystem.getFileBlockLocation() contract:
 {noformat}
   /**
* Return an array containing hostnames, offset and size of 
* portions of the given file.  For a nonexistent 
* file or regions, null will be returned.
*
* This call is most helpful with DFS, where it returns 
* hostnames of machines that contain the given file.
*
* The FileSystem will simply return an elt containing 'localhost'.
*/
   public BlockLocation[] getFileBlockLocations(FileStatus file, 
   long start, long len) throws IOException
 {noformat}
 One (maybe minor) consequence of this issue is: When a job includes such 
 numeric ips in in its splits' locations, JobTracker would not be able to 
 assign the job's map tasks local to the file blocks.
 We should either fix the implementation or change the contract. In the latter 
 case, JobTracker needs to be fixed to maintain both the hostnames and ips of 
 the TaskTrackers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.

2015-02-24 Thread zhihai xu (JIRA)
zhihai xu created HDFS-7835:
---

 Summary: make initial sleeptime in locateFollowingBlock 
configurable for DFSClient.
 Key: HDFS-7835
 URL: https://issues.apache.org/jira/browse/HDFS-7835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Reporter: zhihai xu
Assignee: zhihai xu


Make initial sleeptime in locateFollowingBlock configurable for DFSClient.
Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from 
DFSOutputStream is hard-coded as 400 ms, but retries can be configured by 
dfs.client.block.write.locateFollowingBlock.retries. We should also make the 
initial sleeptime configurable to give user more flexibility to control both 
retry and delay.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.

2015-02-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HDFS-7835:

Status: Patch Available  (was: Open)

 make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 --

 Key: HDFS-7835
 URL: https://issues.apache.org/jira/browse/HDFS-7835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: HDFS-7835.000.patch


 Make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from 
 DFSOutputStream is hard-coded as 400 ms, but retries can be configured by 
 dfs.client.block.write.locateFollowingBlock.retries. We should also make 
 the initial sleeptime configurable to give user more flexibility to control 
 both retry and delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.

2015-02-24 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated HDFS-7835:

Attachment: HDFS-7835.000.patch

 make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 --

 Key: HDFS-7835
 URL: https://issues.apache.org/jira/browse/HDFS-7835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: HDFS-7835.000.patch


 Make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from 
 DFSOutputStream is hard-coded as 400 ms, but retries can be configured by 
 dfs.client.block.write.locateFollowingBlock.retries. We should also make 
 the initial sleeptime configurable to give user more flexibility to control 
 both retry and delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335523#comment-14335523
 ] 

Allen Wittenauer commented on HDFS-7537:


bq. When numUnderMinimalRelicatedBlocks  0 and there is no missing/corrupted 
block, all under minimal replicated blocks have at least one good replica so 
that they can be replicated and there is no data loss. It makes sense to 
consider the file system as healthy.

Exactly this.

I made a prototype to play with.  One of things I did was put the number of 
blocks that didn't meet the replication minimum surrounded by the asterisks 
that the corrupted output did.  This made it absolutely crystal clear why the 
NN wasn't coming out of safemode.

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7280) Use netty 4 in WebImageViewer

2015-02-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335365#comment-14335365
 ] 

Yongjun Zhang commented on HDFS-7280:
-

Hi [~wheat9],

Thanks for your work on this jira. I have some questions, in general, what 
impact it is if there is any on user side when we change from using netty3 to 
netty4? Anything special users need to do? any compatibility issue with tools 
that interface with hadoop?  Thanks.





 Use netty 4 in WebImageViewer
 -

 Key: HDFS-7280
 URL: https://issues.apache.org/jira/browse/HDFS-7280
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.7.0

 Attachments: HDFS-7280.000.patch, HDFS-7280.001.patch, 
 HDFS-7280.002.patch, HDFS-7280.003.patch, HDFS-7280.004.patch


 This jira changes WebImageViewer to use netty 4 instead of netty 3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)
Elliott Clark created HDFS-7834:
---

 Summary: Allow HDFS to bind to ipv6 conditionally
 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Elliott Clark
Assignee: Elliott Clark


Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true

While this was needed a while ago. IPV6 on java works much better now and there 
should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally

2015-02-24 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HDFS-7834:

Affects Version/s: 2.6.0
Fix Version/s: 2.7.0

 Allow HDFS to bind to ipv6 conditionally
 

 Key: HDFS-7834
 URL: https://issues.apache.org/jira/browse/HDFS-7834
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Elliott Clark
Assignee: Elliott Clark
 Fix For: 2.7.0


 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true
 While this was needed a while ago. IPV6 on java works much better now and 
 there should be a way to allow it to bind dual stack if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade

2015-02-24 Thread Kai Sasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334594#comment-14334594
 ] 

Kai Sasaki commented on HDFS-7302:
--

[~szetszwo] I may have some misunderstanding. I found there were some 
dependencies on FSImage or FSNamesystem and so on. Can I remove all 
dependencies? This downgrade option is not also necessary to these internal 
classes?

 namenode -rollingUpgrade downgrade may finalize a rolling upgrade
 -

 Key: HDFS-7302
 URL: https://issues.apache.org/jira/browse/HDFS-7302
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Kai Sasaki
  Labels: document, hdfs
 Attachments: HADOOP-7302.1.patch


 The namenode startup option -rollingUpgrade downgrade is originally 
 designed for downgrading cluster.  However, running namenode -rollingUpgrade 
 downgrade with the new software could result in finalizing the ongoing 
 rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334686#comment-14334686
 ] 

Hadoop QA commented on HDFS-7537:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700346/HDFS-7537.1.patch
  against trunk revision 1dba572.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9653//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9653//console

This message is automatically generated.

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message

2015-02-24 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334703#comment-14334703
 ] 

Takanobu Asanuma commented on HDFS-7439:


Excuse me, how can I rebuild for this patch? I can’t log in Jenkins WebUI.

 Add BlockOpResponseProto's message to DFSClient's exception message
 ---

 Key: HDFS-7439
 URL: https://issues.apache.org/jira/browse/HDFS-7439
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Takanobu Asanuma
Priority: Minor
 Attachments: HDFS-7439.1.patch


 When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging 
 if DFSClient can add BlockOpResponseProto's message to the exception message 
 applications will get. For example, instead of
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp());
 {noformat}
 It could be,
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp()
 + , status message  + status.getMessage());
 {noformat}
 We might want to check out all the references to BlockOpResponseProto in 
 DFSClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334742#comment-14334742
 ] 

GAO Rui commented on HDFS-7537:
---

 Thank you very much for your review and comment.
 1. I think minReplication may get its value from 
DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY in the first place. I’ll try to 
figure this out and add it to the output.
 2. In Allen’s comment, the Mock-up output shows status as HEALTHY when 
numUnderMinimalRelicatedBlocks  0. It’s his careless mistake or maybe he has 
his reason to keep the status as HEALTHY while show the 
numUnderMinimalRelicatedBlocks in the same time?
 3. I haven’t added unit test before, but I’ll try to do that.
 4. Sorry, I’ll fix it and avoid this kind of mistakes in future codes.


 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7056) Snapshot support for truncate

2015-02-24 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334562#comment-14334562
 ] 

Konstantin Shvachko commented on HDFS-7056:
---

The patch is up for review in HDFS-7831.

 Snapshot support for truncate
 -

 Key: HDFS-7056
 URL: https://issues.apache.org/jira/browse/HDFS-7056
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Plamen Jeliazkov
 Fix For: 2.7.0

 Attachments: HDFS-3107-HDFS-7056-combined-13.patch, 
 HDFS-3107-HDFS-7056-combined-15.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, 
 HDFS-7056-13.patch, HDFS-7056-15.patch, HDFS-7056.15_branch2.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, 
 HDFSSnapshotWithTruncateDesign.docx, HDFSSnapshotWithTruncateDesign.docx, 
 editsStored, editsStored.xml


 Implementation of truncate in HDFS-3107 does not allow truncating files which 
 are in a snapshot. It is desirable to be able to truncate and still keep the 
 old file state of the file in the snapshot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().

2015-02-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7831:
--
Assignee: Konstantin Shvachko
  Status: Patch Available  (was: Open)

 Fix the starting index of the loop in 
 FileDiffList.findEarlierSnapshotBlocks().
 ---

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7308) DFSClient write packet size may 64kB

2015-02-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334569#comment-14334569
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7308:
---

Patch looks good to me.

[~stack], I wonder if you could repeat the test you have done for HDFS-7276 
with the patch here to see if the packet size can go over 65536?

 DFSClient write packet size may  64kB
 --

 Key: HDFS-7308
 URL: https://issues.apache.org/jira/browse/HDFS-7308
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Tsz Wo Nicholas Sze
Assignee: Takuya Fukudome
Priority: Minor
 Attachments: HDFS-7308.1.patch


 In DFSOutputStream.computePacketChunkSize(..),
 {code}
   private void computePacketChunkSize(int psize, int csize) {
 final int chunkSize = csize + getChecksumSize();
 chunksPerPacket = Math.max(psize/chunkSize, 1);
 packetSize = chunkSize*chunksPerPacket;
 if (DFSClient.LOG.isDebugEnabled()) {
   ...
 }
   }
 {code}
 We have the following
 || variables || usual values ||
 | psize | dfsClient.getConf().writePacketSize = 64kB |
 | csize | bytesPerChecksum = 512B |
 | getChecksumSize(), i.e. CRC size | 32B |
 | chunkSize = csize + getChecksumSize() | 544B (not a power of two) |
 | psize/chunkSize | 120.47 |
 | chunksPerPacket = max(psize/chunkSize, 1) | 120 |
 | packetSize = chunkSize*chunksPerPacket (not including header) | 65280B |
 | PacketHeader.PKT_MAX_HEADER_LEN | 33B |
 | actual packet size | 65280 + 33 = *65313*  65536 = 64k |
 It is fortunate that the usual packet size = 65313  64k although the 
 calculation above does not guarantee it always happens (e.g. if 
 PKT_MAX_HEADER_LEN=257, then actual packet size=65537  64k.)  We should fix 
 the computation in order to guarantee actual packet size  64k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().

2015-02-24 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-7831:
--
Attachment: HDFS-7831-01.patch

Fixed the starting index for the loop. Also we do not need to check that {{i  
diffs.size()}}, because now it always is.
This should be treated as an optimization, so there are no additional test 
cases.

 Fix the starting index of the loop in 
 FileDiffList.findEarlierSnapshotBlocks().
 ---

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'

2015-02-24 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7832:

Attachment: HDFS-7832-001.patch

Attaching the changes
Please review

 Show 'Last Modified' in Namenode's 'Browse Filesystem'
 --

 Key: HDFS-7832
 URL: https://issues.apache.org/jira/browse/HDFS-7832
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7832-001.patch


 new UI no longer shows the last modified time for a path while browsing.
 This could be added to make browse file system more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334808#comment-14334808
 ] 

Hudson commented on HDFS-7807:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/848/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334810#comment-14334810
 ] 

Hudson commented on HDFS-7009:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/848/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334811#comment-14334811
 ] 

Hudson commented on HDFS-7805:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/848/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334785#comment-14334785
 ] 

Hadoop QA commented on HDFS-7831:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700372/HDFS-7831-01.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9656//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9656//console

This message is automatically generated.

 Fix the starting index of the loop in 
 FileDiffList.findEarlierSnapshotBlocks().
 ---

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'

2015-02-24 Thread Vinayakumar B (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-7832:

Status: Patch Available  (was: Open)

 Show 'Last Modified' in Namenode's 'Browse Filesystem'
 --

 Key: HDFS-7832
 URL: https://issues.apache.org/jira/browse/HDFS-7832
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7832-001.patch


 new UI no longer shows the last modified time for a path while browsing.
 This could be added to make browse file system more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7824) GetContentSummary API and its namenode implementaion for Storage Type Quota/Usage

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334769#comment-14334769
 ] 

Hadoop QA commented on HDFS-7824:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700363/HDFS-7824.02.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1156 javac 
compiler warnings (more than the trunk's current 1155 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

  org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat
  org.apache.hadoop.fs.viewfs.TestViewFsDefaultValue
  org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery
  org.apache.hadoop.hdfs.TestEncryptedTransfer
  org.apache.hadoop.hdfs.TestPersistBlocks
  org.apache.hadoop.fs.permission.TestStickyBit
  org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot
  org.apache.hadoop.hdfs.TestPipelines
  org.apache.hadoop.fs.TestHDFSFileContextMainOperations
  org.apache.hadoop.cli.TestHDFSCLI
  org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure

  The following test timeouts occurred in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs-httpfs:

org.apache.hadoop.fs.viewfs.TestViewFsWithAuthorityLocalFs

  The test build failed in 
hadoop-hdfs-project/hadoop-hdfs-httpfs 

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9654//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9654//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9654//console

This message is automatically generated.

 GetContentSummary API and its namenode implementaion for Storage Type 
 Quota/Usage
 -

 Key: HDFS-7824
 URL: https://issues.apache.org/jira/browse/HDFS-7824
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
 Fix For: 2.7.0

 Attachments: HDFS-7824.00.patch, HDFS-7824.01.patch, 
 HDFS-7824.02.patch


 This JIRA is opened to provide API support of GetContentSummary with storage 
 type quota and usage information. It includes namenode implementation, client 
 namenode RPC protocol and Content.Counts refactoring. It is required by 
 HDFS-7701 (CLI to display storage type quota and usage).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334763#comment-14334763
 ] 

Hadoop QA commented on HDFS-7439:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700317/HDFS-7439.1.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9655//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9655//console

This message is automatically generated.

 Add BlockOpResponseProto's message to DFSClient's exception message
 ---

 Key: HDFS-7439
 URL: https://issues.apache.org/jira/browse/HDFS-7439
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Takanobu Asanuma
Priority: Minor
 Attachments: HDFS-7439.1.patch


 When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging 
 if DFSClient can add BlockOpResponseProto's message to the exception message 
 applications will get. For example, instead of
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp());
 {noformat}
 It could be,
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp()
 + , status message  + status.getMessage());
 {noformat}
 We might want to check out all the references to BlockOpResponseProto in 
 DFSClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334792#comment-14334792
 ] 

Hudson commented on HDFS-7009:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'

2015-02-24 Thread Vinayakumar B (JIRA)
Vinayakumar B created HDFS-7832:
---

 Summary: Show 'Last Modified' in Namenode's 'Browse Filesystem'
 Key: HDFS-7832
 URL: https://issues.apache.org/jira/browse/HDFS-7832
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B


new UI no longer shows the last modified time for a path while browsing.
This could be added to make browse file system more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334793#comment-14334793
 ] 

Hudson commented on HDFS-7805:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334790#comment-14334790
 ] 

Hudson commented on HDFS-7807:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334976#comment-14334976
 ] 

Hudson commented on HDFS-7807:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335000#comment-14335000
 ] 

Hudson commented on HDFS-7805:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2015-02-24 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated HDFS-7008:
-
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed this to trunk and branch-2. Thanks [~ted_yu] for your report and 
review.

 xlator should be closed upon exit from DFSAdmin#genericRefresh()
 

 Key: HDFS-7008
 URL: https://issues.apache.org/jira/browse/HDFS-7008
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7008.1.patch, HDFS-7008.2.patch


 {code}
 GenericRefreshProtocol xlator =
   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
 // Refresh
 CollectionRefreshResponse responses = xlator.refresh(identifier, args);
 {code}
 GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
 xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334979#comment-14334979
 ] 

Hudson commented on HDFS-7805:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334978#comment-14334978
 ] 

Hudson commented on HDFS-7009:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334984#comment-14334984
 ] 

Hudson commented on HDFS-7008:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7186 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7186/])
HDFS-7008. xlator should be closed upon exit from DFSAdmin#genericRefresh(). 
(ozawa) (ozawa: rev b53fd7163bc3a4eef4632afb55e5513c7c592fcf)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 xlator should be closed upon exit from DFSAdmin#genericRefresh()
 

 Key: HDFS-7008
 URL: https://issues.apache.org/jira/browse/HDFS-7008
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Fix For: 2.7.0

 Attachments: HDFS-7008.1.patch, HDFS-7008.2.patch


 {code}
 GenericRefreshProtocol xlator =
   new GenericRefreshProtocolClientSideTranslatorPB(proxy);
 // Refresh
 CollectionRefreshResponse responses = xlator.refresh(identifier, args);
 {code}
 GenericRefreshProtocolClientSideTranslatorPB#close() should be called on 
 xlator before return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334999#comment-14334999
 ] 

Hudson commented on HDFS-7009:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334954#comment-14334954
 ] 

Hadoop QA commented on HDFS-7832:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700483/HDFS-7832-001.patch
  against trunk revision b610c68.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9657//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9657//console

This message is automatically generated.

 Show 'Last Modified' in Namenode's 'Browse Filesystem'
 --

 Key: HDFS-7832
 URL: https://issues.apache.org/jira/browse/HDFS-7832
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HDFS-7832-001.patch


 new UI no longer shows the last modified time for a path while browsing.
 This could be added to make browse file system more useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334997#comment-14334997
 ] 

Hudson commented on HDFS-7807:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334904#comment-14334904
 ] 

Hudson commented on HDFS-7807:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334906#comment-14334906
 ] 

Hudson commented on HDFS-7009:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334907#comment-14334907
 ] 

Hudson commented on HDFS-7805:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334926#comment-14334926
 ] 

Hudson commented on HDFS-7807:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/])
HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) 
(cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c
* hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c


 libhdfs htable.c: fix htable resizing, add unit test
 

 Key: HDFS-7807
 URL: https://issues.apache.org/jira/browse/HDFS-7807
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: native
Affects Versions: 2.7.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
 Fix For: 2.7.0

 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch


 libhdfs htable.c: fix htable resizing, add unit test



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334929#comment-14334929
 ] 

Hudson commented on HDFS-7805:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/])
HDFS-7805. NameNode recovery prompt should be printed on console (Surendra 
Singh Lilhore via Colin P. McCabe) (cmccabe: rev 
faaddb6ecb44cdc9ef82a2ab392f64fc2561e938)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java


 NameNode recovery prompt should be printed on console
 -

 Key: HDFS-7805
 URL: https://issues.apache.org/jira/browse/HDFS-7805
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: surendra singh lilhore
Assignee: surendra singh lilhore
 Fix For: 2.7.0

 Attachments: HDFS-7805.patch, HDFS-7805_1.patch


 In my cluster root logger is not console, so when I run namenode recovery 
 tool MetaRecoveryContext.java prompt message is logged in log file.
 Actually is should be display on console.
 Currently it is like this
 {code}
 LOG.info(prompt);
 {code}
 It should be 
 {code}
 System.err.print(prompt);
 {code}
 NameNode recovery prompt should be printed on console



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334928#comment-14334928
 ] 

Hudson commented on HDFS-7009:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/])
HDFS-7009. Active NN and standby NN have different live nodes. Contributed by 
Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Active NN and standby NN have different live nodes
 --

 Key: HDFS-7009
 URL: https://issues.apache.org/jira/browse/HDFS-7009
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.7.0

 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, 
 HDFS-7009.patch


 To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most 
 cases, given DN sends HB and BR to NN regularly, if a specific RPC call 
 fails, it isn't a big deal.
 However, there are cases where DN fails to register with NN during initial 
 handshake due to exceptions not covered by RPC client's connection retry. 
 When this happens, the DN won't talk to that NN until the DN restarts.
 {noformat}
 BPServiceActor
   public void run() {
 LOG.info(this +  starting to offer service);
 try {
   // init stuff
   try {
 // setup storage
 connectToNNAndHandshake();
   } catch (IOException ioe) {
 // Initial handshake, storage recovery or registration failed
 // End BPOfferService thread
 LOG.fatal(Initialization failed for block pool  + this, ioe);
 return;
   }
   initialized = true; // bp is initialized;
   
   while (shouldRun()) {
 try {
   offerService();
 } catch (Exception ex) {
   LOG.error(Exception in BPOfferService for  + this, ex);
   sleepAndLogInterrupts(5000, offering service);
 }
   }
 ...
 {noformat}
 Here is an example of the call stack.
 {noformat}
 java.io.IOException: Failed on local exception: java.io.IOException: Response 
 is null.; Host Details : local host is: xxx; destination host is: 
 yyy:8030;
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761)
 at org.apache.hadoop.ipc.Client.call(Client.java:1239)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
 at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225)
 at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: Response is null.
 at 
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949)
 at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844)
 {noformat}
 This will create discrepancy between active NN and standby NN in terms of 
 live nodes.
  
 Here is a possible scenario of missing blocks after failover.
 1. DN A, B set up handshakes with active NN, but not with standby NN.
 2. A block is replicated to DN A, B and C.
 3. From standby NN's point of view, given A and B are dead nodes, the block 
 is under replicated.
 4. DN C is down.
 5. Before active NN detects DN C is down, it fails over.
 6. The new active NN considers the block is missing. Even though there are 
 two replicas on DN A and B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7817) libhdfs3: fix strerror_r detection

2015-02-24 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335048#comment-14335048
 ] 

Thanh Do commented on HDFS-7817:


Hi [~cmccabe]. Thanks for pointing out the code. I was grepping the 
{{hadoop-hdfs}} folder but not {{hadoop-common}}. 

So this Jira is about using {{sys_errlist}} instead of {{strerror_r}} for 
libhdfs3 right?

 libhdfs3: fix strerror_r detection
 --

 Key: HDFS-7817
 URL: https://issues.apache.org/jira/browse/HDFS-7817
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Colin Patrick McCabe

 The signature of strerror_r is not quite detected correctly in libhdfs3.  The 
 code assumes that {{int foo = strerror_r}} will fail to compile with the GNU 
 type signature, but this is not the case (C\+\+ will coerce the char* to an 
 int in this case).  Instead, we should do what the libhdfs {{terror}} 
 (threaded error) function does here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade

2015-02-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335036#comment-14335036
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7302:
---

 ... Can I remove all dependencies? ...

Yes, we should remove all dependencies since -rollingUpgrade downgrade is no 
longer a valid option.

 namenode -rollingUpgrade downgrade may finalize a rolling upgrade
 -

 Key: HDFS-7302
 URL: https://issues.apache.org/jira/browse/HDFS-7302
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Kai Sasaki
  Labels: document, hdfs
 Attachments: HADOOP-7302.1.patch


 The namenode startup option -rollingUpgrade downgrade is originally 
 designed for downgrading cluster.  However, running namenode -rollingUpgrade 
 downgrade with the new software could result in finalizing the ongoing 
 rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times

2015-02-24 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335069#comment-14335069
 ] 

Arpit Agarwal commented on HDFS-7645:
-

Hi [~ogikei], thank you for posting a patch. This fix looks incomplete.
# The trash must be restored on rollback. Fairly easy to fix this in the same 
function. If the rollback option was passed and previous exists we call 
{{doRollback}}. If previous does not exist, restore trash.
# On finalize, the trash directories must be deleted. I think this will be 
handled by {{signalRollingUpgrade}} but I'd have to check it to make sure.

TestDataNodeRollingUpgrade should flag both these issues.

 Rolling upgrade is restoring blocks from trash multiple times
 -

 Key: HDFS-7645
 URL: https://issues.apache.org/jira/browse/HDFS-7645
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Keisuke Ogiwara
 Attachments: HDFS-7645.01.patch


 When performing an HDFS rolling upgrade, the trash directory is getting 
 restored twice when under normal circumstances it shouldn't need to be 
 restored at all. iiuc, the only time these blocks should be restored is if we 
 need to rollback a rolling upgrade. 
 On a busy cluster, this can cause significant and unnecessary block churn 
 both on the datanodes, and more importantly in the namenode.
 The two times this happens are:
 1) restart of DN onto new software
 {code}
   private void doTransition(DataNode datanode, StorageDirectory sd,
   NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
 if (startOpt == StartupOption.ROLLBACK  sd.getPreviousDir().exists()) {
   Preconditions.checkState(!getTrashRootDir(sd).exists(),
   sd.getPreviousDir() +  and  + getTrashRootDir(sd) +  should not 
  +
both be present.);
   doRollback(sd, nsInfo); // rollback if applicable
 } else {
   // Restore all the files in the trash. The restored files are retained
   // during rolling upgrade rollback. They are deleted during rolling
   // upgrade downgrade.
   int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
   LOG.info(Restored  + restored +  block files from trash.);
 }
 {code}
 2) When heartbeat response no longer indicates a rollingupgrade is in progress
 {code}
   /**
* Signal the current rolling upgrade status as indicated by the NN.
* @param inProgress true if a rolling upgrade is in progress
*/
   void signalRollingUpgrade(boolean inProgress) throws IOException {
 String bpid = getBlockPoolId();
 if (inProgress) {
   dn.getFSDataset().enableTrash(bpid);
   dn.getFSDataset().setRollingUpgradeMarker(bpid);
 } else {
   dn.getFSDataset().restoreTrash(bpid);
   dn.getFSDataset().clearRollingUpgradeMarker(bpid);
 }
   }
 {code}
 HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely 
 clear whether this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335054#comment-14335054
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7537:
---

 In Allen’s comment, the Mock-up output shows status as HEALTHY when 
 numUnderMinimalRelicatedBlocks  0. ...

I see.  Let's keep showing HEALTHY for the moment.  When 
numUnderMinimalRelicatedBlocks  0 and there is no missing/corrupted block, all 
under minimal replicated blocks have at least one good replica so that they can 
be replicated and there is no data loss.  It makes sense to consider the file 
system as healthy.  Currently, we only have two statuses, HEALTHY and CORRUPT.  
In the future, we may want to add one more status for this case.

BTW, there is a typo: numUnderMinimalRelicatedBlocks should be 
numUnderMinimalReplicatedBlocks

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times

2015-02-24 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335072#comment-14335072
 ] 

Arpit Agarwal commented on HDFS-7645:
-

Also the restore from signalRollingUpgrade pointed out by Nathan can probably 
be deleted.

 Rolling upgrade is restoring blocks from trash multiple times
 -

 Key: HDFS-7645
 URL: https://issues.apache.org/jira/browse/HDFS-7645
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Nathan Roberts
Assignee: Keisuke Ogiwara
 Attachments: HDFS-7645.01.patch


 When performing an HDFS rolling upgrade, the trash directory is getting 
 restored twice when under normal circumstances it shouldn't need to be 
 restored at all. iiuc, the only time these blocks should be restored is if we 
 need to rollback a rolling upgrade. 
 On a busy cluster, this can cause significant and unnecessary block churn 
 both on the datanodes, and more importantly in the namenode.
 The two times this happens are:
 1) restart of DN onto new software
 {code}
   private void doTransition(DataNode datanode, StorageDirectory sd,
   NamespaceInfo nsInfo, StartupOption startOpt) throws IOException {
 if (startOpt == StartupOption.ROLLBACK  sd.getPreviousDir().exists()) {
   Preconditions.checkState(!getTrashRootDir(sd).exists(),
   sd.getPreviousDir() +  and  + getTrashRootDir(sd) +  should not 
  +
both be present.);
   doRollback(sd, nsInfo); // rollback if applicable
 } else {
   // Restore all the files in the trash. The restored files are retained
   // during rolling upgrade rollback. They are deleted during rolling
   // upgrade downgrade.
   int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd));
   LOG.info(Restored  + restored +  block files from trash.);
 }
 {code}
 2) When heartbeat response no longer indicates a rollingupgrade is in progress
 {code}
   /**
* Signal the current rolling upgrade status as indicated by the NN.
* @param inProgress true if a rolling upgrade is in progress
*/
   void signalRollingUpgrade(boolean inProgress) throws IOException {
 String bpid = getBlockPoolId();
 if (inProgress) {
   dn.getFSDataset().enableTrash(bpid);
   dn.getFSDataset().setRollingUpgradeMarker(bpid);
 } else {
   dn.getFSDataset().restoreTrash(bpid);
   dn.getFSDataset().clearRollingUpgradeMarker(bpid);
 }
   }
 {code}
 HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely 
 clear whether this is somehow intentional. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335213#comment-14335213
 ] 

Hudson commented on HDFS-7831:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7189 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7189/])
HDFS-7831. Fix the starting index and end condition of the loop in 
FileDiffList.findEarlierSnapshotBlocks(). Contributed by Konstantin Shvachko. 
(jing9: rev 73bcfa99af61e5202f030510db8954c17cba43cc)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiffList.java


 Fix the starting index and end condition of the loop in 
 FileDiffList.findEarlierSnapshotBlocks()
 

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-02-24 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335216#comment-14335216
 ] 

Lei (Eddy) Xu commented on HDFS-7830:
-

[~cnauroth] Would you mind to file a separate JIRA and assign to me? Thanks!

 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335212#comment-14335212
 ] 

Jing Zhao commented on HDFS-7435:
-

Thanks for sharing the thoughts, [~daryn]. Why not posting your current patch 
first so that we can also have a better understanding about why bumping the 
DN's min NN version is necessary?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.

2015-02-24 Thread Chris Nauroth (JIRA)
Chris Nauroth created HDFS-7833:
---

 Summary: DataNode reconfiguration does not recalculate valid 
volumes required, based on configured failed volumes tolerated.
 Key: HDFS-7833
 URL: https://issues.apache.org/jira/browse/HDFS-7833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Lei (Eddy) Xu


DataNode reconfiguration never recalculates 
{{FsDatasetImpl#validVolsRequired}}.  This may cause incorrect behavior of the 
{{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration causes 
the DataNode to run with a different total number of volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-02-24 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335230#comment-14335230
 ] 

Chris Nauroth commented on HDFS-7830:
-

Thank you, Eddy.  I filed HDFS-7833.

 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.

2015-02-24 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335228#comment-14335228
 ] 

Chris Nauroth commented on HDFS-7833:
-

This is a repeat of the comment I mentioned on HDFS-7830.  Thank you to 
[~eddyxu] for volunteering to take assignment of the issue.

Another potential problem that I've noticed in the DataNode reconfiguration 
code is that it never recalculates {{FsDatasetImpl#validVolsRequired}}. This is 
a final variable calculated as (# volumes configured) - (# volume failures 
tolerated):
{code}
this.validVolsRequired = volsConfigured - volFailuresTolerated;
{code}
If this variable is not updated for DataNode reconfigurations, then it could 
lead to some unexpected situations. For example:
# DataNode starts running with 6 volumes (all healthy) and 
{{dfs.datanode.failed.volumes.tolerated}} set to 2.
# {{FsDatasetImpl#validVolsRequired}} is set to 6 - 2 = 4.
# DataNode is reconfigured to run with 8 volumes (all still healthy).
# Now 3 volumes fail. The admin would expect the DataNode to abort, but there 
are 8 - 3 = 5 good volumes left, and {{FsDatasetImpl#validVolsRequired}} is 
still 4, so {{FsDatasetImpl#hasEnoughResource}} returns true.



 DataNode reconfiguration does not recalculate valid volumes required, based 
 on configured failed volumes tolerated.
 ---

 Key: HDFS-7833
 URL: https://issues.apache.org/jira/browse/HDFS-7833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Lei (Eddy) Xu

 DataNode reconfiguration never recalculates 
 {{FsDatasetImpl#validVolsRequired}}.  This may cause incorrect behavior of 
 the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration 
 causes the DataNode to run with a different total number of volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message

2015-02-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335080#comment-14335080
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7439:
---

 ... how can I rebuild for this patch? I can’t log in Jenkins WebUI.

I mean I already have started another Jenkins build for the patch.  For 
rebuilding, you may click Cancel Patch and then Submit Patch.  It will 
trigger a new build.

 Add BlockOpResponseProto's message to DFSClient's exception message
 ---

 Key: HDFS-7439
 URL: https://issues.apache.org/jira/browse/HDFS-7439
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Takanobu Asanuma
Priority: Minor
 Attachments: HDFS-7439.1.patch


 When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging 
 if DFSClient can add BlockOpResponseProto's message to the exception message 
 applications will get. For example, instead of
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp());
 {noformat}
 It could be,
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp()
 + , status message  + status.getMessage());
 {noformat}
 We might want to check out all the references to BlockOpResponseProto in 
 DFSClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message

2015-02-24 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335088#comment-14335088
 ] 

Tsz Wo Nicholas Sze commented on HDFS-7439:
---

There are other places having similar problem:
- DFSOutputStream.DataStreamer.createBlockOutputStream(..)
- DFSOutputStream.DataStreamer.transfer(..)
- RemoteBlockReader2.checkSuccess(..)
- Dispatcher.PendingMove.receiveResponse(..)
- DataXceiver.replaceBlock(..)

The code has similar format
{code}
if (status != SUCCESS) {
  if (status == Status.ERROR_ACCESS_TOKEN) {
throw new InvalidBlockTokenException(..);
  } else {
throw new IOException(..);
  }
}
{code}
How about we add a utility method?



 Add BlockOpResponseProto's message to DFSClient's exception message
 ---

 Key: HDFS-7439
 URL: https://issues.apache.org/jira/browse/HDFS-7439
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Takanobu Asanuma
Priority: Minor
 Attachments: HDFS-7439.1.patch


 When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging 
 if DFSClient can add BlockOpResponseProto's message to the exception message 
 applications will get. For example, instead of
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp());
 {noformat}
 It could be,
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp()
 + , status message  + status.getMessage());
 {noformat}
 We might want to check out all the references to BlockOpResponseProto in 
 DFSClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

2015-02-24 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335128#comment-14335128
 ] 

Daryn Sharp commented on HDFS-7435:
---

Do I have the luxury of bumping the DN's minimum NN version?  That would 
greatly simplify the implementation.  It's easy for the NN to use the presence 
of the protobuf fields to determine if old/new.  However the prior patches 
illustrate it's not so easy for the DN to auto-detect.

I believe the standard upgrade procedure is upgrade the NN, then rolling 
upgrade the DNs.  Per above, upgraded NN supports old/new reports from DNs.  
The only scenario in which a problem can occur is the cluster is fully or 
partially upgraded, and the NN is downgraded.  The new DNs won't be able to 
communicate with the old NN, hence why I'd like to bump the minimum version so 
the DN doesn't continue to send block reports that appear to be empty to the 
old NN.  I'd argue that if the NN is downgraded, there's going to be downtime, 
so you might as well rollback the DNs too.

Thoughts?

 PB encoding of block reports is very inefficient
 

 Key: HDFS-7435
 URL: https://issues.apache.org/jira/browse/HDFS-7435
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, namenode
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Priority: Critical
 Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
 HDFS-7435.002.patch, HDFS-7435.patch


 Block reports are encoded as a PB repeating long.  Repeating fields use an 
 {{ArrayList}} with default capacity of 10.  A block report containing tens or 
 hundreds of thousand of longs (3 for each replica) is extremely expensive 
 since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
 fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().

2015-02-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335192#comment-14335192
 ] 

Jing Zhao commented on HDFS-7831:
-

Thanks for the fix, [~shv]. +1. I will commit it shortly.

 Fix the starting index of the loop in 
 FileDiffList.findEarlierSnapshotBlocks().
 ---

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7789) DFsck should resolve the path to support cross-FS symlinks

2015-02-24 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335182#comment-14335182
 ] 

Gera Shegalov commented on HDFS-7789:
-

[~lohit], can you review this patch?

 DFsck should resolve the path to support cross-FS symlinks
 --

 Key: HDFS-7789
 URL: https://issues.apache.org/jira/browse/HDFS-7789
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: HDFS-7789.001.patch


 DFsck should resolve the specified path such that it can be used in with 
 viewfs and other cross-filesystem symlinks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.

2015-02-24 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335185#comment-14335185
 ] 

Chris Nauroth commented on HDFS-7830:
-

Hi [~eddyxu].  Another potential problem that I've noticed in the DataNode 
reconfiguration code is that it never recalculates 
{{FsDatasetImpl#validVolsRequired}}.  This is a {{final}} variable calculated 
as (# volumes configured) - (# volume failures tolerated):
{code}
this.validVolsRequired = volsConfigured - volFailuresTolerated;
{code}
If this variable is not updated for DataNode reconfigurations, then it could 
lead to some unexpected situations.  For example:
# DataNode starts running with 6 volumes (all healthy) and 
{{dfs.datanode.failed.volumes.tolerated}} set to 2.
# {{FsDatasetImpl#validVolsRequired}} is set to 6 - 2 = 4.
# DataNode is reconfigured to run with 8 volumes (all still healthy).
# Now 3 volumes fail.  The admin would expect the DataNode to abort, but there 
are 8 - 3 = 5 good volumes left, and {{FsDatasetImpl#validVolsRequired}} is 
still 4, so {{FsDatasetImpl#hasEnoughResource}} returns {{true}}.

Is this something that makes sense for you to address as part of the patch 
you're working on now, or would you prefer I file a separate jira to track 
this?  Thanks!

 DataNode does not release the volume lock when adding a volume fails.
 -

 Key: HDFS-7830
 URL: https://issues.apache.org/jira/browse/HDFS-7830
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu

 When there is a failure in adding volume process, the {{in_use.lock}} is not 
 released. Also, doing another {{-reconfig}} to remove the new dir in order to 
 cleanup doesn't remove the lock. lsof still shows datanode holding on to the 
 lock file. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7831:

Summary: Fix the starting index and end condition of the loop in 
FileDiffList.findEarlierSnapshotBlocks()  (was: Fix the starting index of the 
loop in FileDiffList.findEarlierSnapshotBlocks().)

 Fix the starting index and end condition of the loop in 
 FileDiffList.findEarlierSnapshotBlocks()
 

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7831:

   Resolution: Fixed
Fix Version/s: 2.7.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2.

 Fix the starting index and end condition of the loop in 
 FileDiffList.findEarlierSnapshotBlocks()
 

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 2.7.0

 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path

2015-02-24 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335287#comment-14335287
 ] 

Yongjun Zhang commented on HDFS-6133:
-

Hi [~szetszwo],

Thanks for your explanation, and sorry for late reply. 

I agree with your assessment.  I wonder if we can update the config property 
description to say that enabling is not recommended before rolling upgrade is 
finished?

Thanks.

 

 Make Balancer support exclude specified path
 

 Key: HDFS-6133
 URL: https://issues.apache.org/jira/browse/HDFS-6133
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: balancer  mover, datanode
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Fix For: 2.7.0

 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
 HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
 HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
 HDFS-6133-9.patch, HDFS-6133.patch


 Currently, run Balancer will destroying Regionserver's data locality.
 If getBlocks could exclude blocks belongs to files which have specific path 
 prefix, like /hbase, then we can run Balancer without destroying 
 Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7467) Provide storage tier information for a directory via fsck

2015-02-24 Thread Benoy Antony (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335606#comment-14335606
 ] 

Benoy Antony commented on HDFS-7467:


Thanks for the review [~szetszwo].
If there are no further comments, I'll commit the patch tomorrow.



 Provide storage tier information for a directory via fsck
 -

 Key: HDFS-7467
 URL: https://issues.apache.org/jira/browse/HDFS-7467
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer  mover
Affects Versions: 2.6.0
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-7467-002.patch, HDFS-7467-003.patch, 
 HDFS-7467-004.patch, HDFS-7467.patch, storagepolicydisplay.pdf


 Currently _fsck_  provides information regarding blocks for a directory.
 It should be augmented to provide storage tier information (optionally). 
 The sample report could be as follows :
 {code}
 Storage Tier Combination# of blocks   % of blocks
 DISK:1,ARCHIVE:2  340730   97.7393%
  
 ARCHIVE:3   39281.1268%
  
 DISK:2,ARCHIVE:231220.8956%
  
 DISK:2,ARCHIVE:1 7480.2146%
  
 DISK:1,ARCHIVE:3  440.0126%
  
 DISK:3,ARCHIVE:2  300.0086%
  
 DISK:3,ARCHIVE:1   90.0026%
 {code}
  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7668) Convert site documentation from apt to markdown

2015-02-24 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7668:
---
 Target Version/s: 2.7.0  (was: 3.0.0)
Affects Version/s: (was: 3.0.0)
   2.7.0
Fix Version/s: (was: 3.0.0)
   2.7.0

 Convert site documentation from apt to markdown
 ---

 Key: HDFS-7668
 URL: https://issues.apache.org/jira/browse/HDFS-7668
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 2.7.0

 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, 
 HDFS-7668-b2.001.patch


 HDFS analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336111#comment-14336111
 ] 

GAO Rui commented on HDFS-7537:
---

I have attached a new patch which added  
DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY to the output of fsck and made a 
unit test to confirm this change. Please review that when you are free, thanks 
a lot.

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, 
 dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336110#comment-14336110
 ] 

GAO Rui commented on HDFS-7537:
---

I have attached a new patch which added  
DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY to the output of fsck and made a 
unit test to confirm this change. Please review that when you are free, thanks 
a lot.

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, 
 dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage

2015-02-24 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336134#comment-14336134
 ] 

Jing Zhao commented on HDFS-7827:
-

Sure. Assign the jira to you. Thanks for working on this, Hui!

 Erasure Coding: support striped blocks in non-protobuf fsimage
 --

 Key: HDFS-7827
 URL: https://issues.apache.org/jira/browse/HDFS-7827
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Hui Zheng

 HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. 
 We should also add this support to the non-protobuf fsimage since it is still 
 used for use cases like offline image processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade

2015-02-24 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated HDFS-7302:
-
Attachment: HDFS-7302.2.patch

 namenode -rollingUpgrade downgrade may finalize a rolling upgrade
 -

 Key: HDFS-7302
 URL: https://issues.apache.org/jira/browse/HDFS-7302
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Kai Sasaki
  Labels: document, hdfs
 Attachments: HADOOP-7302.1.patch, HDFS-7302.2.patch


 The namenode startup option -rollingUpgrade downgrade is originally 
 designed for downgrading cluster.  However, running namenode -rollingUpgrade 
 downgrade with the new software could result in finalizing the ongoing 
 rolling upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7749) Erasure Coding: Add striped block support in INodeFile

2015-02-24 Thread Hui Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Zheng reassigned HDFS-7749:
---

Assignee: Hui Zheng  (was: Jing Zhao)

 Erasure Coding: Add striped block support in INodeFile
 --

 Key: HDFS-7749
 URL: https://issues.apache.org/jira/browse/HDFS-7749
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Hui Zheng
 Attachments: HDFS-7749.000.patch


 This jira plan to add a new INodeFile feature to store the stripped blocks 
 information in case that the INodeFile is erasure coded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread GAO Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-7537:
--
Attachment: HDFS-7537.2.patch

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, 
 dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage

2015-02-24 Thread Hui Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336117#comment-14336117
 ] 

Hui Zheng commented on HDFS-7827:
-

Hi Jing
I would like to work on this jira. Could you assign it to me?

 Erasure Coding: support striped blocks in non-protobuf fsimage
 --

 Key: HDFS-7827
 URL: https://issues.apache.org/jira/browse/HDFS-7827
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. 
 We should also add this support to the non-protobuf fsimage since it is still 
 used for use cases like offline image processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7827:

Assignee: Hui Zheng  (was: Jing Zhao)

 Erasure Coding: support striped blocks in non-protobuf fsimage
 --

 Key: HDFS-7827
 URL: https://issues.apache.org/jira/browse/HDFS-7827
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Hui Zheng

 HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. 
 We should also add this support to the non-protobuf fsimage since it is still 
 used for use cases like offline image processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7749) Erasure Coding: Add striped block support in INodeFile

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao reassigned HDFS-7749:
---

Assignee: Jing Zhao  (was: Hui Zheng)

 Erasure Coding: Add striped block support in INodeFile
 --

 Key: HDFS-7749
 URL: https://issues.apache.org/jira/browse/HDFS-7749
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7749.000.patch


 This jira plan to add a new INodeFile feature to store the stripped blocks 
 information in case that the INodeFile is erasure coded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message

2015-02-24 Thread Takanobu Asanuma (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335824#comment-14335824
 ] 

Takanobu Asanuma commented on HDFS-7439:


Sorry for my misunderstanding. I understand how to rebuild.
But, test was failed again. Is my patch the cause?

I also try to add a utility methods. Thank you!

 Add BlockOpResponseProto's message to DFSClient's exception message
 ---

 Key: HDFS-7439
 URL: https://issues.apache.org/jira/browse/HDFS-7439
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: Takanobu Asanuma
Priority: Minor
 Attachments: HDFS-7439.1.patch


 When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging 
 if DFSClient can add BlockOpResponseProto's message to the exception message 
 applications will get. For example, instead of
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp());
 {noformat}
 It could be,
 {noformat}
 throw new IOException(Got error for OP_READ_BLOCK, self=
 + peer.getLocalAddressString() + , remote=
 + peer.getRemoteAddressString() + , for file  + file
 + , for pool  + block.getBlockPoolId() +  block  
 + block.getBlockId() + _ + block.getGenerationStamp()
 + , status message  + status.getMessage());
 {noformat}
 We might want to check out all the references to BlockOpResponseProto in 
 DFSClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7839) Erasure coding: move EC policies from file header to XAttr

2015-02-24 Thread Zhe Zhang (JIRA)
Zhe Zhang created HDFS-7839:
---

 Summary: Erasure coding: move EC policies from file header to XAttr
 Key: HDFS-7839
 URL: https://issues.apache.org/jira/browse/HDFS-7839
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart

2015-02-24 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335528#comment-14335528
 ] 

Allen Wittenauer commented on HDFS-7537:


Also: I'm not sure what to do about the web UI component.  It may not be 
necessary; better practice should be to run fsck under situations like these.

 fsck is confusing when dfs.namenode.replication.min  1  missing replicas 
  NN restart
 -

 Key: HDFS-7537
 URL: https://issues.apache.org/jira/browse/HDFS-7537
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Allen Wittenauer
Assignee: GAO Rui
 Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png


 If minimum replication is set to 2 or higher and some of those replicas are 
 missing and the namenode restarts, it isn't always obvious that the missing 
 replicas are the reason why the namenode isn't leaving safemode.  We should 
 improve the output of fsck and the web UI to make it obvious that the missing 
 blocks are from unmet replicas vs. completely/totally missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7836) BlockManager Scalability Improvements

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7836:

Issue Type: Improvement  (was: Bug)

 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb

 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-02-24 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335752#comment-14335752
 ] 

Lei (Eddy) Xu commented on HDFS-7722:
-

{{TestDataNodeVolumeFailureReporting}} is relevant. I will work on fixing it.

 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7668) Convert site documentation from apt to markdown

2015-02-24 Thread Masatake Iwasaki (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335753#comment-14335753
 ] 

Masatake Iwasaki commented on HDFS-7668:


Thanks, [~cmccabe]!

 Convert site documentation from apt to markdown
 ---

 Key: HDFS-7668
 URL: https://issues.apache.org/jira/browse/HDFS-7668
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 2.7.0

 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, 
 HDFS-7668-b2.001.patch


 HDFS analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7838) Expose truncate API for libhdfs

2015-02-24 Thread Yi Liu (JIRA)
Yi Liu created HDFS-7838:


 Summary: Expose truncate API for libhdfs
 Key: HDFS-7838
 URL: https://issues.apache.org/jira/browse/HDFS-7838
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: 2.7.0
Reporter: Yi Liu
Assignee: Yi Liu


It's good to expose truncate in libhdfs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7495) Remove updatePosition argument from DFSInputStream#getBlockAt()

2015-02-24 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335809#comment-14335809
 ] 

Yi Liu commented on HDFS-7495:
--

Good catch.
+1 for the latest patch. Thanks Ted and Colin.

 Remove updatePosition argument from DFSInputStream#getBlockAt()
 ---

 Key: HDFS-7495
 URL: https://issues.apache.org/jira/browse/HDFS-7495
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Ted Yu
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-7495.002.patch, hdfs-7495-001.patch


 There're two locks: one on DFSInputStream.this , one on 
 DFSInputStream.infoLock
 Normally lock is obtained on infoLock, then on DFSInputStream.infoLock
 However, such order is not observed in DFSInputStream#getBlockAt() :
 {code}
 synchronized(infoLock) {
 ...
   if (updatePosition) {
 // synchronized not strictly needed, since we only get here
 // from synchronized caller methods
 synchronized(this) {
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335590#comment-14335590
 ] 

Hadoop QA commented on HDFS-7835:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700586/HDFS-7835.000.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1155 javac 
compiler warnings (more than the trunk's current 185 warnings).

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
47 warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/9659//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9659//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9659//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9659//console

This message is automatically generated.

 make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 --

 Key: HDFS-7835
 URL: https://issues.apache.org/jira/browse/HDFS-7835
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: dfsclient
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: HDFS-7835.000.patch


 Make initial sleeptime in locateFollowingBlock configurable for DFSClient.
 Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from 
 DFSOutputStream is hard-coded as 400 ms, but retries can be configured by 
 dfs.client.block.write.locateFollowingBlock.retries. We should also make 
 the initial sleeptime configurable to give user more flexibility to control 
 both retry and delay.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block

2015-02-24 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-1447:
---
Status: Patch Available  (was: Open)

 Make getGenerationStampFromFile() more efficient, so it doesn't reprocess 
 full directory listing for every block
 

 Key: HDFS-1447
 URL: https://issues.apache.org/jira/browse/HDFS-1447
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 0.20.2
Reporter: Matt Foley
Assignee: Matt Foley
 Attachments: HDFS-1447.patch, Test_HDFS_1447_NotForCommitt.java.patch


 Make getGenerationStampFromFile() more efficient. Currently this routine is 
 called by addToReplicasMap() for every blockfile in the directory tree, and 
 it walks each file's containing directory on every call. There is a simple 
 refactoring that should make it more efficient.
 This work item is one of four sub-tasks for HDFS-1443, Improve Datanode 
 startup time.
 The fix will probably be folded into sibling task HDFS-1446, which is already 
 refactoring the method that calls getGenerationStampFromFile().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7668) Convert site documentation from apt to markdown

2015-02-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335714#comment-14335714
 ] 

Colin Patrick McCabe commented on HDFS-7668:


+1 for the backport. Thanks, [~iwasakims].

 Convert site documentation from apt to markdown
 ---

 Key: HDFS-7668
 URL: https://issues.apache.org/jira/browse/HDFS-7668
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 3.0.0

 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, 
 HDFS-7668-b2.001.patch


 HDFS analog to HADOOP-11495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7837) Erasure Coding: allocate and persist striped blocks in FSNamesystem

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7837:

Summary: Erasure Coding: allocate and persist striped blocks in 
FSNamesystem  (was: Allocate and persist striped blocks in FSNamesystem)

 Erasure Coding: allocate and persist striped blocks in FSNamesystem
 ---

 Key: HDFS-7837
 URL: https://issues.apache.org/jira/browse/HDFS-7837
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao

 Try to finish the remaining work from HDFS-7339 (except the 
 ClientProtocol/DFSClient part):
 # Allow FSNamesystem#getAdditionalBlock to create striped blocks and persist 
 striped blocks to editlog
 # Update FSImage for max allocated striped block ID
 # Update the block commit/complete logic in BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335738#comment-14335738
 ] 

Hadoop QA commented on HDFS-7411:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700583/hdfs-7411.011.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9658//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9658//console

This message is automatically generated.

 Refactor and improve decommissioning logic into DecommissionManager
 ---

 Key: HDFS-7411
 URL: https://issues.apache.org/jira/browse/HDFS-7411
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.5.1
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, 
 hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, 
 hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, 
 hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch


 Would be nice to split out decommission logic from DatanodeManager to 
 DecommissionManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7837) Erasure Coding: allocate and persist striped blocks in FSNamesystem

2015-02-24 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-7837:

Attachment: HDFS-7837.000.patch

Patch depending on HDFS-7749. The patch also includes a unit test to make sure 
the striped blocks is correctly written to and loaded from editlog and fsimage.

 Erasure Coding: allocate and persist striped blocks in FSNamesystem
 ---

 Key: HDFS-7837
 URL: https://issues.apache.org/jira/browse/HDFS-7837
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-7837.000.patch


 Try to finish the remaining work from HDFS-7339 (except the 
 ClientProtocol/DFSClient part):
 # Allow FSNamesystem#getAdditionalBlock to create striped blocks and persist 
 striped blocks to editlog
 # Update FSImage for max allocated striped block ID
 # Update the block commit/complete logic in BlockManager



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7763) fix zkfc hung issue due to not catching exception in a corner case

2015-02-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335675#comment-14335675
 ] 

Hudson commented on HDFS-7763:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7193 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7193/])
HDFS-7763. fix zkfc hung issue due to not catching exception in a corner case. 
Contributed by Liang Xie. (wang: rev 7105ebaa9f370db04962a1e19a67073dc080433b)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSZKFailoverController.java


 fix zkfc hung issue due to not catching exception in a corner case
 --

 Key: HDFS-7763
 URL: https://issues.apache.org/jira/browse/HDFS-7763
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 2.7.0

 Attachments: HDFS-7763-001.txt, HDFS-7763-002.txt, jstack.4936


 In our product cluster, we hit both the two zkfc process is hung after a zk 
 network outage.
 the zkfc log said:
 {code}
 2015-02-07,17:40:11,875 INFO org.apache.zookeeper.ClientCnxn: Client session 
 timed out, have not heard from server in 3334ms for sessionid 
 0x4a61bacdd9dfb2, closing socket connection and attempting reconnect
 2015-02-07,17:40:11,977 FATAL org.apache.hadoop.ha.ActiveStandbyElector: 
 Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further 
 znode monitoring connection errors.
 2015-02-07,17:40:12,425 INFO org.apache.zookeeper.ZooKeeper: Session: 
 0x4a61bacdd9dfb2 closed
 2015-02-07,17:40:12,425 FATAL org.apache.hadoop.ha.ZKFailoverController: 
 Fatal error occurred:Received stat error from Zookeeper. code:CONNECTIONLOSS. 
 Not retrying further znode monitoring connection errors.
 2015-02-07,17:40:12,425 INFO org.apache.hadoop.ipc.Server: Stopping server on 
 11300
 2015-02-07,17:40:12,425 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
 Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2
 2015-02-07,17:40:12,426 INFO org.apache.zookeeper.ClientCnxn: EventThread 
 shut down
 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
 Yielding from election
 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
 Server Responder
 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ha.HealthMonitor: Stopping 
 HealthMonitor thread
 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
 Server listener on 11300
 {code}
 and the thread dump also be uploaded as attachment.
 From the dump, we can see due to the unknown non-daemon 
 threads(pool-*-thread-*), the process did not exit, but the critical threads, 
 like health monitor and rpc threads had been stopped, so our 
 watchdog(supervisord) had not not observed the zkfc process is down or 
 abnormal.  so the following namenode failover could not be done as expected.
 there're two possible fixes here, 1) figure out the unset-thread-name, like 
 pool-7-thread-1, where them came from and close or set daemon property. i 
 tried to search but got nothing right now. 2) catch the exception from 
 ZKFailoverController.run() so we 

[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements

2015-02-24 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335747#comment-14335747
 ] 

Charles Lamb commented on HDFS-7836:


Problem Statement

The number of blocks stored by the largest HDFS clusters continues to increase. 
 This increase adds pressure to the BlockManager, that part of the NameNode 
which handles block data from across the cluster.

Full block reports are problematic.  The more blocks each DataNode has, the 
longer it takes to process a full block report from that DataNode.  Storage 
densities have roughly doubled each year for the past few years.  Meanwhile, 
increases in CPU power have come mostly in the form of additional cores rather 
than faster clock speeds.  Currently, the NameNode cannot use these additional 
cores because full block reports are processed while holding the namesystem 
lock.

The BlockManager stores all blocks in memory and this contributes to a large 
heap size.  As the NameNode Java heap size has grown, full garbage collection 
events have started to take several minutes.  Although it is often possible to 
avoid full GCs by re-using Java objects, they remain an operational concern for 
administrators.  They also contribute to a long NameNode startup time, 
sometimes measured in tens of minutes for the biggest clusters.


Goals
We need to improve the BlockManager to handle the challenges of the next few 
years.  Our specific goals for this project are to:

* Reduce lock contention for the FSNamesystem lock
* Enable concurrent processing of block reports
* Reduce the Java heap size of the NameNode
* Optimize the use of network resources

[~cmccabe] and I will be working on this Jira. We propose doing this work on a 
separate branch. If there is interest in a community meeting to discuss these 
changes, then perhaps Tuesday 3/10/15 at Cloudera in Palo Alto, CA would work? 
I suggest that date because I will be in the bay area that day and would like 
to meet with other interested community members in person. I'll also be around 
3/11 and 3/12 if we need an alternate date.


 BlockManager Scalability Improvements
 -

 Key: HDFS-7836
 URL: https://issues.apache.org/jira/browse/HDFS-7836
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Charles Lamb
Assignee: Charles Lamb

 Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.

2015-02-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335745#comment-14335745
 ] 

Hadoop QA commented on HDFS-7722:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12700593/HDFS-7722.001.patch
  against trunk revision 9a37247.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting

  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/9660//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9660//console

This message is automatically generated.

 DataNode#checkDiskError should also remove Storage when error is found.
 ---

 Key: HDFS-7722
 URL: https://issues.apache.org/jira/browse/HDFS-7722
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch


 When {{DataNode#checkDiskError}} found disk errors, it removes all block 
 metadatas from {{FsDatasetImpl}}. However, it does not removed the 
 corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. 
 The result is that, we could not directly run {{reconfig}} to hot swap the 
 failure disks without changing the configure file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >