date:20150715


[ 
https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629092#comment-14629092
 ] 

Hadoop QA commented on HDFS-8483:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745565/HDFS-8483.0.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11724/console |


This message was automatically generated.

 Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to 
 a striped block.
 --

 Key: HDFS-8483
 URL: https://issues.apache.org/jira/browse/HDFS-8483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
 Fix For: HDFS-7285

 Attachments: HDFS-8483.0.patch


 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to 
 a striped block) to the NameNode (through the 
 DatanodeProtocol#reportBadBlocks call), and check if the 
 recovery/invalidation work can be correctly scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test

2015-07-15 Thread Xinwei Qin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinwei Qin  updated HDFS-8202:
--
Attachment: HDFS-8202-HDFS-7285.003.patch

[~zhz], updated the patch including reading and writing EC file test with 
failure, please help to review. 

 Improve end to end stirpping file test to add erasure recovering test
 -

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202.001.patch, 
 HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.


 [ 
https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-8483:
---
Status: Patch Available  (was: Open)

 Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to 
 a striped block.
 --

 Key: HDFS-8483
 URL: https://issues.apache.org/jira/browse/HDFS-8483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
 Fix For: HDFS-7285

 Attachments: HDFS-8483.0.patch


 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to 
 a striped block) to the NameNode (through the 
 DatanodeProtocol#reportBadBlocks call), and check if the 
 recovery/invalidation work can be correctly scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.


 [ 
https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-8483:
---
Attachment: HDFS-8483.0.patch

I uploaded an initial patch. Please review it.

 Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to 
 a striped block.
 --

 Key: HDFS-8483
 URL: https://issues.apache.org/jira/browse/HDFS-8483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
 Fix For: HDFS-7285

 Attachments: HDFS-8483.0.patch


 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to 
 a striped block) to the NameNode (through the 
 DatanodeProtocol#reportBadBlocks call), and check if the 
 recovery/invalidation work can be correctly scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk


 [ 
https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8787:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

Thanks Jing for reviewing! {{TestEditLog}} passes fine locally. I just 
committed the patch to EC branch.

 Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be 
 consistent with trunk
 ---

 Key: HDFS-8787
 URL: https://issues.apache.org/jira/browse/HDFS-8787
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Fix For: HDFS-7285

 Attachments: HDFS-8787-HDFS-7285.00.patch


 As Nicholas suggested under HDFS-8728, we should split the patch on 
 {{BlockInfo}} structure into smaller pieces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile


[ 
https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629184#comment-14629184
 ] 

kanaka kumar avvaru commented on HDFS-8767:
---

Looks fine for me. Thanks for updating the test [~wheat9]. I will post patch 
for pending check-style. 

 RawLocalFileSystem.listStatus() returns null for UNIX pipefile
 --

 Key: HDFS-8767
 URL: https://issues.apache.org/jira/browse/HDFS-8767
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: kanaka kumar avvaru
Priority: Critical
 Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, 
 HDFS-8767-02.patch, HDFS-8767.003.patch


 Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of 
 the file. The bug breaks Hive when Hive loads data from UNIX pipe file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8688) replace shouldCheckForEnoughRacks with hasClusterEverBeenMultiRack


[ 
https://issues.apache.org/jira/browse/HDFS-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629210#comment-14629210
 ] 

Ming Ma commented on HDFS-8688:
---

Thanks [~walter.k.su]! Overall it looks good.

{{ScriptBasedMapping#isSingleSwitch}} still checks if the script name is null. 
But it appears that method is only used by test codes. Does that mean 
{{AbstractDNSToSwitchMapping#isSingleSwitch}} isn't necessary anymore? Try to 
understand if all the the script name == null implies single rack assumptions 
in the code can be removed.


 replace shouldCheckForEnoughRacks with hasClusterEverBeenMultiRack
 --

 Key: HDFS-8688
 URL: https://issues.apache.org/jira/browse/HDFS-8688
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Walter Su
Assignee: Walter Su
 Attachments: HDFS-8688.01.patch, HDFS-8688.02.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups

[
https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629070#comment-14629070
]

Ming Ma commented on HDFS-7613:
---

Interesting work. There are couple issues around the extensibility of block
placement policies. They aren't EC specific; so we don't have to tackle them
here. But we would like to raise them here if it helps to make future
refactoring easier.

* Balancer and Mover have built-in assumption it is the default block placement
policy. So every time we have a new block placement policy, we need to modify
those tools. One suggestion mentioned in HDFS-1431 is to run Balancer and Mover
inside NN.
* BlockManager has built-in assumption about rack policy in functions such as
useDelHint, blockHasEnoughRacks. That means when we have new block placement
policy, we need to modify BlockManager to account for the new policy. HDFS-8647
should improve that.
* Ability to reuse or compose new policies based on existing block placement
policies. This is different from HDFS-4894 to support different policies for
different files. For example, HDFS-7541 adds upgrade domain policy. It will be
nice if we can support both upgrade domain policy and EC policy without any
code change at run time for a given file.
https://issues.apache.org/jira/secure/attachment/12687808/SupportforfastHDFSdatanoderollingupgrade.pdf's
Support for nontopology based policy section suggested a more flexible API.
* As we have new policies, migration from old policy to new policy on
production clusters becomes necessary. [~ctrezzo] has worked in this area and
plan to open a new jira for that.

Block placement policy for erasure coding groups

Key: HDFS-7613
URL: https://issues.apache.org/jira/browse/HDFS-7613
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Walter Su
Attachments: HDFS-7613.001.patch

Blocks in an erasure coding group should be placed in different failure
domains -- different DataNodes at the minimum, and different racks ideally.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test

2015-07-15 Thread Xinwei Qin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629094#comment-14629094
 ] 

Xinwei Qin  commented on HDFS-8202:
---

Hi, [~zhz], thanks for your clarify. I will move HDFS-8259 and HDFS-8260 patch 
here.

 Improve end to end stirpping file test to add erasure recovering test
 -

 Key: HDFS-8202
 URL: https://issues.apache.org/jira/browse/HDFS-8202
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Kai Zheng
Assignee: Xinwei Qin 
 Attachments: HDFS-8202.001.patch, HDFS-8202.002.patch


 This to follow on HDFS-8201 to add erasure recovering test in the end to end 
 stripping file test:
 * After writing certain blocks to the test file, delete some block file;
 * Read the file content and compare, see if any recovering issue, or verify 
 the erasure recovering works or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8776) Decom manager should not be active on standby

[
https://issues.apache.org/jira/browse/HDFS-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629236#comment-14629236
]

Ming Ma commented on HDFS-8776:
---

Make sense. There might be some operational impact with disabling
DecommissionManager on standby. admins usually update
dfs.namenode.hosts.exclude and then call dfsadmin -refreshNodes on both
active and standby around the same time; in that way if NN fails over, decomm
can continue. If DecommissionManager isn't running on standby, nodes will stay
in decommission_inprogress state without any progress on standby. As long as
admins know to ignore decommission state on standby, that should be ok (even if
we keep DecommissionManager running, decommission states between active and
standby could be different at any given time).

Decom manager should not be active on standby
-

Key: HDFS-8776
URL: https://issues.apache.org/jira/browse/HDFS-8776
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 2.6.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp

The decommission manager should not be actively processing on the standby.
The decomm manager goes through the costly computation for determining every
block on the node requires replication yet doesn't queue them for replication
- because it's in standby. The decomm manager is holding the namesystem write
lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue
of timed out clients, NN processes some heartbeats/IBRs before the decomm
manager locks up the namesystem again. Nodes attempting to register will be
sending full BRs which are more costly to send and discard than a heartbeat.
If a failover is required, the standby will likely have to struggle very hard
to not GC while catching up on its queued IBRs while DNs continue to fill
the call queue and time out.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-8313) Erasure Coding: DFSStripedOutputStream#close throws NullPointerException exception in some cases


 [ 
https://issues.apache.org/jira/browse/HDFS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo resolved HDFS-8313.
-
Resolution: Cannot Reproduce

 Erasure Coding: DFSStripedOutputStream#close throws NullPointerException 
 exception in some cases
 

 Key: HDFS-8313
 URL: https://issues.apache.org/jira/browse/HDFS-8313
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Yi Liu
Assignee: Li Bo

 {code}
 java.io.IOException: java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.DataStreamer$LastException.check(DataStreamer.java:193)
   at 
 org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:422)
 {code}
  DFSStripedOutputStream#close throws NullPointerException exception in some 
 cases



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX


[ 
https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629123#comment-14629123
 ] 

J.Andreina commented on HDFS-8670:
--

Testcase failures are not related to this patch.

 Better to exclude decommissioned nodes for namenode NodeUsage JMX
 -

 Key: HDFS-8670
 URL: https://issues.apache.org/jira/browse/HDFS-8670
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch


 The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of 
 DataNodes usage, it currently includes decommissioned nodes for the 
 calculation. However, given balancer doesn't work on decommissioned nodes and 
 sometimes we could have nodes stay in decommissioned states for a long time; 
 it might be better to exclude decommissioned nodes for the metrics 
 calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-8784) BlockInfo#numNodes should be numStorages


 [ 
https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanaka kumar avvaru reassigned HDFS-8784:
-

Assignee: kanaka kumar avvaru

 BlockInfo#numNodes should be numStorages
 

 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang
Assignee: kanaka kumar avvaru

 The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX


[ 
https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629217#comment-14629217
 ] 

Ming Ma commented on HDFS-8670:
---

Thanks [~andreina]. Overall it looks good.

The two test cases have quite amount of overlaps. It seems the only differences 
between decommissioned test case and decommission_inprogress test case are the 
number of datanodes and the expected decommission state the test cases wait 
for. Maybe these two test cases can call a common function that takes number of 
datanodes and expected decommission state as parameters?

 Better to exclude decommissioned nodes for namenode NodeUsage JMX
 -

 Key: HDFS-8670
 URL: https://issues.apache.org/jira/browse/HDFS-8670
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch


 The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of 
 DataNodes usage, it currently includes decommissioned nodes for the 
 calculation. However, given balancer doesn't work on decommissioned nodes and 
 sometimes we could have nodes stay in decommissioned states for a long time; 
 it might be better to exclude decommissioned nodes for the metrics 
 calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk


[ 
https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629075#comment-14629075
 ] 

Hadoop QA commented on HDFS-8787:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 28s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 12 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 30s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 37s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  3s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 23s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 16s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 169m 53s | Tests failed in hadoop-hdfs. |
| | | 212m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate |
| Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745540/HDFS-8787-HDFS-7285.00.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 7e091de |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11723/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11723/console |


This message was automatically generated.

 Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be 
 consistent with trunk
 ---

 Key: HDFS-8787
 URL: https://issues.apache.org/jira/browse/HDFS-8787
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8787-HDFS-7285.00.patch


 As Nicholas suggested under HDFS-8728, we should split the patch on 
 {{BlockInfo}} structure into smaller pieces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.


 [ 
https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-8483:
---
Status: Open  (was: Patch Available)

 Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to 
 a striped block.
 --

 Key: HDFS-8483
 URL: https://issues.apache.org/jira/browse/HDFS-8483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
 Fix For: HDFS-7285

 Attachments: HDFS-8483.0.patch


 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to 
 a striped block) to the NameNode (through the 
 DatanodeProtocol#reportBadBlocks call), and check if the 
 recovery/invalidation work can be correctly scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6697) Make NN lease soft and hard limits configurable


 [ 
https://issues.apache.org/jira/browse/HDFS-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-6697:
-
Attachment: HDFS-6697.2.patch

Updated the patch.
Please review.

 Make NN lease soft and hard limits configurable
 ---

 Key: HDFS-6697
 URL: https://issues.apache.org/jira/browse/HDFS-6697
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-6697.1.patch, HDFS-6697.2.patch


 For testing, NameNodeAdapter allows test code to specify lease soft and hard 
 limit via setLeasePeriod directly on LeaseManager. But NamenodeProxies.java 
 still use the default values.
  
 It is useful if we can make NN lease soft and hard limit configurable via 
 Configuration. That will allow NamenodeProxies.java to use the configured 
 values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8697) Refactor DecommissionManager: more generic method names and misc cleanup


[ 
https://issues.apache.org/jira/browse/HDFS-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629011#comment-14629011
 ] 

Hadoop QA commented on HDFS-8697:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 56s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   8m  2s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  3s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 27s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 3  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 18s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 10s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 159m 58s | Tests failed in hadoop-hdfs. |
| | | 205m 29s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits |
|   | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745524/HDFS-8697.01.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11719/artifact/patchprocess/whitespace.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11719/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11719/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11719/console |


This message was automatically generated.

 Refactor DecommissionManager: more generic method names and misc cleanup
 

 Key: HDFS-8697
 URL: https://issues.apache.org/jira/browse/HDFS-8697
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8697.00.patch, HDFS-8697.01.patch


 This JIRA merges the changes in {{DecommissionManager}} from the HDFS-7285 
 branch, including changing a few method names to be more generic 
 ({{replicated}} - {{stored}}), and some cleanups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.


[ 
https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629017#comment-14629017
 ] 

Takanobu Asanuma commented on HDFS-8483:


I'm going to submit a patch today or tommorow. Thanks for working about 
HDFS-8619.

 Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to 
 a striped block.
 --

 Key: HDFS-8483
 URL: https://issues.apache.org/jira/browse/HDFS-8483
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
 Fix For: HDFS-7285


 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to 
 a striped block) to the NameNode (through the 
 DatanodeProtocol#reportBadBlocks call), and check if the 
 recovery/invalidation work can be correctly scheduled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume


 [ 
https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7443:
--
Labels: 2.6.1-candidate  (was: )

 Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are 
 present in the same volume
 --

 Key: HDFS-7443
 URL: https://issues.apache.org/jira/browse/HDFS-7443
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Kihwal Lee
Assignee: Colin Patrick McCabe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7443.001.patch, HDFS-7443.002.patch


 When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of 
 datanodes were not coming up.  They treid data file layout upgrade for 
 BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed.
 All failures were caused by {{NativeIO.link()}} throwing IOException saying 
 {{EEXIST}}.  The data nodes didn't die right away, but the upgrade was soon 
 retried when the block pool initialization was retried whenever 
 {{BPServiceActor}} was registering with the namenode.  After many retries, 
 datenodes terminated.  This would leave {{previous.tmp}} and {{current}} with 
 no {{VERSION}} file in the block pool slice storage directory.  
 Although {{previous.tmp}} contained the old {{VERSION}} file, the content was 
 in the new layout and the subdirs were all newly created ones.  This 
 shouldn't have happened because the upgrade-recovery logic in {{Storage}} 
 removes {{current}} and renames {{previous.tmp}} to {{current}} before 
 retrying.  All successfully upgraded volumes had old state preserved in their 
 {{previous}} directory.
 In summary there were two observed issues.
 - Upgrade failure with {{link()}} failing with {{EEXIST}}
 - {{previous.tmp}} contained not the content of original {{current}}, but 
 half-upgraded one.
 We did not see this in smaller scale test clusters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7489) Incorrect locking in FsVolumeList#checkDirs can hang datanodes


 [ 
https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7489:
--
Labels: 2.6.1-candidate  (was: )

 Incorrect locking in FsVolumeList#checkDirs can hang datanodes
 --

 Key: HDFS-7489
 URL: https://issues.apache.org/jira/browse/HDFS-7489
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.5.0, 2.6.0
Reporter: Noah Lorang
Assignee: Noah Lorang
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch, 
 HDFS-7489-v2.patch.1


 Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes 
 hanging their heartbeat and requests from clients. After some digging, I 
 identified the culprit as being the checkDiskError() triggered by catching 
 IOExceptions (in our case, SocketExceptions being triggered on one datanode 
 by ReplicaAlreadyExistsExceptions on another datanode).
 Thread dumps reveal that the checkDiskErrors() thread is holding a lock on 
 the FsVolumeList:
 {code}
 Thread-409 daemon prio=10 tid=0x7f4e50200800 nid=0x5b8e runnable 
 [0x7f4e2f855000]
java.lang.Thread.State: RUNNABLE
 at java.io.UnixFileSystem.list(Native Method)
 at java.io.File.list(File.java:973)
 at java.io.File.listFiles(File.java:1051)
 at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89)
 at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
 at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180)
 - locked 0x00063b182ea0 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Other things would then lock the FsDatasetImpl while waiting for the 
 FsVolumeList, e.g.:
 {code}
 DataXceiver for client  at /10.10.0.52:46643 [Receiving block 
 BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_106962574] daemon 
 prio=10 tid=0x7f4e55561000 nid=0x406d waiting for monitor entry 
 [0x7f4e3106d000]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64)
 - waiting to lock 0x00063b182ea0 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927)
 - locked 0x00063b1f9a48 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:167)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 That lock on the FsDatasetImpl then causes other threads to block:
 {code}
 Thread-127 daemon prio=10 tid=0x7f4e4c67d800 nid=0x2e02 waiting for 
 monitor entry [0x7f4e3339]
java.lang.Thread.State: BLOCKED (on object monitor)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:228)
 - waiting to lock 0x00063b1f9a48 (a 
 org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523)
 at 
 org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684)
 at

[jira] [Updated] (HDFS-7425) NameNode block deletion logging uses incorrect appender.


 [ 
https://issues.apache.org/jira/browse/HDFS-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7425:
--
Labels: 2.6.1-candidate  (was: )

 NameNode block deletion logging uses incorrect appender.
 

 Key: HDFS-7425
 URL: https://issues.apache.org/jira/browse/HDFS-7425
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7425-branch-2.1.patch


 The NameNode uses 2 separate Log4J appenders for tracking state changes.  The 
 appenders are named org.apache.hadoop.hdfs.StateChange and 
 BlockStateChange.  The intention of BlockStateChange is to separate more 
 verbose block state change logging and allow it to be configured separately.  
 In branch-2, there is some block state change logging that incorrectly goes 
 to the org.apache.hadoop.hdfs.StateChange appender though.  The bug is not 
 present in trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)


 [ 
https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7503:
--
Labels: 2.6.1-candidate  (was: )

 Namenode restart after large deletions can cause slow processReport (due to 
 logging)
 

 Key: HDFS-7503
 URL: https://issues.apache.org/jira/browse/HDFS-7503
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.1, 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
  Labels: 2.6.1-candidate
 Fix For: 1.3.0, 2.6.1

 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, 
 HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch


 If a large directory is deleted and namenode is immediately restarted, there 
 are a lot of blocks that do not belong to any file. This results in a log:
 {code}
 2014-11-08 03:11:45,584 INFO BlockStateChange 
 (BlockManager.java:processReport(1901)) - BLOCK* processReport: 
 blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file.
 {code}
 This log is printed within FSNamsystem lock. This can cause namenode to take 
 long time in coming out of safemode.
 One solution is to downgrade the logging level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7575) Upgrade should generate a unique storage ID for each volume


 [ 
https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7575:
--
Labels: 2.6.1-candidate  (was: )

 Upgrade should generate a unique storage ID for each volume
 ---

 Key: HDFS-7575
 URL: https://issues.apache.org/jira/browse/HDFS-7575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Lars Francke
Assignee: Arpit Agarwal
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, 
 HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, 
 HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, 
 testUpgrade22via24GeneratesStorageIDs.tgz, 
 testUpgradeFrom22GeneratesStorageIDs.tgz, 
 testUpgradeFrom24PreservesStorageId.tgz


 Before HDFS-2832 each DataNode would have a unique storageId which included 
 its IP address. Since HDFS-2832 the DataNodes have a unique storageId per 
 storage directory which is just a random UUID.
 They send reports per storage directory in their heartbeats. This heartbeat 
 is processed on the NameNode in the 
 {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would 
 just store the information per Datanode. After the patch though each DataNode 
 can have multiple different storages so it's stored in a map keyed by the 
 storage Id.
 This works fine for all clusters that have been installed post HDFS-2832 as 
 they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 
 different keys. On each Heartbeat the Map is searched and updated 
 ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}):
 {code:title=DatanodeStorageInfo}
   void updateState(StorageReport r) {
 capacity = r.getCapacity();
 dfsUsed = r.getDfsUsed();
 remaining = r.getRemaining();
 blockPoolUsed = r.getBlockPoolUsed();
   }
 {code}
 On clusters that were upgraded from a pre HDFS-2832 version though the 
 storage Id has not been rewritten (at least not on the four clusters I 
 checked) so each directory will have the exact same storageId. That means 
 there'll be only a single entry in the {{storageMap}} and it'll be 
 overwritten by a random {{StorageReport}} from the DataNode. This can be seen 
 in the {{updateState}} method above. This just assigns the capacity from the 
 received report, instead it should probably sum it up per received heartbeat.
 The Balancer seems to be one of the only things that actually uses this 
 information so it now considers the utilization of a random drive per 
 DataNode for balancing purposes.
 Things get even worse when a drive has been added or replaced as this will 
 now get a new storage Id so there'll be two entries in the storageMap. As new 
 drives are usually empty it skewes the balancers decision in a way that this 
 node will never be considered over-utilized.
 Another problem is that old StorageReports are never removed from the 
 storageMap. So if I replace a drive and it gets a new storage Id the old one 
 will still be in place and used for all calculations by the Balancer until a 
 restart of the NameNode.
 I can try providing a patch that does the following:
 * Instead of using a Map I could just store the array we receive or instead 
 of storing an array sum up the values for reports with the same Id
 * On each heartbeat clear the map (so we know we have up to date information)
 Does that sound sensible?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7579) Improve log reporting during block report rpc failure


 [ 
https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7579:
--
Labels: 2.6.1-candidate supportability  (was: supportability)

 Improve log reporting during block report rpc failure
 -

 Key: HDFS-7579
 URL: https://issues.apache.org/jira/browse/HDFS-7579
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode
Affects Versions: 2.7.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Minor
  Labels: 2.6.1-candidate, supportability
 Fix For: 2.7.0

 Attachments: HDFS-7579.000.patch, HDFS-7579.001.patch


 During block reporting, if the block report RPC fails, for example because it 
 exceeded the max rpc len, we should still produce some sort of LOG.info 
 output to help with debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7733) NFS: readdir/readdirplus return null directory attribute on failure


 [ 
https://issues.apache.org/jira/browse/HDFS-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7733:
--
Labels: 2.6.1-candidate  (was: )

 NFS: readdir/readdirplus return null directory attribute on failure
 ---

 Key: HDFS-7733
 URL: https://issues.apache.org/jira/browse/HDFS-7733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
  Labels: 2.6.1-candidate
 Fix For: 2.6.1

 Attachments: HDFS-7733.01.patch


 NFS readdir and readdirplus operations return a null directory attribute on 
 some failure paths. This causes clients to get a 'Stale file handle' error 
 which can only be fixed by unmounting and remounting the share.
 The issue can be reproduced by running 'ls' against a large directory which 
 is being actively modified, triggering the 'cookie mismatch' failure path.
 {code}
 } else {
   LOG.error(cookieverf mismatch. request cookieverf:  + cookieVerf
   +  dir cookieverf:  + dirStatus.getModificationTime());
   return new READDIRPLUS3Response(Nfs3Status.NFS3ERR_BAD_COOKIE);
 }
 {code}
 Thanks to [~brandonli] for catching the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7596) NameNode should prune dead storages from storageMap


 [ 
https://issues.apache.org/jira/browse/HDFS-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7596:
--
Labels: 2.6.1-candidate  (was: )

 NameNode should prune dead storages from storageMap
 ---

 Key: HDFS-7596
 URL: https://issues.apache.org/jira/browse/HDFS-7596
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7596.01.patch, HDFS-7596.02.patch


 The NameNode must be able to prune storages that are no longer reported by 
 the DataNode and that have no blocks associated. These stale storages can 
 skew the balancer behavior.
 Detailed discussion on HDFS-7575.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7885) Datanode should not trust the generation stamp provided by client


 [ 
https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7885:
--
Labels: 2.6.1-candidate  (was: )

 Datanode should not trust the generation stamp provided by client
 -

 Key: HDFS-7885
 URL: https://issues.apache.org/jira/browse/HDFS-7885
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: vitthal (Suhas) Gogate
Assignee: Tsz Wo Nicholas Sze
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: h7885_20150305.patch, h7885_20150306.patch


 Datanode should not trust the generation stamp provided by client, since it 
 is prefetched and buffered in client, and concurrent append may increase it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()


 [ 
https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7831:
--
Labels: 2.6.1-candidate  (was: )

 Fix the starting index and end condition of the loop in 
 FileDiffList.findEarlierSnapshotBlocks()
 

 Key: HDFS-7831
 URL: https://issues.apache.org/jira/browse/HDFS-7831
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7831-01.patch


 Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts 
 from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted 
 in [Jing's 
 comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8072) Reserved RBW space is not released if client terminates while writing block


 [ 
https://issues.apache.org/jira/browse/HDFS-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8072:
--
Labels: 2.6.1-candidate  (was: )

 Reserved RBW space is not released if client terminates while writing block
 ---

 Key: HDFS-8072
 URL: https://issues.apache.org/jira/browse/HDFS-8072
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.6.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-8072.01.patch, HDFS-8072.02.patch


 The DataNode reserves space for a full block when creating an RBW block 
 (introduced in HDFS-6898).
 The reserved space is released incrementally as data is written to disk and 
 fully when the block is finalized. However if the client process terminates 
 unexpectedly mid-write then the reserved space is not released until the DN 
 is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8127) NameNode Failover during HA upgrade can cause DataNode to finalize upgrade


 [ 
https://issues.apache.org/jira/browse/HDFS-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-8127:
--
Labels: 2.6.1-candidate  (was: )

 NameNode Failover during HA upgrade can cause DataNode to finalize upgrade
 --

 Key: HDFS-8127
 URL: https://issues.apache.org/jira/browse/HDFS-8127
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: 2.4.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: HDFS-8127.000.patch, HDFS-8127.001.patch


 Currently for HA upgrade (enabled by HDFS-5138), we use {{-bootstrapStandby}} 
 to initialize the standby NameNode. The standby NameNode does not have the 
 {{previous}} directory thus it does not know that the cluster is in the 
 upgrade state. If NN failover happens, as response of block reports, the new 
 ANN will tell DNs to finalize the upgrade thus make it impossible to rollback 
 again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7960) The full block report should prune zombie storages even if they're not empty


 [ 
https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated HDFS-7960:
--
Labels: 2.6.1-candidate  (was: )

 The full block report should prune zombie storages even if they're not empty
 

 Key: HDFS-7960
 URL: https://issues.apache.org/jira/browse/HDFS-7960
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Lei (Eddy) Xu
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, 
 HDFS-7960.004.patch, HDFS-7960.005.patch, HDFS-7960.006.patch, 
 HDFS-7960.007.patch, HDFS-7960.008.patch


 The full block report should prune zombie storages even if they're not empty. 
  We have seen cases in production where zombie storages have not been pruned 
 subsequent to HDFS-7575.  This could arise any time the NameNode thinks there 
 is a block in some old storage which is actually not there.  In this case, 
 the block will not show up in the new storage (once old is renamed to new) 
 and the old storage will linger forever as a zombie, even with the HDFS-7596 
 fix applied.  This also happens with datanode hotplug, when a drive is 
 removed.  In this case, an entire storage (volume) goes away but the blocks 
 do not show up in another storage on the same datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread


 [ 
https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8760:

Attachment: HDFS-8760.000.patch

 Erasure Coding: reuse BlockReader when reading the same block in pread
 --

 Key: HDFS-8760
 URL: https://issues.apache.org/jira/browse/HDFS-8760
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8760.000.patch


 Currently in pread, we create a new block reader for each aligned stripe even 
 though these stripes belong to the same block. It's better to reuse them to 
 avoid unnecessary block reader creation overhead. This can also avoid reading 
 from the same bad DataNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread


 [ 
https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-8760:

Attachment: (was: HDFS-8760.000.patch)

 Erasure Coding: reuse BlockReader when reading the same block in pread
 --

 Key: HDFS-8760
 URL: https://issues.apache.org/jira/browse/HDFS-8760
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8760.000.patch


 Currently in pread, we create a new block reader for each aligned stripe even 
 though these stripes belong to the same block. It's better to reuse them to 
 avoid unnecessary block reader creation overhead. This can also avoid reading 
 from the same bad DataNode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8783) enable socket timeout for balancer's target connection


[ 
https://issues.apache.org/jira/browse/HDFS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629036#comment-14629036
 ] 

Hadoop QA commented on HDFS-8783:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m  7s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 34s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 34s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 28s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 30s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  1s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 160m 57s | Tests failed in hadoop-hdfs. |
| | | 201m 50s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745489/HDFS-8783.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11722/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11722/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11722/console |


This message was automatically generated.

 enable socket timeout for balancer's target connection
 --

 Key: HDFS-8783
 URL: https://issues.apache.org/jira/browse/HDFS-8783
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8783.patch


 Have met a real case when the balancer connected to a black hole target 
 datanode which accepted connection but not sent any response back, then 
 balancer got hung



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI

2015-07-15 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629045#comment-14629045
 ] 

Benoy Antony commented on HDFS-7483:


[~wheat9], As I mentioned, there is no good way of displaying percentage using 
math helper and fmt_percentage filter. If you have no further comments, I'll 
commit this patch by end of day tomorrow.

 Display information per tier on the Namenode UI
 ---

 Key: HDFS-7483
 URL: https://issues.apache.org/jira/browse/HDFS-7483
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, overview.png, 
 storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, 
 withTwoStorageType.png


 If cluster has different types of storage, it is useful to display the 
 storage information per type. 
 The information will be available via JMX (HDFS-7390)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile


[ 
https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629048#comment-14629048
 ] 

Zhe Zhang commented on HDFS-8728:
-

Good point Nicholas. I should perhaps change the title of the JIRA. The latest 
patch pretty much is for the purpose of merging HDFS-8499 to the branch. But as 
part of the merging we need to change the {{BIStriped} logic, which needs 
additional review in the context of the branch.

 Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
 ---

 Key: HDFS-8728
 URL: https://issues.apache.org/jira/browse/HDFS-8728
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8728-HDFS-7285.00.patch, 
 HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, 
 HDFS-8728-HDFS-7285.03.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, 
 HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, 
 Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, 
 Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, 
 Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails


[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627671#comment-14627671
 ] 

Walter Su commented on HDFS-8704:
-

Hi [~libo-intel]! The test failed. Seems like the issue still exists. Could you 
update the patch? This jira has higher priority. Thanks.

 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID


 [ 
https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8779:

Status: Patch Available  (was: Open)

 WebUI can't display randomly generated block ID
 ---

 Key: HDFS-8779
 URL: https://issues.apache.org/jira/browse/HDFS-8779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8779.01.patch


 Old release use randomly generated block ID(HDFS-4645).
 max value of Long in Java is 2^63-1
 max value of number in Javascript is 2^53-1. ( See 
 [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER])
 Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER.
 A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks

2015-07-15 Thread Masatake Iwasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627801#comment-14627801
 ] 

Masatake Iwasaki commented on HDFS-8344:


Hi, [~raviprak].

{code}
67private int recoveryAttemptsBeforeMarkingBlockMissing = 5;
{code}

Should this be configurable? I think infinite is conservative and preferable 
default value in order to avoid data loss and keep current behavior. 5 could be 
used as threshold to show warning message as [~kihwal] suggested.


 NameNode doesn't recover lease for files with missing blocks
 

 Key: HDFS-8344
 URL: https://issues.apache.org/jira/browse/HDFS-8344
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, 
 HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch


 I found another\(?) instance in which the lease is not recovered. This is 
 reproducible easily on a pseudo-distributed single node cluster
 # Before you start it helps if you set. This is not necessary, but simply 
 reduces how long you have to wait
 {code}
   public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000;
   public static final long LEASE_HARDLIMIT_PERIOD = 2 * 
 LEASE_SOFTLIMIT_PERIOD;
 {code}
 # Client starts to write a file. (could be less than 1 block, but it hflushed 
 so some of the data has landed on the datanodes) (I'm copying the client code 
 I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar)
 # Client crashes. (I simulate this by kill -9 the $(hadoop jar 
 TestHadoop.jar) process after it has printed Wrote to the bufferedWriter
 # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was 
 only 1)
 I believe the lease should be recovered and the block should be marked 
 missing. However this is not happening. The lease is never recovered.
 The effect of this bug for us was that nodes could not be decommissioned 
 cleanly. Although we knew that the client had crashed, the Namenode never 
 released the leases (even after restarting the Namenode) (even months 
 afterwards). There are actually several other cases too where we don't 
 consider what happens if ALL the datanodes die while the file is being 
 written, but I am going to punt on that for another time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8779) WebUI can't display randomly generated block ID

Walter Su created HDFS-8779:
---

 Summary: WebUI can't display randomly generated block ID
 Key: HDFS-8779
 URL: https://issues.apache.org/jira/browse/HDFS-8779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor


Old release use randomly generated block ID(HDFS-4645).
max value of Long in Java is 2^63-1
max value of number in Javascript is 2^53-1. ( See 
[Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER])

Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER.

A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8762) Erasure Coding: the log of each streamer should show its index


[ 
https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627809#comment-14627809
 ] 

Hadoop QA commented on HDFS-8762:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m 11s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 33s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 15s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 36s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 27s | The patch appears to introduce 6 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 15s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 174m 24s | Tests failed in hadoop-hdfs. |
| | | 216m 44s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Failed unit tests | hadoop.hdfs.server.datanode.TestTransferRbw |
|   | hadoop.hdfs.server.namenode.TestFileTruncate |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745387/HDFS-8762-HDFS-7285-001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 0a93712 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11709/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11709/console |


This message was automatically generated.

 Erasure Coding: the log of each streamer should show its index
 --

 Key: HDFS-8762
 URL: https://issues.apache.org/jira/browse/HDFS-8762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8762-HDFS-7285-001.patch


 The log in {{DataStreamer}} doesn't show which streamer it's generated from. 
 In order to make log information more convenient for debugging, each log 
 should include the index of the streamer it's generated from. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-15 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627827#comment-14627827
 ] 

Duo Zhang commented on HDFS-7966:
-

Small read using {{PerformanceTest}}. Unit is millisecond.

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 
1(thread number) 100(read count per thread) 1024(bytes per read) pread(use 
pread)
{noformat}

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 
100 1024 pread
*** time based on tcp 242730

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 
100 1024 pread
*** time based on http2 324491

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 
10 1024 pread
*** time based on tcp 40688

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 
10 1024 pread
*** time based on http2 82819

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
1 1024 pread
*** time based on tcp 21612

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
1 1024 pread
*** time based on http2 69658

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 
2000 1024 pread
*** time based on tcp 19931

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 
2000 1024 pread
*** time based on http2 151727

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1000 
1000 1024 pread
*** time based on http2 251735
{noformat}

For the single threaded test, 324491/242730=1.34, so http2 is 30% slow than 
tcp. Will try to find the overhead later.

And for multi threaded test, http2 is much slow than tcp. And tcp failed the 
1000 threads test.

I think the problem is that I only use one connection in http2 so there is only 
one EventLoop(which means only one thread) which sends or receives data. And 
for tcp, the thread number is same with connection number. The {{%CPU}} of 
datanode when using http2 is always around 100% no matter the thread number is 
10 or 100 or 1000. But when using tcp the {{%CPU}} could be higher than 1500% 
when the number of thread increasing. Next I will write new test which can use 
multiple http2 connections.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile


 [ 
https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8058:

Attachment: HDFS-8058-HDFS-7285.010.patch

Thanks Jing for noticing the {{TestDFSStripedOutputStreamWithFailure}} timeout.

It turns out to be a tricky bug. It happened between 06 and 07 patch. Basically 
I forgot to carry over the {{setFileReplication((short) 0)}} logic in the new 
{{INodeFile}} constructor. New patch addresses this issue:
{code}
  // Replication factor for striped files is zero
  if (isStriped) {
h = REPLICATION.BITS.combine(0L, h);
h = IS_STRIPED.BITS.combine(1L, h);
  } else {
h = REPLICATION.BITS.combine(replication, h);
h = IS_STRIPED.BITS.combine(0L, h);
  }
{code}

 Erasure coding: use BlockInfo[] for both striped and contiguous blocks in 
 INodeFile
 ---

 Key: HDFS-8058
 URL: https://issues.apache.org/jira/browse/HDFS-8058
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Yi Liu
Assignee: Zhe Zhang
 Attachments: HDFS-8058-HDFS-7285.003.patch, 
 HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, 
 HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, 
 HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, 
 HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch


 This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous 
 blocks in INodeFile.
 Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped 
 blocks, and the methods there duplicate with those in INodeFile, and current 
 code need to judge {{isStriped}} then do different things. Also if file is 
 striped, the {{blocks}} in INodeFile occupy a reference memory space.
 These are not necessary, and we can use the same {{blocks}} to make code more 
 clear.
 I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file 
 a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from 
 *BlockInfoStriped* to INodeFile, since ideally they are the same for all 
 striped blocks in a file, and store them in block will waste NN memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck


 [ 
https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8768:

Attachment: screen-shot-with-HDFS-8779-patch.PNG

 Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
 -

 Key: HDFS-8768
 URL: https://issues.apache.org/jira/browse/HDFS-8768
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: GAO Rui
 Attachments: Screen Shot 2015-07-14 at 15.33.08.png, 
 screen-shot-with-HDFS-8779-patch.PNG


 For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   
 file with one block group was displayed as the attached screenshot [^Screen 
 Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of 
 the same file was displayed like: {{0. 
 BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
 len=6438256640}}
 After checking block file names in datanodes, we believe WebUI may have some 
 problem with Erasure Code block group display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8762) Erasure Coding: the log of each streamer should show its index


[ 
https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627666#comment-14627666
 ] 

Li Bo commented on HDFS-8762:
-

Add {{this}} to log string is also a solution. But it adds too much to the log 
string. I think only the index is enough. Some log strings are generated in 
static function, I have to change them to non-static, is there any better idea 
of this problem?

 Erasure Coding: the log of each streamer should show its index
 --

 Key: HDFS-8762
 URL: https://issues.apache.org/jira/browse/HDFS-8762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8762-HDFS-7285-001.patch


 The log in {{DataStreamer}} doesn't show which streamer it's generated from. 
 In order to make log information more convenient for debugging, each log 
 should include the index of the streamer it's generated from. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails


[ 
https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627677#comment-14627677
 ] 

Li Bo commented on HDFS-8704:
-

I am still working on this jira. The error is random and now it succeeds in 
most times. I still need several days to get it totally works well.

 Erasure Coding: client fails to write large file when one datanode fails
 

 Key: HDFS-8704
 URL: https://issues.apache.org/jira/browse/HDFS-8704
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8704-000.patch


 I test current code on a 5-node cluster using RS(3,2).  When a datanode is 
 corrupt, client succeeds to write a file smaller than a block group but fails 
 to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests 
 files smaller than a block group, this jira will add more test situations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID


 [ 
https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-8779:

Attachment: HDFS-8779.01.patch

 WebUI can't display randomly generated block ID
 ---

 Key: HDFS-8779
 URL: https://issues.apache.org/jira/browse/HDFS-8779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8779.01.patch


 Old release use randomly generated block ID(HDFS-4645).
 max value of Long in Java is 2^63-1
 max value of number in Javascript is 2^53-1. ( See 
 [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER])
 Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER.
 A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8716) introduce a new config specifically for safe mode block count


[ 
https://issues.apache.org/jira/browse/HDFS-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627589#comment-14627589
 ] 

Hadoop QA commented on HDFS-8716:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 20s | The applied patch generated  1 
new checkstyle issues (total was 676, now 676). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 20s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  4s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 160m 56s | Tests failed in hadoop-hdfs. |
| | | 204m 32s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate |
|   | hadoop.hdfs.server.namenode.ha.TestDNFencing |
|   | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745376/HDFS-8716.7.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0a16ee6 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11708/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11708/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11708/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11708/console |


This message was automatically generated.

 introduce a new config specifically for safe mode block count
 -

 Key: HDFS-8716
 URL: https://issues.apache.org/jira/browse/HDFS-8716
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8716.1.patch, HDFS-8716.2.patch, HDFS-8716.3.patch, 
 HDFS-8716.4.patch, HDFS-8716.5.patch, HDFS-8716.6.patch, HDFS-8716.7.patch, 
 HDFS-8716.7.patch


 During the start up, namenode waits for n replicas of each block to be 
 reported by datanodes before exiting the safe mode. Currently n is tied to 
 the min replicas config. We could set min replicas to more than one but we 
 might want to exit safe mode as soon as each block has one replica reported. 
 This can be worked out by introducing a new config variable for safe mode 
 block count



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8762) Erasure Coding: the log of each streamer should show its index


 [ 
https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8762:

Status: Patch Available  (was: Open)

 Erasure Coding: the log of each streamer should show its index
 --

 Key: HDFS-8762
 URL: https://issues.apache.org/jira/browse/HDFS-8762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8762-HDFS-7285-001.patch


 The log in {{DataStreamer}} doesn't show which streamer it's generated from. 
 In order to make log information more convenient for debugging, each log 
 should include the index of the streamer it's generated from. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8762) Erasure Coding: the log of each streamer should show its index


 [ 
https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Bo updated HDFS-8762:

Attachment: HDFS-8762-HDFS-7285-001.patch

 Erasure Coding: the log of each streamer should show its index
 --

 Key: HDFS-8762
 URL: https://issues.apache.org/jira/browse/HDFS-8762
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Li Bo
Assignee: Li Bo
 Attachments: HDFS-8762-HDFS-7285-001.patch


 The log in {{DataStreamer}} doesn't show which streamer it's generated from. 
 In order to make log information more convenient for debugging, each log 
 should include the index of the streamer it's generated from. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks


[ 
https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627656#comment-14627656
 ] 

Walter Su commented on HDFS-8619:
-

LGTM. +1

 Erasure Coding: revisit replica counting for striped blocks
 ---

 Key: HDFS-8619
 URL: https://issues.apache.org/jira/browse/HDFS-8619
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-8619-HDFS-7285.001.patch, HDFS-8619.000.patch


 Currently we use the same {{BlockManager#countNodes}} method for striped 
 blocks, which simply treat each internal block as a replica. However, for a 
 striped block, we may have more complicated scenario, e.g., we have multiple 
 replicas of the first internal block while we miss some other internal 
 blocks. Using the current {{countNodes}} methods can lead to wrong decision 
 in these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8784) BlockInfo#numNodes should be numStorages

Zhe Zhang created HDFS-8784:
---

 Summary: BlockInfo#numNodes should be numStorages
 Key: HDFS-8784
 URL: https://issues.apache.org/jira/browse/HDFS-8784
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.7.1
Reporter: Zhe Zhang


The method actually returns the number of storages holding a block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8694) Expose the stats of IOErrors on each FsVolume through JMX

2015-07-15 Thread Lei (Eddy) Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628371#comment-14628371
 ] 

Lei (Eddy) Xu commented on HDFS-8694:
-

Thanks for the reviews, [~andrew.wang]

bq. I have a hard time understanding when we should call handle the disk error 
vs. just bubbling up, since it bubbles there seems like a danger of handling 
the same root IOE more than once. What's the methodology here? Is it possible 
to move handling to the top-level somewhere? I can manually examine all the 
current callsites and callers, but that's not very future-proof.

The reason that call {{volume#handleIOErrors()}} is that when the {{IOE}} pops 
up to the place we used to call {{DataNode#checkDiskErrorAsync()}}, the context 
(IOs on which volume) is usually missing. My intention was to call 
{{volume#handleIOErrors()}} at the highest level that manages {{volume}} object 
lifetime. I will try to get rid of {{DataNode#checkDiskErrorAsync()}} call in a 
following JIRA.

bq. Since we now have the volume as context, we should really move the disk 
checker to be per-volume rather than DN wide. One volume throwing an error is 
no reason to check all of them. This can be deferred to a follow-up; I think 
it's a slam dunk.

Yes. It is the reason to put {{hadnleIOErrors()}} in to {{FsVolumeSpi}}. I was 
thinking to use a per-volume thread to do {{checkDirs()}} and also use 
{{numOfErrors()}} as trigger. I will do it in a following JIRA as well.

Working on the rest of comments.

Thanks a lot for these great comments.

 Expose the stats of IOErrors on each FsVolume through JMX
 -

 Key: HDFS-8694
 URL: https://issues.apache.org/jira/browse/HDFS-8694
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, HDFS
Affects Versions: 2.7.0
Reporter: Lei (Eddy) Xu
Assignee: Lei (Eddy) Xu
 Attachments: HDFS-8694.000.patch, HDFS-8694.001.patch


 Currently, once DataNode hits an {{IOError}} when writing / reading block 
 files, it starts a background {{DiskChecker.checkDirs()}} thread. But if this 
 thread successfully finishes, DN does not record this {{IOError}}. 
 We need one measurement to count all {{IOErrors}} for each volume.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID


[ 
https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628439#comment-14628439
 ] 

Hadoop QA commented on HDFS-8779:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m 55s | Findbugs (version 3.0.0) 
appears to be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 39s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 43s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   2m  1s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   4m 28s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m  1s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 160m 48s | Tests failed in hadoop-hdfs. |
| {color:green}+1{color} | hdfs tests |   0m 28s | Tests passed in 
hadoop-hdfs-client. |
| | | 207m 37s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745448/HDFS-8779.02.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11714/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| hadoop-hdfs-client test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11714/artifact/patchprocess/testrun_hadoop-hdfs-client.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11714/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11714/console |


This message was automatically generated.

 WebUI can't display randomly generated block ID
 ---

 Key: HDFS-8779
 URL: https://issues.apache.org/jira/browse/HDFS-8779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch


 Old release use randomly generated block ID(HDFS-4645).
 max value of Long in Java is 2^63-1
 max value of number in Javascript is 2^53-1. ( See 
 [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER])
 Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER.
 A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil


[ 
https://issues.apache.org/jira/browse/HDFS-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628459#comment-14628459
 ] 

Jing Zhao commented on HDFS-8433:
-

In 03 patch, when checking the block token, the {{BlockTokenSecrectManager}} 
still uses a {{BlockTokenIdentifier}} to parse the ID of the token, thus if 
both sides are new DataNodes, the ID range information cannot be retrieved. 

 blockToken is not set in constructInternalBlock and parseStripedBlockGroup in 
 StripedBlockUtil
 --

 Key: HDFS-8433
 URL: https://issues.apache.org/jira/browse/HDFS-8433
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Walter Su
 Attachments: HDFS-8433-HDFS-7285.02.patch, HDFS-8433.00.patch, 
 HDFS-8433.01.patch, HDFS-8433.03.PoC.patch


 The blockToken provided in LocatedStripedBlock is not used to create 
 LocatedBlock in constructInternalBlock and parseStripedBlockGroup in 
 StripedBlockUtil.
 We should also add ec tests with security on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8783) enable socket timeout for balancer's target connection

2015-07-15 Thread Chang Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chang Li updated HDFS-8783:
---
Attachment: HDFS-8783.patch

 enable socket timeout for balancer's target connection
 --

 Key: HDFS-8783
 URL: https://issues.apache.org/jira/browse/HDFS-8783
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8783.patch


 Have met a real case when the balancer connected to a black hole target 
 datanode which accepted connection but not sent any response back, then 
 balancer got hung



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile


 [ 
https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8728:

Attachment: HDFS-8728-HDFS-7285.02.patch

Uploading branch-based {{HDFS-8728-HDFS-7285.02.patch}} to address some of 
Andrew's comments:

# Since the patch is already large, will leave the ideas of {{getOp}} and 
saving reference in {{StripedBlockStorageOp}} as follow-ons.
# I filed HDFS-8784 to rename {{numNodes}}.

This patch basically makes necessary changes to merge trunk's {{BlockInfo}} 
hierarchy back to HDFS-7285 branch (as well as adding the striped 
counterparts). If we agree upon this direction I will create another patch to 
replace all unnecessary usages of {{BIC}}, {{BIS}}, {{BIUCC}}, {{BIUCS}} with 
{{BlockInfo}} and {{BlockInfoUC}}.

After reaching a conclusion here, I plan to update {{Merge-1}} to {{Merge-14}} 
patches accordingly, and then rebase the HDFS-7285 branch to catch up with 
trunk.

[~jingzhao] Could you share some advice here? Thanks!

 Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
 ---

 Key: HDFS-8728
 URL: https://issues.apache.org/jira/browse/HDFS-8728
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8728-HDFS-7285.00.patch, 
 HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, 
 HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, 
 Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, 
 Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, 
 Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, 
 Merge-8-inodeFile.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile


 [ 
https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8058:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: HDFS-7285
   Status: Resolved  (was: Patch Available)

Since 10 patch only makes a minor change from 09, committing the patch based on 
Jing's review. I tested {{TestFileLengthOnClusterRestart}} and 
{{TestFileAppend3}} locally and they passed.

Thanks Yi for the initial work, and Jing / Walter for the helpful reviews!

 Erasure coding: use BlockInfo[] for both striped and contiguous blocks in 
 INodeFile
 ---

 Key: HDFS-8058
 URL: https://issues.apache.org/jira/browse/HDFS-8058
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Yi Liu
Assignee: Zhe Zhang
 Fix For: HDFS-7285

 Attachments: HDFS-8058-HDFS-7285.003.patch, 
 HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, 
 HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, 
 HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, 
 HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch


 This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous 
 blocks in INodeFile.
 Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped 
 blocks, and the methods there duplicate with those in INodeFile, and current 
 code need to judge {{isStriped}} then do different things. Also if file is 
 striped, the {{blocks}} in INodeFile occupy a reference memory space.
 These are not necessary, and we can use the same {{blocks}} to make code more 
 clear.
 I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file 
 a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from 
 *BlockInfoStriped* to INodeFile, since ideally they are the same for all 
 striped blocks in a file, and store them in block will waste NN memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones

2015-07-15 Thread Xiaoyu Yao (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Xiaoyu Yao updated HDFS-8747:
-
Attachment: HDFS-8747-07152015.pdf

Provide Better Scratch Space and Soft Delete Support for HDFS Encryption
Zones
--

Key: HDFS-8747
URL: https://issues.apache.org/jira/browse/HDFS-8747
Project: Hadoop HDFS
Issue Type: Bug
Components: encryption
Affects Versions: 2.6.0
Reporter: Xiaoyu Yao
Assignee: Xiaoyu Yao
Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf

HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to
allow create encryption zone on top of a single HDFS directory. Files under
the root directory of the encryption zone will be encrypted/decrypted
transparently upon HDFS client write or read operations.
Generally, it does not support rename(without data copying) across encryption
zones or between encryption zone and non-encryption zone because different
security settings of encryption zones. However, there are certain use cases
where efficient rename support is desired. This JIRA is to propose better
support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft
Delete” (a.k.a. trash) with HDFS encryption zones.
“Scratch Space” is widely used in Hadoop jobs, which requires efficient
rename support. Temporary files from MR jobs are usually stored in staging
area outside encryption zone such as “/tmp” directory and then rename to
targeted directories as specified once the data is ready to be further
processed.
Below is a summary of supported/unsupported cases from latest Hadoop:
* Rename within the encryption zone is supported
* Rename the entire encryption zone by moving the root directory of the zone
is allowed.
* Rename sub-directory/file from encryption zone to non-encryption zone is
not allowed.
* Rename sub-directory/file from encryption zone A to encryption zone B is
not allowed.
* Rename from non-encryption zone to encryption zone is not allowed.
“Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that
helps prevent accidental deletion of files and directories. If trash is
enabled and a file or directory is deleted using the Hadoop shell, the file
is moved to the .Trash directory of the user's home directory instead of
being deleted. Deleted files are initially moved (renamed) to the Current
sub-directory of the .Trash directory with original path being preserved.
Files and directories in the trash can be restored simply by moving them to a
location outside the .Trash directory.
Due to the limited rename support, delete sub-directory/file within
encryption zone with trash feature is not allowed. Client has to use
-skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved
the error message but without a complete solution to the problem.
We propose to solve the problem by generalizing the mapping between
encryption zone and its underlying HDFS directories from 1:1 today to 1:N.
The encryption zone should allow non-overlapped directories such as scratch
space or soft delete trash locations to be added/removed dynamically after
creation. This way, rename for scratch space and soft delete can be
better supported without breaking the assumption that rename is only
supported within the zone.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX


[ 
https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627934#comment-14627934
 ] 

J.Andreina commented on HDFS-8670:
--

Thanks [~mingma] .

I have updated the patch as per your review comments.

bq.Any reason it changes to call fetchDatanodes with parameter 
removeDecommissionNode set to false?
Seems there is an issue in the logic for removing  decommission node from 
live/dead node list .I have raised a separate jira for the same (HDFS-8780).

Please review the patch.


 Better to exclude decommissioned nodes for namenode NodeUsage JMX
 -

 Key: HDFS-8670
 URL: https://issues.apache.org/jira/browse/HDFS-8670
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch


 The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of 
 DataNodes usage, it currently includes decommissioned nodes for the 
 calculation. However, given balancer doesn't work on decommissioned nodes and 
 sometimes we could have nodes stay in decommissioned states for a long time; 
 it might be better to exclude decommissioned nodes for the metrics 
 calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX


 [ 
https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-8670:
-
Attachment: HDFS-8670.3.patch

 Better to exclude decommissioned nodes for namenode NodeUsage JMX
 -

 Key: HDFS-8670
 URL: https://issues.apache.org/jira/browse/HDFS-8670
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: J.Andreina
 Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch


 The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of 
 DataNodes usage, it currently includes decommissioned nodes for the 
 calculation. However, given balancer doesn't work on decommissioned nodes and 
 sometimes we could have nodes stay in decommissioned states for a long time; 
 it might be better to exclude decommissioned nodes for the metrics 
 calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile


[ 
https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627865#comment-14627865
 ] 

Hadoop QA commented on HDFS-8058:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  16m  8s | Findbugs (version ) appears to 
be broken on HDFS-7285. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 8 new or modified test files. |
| {color:green}+1{color} | javac |   7m 59s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m  8s | There were no new javadoc 
warning messages. |
| {color:red}-1{color} | release audit |   0m 14s | The applied patch generated 
1 release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 40s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  8s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   3m 36s | The patch appears to introduce 5 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   3m 13s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 107m 19s | Tests failed in hadoop-hdfs. |
| | | 151m 45s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdfs |
| Timed out tests | org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart |
|   | org.apache.hadoop.hdfs.TestFileAppend3 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745406/HDFS-8058-HDFS-7285.010.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | HDFS-7285 / 0a93712 |
| Release Audit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/patchReleaseAuditProblems.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11710/console |


This message was automatically generated.

 Erasure coding: use BlockInfo[] for both striped and contiguous blocks in 
 INodeFile
 ---

 Key: HDFS-8058
 URL: https://issues.apache.org/jira/browse/HDFS-8058
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7285
Reporter: Yi Liu
Assignee: Zhe Zhang
 Attachments: HDFS-8058-HDFS-7285.003.patch, 
 HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, 
 HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, 
 HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, 
 HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch


 This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous 
 blocks in INodeFile.
 Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped 
 blocks, and the methods there duplicate with those in INodeFile, and current 
 code need to judge {{isStriped}} then do different things. Also if file is 
 striped, the {{blocks}} in INodeFile occupy a reference memory space.
 These are not necessary, and we can use the same {{blocks}} to make code more 
 clear.
 I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file 
 a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from 
 *BlockInfoStriped* to INodeFile, since ideally they are the same for all 
 striped blocks in a file, and store them in block will waste NN memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck

2015-07-15 Thread GAO Rui (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GAO Rui updated HDFS-8768:
--
Description: 
This is duplicated by [HDFS-8779].

For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   file 
with one block group was displayed as the attached screenshot [^Screen Shot 
2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the 
same file was displayed like: {{0. 
BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
len=6438256640}}

After checking block file names in datanodes, we believe WebUI may have some 
problem with Erasure Code block group display.

  was:
For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   file 
with one block group was displayed as the attached screenshot [^Screen Shot 
2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the 
same file was displayed like: {{0. 
BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
len=6438256640}}

After checking block file names in datanodes, we believe WebUI may have some 
problem with Erasure Code block group display.


 Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
 -

 Key: HDFS-8768
 URL: https://issues.apache.org/jira/browse/HDFS-8768
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: GAO Rui
 Attachments: Screen Shot 2015-07-14 at 15.33.08.png, 
 screen-shot-with-HDFS-8779-patch.PNG


 This is duplicated by [HDFS-8779].
 For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   
 file with one block group was displayed as the attached screenshot [^Screen 
 Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of 
 the same file was displayed like: {{0. 
 BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
 len=6438256640}}
 After checking block file names in datanodes, we believe WebUI may have some 
 problem with Erasure Code block group display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck

2015-07-15 Thread GAO Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627889#comment-14627889
 ] 

GAO Rui commented on HDFS-8768:
---

Thank you [~walter.k.su] very much! 

 Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
 -

 Key: HDFS-8768
 URL: https://issues.apache.org/jira/browse/HDFS-8768
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: GAO Rui
 Attachments: Screen Shot 2015-07-14 at 15.33.08.png, 
 screen-shot-with-HDFS-8779-patch.PNG


 For example, In WebUI( usually, namenode port: 50070) , one Erasure Code   
 file with one block group was displayed as the attached screenshot [^Screen 
 Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of 
 the same file was displayed like: {{0. 
 BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 
 len=6438256640}}
 After checking block file names in datanodes, we believe WebUI may have some 
 problem with Erasure Code block group display.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.

J.Andreina created HDFS-8780:


 Summary: Fetching live/dead datanode list with arg true for 
removeDecommissionNode,returns list with decom node.
 Key: HDFS-8780
 URL: https://issues.apache.org/jira/browse/HDFS-8780
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: J.Andreina
Assignee: J.Andreina
Priority: Critical


Current implementation: 
==

DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be removed 
from dead/live node list only if below conditions are met
 I . If the Include list is not empty. 
 II. If include and exclude list does not have decommissioned node and node 
state is decommissioned. 

{code}
  if (!hostFileManager.hasIncludes()) {
  return;
   }

  if ((!hostFileManager.isIncluded(node))  
(!hostFileManager.isExcluded(node))
   node.isDecommissioned()) {
// Include list is not empty, an existing datanode does not appear
// in both include or exclude lists and it has been decommissioned.
// Remove it from the node list.
it.remove();
  }
{code}

As mentioned in javadoc a datanode cannot be in already decommissioned 
datanode state.
Following the steps mentioned in javadoc datanode state is dead and not 
decommissioned.

*Can we avoid the unnecessary checks and have check for the node is in 
decommissioned state then remove from node list. ?*
Please provide your feedback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile


 [ 
https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanaka kumar avvaru updated HDFS-8767:
--
Attachment: HDFS-8767-02.patch

 RawLocalFileSystem.listStatus() returns null for UNIX pipefile
 --

 Key: HDFS-8767
 URL: https://issues.apache.org/jira/browse/HDFS-8767
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: kanaka kumar avvaru
Priority: Critical
 Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, 
 HDFS-8767-02.patch


 Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of 
 the file. The bug breaks Hive when Hive loads data from UNIX pipe file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-8771) If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes


 [ 
https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanaka kumar avvaru reassigned HDFS-8771:
-

Assignee: kanaka kumar avvaru

 If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not 
 send another RPC calls to Journalnodes
 

 Key: HDFS-8771
 URL: https://issues.apache.org/jira/browse/HDFS-8771
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Takuya Fukudome
Assignee: kanaka kumar avvaru

 In our cluster, edits has became huge(about 50GB) accidentally and our 
 Jounalnodes' disks were busy, therefore {{purgeLogsOlderThan}} took more than 
 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}} takes too much time, 
 Namenode couldn't send other RPC calls to Journalnodes because 
 {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s executor is single thread. 
 It will cause namenode shutting down.
 I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls 
 like sendEdits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout


[ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627887#comment-14627887
 ] 

Hudson commented on HDFS-7608:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/987/])
HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via 
Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
HDFS-7608: add CHANGES.txt (cmccabe: rev 
b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 hdfs dfsclient  newConnectedPeer has no write timeout
 -

 Key: HDFS-7608
 URL: https://issues.apache.org/jira/browse/HDFS-7608
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs, hdfs-client
Affects Versions: 2.3.0, 2.6.0
 Environment: hdfs 2.3.0  hbase 0.98.6
Reporter: zhangshilong
Assignee: Xiaoyu Yao
 Fix For: 2.8.0

 Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch, HDFS-7608.2.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 problem:
 hbase compactSplitThread may lock forever on  read datanode blocks.
 debug found:  epollwait timeout set to 0,so epollwait can not  run out.
 cause: in hdfs 2.3.0
 hbase using DFSClient to read and write blocks.
 DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
 or write timeout. 
 in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
 problem,but did not add writeTimeout. why did not add write Timeout?
 I think NioInetPeer need a default socket timeout,so appalications will no 
 need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8742) Inotify: Support event for OP_TRUNCATE


[ 
https://issues.apache.org/jira/browse/HDFS-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627884#comment-14627884
 ] 

Hudson commented on HDFS-8742:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/987/])
HDFS-8742. Inotify: Support event for OP_TRUNCATE. Contributed by Surendra 
Singh Lilhore. (aajisaka: rev 979c9ca2ca89e99dc7165abfa29c78d66de43d9a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java
* hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/inotify.proto
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java


 Inotify: Support event for OP_TRUNCATE
 --

 Key: HDFS-8742
 URL: https://issues.apache.org/jira/browse/HDFS-8742
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Surendra Singh Lilhore
Assignee: Surendra Singh Lilhore
 Fix For: 2.8.0

 Attachments: HDFS-8742-001.patch, HDFS-8742.patch


 Currently inotify is not giving any event for Truncate operation. NN should 
 send event for Truncate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes


[ 
https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627885#comment-14627885
 ] 

Hudson commented on HDFS-8722:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/987/])
HDFS-8722. Optimize datanode writes for small writes and flushes. Contributed 
by Kihwal Lee (kihwal: rev 59388a801514d6af64ef27fbf246d8054f1dcc74)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java


 Optimize datanode writes for small writes and flushes
 -

 Key: HDFS-8722
 URL: https://issues.apache.org/jira/browse/HDFS-8722
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 2.7.2

 Attachments: HDFS-8722.patch, HDFS-8722.v1.patch


 After the data corruption fix by HDFS-4660, the CRC recalculation for partial 
 chunk is executed more frequently, if the client repeats writing few bytes 
 and calling hflush/hsync.  This is because the generic logic forces CRC 
 recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, 
 datanode blindly accepted whatever CRC client provided, if the incoming data 
 is chunk-aligned. This was the source of the corruption.
 We can still optimize for the most common case where a client is repeatedly 
 writing small number of bytes followed by hflush/hsync with no pipeline 
 recovery or append, by allowing the previous behavior for this specific case. 
  If the incoming data has a duplicate portion and that is at the last 
 chunk-boundary before the partial chunk on disk, datanode can use the 
 checksum supplied by the client without redoing the checksum on its own.  
 This reduces disk reads as well as CPU load for the checksum calculation.
 If the incoming packet data goes back further than the last on-disk chunk 
 boundary, datanode will still do a recalculation, but this occurs rarely 
 during pipeline recoveries. Thus the optimization for this specific case 
 should be sufficient to speed up the vast majority of cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8771) If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes


[ 
https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627895#comment-14627895
 ] 

kanaka kumar avvaru commented on HDFS-8771:
---

In my view, all the write related calls are handled in single thread to ensure 
order of request from NN. So, journal node can perform purge operation in a 
separate thread instead of blocking the caller.

Please correct me if any other suggested approach is better.

 If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not 
 send another RPC calls to Journalnodes
 

 Key: HDFS-8771
 URL: https://issues.apache.org/jira/browse/HDFS-8771
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Takuya Fukudome
Assignee: kanaka kumar avvaru

 In our cluster, edits has became huge(about 50GB) accidentally and our 
 Jounalnodes' disks were busy, therefore {{purgeLogsOlderThan}} took more than 
 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}} takes too much time, 
 Namenode couldn't send other RPC calls to Journalnodes because 
 {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s executor is single thread. 
 It will cause namenode shutting down.
 I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls 
 like sendEdits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes


[ 
https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627877#comment-14627877
 ] 

Hudson commented on HDFS-8722:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/])
HDFS-8722. Optimize datanode writes for small writes and flushes. Contributed 
by Kihwal Lee (kihwal: rev 59388a801514d6af64ef27fbf246d8054f1dcc74)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Optimize datanode writes for small writes and flushes
 -

 Key: HDFS-8722
 URL: https://issues.apache.org/jira/browse/HDFS-8722
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Critical
 Fix For: 2.7.2

 Attachments: HDFS-8722.patch, HDFS-8722.v1.patch


 After the data corruption fix by HDFS-4660, the CRC recalculation for partial 
 chunk is executed more frequently, if the client repeats writing few bytes 
 and calling hflush/hsync.  This is because the generic logic forces CRC 
 recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, 
 datanode blindly accepted whatever CRC client provided, if the incoming data 
 is chunk-aligned. This was the source of the corruption.
 We can still optimize for the most common case where a client is repeatedly 
 writing small number of bytes followed by hflush/hsync with no pipeline 
 recovery or append, by allowing the previous behavior for this specific case. 
  If the incoming data has a duplicate portion and that is at the last 
 chunk-boundary before the partial chunk on disk, datanode can use the 
 checksum supplied by the client without redoing the checksum on its own.  
 This reduces disk reads as well as CPU load for the checksum calculation.
 If the incoming packet data goes back further than the last on-disk chunk 
 boundary, datanode will still do a recalculation, but this occurs rarely 
 during pipeline recoveries. Thus the optimization for this specific case 
 should be sufficient to speed up the vast majority of cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout


[ 
https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627879#comment-14627879
 ] 

Hudson commented on HDFS-7608:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/])
HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via 
Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java
HDFS-7608: add CHANGES.txt (cmccabe: rev 
b7fb6ec4513de7d342c541eb3d9e14642286e2cf)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 hdfs dfsclient  newConnectedPeer has no write timeout
 -

 Key: HDFS-7608
 URL: https://issues.apache.org/jira/browse/HDFS-7608
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: fuse-dfs, hdfs-client
Affects Versions: 2.3.0, 2.6.0
 Environment: hdfs 2.3.0  hbase 0.98.6
Reporter: zhangshilong
Assignee: Xiaoyu Yao
 Fix For: 2.8.0

 Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch, HDFS-7608.2.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 problem:
 hbase compactSplitThread may lock forever on  read datanode blocks.
 debug found:  epollwait timeout set to 0,so epollwait can not  run out.
 cause: in hdfs 2.3.0
 hbase using DFSClient to read and write blocks.
 DFSClient  creates one socket using newConnectedPeer(addr), but has no read 
 or write timeout. 
 in v 2.6.0,  newConnectedPeer has added readTimeout to deal with the 
 problem,but did not add writeTimeout. why did not add write Timeout?
 I think NioInetPeer need a default socket timeout,so appalications will no 
 need to force adding timeout by themselives. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8742) Inotify: Support event for OP_TRUNCATE


[ 
https://issues.apache.org/jira/browse/HDFS-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627876#comment-14627876
 ] 

Hudson commented on HDFS-8742:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/])
HDFS-8742. Inotify: Support event for OP_TRUNCATE. Contributed by Surendra 
Singh Lilhore. (aajisaka: rev 979c9ca2ca89e99dc7165abfa29c78d66de43d9a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/inotify.proto
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java


 Inotify: Support event for OP_TRUNCATE
 --

 Key: HDFS-8742
 URL: https://issues.apache.org/jira/browse/HDFS-8742
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 2.7.0
Reporter: Surendra Singh Lilhore
Assignee: Surendra Singh Lilhore
 Fix For: 2.8.0

 Attachments: HDFS-8742-001.patch, HDFS-8742.patch


 Currently inotify is not giving any event for Truncate operation. NN should 
 send event for Truncate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6290) File is not closed in OfflineImageViewerPB#run()

2015-07-15 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628554#comment-14628554
 ] 

Akira AJISAKA commented on HDFS-6290:
-

Hi [~hapandya], what's going on this issue? I'd like to take it over.

 File is not closed in OfflineImageViewerPB#run()
 

 Key: HDFS-6290
 URL: https://issues.apache.org/jira/browse/HDFS-6290
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Reporter: Ted Yu
Priority: Minor

 {code}
   } else if (processor.equals(XML)) {
 new PBImageXmlWriter(conf, out).visit(new RandomAccessFile(inputFile,
 r));
 {code}
 The RandomAccessFile instance should be closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock


[ 
https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628590#comment-14628590
 ] 

Hadoop QA commented on HDFS-8778:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |   8m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 21s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 17s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | native |   1m  5s | Pre-build of native portion |
| {color:red}-1{color} | hdfs tests | 159m 25s | Tests failed in hadoop-hdfs. |
| | | 182m 22s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | hadoop.hdfs.TestDistributedFileSystem |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745471/HDFS-8778.02.patch |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / edcaae4 |
| hadoop-hdfs test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11716/artifact/patchprocess/testrun_hadoop-hdfs.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11716/console |


This message was automatically generated.

 TestBlockReportRateLimiting#testLeaseExpiration can deadlock
 

 Key: HDFS-8778
 URL: https://issues.apache.org/jira/browse/HDFS-8778
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch


 {{requestBlockReportLease}} blocks on DataNode registration while holding the 
 NameSystem read lock.
 DataNode registration can block on the NameSystem read lock if a writer gets 
 in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8785) TestDistributedFileSystem is failing in trunk

2015-07-15 Thread Arpit Agarwal (JIRA)

Arpit Agarwal created HDFS-8785:
---

 Summary: TestDistributedFileSystem is failing in trunk
 Key: HDFS-8785
 URL: https://issues.apache.org/jira/browse/HDFS-8785
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.8.0
Reporter: Arpit Agarwal


A newly added test case 
{{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in trunk.

e.g. run
https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8716) introduce a new config specifically for safe mode block count

2015-07-15 Thread Chang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628601#comment-14628601
 ] 

Chang Li commented on HDFS-8716:


those tests failures are not related to my change. I have applied the latest 
patch to trunk and run all the unit tests and pass. [~kihwal] could you please 
help review the latest patch. Thanks!

 introduce a new config specifically for safe mode block count
 -

 Key: HDFS-8716
 URL: https://issues.apache.org/jira/browse/HDFS-8716
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li
 Attachments: HDFS-8716.1.patch, HDFS-8716.2.patch, HDFS-8716.3.patch, 
 HDFS-8716.4.patch, HDFS-8716.5.patch, HDFS-8716.6.patch, HDFS-8716.7.patch, 
 HDFS-8716.7.patch


 During the start up, namenode waits for n replicas of each block to be 
 reported by datanodes before exiting the safe mode. Currently n is tied to 
 the min replicas config. We could set min replicas to more than one but we 
 might want to exit safe mode as soon as each block has one replica reported. 
 This can be worked out by introducing a new config variable for safe mode 
 block count



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil


[ 
https://issues.apache.org/jira/browse/HDFS-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628883#comment-14628883
 ] 

Zhe Zhang commented on HDFS-8433:
-

Thanks for clarifying, I missed the {{BlockManager}} change.

 blockToken is not set in constructInternalBlock and parseStripedBlockGroup in 
 StripedBlockUtil
 --

 Key: HDFS-8433
 URL: https://issues.apache.org/jira/browse/HDFS-8433
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Walter Su
 Attachments: HDFS-8433-HDFS-7285.02.patch, HDFS-8433.00.patch, 
 HDFS-8433.01.patch, HDFS-8433.03.PoC.patch


 The blockToken provided in LocatedStripedBlock is not used to create 
 LocatedBlock in constructInternalBlock and parseStripedBlockGroup in 
 StripedBlockUtil.
 We should also add ec tests with security on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile


[ 
https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628903#comment-14628903
 ] 

Hadoop QA commented on HDFS-8767:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  19m  3s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m 43s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |  10m 55s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 18s | The applied patch generated  1 
new checkstyle issues (total was 21, now 21). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 25s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   2m  7s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | common tests |  24m  1s | Tests passed in 
hadoop-common. |
| | |  68m 35s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12745529/HDFS-8767.003.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 3ec0a04 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-HDFS-Build/11721/artifact/patchprocess/diffcheckstylehadoop-common.txt
 |
| hadoop-common test log | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11721/artifact/patchprocess/testrun_hadoop-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11721/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/11721/console |


This message was automatically generated.

 RawLocalFileSystem.listStatus() returns null for UNIX pipefile
 --

 Key: HDFS-8767
 URL: https://issues.apache.org/jira/browse/HDFS-8767
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: kanaka kumar avvaru
Priority: Critical
 Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, 
 HDFS-8767-02.patch, HDFS-8767.003.patch


 Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of 
 the file. The bug breaks Hive when Hive loads data from UNIX pipe file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk


[ 
https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628917#comment-14628917
 ] 

Jing Zhao commented on HDFS-8787:
-

+1 pending Jenkins.

 Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be 
 consistent with trunk
 ---

 Key: HDFS-8787
 URL: https://issues.apache.org/jira/browse/HDFS-8787
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS-7285
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8787-HDFS-7285.00.patch


 As Nicholas suggested under HDFS-8728, we should split the patch on 
 {{BlockInfo}} structure into smaller pieces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI

2015-07-15 Thread Benoy Antony (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628406#comment-14628406
 ] 

Benoy Antony commented on HDFS-7483:


To add on, I had tried the approach while working on the patch. _math_ is a 
helper whereas fmt_percentage is a filter. We cannot do something like   
helper | filter . Some helpers support a filters attribute. But math helper 
does not support filters attribute. So I could not reuse math helper and 
fmt_percentage filter. That's why I wrote a a new percentage helper. 


 Display information per tier on the Namenode UI
 ---

 Key: HDFS-7483
 URL: https://issues.apache.org/jira/browse/HDFS-7483
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Benoy Antony
Assignee: Benoy Antony
 Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, overview.png, 
 storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, 
 withTwoStorageType.png


 If cluster has different types of storage, it is useful to display the 
 storage information per type. 
 The information will be available via JMX (HDFS-7390)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8783) enable socket timeout for balancer's target connection

2015-07-15 Thread Chang Li (JIRA)

Chang Li created HDFS-8783:
--

 Summary: enable socket timeout for balancer's target connection
 Key: HDFS-8783
 URL: https://issues.apache.org/jira/browse/HDFS-8783
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Chang Li
Assignee: Chang Li


Have met a real case when the balancer connected to a black hole target 
datanode which accepted connection but not sent any response back, then 
balancer got hung



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8697) Refactor DecommissionManager: more generic method names and misc cleanup


 [ 
https://issues.apache.org/jira/browse/HDFS-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8697:

Attachment: HDFS-8697.01.patch

Thanks Andrew for pointing out the issue.

Uploading new patch to revise {{replicated}} and {{stored}} related naming 
(generalizing them with {{redundancy}}).

I filed HDFS-8786 as a follow-on to avoid reconstruction after decomm. The 
change could be large because it breaks the implicit assumption that each 
internal block should have only 1 replica in common cases.

 Refactor DecommissionManager: more generic method names and misc cleanup
 

 Key: HDFS-8697
 URL: https://issues.apache.org/jira/browse/HDFS-8697
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 2.7.0
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8697.00.patch, HDFS-8697.01.patch


 This JIRA merges the changes in {{DecommissionManager}} from the HDFS-7285 
 branch, including changing a few method names to be more generic 
 ({{replicated}} - {{stored}}), and some cleanups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk

2015-07-15 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628707#comment-14628707
 ] 

Andrew Wang commented on HDFS-8785:
---

Possibly related to HDFS-7608? [~cmccabe] thoughts?

 TestDistributedFileSystem is failing in trunk
 -

 Key: HDFS-8785
 URL: https://issues.apache.org/jira/browse/HDFS-8785
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.8.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao

 A newly added test case 
 {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in 
 trunk.
 e.g. run
 https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock

2015-07-15 Thread Arpit Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-8778:

  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.8.0
Target Version/s:   (was: 2.7.2)
  Status: Resolved  (was: Patch Available)

Thanks for the review Andrew. Committed for 2.8.0.

 TestBlockReportRateLimiting#testLeaseExpiration can deadlock
 

 Key: HDFS-8778
 URL: https://issues.apache.org/jira/browse/HDFS-8778
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch


 {{requestBlockReportLease}} blocks on DataNode registration while holding the 
 NameSystem read lock.
 DataNode registration can block on the NameSystem read lock if a writer gets 
 in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile

2015-07-15 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628756#comment-14628756
 ] 

Haohui Mai commented on HDFS-8767:
--

Thanks for the work. The fix looks good.

bq. --The format looks fine in eclipse. Fixing this will reduce the readability

Readability is subjective. It might make more sense to fix it to avoid the 
checkstyle warnings.

{code}
+  @Test
+  public void testFileStatusPipeFile() throws Exception {
+Assume.assumeTrue(SystemUtils.IS_OS_UNIX);
+String path = TEST_ROOT_DIR + /testfifofile;
+new File(path).delete();
+File fifoFile = new File(path);
+fifoFile.getParentFile().mkdirs();
+String fullPath = fifoFile.getAbsolutePath();
+Process process = Runtime.getRuntime().exec(mkfifo  + fullPath);
+process.waitFor();
+
+String input = org.apache.commons.io.IOUtils.toString(process
+.getInputStream());
+String errors = org.apache.commons.io.IOUtils.toString(process
+.getErrorStream());
+assertTrue(Expected empty but got  + input, .equals(input));
+assertTrue(Expected empty but got  + errors, .equals(errors));
+
+fifoFile = new File(fullPath);
+assertTrue(FIFO file should present, fifoFile.exists());
+assertFalse(fifoFile.isFile());
+assertFalse(fifoFile.isDirectory());
+
+Path fsPath = new Path(path);
+FileSystem fs = fileSys.getRawFileSystem();
+assertTrue(fs.exists(fsPath));
+assertNotNull(fs.listStatus(fsPath));
+fifoFile.delete();
+  }
 }
{code}

To me it seems that it makes more sense to test it through mockito instead of 
creating a real pipe file. I'll upload a patch later to demonstrate the 
proposed approach.


 RawLocalFileSystem.listStatus() returns null for UNIX pipefile
 --

 Key: HDFS-8767
 URL: https://issues.apache.org/jira/browse/HDFS-8767
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: kanaka kumar avvaru
Priority: Critical
 Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, 
 HDFS-8767-02.patch


 Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of 
 the file. The bug breaks Hive when Hive loads data from UNIX pipe file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-8785) TestDistributedFileSystem is failing in trunk

2015-07-15 Thread Xiaoyu Yao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao reassigned HDFS-8785:


Assignee: Xiaoyu Yao

 TestDistributedFileSystem is failing in trunk
 -

 Key: HDFS-8785
 URL: https://issues.apache.org/jira/browse/HDFS-8785
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.8.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao

 A newly added test case 
 {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in 
 trunk.
 e.g. run
 https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk

2015-07-15 Thread Xiaoyu Yao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628654#comment-14628654
 ] 

Xiaoyu Yao commented on HDFS-8785:
--

Thanks [~arpitagarwal] for reporting this, I will take a look at it.

 TestDistributedFileSystem is failing in trunk
 -

 Key: HDFS-8785
 URL: https://issues.apache.org/jira/browse/HDFS-8785
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.8.0
Reporter: Arpit Agarwal
Assignee: Xiaoyu Yao

 A newly added test case 
 {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in 
 trunk.
 e.g. run
 https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock

2015-07-15 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628703#comment-14628703
 ] 

Andrew Wang commented on HDFS-8778:
---

LGTM +1, thanks Arpit for finding and fixing. Test failure looks unrelated.

 TestBlockReportRateLimiting#testLeaseExpiration can deadlock
 

 Key: HDFS-8778
 URL: https://issues.apache.org/jira/browse/HDFS-8778
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch


 {{requestBlockReportLease}} blocks on DataNode registration while holding the 
 NameSystem read lock.
 DataNode registration can block on the NameSystem read lock if a writer gets 
 in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8666) speedup TestMover

2015-07-15 Thread Tsz Wo Nicholas Sze (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628705#comment-14628705
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8666:
---

 With the patch, the time reduces from 5m24s to 49s per local test.

The result is great.  Thanks!

 speedup TestMover
 -

 Key: HDFS-8666
 URL: https://issues.apache.org/jira/browse/HDFS-8666
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Walter Su
Assignee: Walter Su
 Fix For: 2.8.0

 Attachments: HDFS-8666.01.patch


 TestMover is one of the most time consuming tests.(See 
 [TestReport#1|https://builds.apache.org/job/PreCommit-HDFS-Build/11450/testReport/]
  )
 It often timeout. (See 
 [TestReport#2|https://issues.apache.org/jira/browse/HDFS-8652?focusedCommentId=14598394page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598394]
  )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID

2015-07-15 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628709#comment-14628709
 ] 

Andrew Wang commented on HDFS-8779:
---

Hi [~walter.k.su], is it possible to add a test for this? Otherwise looks good 
:)

 WebUI can't display randomly generated block ID
 ---

 Key: HDFS-8779
 URL: https://issues.apache.org/jira/browse/HDFS-8779
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: Walter Su
Assignee: Walter Su
Priority: Minor
 Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch


 Old release use randomly generated block ID(HDFS-4645).
 max value of Long in Java is 2^63-1
 max value of number in Javascript is 2^53-1. ( See 
 [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER])
 Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER.
 A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock


[ 
https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628747#comment-14628747
 ] 

Hudson commented on HDFS-8778:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8169 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8169/])
HDFS-8778. TestBlockReportRateLimiting#testLeaseExpiration can deadlock. 
(Contributed by Arpit Agarwal) (arp: rev 
3ec0a0444f75c8743289ec7c8645d4bdf51fc45a)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 TestBlockReportRateLimiting#testLeaseExpiration can deadlock
 

 Key: HDFS-8778
 URL: https://issues.apache.org/jira/browse/HDFS-8778
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.7.1
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal
 Fix For: 2.8.0

 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch


 {{requestBlockReportLease}} blocks on DataNode registration while holding the 
 NameSystem read lock.
 DataNode registration can block on the NameSystem read lock if a writer gets 
 in the queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode

2015-07-15 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628763#comment-14628763
 ] 

Elliott Clark commented on HDFS-8078:
-

PING?

 HDFS client gets errors trying to to connect to IPv6 DataNode
 -

 Key: HDFS-8078
 URL: https://issues.apache.org/jira/browse/HDFS-8078
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.6.0
Reporter: Nate Edel
Assignee: Nate Edel
  Labels: BB2015-05-TBR, ipv6
 Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch


 1st exception, on put:
 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception
 java.lang.IllegalArgumentException: Does not contain a valid host:port 
 authority: 2401:db00:1010:70ba:face:0:8:0:50010
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
   at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361)
   at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588)
 Appears to actually stem from code in DataNodeID which assumes it's safe to 
 append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for 
 IPv6.  NetUtils.createSocketAddr( ) assembles a Java URI object, which 
 requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010
 Currently using InetAddress.getByName() to validate IPv6 (guava 
 InetAddresses.forString has been flaky) but could also use our own parsing. 
 (From logging this, it seems like a low-enough frequency call that the extra 
 object creation shouldn't be problematic, and for me the slight risk of 
 passing in bad input that is not actually an IPv4 or IPv6 address and thus 
 calling an external DNS lookup is outweighed by getting the address 
 normalized and avoiding rewriting parsing.)
 Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress()
 ---
 2nd exception (on datanode)
 15/04/13 13:18:07 ERROR datanode.DataNode: 
 dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown 
 operation  src: /2401:db00:20:7013:face:0:7:0:54152 dst: 
 /2401:db00:11:d010:face:0:2f:0:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
 at java.lang.Thread.run(Thread.java:745)
 Which also comes as client error -get: 2401 is not an IP string literal.
 This one has existing parsing logic which needs to shift to the last colon 
 rather than the first.  Should also be a tiny bit faster by using lastIndexOf 
 rather than split.  Could alternatively use the techniques above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

Zhe Zhang created HDFS-8786:
---

 Summary: Erasure coding: DataNode should transfer striped blocks 
before being decommissioned
 Key: HDFS-8786
 URL: https://issues.apache.org/jira/browse/HDFS-8786
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang


Per [discussion | 
https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
 under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-8738) Limit Exceptions thrown by DataNode when a client makes socket connection and sends an empty message

2015-07-15 Thread Rajesh Kartha (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Kartha reassigned HDFS-8738:
---

Assignee: Rajesh Kartha

 Limit Exceptions thrown by DataNode when a client makes socket connection and 
 sends an empty message
 

 Key: HDFS-8738
 URL: https://issues.apache.org/jira/browse/HDFS-8738
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.1
Reporter: Rajesh Kartha
Assignee: Rajesh Kartha
Priority: Minor

 When a client creates a socket connection to the Datanode and sends an empty 
 message, the datanode logs have exceptions like these:
 2015-07-08 20:00:55,427 ERROR datanode.DataNode (DataXceiver.java:run(278)) - 
 bidev17.rtp.ibm.com:50010:DataXceiver error processing unknown operation  
 src: /127.0.0.1:41508 dst: /127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 2015-07-08 20:00:56,671 ERROR datanode.DataNode (DataXceiver.java:run(278)) - 
 bidev17.rtp.ibm.com:50010:DataXceiver error processing unknown operation  
 src: /127.0.0.1:41509 dst: /127.0.0.1:50010
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:315)
 at 
 org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227)
 at java.lang.Thread.run(Thread.java:745)
 These can fill up the logs and was recently noticed with an Ambari 2.1 based 
 install which tries to check if the datanode is up.
 Can be easily reproduced with a simple Java client creating a Socket 
 connection:
 public static void main(String[] args) {
 Socket DNClient;
 try {
 DNClient = new Socket(127.0.0.1, 50010);
 DataOutputStream os= new 
 DataOutputStream(DNClient.getOutputStream());
 os.writeBytes();
 os.close();
 } catch (UnknownHostException e) {
 // TODO Auto-generated catch block
 e.printStackTrace();
 } catch (IOException e) {
 // TODO Auto-generated catch block
 e.printStackTrace();
 }
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile


 [ 
https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-8728:

Status: Open  (was: Patch Available)

 Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
 ---

 Key: HDFS-8728
 URL: https://issues.apache.org/jira/browse/HDFS-8728
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Zhe Zhang
Assignee: Zhe Zhang
 Attachments: HDFS-8728-HDFS-7285.00.patch, 
 HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, 
 HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, 
 Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, 
 Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, 
 Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, 
 Merge-8-inodeFile.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile