[jira] [Commented] (HDFS-13032) Make AvailableSpaceBlockPlacementPolicy more adaptive

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517763#comment-16517763
 ] 

genericqa commented on HDFS-13032:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 14s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git 
apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply 
{color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 10s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}145m 19s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.tools.TestHdfsConfigFields |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13032 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12907063/HDFS-13032.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1bdfd96ae9f2 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 2d87592 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| whitespace | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24478/artifact/out/whitespace-eol.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24478/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/24478/testReport/ |
| Max. process+thread count | 3312 (vs. ulimit of 1) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Updated] (HDDS-166) Create a landing page for Ozone

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-166:
--
Status: Patch Available  (was: Open)

Uploaded both the source and the rendered version (which also includes the 
current version of the ozone docs from the hadoop-ozone/docs project). 

It supposed to be commited to ozone-site repository.

> Create a landing page for Ozone
> ---
>
> Key: HDDS-166
> URL: https://issues.apache.org/jira/browse/HDDS-166
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: document
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: ozone-site-rendered.tar.gz, ozone-site-source.tar.gz
>
>
> As Ozone release cycle is seprated from hadoop we need a separated page to 
> publish the releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-166) Create a landing page for Ozone

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-166:
--
Attachment: ozone-site-source.tar.gz

> Create a landing page for Ozone
> ---
>
> Key: HDDS-166
> URL: https://issues.apache.org/jira/browse/HDDS-166
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: document
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: ozone-site-rendered.tar.gz, ozone-site-source.tar.gz
>
>
> As Ozone release cycle is seprated from hadoop we need a separated page to 
> publish the releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-166) Create a landing page for Ozone

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-166:
--
Attachment: (was: ozone-site.tar.gz)

> Create a landing page for Ozone
> ---
>
> Key: HDDS-166
> URL: https://issues.apache.org/jira/browse/HDDS-166
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: document
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: ozone-site-rendered.tar.gz
>
>
> As Ozone release cycle is seprated from hadoop we need a separated page to 
> publish the releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-166) Create a landing page for Ozone

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-166:
--
Attachment: ozone-site-rendered.tar.gz

> Create a landing page for Ozone
> ---
>
> Key: HDDS-166
> URL: https://issues.apache.org/jira/browse/HDDS-166
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: document
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Attachments: ozone-site-rendered.tar.gz, ozone-site.tar.gz
>
>
> As Ozone release cycle is seprated from hadoop we need a separated page to 
> publish the releases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-175) Refactor ContainerInfo to remove Pipeline object from it

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517712#comment-16517712
 ] 

genericqa commented on HDDS-175:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 18 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
18s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 33s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-ozone/integration-test 
{color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
23s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
55s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 7s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 28m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 51s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site hadoop-ozone/integration-test 
{color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m  
0s{color} | {color:red} hadoop-hdds/common generated 1 new + 0 unchanged - 0 
fixed = 1 total (was 0) {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
44s{color} | {color:red} hadoop-hdds/server-scm generated 2 new + 0 unchanged - 
0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
48s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
21s{color} | {color:green} hadoop-yarn-site in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m  
9s{color} | {color:green} common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
24s{color} | {color:green} client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m 
23s{color} | {color:green} server-scm in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | 

[jira] [Assigned] (HDFS-13032) Make AvailableSpaceBlockPlacementPolicy more adaptive

2018-06-19 Thread Tao Jie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Jie reassigned HDFS-13032:
--

Assignee: Tao Jie

> Make AvailableSpaceBlockPlacementPolicy more adaptive
> -
>
> Key: HDFS-13032
> URL: https://issues.apache.org/jira/browse/HDFS-13032
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.8.2
>Reporter: Tao Jie
>Assignee: Tao Jie
>Priority: Major
> Attachments: HDFS-13032.001.patch, HDFS-13032.002.patch
>
>
> In a heterogeneous HDFS cluster, datanode capacity and usage are very 
> different.
> Now we can use HDFS-8131, a usage-aware block placement policy to deal with 
> the problem. However, this policy could be more flexible.
> 1, The probability of a node with high usage being chosen is fixed once the 
> parameter is set. That is the probability is always the same no matter its 
> usage is 90% or 70%. When the usage of a node is close to full, its 
> probability of being chosen should be lower.
> 2, When the difference of usage is below 5%(hard code), the two nodes are 
> considered the same usage. I think it's OK when usage is 30% and 35%, but 
> when usage is 93% and 98%, they should not be treated equally. The correction 
> of probability could be more smooth.
> In my opinion, when we choose one node from two candidates (A: usage 30%, B: 
> usage 60%), we can calculate the probability according to the available 
> storage. p(A) = 70%/(70% + 40%), p(B) = 40% (70% +40%). When a node is close 
> to full, the probability would be very small.
> Also we could have another factor to weaken this correctness, and make the 
> modification not so aggressive.
> Any thought? [~liushaohui]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-19 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517672#comment-16517672
 ] 

Erik Krogen edited comment on HDFS-13609 at 6/20/18 1:05 AM:
-

Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will 
be equivalent to {{optimizeLatency}}. However there are a few cases where this 
is not currently true. If you agree with my assessments below about these cases 
being OK with RPC, and that we can handle {{BackupImage}} as I describe below, 
I think I can remove this new parameter and limit the scope of the change.
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm 
that no one else is writing new transactions. It seems fine to allow this to 
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a 
range of transaction IDs are available. Seems fine to allow this to use the RPC 
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions 
for INotify feature. Seems fine (actually, seems desirable) to let this use the 
RPC mechanism. However, on a slightly unrelated note, one portion of this will 
need to be changed to work properly in a read-from-standby environment... Filed 
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on 
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the 
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy 
things and making assumptions about the streams returned that will not be true 
when using the RPC mechanism. We need to prevent this from using the RPC 
mechanism, but given that this is only for the BackupNode, I recommend we avoid 
adding a new API / parameter just for this situation and disable the RPC 
mechanism on the BackupNode entirely. I instead propose that we add a way for 
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could 
take the form of an undocumented config parameter, or, my preference, add a 
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the 
BackupNode can call.
# Agreed. I will fix this in the next patch.
# I thought more about why an operator might want to change this config. I 
determined that I can imagine situations when I would want to increase it, if 
the situation arises that RPC response time from the JournalNodes is high and 
the number of transactions per second is very high (say, a very high write 
workload). But I can't think of a reason to lower it; this is more about just 
setting a sanity-check upper bound. This makes me think we should (a) raise the 
default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite 
high, this would allow 50k transactions per second, (b) make it undocumented as 
you described. I will incorporate this into the next patch.


was (Author: xkrogen):
Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will 
be equivalent to {{optimizeLatency}}. However there are a few cases where this 
is not currently true: 
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm 
that no one else is writing new transactions. It seems fine to allow this to 
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a 
range of transaction IDs are available. Seems fine to allow this to use the RPC 
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions 
for INotify feature. Seems fine (actually, seems desirable) to let this use the 
RPC mechanism. However, on a slightly unrelated note, one portion of this will 
need to be changed to work properly in a read-from-standby environment... Filed 
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on 
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the 
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy 
things and making assumptions about the streams returned that will not be true 
when using the RPC mechanism. We need to prevent this from using the RPC 
mechanism, but given that this is only for the BackupNode, I recommend we avoid 
adding a new API / parameter just for this situation and disable the RPC 
mechanism on the BackupNode entirely. I instead propose that we add a way for 
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could 
take the form of an undocumented config parameter, or, my preference, add a 
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the 
BackupNode can call.
If you agree that we can handle 

[jira] [Commented] (HDFS-13609) [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via RPC

2018-06-19 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517672#comment-16517672
 ] 

Erik Krogen commented on HDFS-13609:


Thanks for the review [~shv]!
# I agree that this would be much cleaner. In many cases, {{inProgressOk}} will 
be equivalent to {{optimizeLatency}}. However there are a few cases where this 
is not currently true: 
** {{FSEditLog#openForWrite()}} - It is using {{selectInputStreams}} to confirm 
that no one else is writing new transactions. It seems fine to allow this to 
use the RPC mechanism.
** {{BootstrapStandby#checkLogsAvailableForRead()}} - It is confirming that a 
range of transaction IDs are available. Seems fine to allow this to use the RPC 
mechanism.
** {{NameNodeRpcServer#getEventBatchList()}} - Serves ranges of transactions 
for INotify feature. Seems fine (actually, seems desirable) to let this use the 
RPC mechanism. However, on a slightly unrelated note, one portion of this will 
need to be changed to work properly in a read-from-standby environment... Filed 
HDFS-13689 for this.
** {{NameNode#copyEditLogSegmentsToSharedDir()}} - This is only called on 
{{NameNode#initializeSharedEdits()}}, i.e. a separate startup flag for the 
NameNode. I don't think it's necessary to optimize for this situation.
** {{BackupImage#tryConvergeJournalSpool()}} - This code is doing some sketchy 
things and making assumptions about the streams returned that will not be true 
when using the RPC mechanism. We need to prevent this from using the RPC 
mechanism, but given that this is only for the BackupNode, I recommend we avoid 
adding a new API / parameter just for this situation and disable the RPC 
mechanism on the BackupNode entirely. I instead propose that we add a way for 
the BackupNode to disable RPC reads on the {{QuorumJournalManager}}. This could 
take the form of an undocumented config parameter, or, my preference, add a 
static method {{QuorumJournalManager.disableRPCJournalStreams()}} which the 
BackupNode can call.
If you agree that we can handle {{BackupImage}} as I described, I think I can 
remove this new parameter and limit the scope of the change.
# Agreed. I will fix this in the next patch.
# I thought more about why an operator might want to change this config. I 
determined that I can imagine situations when I would want to increase it, if 
the situation arises that RPC response time from the JournalNodes is high and 
the number of transactions per second is very high (say, a very high write 
workload). But I can't think of a reason to lower it; this is more about just 
setting a sanity-check upper bound. This makes me think we should (a) raise the 
default limit to 5000 -> even with a RTT RPC time of 100ms, which is quite 
high, this would allow 50k transactions per second, (b) make it undocumented as 
you described. I will incorporate this into the next patch.

> [Edit Tail Fast Path Pt 3] NameNode-side changes to support tailing edits via 
> RPC
> -
>
> Key: HDFS-13609
> URL: https://issues.apache.org/jira/browse/HDFS-13609
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, namenode
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Attachments: HDFS-13609-HDFS-12943.000.patch, 
> HDFS-13609-HDFS-12943.001.patch, HDFS-13609-HDFS-12943.002.patch
>
>
> See HDFS-13150 for the full design.
> This JIRA is targetted at the NameNode-side changes to enable tailing 
> in-progress edits via the RPC mechanism added in HDFS-13608. Most changes are 
> in the QuorumJournalManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13689) NameNodeRpcServer getEditsFromTxid assumes it is run on active NameNode

2018-06-19 Thread Erik Krogen (JIRA)
Erik Krogen created HDFS-13689:
--

 Summary: NameNodeRpcServer getEditsFromTxid assumes it is run on 
active NameNode
 Key: HDFS-13689
 URL: https://issues.apache.org/jira/browse/HDFS-13689
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs, namenode
Reporter: Erik Krogen
Assignee: Erik Krogen


{{NameNodeRpcServer#getEditsFromTxid}} currently decides which transactions are 
able to be served, i.e. which transactions are durable, using the following 
logic:
{code}
long syncTxid = log.getSyncTxId();
// If we haven't synced anything yet, we can only read finalized
// segments since we can't reliably determine which txns in in-progress
// segments have actually been committed (e.g. written to a quorum of JNs).
// If we have synced txns, we can definitely read up to syncTxid since
// syncTxid is only updated after a transaction is committed to all
// journals. (In-progress segments written by old writers are already
// discarded for us, so if we read any in-progress segments they are
// guaranteed to have been written by this NameNode.)
boolean readInProgress = syncTxid > 0;
{code}
This assumes that the NameNode serving this request is the current 
writer/active NameNode, which may not be true in the ObserverNode situation. 
Since {{selectInputStreams}} now has a {{onlyDurableTxns}} flag, which, if 
enabled, will only return durable/committed transactions, we can instead 
leverage this to provide the same functionality. We should utilize this to 
avoid consistency issues when serving this request from the ObserverNode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517647#comment-16517647
 ] 

genericqa commented on HDDS-177:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 31 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  2m 
27s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m  
6s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 26m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-ozone hadoop-tools/hadoop-tools-dist hadoop-tools 
hadoop-dist hadoop-ozone/acceptance-test . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  8m 
16s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
31s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 40m 
58s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 32m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 21m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 2s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
14s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
8s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 15s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-project hadoop-tools/hadoop-tools-dist hadoop-tools hadoop-ozone 
hadoop-ozone/acceptance-test . hadoop-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  6m 
54s{color} | {color:red} root in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 21m 14s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  2m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}253m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 

[jira] [Commented] (HDDS-175) Refactor ContainerInfo to remove Pipeline object from it

2018-06-19 Thread Ajay Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517625#comment-16517625
 ] 

Ajay Kumar commented on HDDS-175:
-

patch v3 to remove some unused imports.

> Refactor ContainerInfo to remove Pipeline object from it 
> -
>
> Key: HDDS-175
> URL: https://issues.apache.org/jira/browse/HDDS-175
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-175.00.patch, HDDS-175.01.patch, HDDS-175.02.patch, 
> HDDS-175.03.patch
>
>
> Refactor ContainerInfo to remove Pipeline object from it. We can add below 4 
> fields to ContainerInfo to recreate pipeline if required:
> # pipelineId
> # replication type
> # expected replication count
> # DataNode where its replica exist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-175) Refactor ContainerInfo to remove Pipeline object from it

2018-06-19 Thread Ajay Kumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajay Kumar updated HDDS-175:

Attachment: HDDS-175.03.patch

> Refactor ContainerInfo to remove Pipeline object from it 
> -
>
> Key: HDDS-175
> URL: https://issues.apache.org/jira/browse/HDDS-175
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-175.00.patch, HDDS-175.01.patch, HDDS-175.02.patch, 
> HDDS-175.03.patch
>
>
> Refactor ContainerInfo to remove Pipeline object from it. We can add below 4 
> fields to ContainerInfo to recreate pipeline if required:
> # pipelineId
> # replication type
> # expected replication count
> # DataNode where its replica exist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-175) Refactor ContainerInfo to remove Pipeline object from it

2018-06-19 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517596#comment-16517596
 ] 

Hudson commented on HDDS-175:
-

SUCCESS: Integrated in Jenkins build Hadoop-precommit-ozone-acceptance #19 (See 
[https://builds.apache.org/job/Hadoop-precommit-ozone-acceptance/19/])


> Refactor ContainerInfo to remove Pipeline object from it 
> -
>
> Key: HDDS-175
> URL: https://issues.apache.org/jira/browse/HDDS-175
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: SCM
>Affects Versions: 0.2.1
>Reporter: Ajay Kumar
>Assignee: Ajay Kumar
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-175.00.patch, HDDS-175.01.patch, HDDS-175.02.patch
>
>
> Refactor ContainerInfo to remove Pipeline object from it. We can add below 4 
> fields to ContainerInfo to recreate pipeline if required:
> # pipelineId
> # replication type
> # expected replication count
> # DataNode where its replica exist



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13532) RBF: Adding security

2018-06-19 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517575#comment-16517575
 ] 

Xiao Chen edited comment on HDFS-13532 at 6/19/18 10:00 PM:


Thanks for the work here [~zhengxg3] and all. The last page of the doc looks 
familiar. :)

Some high level questions from the doc. I have not followed RBF closely and my 
apologies if these are stupid comments/questions...
 * I second what Inigo said above. It's not clear to me how DTr is used.
 * It looks like we'll add the same mechanism to the router, so clients can 
auth with kerberos, then get a delegation token for subsequent authentications. 
Is this understanding correct?
 * I'm not a very security person - the router proxying as client part seems 
fine. But IMO that should only work if the client auth'ed via kerberos; if 
client->router auth is dt, then router should not auth to NN via kerberos, but 
only via the provided DTnn.
 * Who's gonna renew the router tokens? Tokens from different NNs may have 
different expiration time, hence need to be renewed at different intervals. RM 
currently does this, it's kinda nice to reuse RM to handle the DTr token 
renewal / cancelation.
 * [~daryn] at one point mentioned he's working on some token issuer interface. 
Not sure if it will benefit/collide with the work here.


was (Author: xiaochen):
Thanks for the work here [~zhengxg3] and all. The last page of the doc looks 
familiar. :)

Some high level questions from the doc. I have not followed RBF closely and my 
apologies if these are stupid questions...
 * I second what Inigo said above. It's not clear to me how DTr is used.
 * It looks like we'll add the same mechanism to the router, so clients can 
auth with kerberos, then get a delegation token for subsequent authentications. 
Is this understanding correct?
 * I'm not a very security person - the router proxying as client part seems 
fine. But IMO that should only work if the client auth'ed via kerberos; if 
client->router auth is dt, then router should not auth to NN via kerberos, but 
only via the provided DTnn.
 * Who's gonna renew the router tokens? Tokens from different NNs may have 
different expiration time, hence need to be renewed at different intervals. RM 
currently does this, it's kinda nice to reuse RM to handle the DTr token 
renewal / cancelation.
 * [~daryn] at one point mentioned he's working on some token issuer interface. 
Not sure if it will benefit/collide with the work here.

> RBF: Adding security
> 
>
> Key: HDFS-13532
> URL: https://issues.apache.org/jira/browse/HDFS-13532
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Íñigo Goiri
>Assignee: Sherwood Zheng
>Priority: Major
> Attachments: Security_for_Router-based Federation_design_doc.pdf
>
>
> HDFS Router based federation should support security. This includes 
> authentication and delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13532) RBF: Adding security

2018-06-19 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517575#comment-16517575
 ] 

Xiao Chen commented on HDFS-13532:
--

Thanks for the work here [~zhengxg3] and all. The last page of the doc looks 
familiar. :)

Some high level questions from the doc. I have not followed RBF closely and my 
apologies if these are stupid questions...
 * I second what Inigo said above. It's not clear to me how DTr is used.
 * It looks like we'll add the same mechanism to the router, so clients can 
auth with kerberos, then get a delegation token for subsequent authentications. 
Is this understanding correct?
 * I'm not a very security person - the router proxying as client part seems 
fine. But IMO that should only work if the client auth'ed via kerberos; if 
client->router auth is dt, then router should not auth to NN via kerberos, but 
only via the provided DTnn.
 * Who's gonna renew the router tokens? Tokens from different NNs may have 
different expiration time, hence need to be renewed at different intervals. RM 
currently does this, it's kinda nice to reuse RM to handle the DTr token 
renewal / cancelation.
 * [~daryn] at one point mentioned he's working on some token issuer interface. 
Not sure if it will benefit/collide with the work here.

> RBF: Adding security
> 
>
> Key: HDFS-13532
> URL: https://issues.apache.org/jira/browse/HDFS-13532
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Íñigo Goiri
>Assignee: Sherwood Zheng
>Priority: Major
> Attachments: Security_for_Router-based Federation_design_doc.pdf
>
>
> HDFS Router based federation should support security. This includes 
> authentication and delegation tokens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517570#comment-16517570
 ] 

Elek, Marton commented on HDDS-177:
---

Finally the patch is proved to be good. We had a good run of the acceptance 
tests (including the new ozonefs tests):

https://builds.apache.org/job/Hadoop-precommit-ozone-acceptance/18 (You can 
also check the end of the console output)

The dependency + the tab problems are also fixed in the latest patch (waiting 
for the jenkins)

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch, HDDS-177.005.patch, 
> HDDS-177.006.patch, HDDS-177.007.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13680) Httpfs does not support custom authentication

2018-06-19 Thread Joris Nogneng (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Nogneng updated HDFS-13680:
-
Resolution: Not A Problem
Status: Resolved  (was: Patch Available)

There is no need to set the Authentication Handler at this level. The value 
read from the property file will be used by default.

> Httpfs does not support custom authentication
> -
>
> Key: HDFS-13680
> URL: https://issues.apache.org/jira/browse/HDFS-13680
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Reporter: Joris Nogneng
>Priority: Major
> Attachments: HDFS-13680.01.patch
>
>
> Currently Httpfs Authentication Filter does not support any custom 
> authentication: the Authentication Handler can only be 
> PseudoAuthenticationHandler or KerberosDelegationTokenAuthenticationHandler.
> We should allow other authentication handlers to manage custom authentication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-177:
--
Attachment: HDDS-177.007.patch

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch, HDDS-177.005.patch, 
> HDDS-177.006.patch, HDDS-177.007.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517536#comment-16517536
 ] 

Elek, Marton commented on HDDS-177:
---

FYI: I am trying to stabilize a jenkins to test any new patch with the 
acceptance test suite. 
https://builds.apache.org/job/Hadoop-precommit-ozone-acceptance Ideally this 
patch will be commited after a good build. 

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch, HDDS-177.005.patch, HDDS-177.006.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-177:
--
Attachment: HDDS-177.006.patch

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch, HDDS-177.005.patch, HDDS-177.006.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517514#comment-16517514
 ] 

Wei-Chiu Chuang commented on HDFS-13672:


I've given it some thoughts.
Instead of breaking down the iteration by the number of blocks, would it make 
sense to have a time-based configuration? For example, break out from the loop 
after 1 second?

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-19 Thread BELUGA BEHR (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517509#comment-16517509
 ] 

BELUGA BEHR commented on HDFS-13448:


I'm not sure why the build failed... any help please? :)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.6.patch, 
> HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517506#comment-16517506
 ] 

genericqa commented on HDFS-13448:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
36s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 32m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m 
30s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  6m  
9s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 37m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 37m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 37m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
30s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
53s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 
27s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  2m  
1s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m  4s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
50s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}262m 45s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.client.impl.TestBlockReaderLocal |
|   | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13448 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928373/HDFS-13448.13.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 787c4544bc3e 3.13.0-143-generic 

[jira] [Commented] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517467#comment-16517467
 ] 

genericqa commented on HDDS-177:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 31 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
29s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 
57s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 49s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone hadoop-tools/hadoop-tools-dist hadoop-tools hadoop-dist 
hadoop-ozone/acceptance-test . {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
37s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
56s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
15s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
12s{color} | {color:red} hadoop-dist in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} root in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 16s{color} 
| {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
17s{color} | {color:red} root in the patch failed. {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
15s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
0s{color} | {color:red} The patch 2 line(s) with tabs. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
8s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  0m 
22s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: . 
hadoop-dist hadoop-ozone hadoop-ozone/acceptance-test hadoop-tools 
hadoop-tools/hadoop-tools-dist {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
16s{color} | {color:red} root in the patch failed. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 17s{color} 
| {color:red} root in the patch failed. {color} |
| {color:blue}0{color} | {color:blue} asflicense {color} | {color:blue}  0m 
17s{color} | {color:blue} ASF License check generated no output? {color} |
| {color:black}{color} | {color:black} {color} | {color:black}106m 59s{color} | 
{color:black} 

[jira] [Commented] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517446#comment-16517446
 ] 

genericqa commented on HDFS-13658:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
19s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m  
9s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  6m 
47s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
17s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  4m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m  
8s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  7m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m 
19s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m  
5s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  1m 
47s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}110m 51s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m  
6s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
55s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}269m 35s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestBlockStoragePolicy |
|   | hadoop.hdfs.TestReadStripedFileWithMissingBlocks |
|   | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13658 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928366/HDFS-13658.005.patch |
| Optional Tests |  asflicense  mvnsite  compile  javac  

[jira] [Updated] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-177:
--
Attachment: HDDS-177.005.patch

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch, HDDS-177.005.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-177:
--
Attachment: HDDS-177.004.patch

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch, HDDS-177.004.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12571) Ozone: remove spaces from the beginning of the hdfs script

2018-06-19 Thread Elek, Marton (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517402#comment-16517402
 ] 

Elek, Marton commented on HDFS-12571:
-

Unfortunately I can't test it on mac currently. But if it's mac specific, it 
will be a different problem. If it could be reproduced, I suggest to create a 
separated issue for that.

> Ozone: remove spaces from the beginning of the hdfs script  
> 
>
> Key: HDFS-12571
> URL: https://issues.apache.org/jira/browse/HDFS-12571
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Affects Versions: HDFS-7240
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Critical
>  Labels: ozoneMerge
> Fix For: HDFS-7240
>
> Attachments: HDFS-12571-HDFS-7240.001.patch
>
>
> It seems that during one of the previous merge some unnecessary spaces has 
> been added to the hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs file.
> After a dist build I can not start server with the hdfs command:
> {code}
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-functions.sh: line 398: 
> syntax error near unexpected token `<'
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-functions.sh: line 398: `  
> done < <(for text in "${input[@]}"; do'
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 70: 
> hadoop_deprecate_envvar: command not found
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 87: 
> hadoop_bootstrap: command not found
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 104: 
> hadoop_parse_args: command not found
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 105: shift: 
> : numeric argument required
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 110: 
> hadoop_find_confdir: command not found
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 111: 
> hadoop_exec_hadoopenv: command not found
> /tmp/hadoop-3.1.0-SNAPSHOT/bin/../libexec/hadoop-config.sh: line 112: 
> hadoop_import_shellprofiles: command not found
> {code}
> See the space at here:
> https://github.com/apache/hadoop/blob/d0bd0f623338dbb558d0dee5e747001d825d92c5/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
> Or see the latest version at:
> https://github.com/apache/hadoop/blob/HDFS-7240/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
> To be honest I don't understand how it could work for others, as it seems to 
> be an older change. Maybe some git magic removed it on OSX (I use linux). 
> Anyway I upload a patch to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-177) Create a releasable ozonefs artifact

2018-06-19 Thread Elek, Marton (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elek, Marton updated HDDS-177:
--
Attachment: HDDS-177.003.patch

> Create a releasable ozonefs artifact 
> -
>
> Key: HDDS-177
> URL: https://issues.apache.org/jira/browse/HDDS-177
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Filesystem
>Reporter: Elek, Marton
>Assignee: Elek, Marton
>Priority: Major
> Fix For: 0.2.1
>
> Attachments: HDDS-177.001.patch, HDDS-177.002.patch, 
> HDDS-177.003.patch
>
>
> The current ozonefs implementaton is under hadoop-tools/hadoop-ozone and uses 
> the version of hadoop (3.2.0-SNAPSHOT currently) which is wrong.
> The other problem is that we have no single hadoop independent arfitact from 
> ozonefs which could be used with any hadoop version.
> In this patch I propose the following modification:
> * move hadoop-tools/hadoop-ozone to hadoop-ozone/ozonefs and use the hdds 
> version (0.2.1-SNAPSHOT)
> * Create a shaded artifact which includes all the required jar files to use 
> ozonefs (hdds/ozone client)
> * Create an ozonefs acceptance test to test it with the latest stable hadoop 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13672 started by Gabor Bota.
-
> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517378#comment-16517378
 ] 

Gabor Bota commented on HDFS-13672:
---

Hi [~jojochuang],
What would be a reasonable amount of elements that could be iterated through at 
one pass?
Should this be number (number of iterations per pass) configurable from 
outside? 

Thanks,
Gabor

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-170) Fix TestBlockDeletingService#testBlockDeletionTimeout

2018-06-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-170:
-
Attachment: HDDS-170.001.patch

> Fix TestBlockDeletingService#testBlockDeletionTimeout
> -
>
> Key: HDDS-170
> URL: https://issues.apache.org/jira/browse/HDDS-170
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-170.001.patch
>
>
> TestBlockDeletingService#testBlockDeletionTimeout timesout while waiting for 
> expected error messsage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-170) Fix TestBlockDeletingService#testBlockDeletionTimeout

2018-06-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain reassigned HDDS-170:


Assignee: Lokesh Jain  (was: Mukul Kumar Singh)

> Fix TestBlockDeletingService#testBlockDeletionTimeout
> -
>
> Key: HDDS-170
> URL: https://issues.apache.org/jira/browse/HDDS-170
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: SCM
>Reporter: Mukul Kumar Singh
>Assignee: Lokesh Jain
>Priority: Major
>
> TestBlockDeletingService#testBlockDeletionTimeout timesout while waiting for 
> expected error messsage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12857) StoragePolicyAdmin should support schema based path

2018-06-19 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517320#comment-16517320
 ] 

Xiao Chen commented on HDFS-12857:
--

Cherry-picked to branch-3.0. Ran the changed test before pushing.

> StoragePolicyAdmin should support schema based path
> ---
>
> Key: HDFS-12857
> URL: https://issues.apache.org/jira/browse/HDFS-12857
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: 3.1.0, 3.0.4
>
> Attachments: HDFS-12857.001.patch
>
>
> When we execute storagepolicy admin command with full schema path, then it 
> will throw this exception 
> {noformat}
> ./hdfs storagepolicies -getstoragepolicy -path 
> hdfs://localhost:39133/user1/bar
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:39133/user1/bar, expected: viewfs://cluster/
> {noformat}
> This is because path schema is not matching with {{fs.defaultFS}}.  
> {{fs.defaultFS}} configured with {{viewFs}} and in file path {{hdfs}} is 
> used. This is broken because of HDFS-11968



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-12857) StoragePolicyAdmin should support schema based path

2018-06-19 Thread Xiao Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-12857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HDFS-12857:
-
Fix Version/s: 3.0.4

> StoragePolicyAdmin should support schema based path
> ---
>
> Key: HDFS-12857
> URL: https://issues.apache.org/jira/browse/HDFS-12857
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.1.0
>Reporter: Surendra Singh Lilhore
>Assignee: Surendra Singh Lilhore
>Priority: Major
> Fix For: 3.1.0, 3.0.4
>
> Attachments: HDFS-12857.001.patch
>
>
> When we execute storagepolicy admin command with full schema path, then it 
> will throw this exception 
> {noformat}
> ./hdfs storagepolicies -getstoragepolicy -path 
> hdfs://localhost:39133/user1/bar
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:39133/user1/bar, expected: viewfs://cluster/
> {noformat}
> This is because path schema is not matching with {{fs.defaultFS}}.  
> {{fs.defaultFS}} configured with {{viewFs}} and in file path {{hdfs}} is 
> used. This is broken because of HDFS-11968



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517319#comment-16517319
 ] 

Xiao Chen commented on HDFS-13683:
--

Yup, will be done via HDFS-12857

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Xiao Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517309#comment-16517309
 ] 

Xiao Chen commented on HDFS-13683:
--

Thanks Gabor for the investigation. Yes this looks to be a dup of HDFS-12857 :)

branch-3.0 is still active, which will release to branch-3.0.x (latest is 
3.0.3). More details at: http://hadoop.apache.org/versioning.html

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517315#comment-16517315
 ] 

Gabor Bota commented on HDFS-13683:
---

Than maybe it would be a good idea to backport this to branch-3.0

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9457) Datanode Logging Improvement

2018-06-19 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-9457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-9457:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Datanode Logging Improvement
> 
>
> Key: HDFS-9457
> URL: https://issues.apache.org/jira/browse/HDFS-9457
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 2.7.0, 3.0.0-alpha1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: logging.patch
>
>
> Please accept my patch for some minor clean-up of logging. Patch is against 
> 3.0.0 but applies to previous versions as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-19 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Status: Patch Available  (was: Open)

Submitted new patch with [~templedf]'s suggestions.

I am accustomed to a different mocking framework that throws errors if a mock's 
signature don't actually match any calls during the test, so my previous work 
assumed that if the source node was 'null' then the arguments of my mock 
wouldn't match and that would raise an exception.  I see in Mockito that is not 
the case.

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 3.0.1, 2.9.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.6.patch, 
> HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-19 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Attachment: HDFS-13448.13.patch

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 2.9.0, 3.0.1
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.13.patch, HDFS-13448.6.patch, 
> HDFS-13448.7.patch, HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota resolved HDFS-13683.
---
Resolution: Duplicate

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica

2018-06-19 Thread Kitti Nanasi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517228#comment-16517228
 ] 

Kitti Nanasi commented on HDFS-13658:
-

Thanks for the comments, [~xiaochen]! I uploaded patch v005 which fixes most of 
your comments:
 * I added a "listonereplicablocks" flag to fsck
 * isStriped checking is removed
 * I added test coverage in TestNameNodeMetrics.
 * I verify for the new metric in the existing test cases of 
TestLowRedundancyBlockQueues and that covers the test cases suggested by you. 
However I kept the TestOneReplicaBlocksAlert integration test as well, because 
that checks if everything is working together well. Do you think I should keep 
or remove the integration test?
 * About using a set instead of a single long: If I use a long, the metric 
increment still works fine, because I know the number of the current replicas, 
however when the metric decrement happens, I would need the information on what 
was the previous replica number when the previous increment happened. But the 
metric decrement can happen from various reasons, for example if the whole file 
was removed, or if more replicas were created for the block, and in some cases 
there is no information on what was the previous replica count. But I agree 
with you that I shouldn't store the block infos. Do you have any suggestions on 
how to fix that?
I can only think of creating another priority queue in LowRedundancyBlocks, but 
I probably that would ruin a bunch of other things, or if I don't store the 
whole block info, just its id for example.

> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica
> ---
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13658) fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 1 replica

2018-06-19 Thread Kitti Nanasi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kitti Nanasi updated HDFS-13658:

Attachment: HDFS-13658.005.patch

> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica
> ---
>
> Key: HDFS-13658
> URL: https://issues.apache.org/jira/browse/HDFS-13658
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: Kitti Nanasi
>Assignee: Kitti Nanasi
>Priority: Major
> Attachments: HDFS-13658.001.patch, HDFS-13658.002.patch, 
> HDFS-13658.003.patch, HDFS-13658.004.patch, HDFS-13658.005.patch
>
>
> fsck, dfsadmin -report, and NN WebUI should report number of blocks that have 
> 1 replica. We have had many cases opened in which a customer has lost a disk 
> or a DN losing files/blocks due to the fact that they had blocks with only 1 
> replica. We need to make the customer better aware of this situation and that 
> they should take action.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13448) HDFS Block Placement - Ignore Locality for First Block Replica

2018-06-19 Thread BELUGA BEHR (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BELUGA BEHR updated HDFS-13448:
---
Status: Open  (was: Patch Available)

> HDFS Block Placement - Ignore Locality for First Block Replica
> --
>
> Key: HDFS-13448
> URL: https://issues.apache.org/jira/browse/HDFS-13448
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: block placement, hdfs-client
>Affects Versions: 3.0.1, 2.9.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Minor
> Attachments: HDFS-13448.10.patch, HDFS-13448.11.patch, 
> HDFS-13448.12.patch, HDFS-13448.6.patch, HDFS-13448.7.patch, 
> HDFS-13448.8.patch
>
>
> According to the HDFS Block Place Rules:
> {quote}
> /**
>  * The replica placement strategy is that if the writer is on a datanode,
>  * the 1st replica is placed on the local machine, 
>  * otherwise a random datanode. The 2nd replica is placed on a datanode
>  * that is on a different rack. The 3rd replica is placed on a datanode
>  * which is on a different node of the rack as the second replica.
>  */
> {quote}
> However, there is a hint for the hdfs-client that allows the block placement 
> request to not put a block replica on the local datanode _where 'local' means 
> the same host as the client is being run on._
> {quote}
>   /**
>* Advise that a block replica NOT be written to the local DataNode where
>* 'local' means the same host as the client is being run on.
>*
>* @see CreateFlag#NO_LOCAL_WRITE
>*/
> {quote}
> I propose that we add a new flag that allows the hdfs-client to request that 
> the first block replica be placed on a random DataNode in the cluster.  The 
> subsequent block replicas should follow the normal block placement rules.
> The issue is that when the {{NO_LOCAL_WRITE}} is enabled, the first block 
> replica is not placed on the local node, but it is still placed on the local 
> rack.  Where this comes into play is where you have, for example, a flume 
> agent that is loading data into HDFS.
> If the Flume agent is running on a DataNode, then by default, the DataNode 
> local to the Flume agent will always get the first block replica and this 
> leads to un-even block placements, with the local node always filling up 
> faster than any other node in the cluster.
> Modifying this example, if the DataNode is removed from the host where the 
> Flume agent is running, or this {{NO_LOCAL_WRITE}} is enabled by Flume, then 
> the default block placement policy will still prefer the local rack.  This 
> remedies the situation only so far as now the first block replica will always 
> be distributed to a DataNode on the local rack.
> This new flag would allow a single Flume agent to distribute the blocks 
> randomly, evenly, over the entire cluster instead of hot-spotting the local 
> node or the local rack.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-06-19 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517173#comment-16517173
 ] 

Yiqun Lin edited comment on HDFS-13671 at 6/19/18 2:37 PM:
---

[~jojochuang], the link of FoldedTreeSet design 
doc:https://issues.apache.org/jira/secure/attachment/12767102/HDFS%20Block%20and%20Replica%20Management%2020151013.pdf
{quote}Yiqun Lin, do you happen to know what 's the deletion rate in your 
cluster before HDFS-9260? (I doubt if it's at 344k blocks/sec )
{quote}
[~xiaochen], I haven't tested the case without the patch of  HDFS-9260. I can 
have a test if I have some free time, :).


was (Author: linyiqun):
[~jojochuang], the link of FoldedTreeSet design doc: [^HDFS Block and Replica 
Management 20151013.pdf]
{quote}Yiqun Lin, do you happen to know what 's the deletion rate in your 
cluster before HDFS-9260? (I doubt if it's at 344k blocks/sec )
{quote}
[~xiaochen], I haven't tested the case without the patch of  HDFS-9260. I can 
have a test if I have some free time, :).

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-06-19 Thread Yiqun Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517173#comment-16517173
 ] 

Yiqun Lin commented on HDFS-13671:
--

[~jojochuang], the link of FoldedTreeSet design doc: [^HDFS Block and Replica 
Management 20151013.pdf]
{quote}Yiqun Lin, do you happen to know what 's the deletion rate in your 
cluster before HDFS-9260? (I doubt if it's at 344k blocks/sec )
{quote}
[~xiaochen], I haven't tested the case without the patch of  HDFS-9260. I can 
have a test if I have some free time, :).

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13616) Batch listing of multiple directories

2018-06-19 Thread lpstudy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517070#comment-16517070
 ] 

lpstudy commented on HDFS-13616:


I am working on erasure coding, and encounter a question with no answer. I do 
not know where to post the question. 
Question: Hadoop 3.0 supports striped layout erasure coding, which will require 
to create multiple output streams so as to write data into file. However, 
according to my knowledge, hadoop doesn't support to write one file 
simultaneously. So my question is  how to achieve this?

> Batch listing of multiple directories
> -
>
> Key: HDFS-13616
> URL: https://issues.apache.org/jira/browse/HDFS-13616
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.2.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: BenchmarkListFiles.java, HDFS-13616.001.patch, 
> HDFS-13616.002.patch
>
>
> One of the dominant workloads for external metadata services is listing of 
> partition directories. This can end up being bottlenecked on RTT time when 
> partition directories contain a small number of files. This is fairly common, 
> since fine-grained partitioning is used for partition pruning by the query 
> engines.
> A batched listing API that takes multiple paths amortizes the RTT cost. 
> Initial benchmarks show a 10-20x improvement in metadata loading performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13671) Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet

2018-06-19 Thread Wei-Chiu Chuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517057#comment-16517057
 ] 

Wei-Chiu Chuang commented on HDFS-13671:


[~linyiqun] thanks for reporting the issue. It seems you've tried to attach a 
file (HDFS Block and Replica Management 20151013.pdf) but it doesn't uploaded. 
Would you please share this file again?

> Namenode deletes large dir slowly caused by FoldedTreeSet#removeAndGet
> --
>
> Key: HDFS-13671
> URL: https://issues.apache.org/jira/browse/HDFS-13671
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.0.3
>Reporter: Yiqun Lin
>Priority: Major
>
> NameNode hung when deleting large files/blocks. The stack info:
> {code}
> "IPC Server handler 4 on 8020" #87 daemon prio=5 os_prio=0 
> tid=0x7fb505b27800 nid=0x94c3 runnable [0x7fa861361000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.compare(FoldedTreeSet.java:474)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:849)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:252)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3813)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3617)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4270)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4244)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4180)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4164)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:871)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> {code}
> In the current deletion logic in NameNode, there are mainly two steps:
> * Collect INodes and all blocks to be deleted, then delete INodes.
> * Remove blocks  chunk by chunk in a loop.
> Actually the first step should be a more expensive operation and will takes 
> more time. However, now we always see NN hangs during the remove block 
> operation. 
> Looking into this, we introduced a new structure {{FoldedTreeSet}} to have a 
> better performance in dealing FBR/IBRs. But compared with early 
> implementation in remove-block logic, {{FoldedTreeSet}} seems more slower 
> since It will take additional time to balance tree node. When there are large 
> block to be removed/deleted, it looks bad.
> For the get type operations in {{DatanodeStorageInfo}}, we only provide the 
> {{getBlockIterator}} to return blocks iterator and no other get operation 
> with specified block. Still we need to use {{FoldedTreeSet}} in 
> {{DatanodeStorageInfo}}? As we know {{FoldedTreeSet}} is benefit for Get not 
> Update. Maybe we can revert this to the early implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516996#comment-16516996
 ] 

genericqa commented on HDDS-178:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
24s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m  
0s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
21s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
56s{color} | {color:red} hadoop-hdds/container-service generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
42s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 22m 29s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
42s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}126m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdds/container-service |
|  |  Format-string method String.format(String, Object[]) called with format 
string "Ignoring delete blocks for containerId:with format string "Ignoring 
delete blocks for containerId: {}. Outdated delete transactionId {} < {}" wants 
0 arguments but is given 3 in 

[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516956#comment-16516956
 ] 

genericqa commented on HDDS-178:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  4m 
28s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
8s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
17s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Skipped patched modules with no Java source: 
hadoop-ozone/integration-test {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
56s{color} | {color:red} hadoop-hdds/container-service generated 2 new + 0 
unchanged - 0 fixed = 2 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
46s{color} | {color:green} container-service in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 27m 42s{color} 
| {color:red} integration-test in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
45s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}129m 44s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-hdds/container-service |
|  |  Format-string method String.format(String, Object[]) called with format 
string "Ignoring delete blocks for containerId:with format string "Ignoring 
delete blocks for containerId: {}. Outdated delete transactionId {} < {}" wants 
0 arguments but is given 3 in 

[jira] [Comment Edited] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516916#comment-16516916
 ] 

Gabor Bota edited comment on HDFS-13683 at 6/19/18 10:31 AM:
-

But the fix is missing from 
[branch-3.0|https://github.com/apache/hadoop/blob/d1dcc39222b6d1d8ba10f38a3f2fb69e4d6548b3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L156]
 (which is ok, since there will be no new upstream release from the branch)


was (Author: gabor.bota):
But the fix is missing from 
[branch-3.0|https://github.com/apache/hadoop/blob/d1dcc39222b6d1d8ba10f38a3f2fb69e4d6548b3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L156]
 (which is ok, since there will be no upstream release from the branch)

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516916#comment-16516916
 ] 

Gabor Bota edited comment on HDFS-13683 at 6/19/18 10:31 AM:
-

But the fix is missing from 
[branch-3.0|https://github.com/apache/hadoop/blob/d1dcc39222b6d1d8ba10f38a3f2fb69e4d6548b3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L156]
 (which is ok, since there will be no upstream release from the branch)


was (Author: gabor.bota):
But the fix is missing from 
[branch-3.0|https://github.com/apache/hadoop/blob/d1dcc39222b6d1d8ba10f38a3f2fb69e4d6548b3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L156]

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-180) CloseContainer should commit all pending open keys for a container

2018-06-19 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-180.
--
Resolution: Fixed

> CloseContainer should commit all pending open keys for a container
> --
>
> Key: HDDS-180
> URL: https://issues.apache.org/jira/browse/HDDS-180
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
>
> When a close container command gets executed, it will first mark the 
> container in closing state. All the open Keys for the container  will now 
> have to be committed. This requires us to track all pending open keys for a 
> container on a DataNode. This Jira aims to address all these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516916#comment-16516916
 ] 

Gabor Bota commented on HDFS-13683:
---

But the fix is missing from 
[branch-3.0|https://github.com/apache/hadoop/blob/d1dcc39222b6d1d8ba10f38a3f2fb69e4d6548b3/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L156]

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-181) CloseContainer should commit all pending open Keys on a dataode

2018-06-19 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-181:
-
Description: 
A close container command arrives in the Datanode by the SCM heartBeat 
response.It will then be queued up over the ratis pipeline. Once the command 
execution starts inside the Datanode, it will mark the container in CLOSING 
State. All the pending open keys for the container now will be committed 
followed by the transition of the container state from CLOSING to CLOSED. For 
achieving this, all the open keys for a container need to be tracked.

This Jira aims to address this.

  was:
A close container command arrives in the Datanode by the SCM heartBeat response.

It will then be queued up over the ratis pipeline. Once the command execution 
starts inside the Datanode, it will mark the container in CLOSING State. All 
the pending open keys for the container now will be committed followed by the 
transition of the container state from CLOSING to CLOSED state. For achieving 
this, all the open keys for a container need to be tracked.

This Jira aims to address this.


> CloseContainer should commit all pending open Keys on a dataode
> ---
>
> Key: HDDS-181
> URL: https://issues.apache.org/jira/browse/HDDS-181
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
>
> A close container command arrives in the Datanode by the SCM heartBeat 
> response.It will then be queued up over the ratis pipeline. Once the command 
> execution starts inside the Datanode, it will mark the container in CLOSING 
> State. All the pending open keys for the container now will be committed 
> followed by the transition of the container state from CLOSING to CLOSED. For 
> achieving this, all the open keys for a container need to be tracked.
> This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-181) CloseContainer should commit all pending open Keys on a dataode

2018-06-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-181:


 Summary: CloseContainer should commit all pending open Keys on a 
dataode
 Key: HDDS-181
 URL: https://issues.apache.org/jira/browse/HDDS-181
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.2.1


A close container command arrives in the Datanode by the SCM heartBeat response.

It will then be queued up over the ratis pipeline. Once the command execution 
starts inside the Datanode, it will mark the container in CLOSING State. All 
the pending open keys for the container now will be committed followed by the 
transition of the container state from CLOSING to CLOSED state. For achieving 
this, all the open keys for a container need to be tracked.

This Jira aims to address this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work stopped] (HDFS-13393) Improve OOM logging

2018-06-19 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13393 stopped by Gabor Bota.
-
> Improve OOM logging
> ---
>
> Key: HDFS-13393
> URL: https://issues.apache.org/jira/browse/HDFS-13393
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: balancer  mover, datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new 
> native thread" errors in a HDFS cluster. Most often this happens when 
> DataNode creating DataXceiver threads, or when balancer creates threads for 
> moving blocks around.
> In most of cases, the "OOM" is a symptom of number of threads reaching system 
> limit, rather than actually running out of memory, and the current logging of 
> this message is usually misleading (suggesting this is due to insufficient 
> memory)
> How about capturing the OOM, and if it is due to "unable to create new native 
> thread", print some more helpful message like "bump your ulimit" or "take a 
> jstack of the process"?
> Even better, surface this error to make it more visible. It usually takes a 
> while for an in-depth investigation after users notice some job fails, by the 
> time the evidences may already been gone (like jstack output).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-180) CloseContainer should commit all pending open keys for a container

2018-06-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-180:


 Summary: CloseContainer should commit all pending open keys for a 
container
 Key: HDDS-180
 URL: https://issues.apache.org/jira/browse/HDDS-180
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.2.1


When a close container command gets executed, it will first mark the container 
in closing state. All the open Keys for the container  will now have to be 
committed. This requires us to track all pending open keys for a container on a 
DataNode. This Jira aims to address all these.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-13672) clearCorruptLazyPersistFiles could crash NameNode

2018-06-19 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota reassigned HDFS-13672:
-

Assignee: Gabor Bota

> clearCorruptLazyPersistFiles could crash NameNode
> -
>
> Key: HDFS-13672
> URL: https://issues.apache.org/jira/browse/HDFS-13672
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Gabor Bota
>Priority: Major
>
> I started a NameNode on a pretty large fsimage. Since the NameNode is started 
> without any DataNodes, all blocks (100 million) are "corrupt".
> Afterwards I observed FSNamesystem#clearCorruptLazyPersistFiles() held write 
> lock for a long time:
> {noformat}
> 18/06/12 12:37:03 INFO namenode.FSNamesystem: FSNamesystem write lock held 
> for 46024 ms via
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1689)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.clearCorruptLazyPersistFiles(FSNamesystem.java:5532)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem$LazyPersistFileScrubber.run(FSNamesystem.java:5543)
> java.lang.Thread.run(Thread.java:748)
> Number of suppressed write-lock reports: 0
> Longest write-lock held interval: 46024
> {noformat}
> Here's the relevant code:
> {code}
>   writeLock();
>   try {
> final Iterator it =
> blockManager.getCorruptReplicaBlockIterator();
> while (it.hasNext()) {
>   Block b = it.next();
>   BlockInfo blockInfo = blockManager.getStoredBlock(b);
>   if (blockInfo.getBlockCollection().getStoragePolicyID() == 
> lpPolicy.getId()) {
> filesToDelete.add(blockInfo.getBlockCollection());
>   }
> }
> for (BlockCollection bc : filesToDelete) {
>   LOG.warn("Removing lazyPersist file " + bc.getName() + " with no 
> replicas.");
>   changed |= deleteInternal(bc.getName(), false, false, false);
> }
>   } finally {
> writeUnlock();
>   }
> {code}
> In essence, the iteration over corrupt replica list should be broken down 
> into smaller iterations to avoid a single long wait.
> Since this operation holds NameNode write lock for more than 45 seconds, the 
> default ZKFC connection timeout, it implies an extreme case like this (100 
> million corrupt blocks) could lead to NameNode failover.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-13683 started by Gabor Bota.
-
> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13683) HDFS StoragePolicy commands should work with Federation

2018-06-19 Thread Gabor Bota (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516910#comment-16516910
 ] 

Gabor Bota commented on HDFS-13683:
---

Seems like very much a duplicated issue of HDFS-12857.
[~xiaochen], what do you think?

> HDFS StoragePolicy commands should work with Federation
> ---
>
> Key: HDFS-13683
> URL: https://issues.apache.org/jira/browse/HDFS-13683
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, tools
>Affects Versions: 3.0.0
>Reporter: Gabor Bota
>Assignee: Gabor Bota
>Priority: Major
>
> In a federated cluster (defaultFS = viewfs://cluster3), getStoragePolicy with 
> hdfs uri will run into the following error:
> {noformat}
> [root@cdh6-0-0-centos73-17443-1 ~]# hdfs storagepolicies -getStoragePolicy 
> -path 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896
> IllegalArgumentException: Wrong FS: 
> hdfs://ns2/hbase/WALs/cdh6-0-0-centos73-17443-4.vpc.cloudera.com,22101,1527272361896,
>  expected: viewfs://cluster3/
> {noformat}
> Taking a quick look at the code, I think 
> [this|https://github.com/apache/hadoop/blob/3e37a9a70ba93430da1b47f2a8b50358348396b0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/StoragePolicyAdmin.java#L106]
>  is the culprit:
> {code}
>  final FileSystem fs = FileSystem.get(conf);
> // should do: final FileSystem fs = p.getFilesystem(conf);
> {code}
> We should have a review of all shell and see if anything else is missing. At 
> the minimum, we should fix all places in StoragePolicyAdmin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-179) CloseContainer command should be executed only if all the prior "Write" type container requests get executed

2018-06-19 Thread Shashikant Banerjee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-179:
-
Description: 
When a close Container command request comes to a Datanode (via SCM hearbeat 
response) through the Ratis protocol, all the prior enqueued "Write" type of 
request like putKey, WriteChunk, DeleteKey, CompactChunk etc should be executed 
first before CloseContainer request gets executed. This synchronization needs 
to be handled in the containerStateMachine. This Jira aims to address this.

 

  was:
When a close Container command request comes to a Datanode (via SCM hearbeat 
response) through the Ratis protocol, all the prior enqueued "Write" type of 
request like putKey, WriteChunk,

DeleteKey, CompactChunk etc should be executed first before CloseContainer 
request gets executed. This synchronization needs to be handled in the 
containerStateMachine. This Jira aims

to address this.

 


> CloseContainer command should be executed only if all the  prior "Write" type 
> container requests get executed
> -
>
> Key: HDDS-179
> URL: https://issues.apache.org/jira/browse/HDDS-179
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client, Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.2.1
>
>
> When a close Container command request comes to a Datanode (via SCM hearbeat 
> response) through the Ratis protocol, all the prior enqueued "Write" type of 
> request like putKey, WriteChunk, DeleteKey, CompactChunk etc should be 
> executed first before CloseContainer request gets executed. This 
> synchronization needs to be handled in the containerStateMachine. This Jira 
> aims to address this.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-179) CloseContainer command should be executed only if all the prior "Write" type container requests get executed

2018-06-19 Thread Shashikant Banerjee (JIRA)
Shashikant Banerjee created HDDS-179:


 Summary: CloseContainer command should be executed only if all the 
 prior "Write" type container requests get executed
 Key: HDDS-179
 URL: https://issues.apache.org/jira/browse/HDDS-179
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client, Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.2.1


When a close Container command request comes to a Datanode (via SCM hearbeat 
response) through the Ratis protocol, all the prior enqueued "Write" type of 
request like putKey, WriteChunk,

DeleteKey, CompactChunk etc should be executed first before CloseContainer 
request gets executed. This synchronization needs to be handled in the 
containerStateMachine. This Jira aims

to address this.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread Lokesh Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516867#comment-16516867
 ] 

Lokesh Jain commented on HDDS-178:
--

v2 patch fixes a few comments and debug logs.

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-178:
-
Attachment: HDDS-178.002.patch

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch, HDDS-178.002.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-178:
-
Status: Patch Available  (was: Open)

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread Lokesh Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDDS-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDDS-178:
-
Attachment: HDDS-178.001.patch

> DeleteBlocks should not be handled by open containers
> -
>
> Key: HDDS-178
> URL: https://issues.apache.org/jira/browse/HDDS-178
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Datanode
>Reporter: Lokesh Jain
>Assignee: Lokesh Jain
>Priority: Major
> Attachments: HDDS-178.001.patch
>
>
> In the case of open containers deleteBlocks command just adds an entry in the 
> log but does not delete the blocks. These blocks are deleted only when 
> container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-178) DeleteBlocks should not be handled by open containers

2018-06-19 Thread Lokesh Jain (JIRA)
Lokesh Jain created HDDS-178:


 Summary: DeleteBlocks should not be handled by open containers
 Key: HDDS-178
 URL: https://issues.apache.org/jira/browse/HDDS-178
 Project: Hadoop Distributed Data Store
  Issue Type: Task
  Components: Ozone Datanode
Reporter: Lokesh Jain
Assignee: Lokesh Jain


In the case of open containers deleteBlocks command just adds an entry in the 
log but does not delete the blocks. These blocks are deleted only when 
container is closed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13687) ConfiguredFailoverProxyProvider could direct requests to SBN

2018-06-19 Thread Chao Sun (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516754#comment-16516754
 ] 

Chao Sun commented on HDFS-13687:
-

Thanks [~xkrogen] and [~shv] for the very valuable comments! Regarding 
Konstantin's comments:

1. Very good point. Sorry I didn't know that {{getServiceStatus}} requires 
super privilege. Another option might be to add another interface/protocol to 
get the active/standby state from NN, [as proposed in the original 
JIRA|https://issues.apache.org/jira/browse/HDFS-2917?focusedCommentId=13204178=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13204178].
2. Yes good point. Perhaps we can reuse {{NameNodeHAProxyFactory}} to create 
the factory needed in our case. 
3. Can you elaborate more on this?. Currently 
{{ConfiguredFailoverProxyProvider}} does assume all remote addresses are NN, 
right?

Will go back to HDFS-12976 in the mean time.

> ConfiguredFailoverProxyProvider could direct requests to SBN
> 
>
> Key: HDFS-13687
> URL: https://issues.apache.org/jira/browse/HDFS-13687
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Minor
> Attachments: HDFS-13687.000.patch
>
>
> In case there are multiple SBNs, and {{dfs.ha.allow.stale.reads}} is set to 
> true, failover could go to a SBN which then may serve read requests from 
> client. This may not be the expected behavior. This issue arises when we are 
> working on HDFS-12943 and HDFS-12976.
> A better approach for this could be to check {{HAServiceState}} and find out 
> the active NN when performing failover. This also can reduce the # of 
> failovers the client has to do in case of multiple SBNs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13682) Cannot create encryption zone after KMS auth token expires

2018-06-19 Thread genericqa (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516709#comment-16516709
 ] 

genericqa commented on HDFS-13682:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
29s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
56s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
23s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
25s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  5m 
10s{color} | {color:red} branch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 28m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} shadedclient {color} | {color:red}  2m 
14s{color} | {color:red} patch has errors when building and testing our client 
artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  9m 
15s{color} | {color:green} hadoop-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 94m 33s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
45s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}217m 37s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.namenode.TestReencryptionWithKMS |
|   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd |
| JIRA Issue | HDFS-13682 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12928286/HDFS-13682.02.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux bbd5b16ce9bf 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / f386e78 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_171 |
| findbugs | v3.1.0-RC1 |
| unit |