[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-18 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110463#comment-17110463
 ] 

Hudson commented on HDFS-14999:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18271 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18271/])
HDFS-14999. Avoid Potential Infinite Loop in DFSNetworkTopology. (ayushsaxena: 
rev c84e6beada4e604175f7f138c9878a29665a8c47)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java


> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-18 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110452#comment-17110452
 ] 

Ayush Saxena commented on HDFS-14999:
-

Committed to trunk.
Thanx Everyone!!!

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-16 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109230#comment-17109230
 ] 

Ayush Saxena commented on HDFS-14999:
-

Thanx [~vinayakumarb] for the review.
If no further comments will commit by tomorrow EOD.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-16 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108986#comment-17108986
 ] 

Vinayakumar B commented on HDFS-14999:
--

+1

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-15 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108486#comment-17108486
 ] 

Wei-Chiu Chuang commented on HDFS-14999:


Looks similar to HADOOP-15317

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-15 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108465#comment-17108465
 ] 

Ayush Saxena commented on HDFS-14999:
-

[~elgoiri] [~vinayakumarb] can you help review 

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-14 Thread bright.zhou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106944#comment-17106944
 ] 

bright.zhou commented on HDFS-14999:


Is it the final solution? can the patch be merged? [~vinayakumarb]  [~brahma]

 

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102860#comment-17102860
 ] 

Ayush Saxena commented on HDFS-14999:
-

Thanx [~vinayakumarb] for the review.
I ran the said test. The performance isn’t getting affected. Both took similar 
times in multiple runs, with the newer one taking negligible number of 
milliseconds less. 

This shouldn’t impact performance ideally. 
Please review!!’

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-08 Thread Vinayakumar B (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102653#comment-17102653
 ] 

Vinayakumar B commented on HDFS-14999:
--

changes looks fine to me.

It would be better to have a benchmark done on this.

Perhaps the you can check \{{TestDFSNetworkTopologyPerformance}} class for 
changes with excluded nodes.

Results before and after patch would help.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-07 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101688#comment-17101688
 ] 

Ayush Saxena commented on HDFS-14999:
-

The last patch was from our internal code, I wrote it couple of days back, when 
bright's cluster got this issue. But wasn't complete solution. 
Have uploaded the complete solution for trunk. Handled the javadoc and comment 
issues too.

[~weichiu] Actually the namenode got stuck for the first time on this loop for 
40 mins and could come out only post failover, And post that this was 
continuously repeating sometimes but the namenode was getting stuck here for 
around 1 to 2 minutes. 

This present logic of infinite loop is logically Ok, but if the cluster size is 
huge, which was actually in our case, and if the excluded nodes are also high. 
The Random function may keep on returning the excluded nodes and we will be 
looping around and there won't be any exit to this. This may happen even if 
there is no bug, just the randomization always returning the excluded nodes. 
So, I think we should try eliminate this loop, at best or if this doesn't sound 
well, atleast have a breaking condition. 
Let me know your thoughts on this.

[~vinayakumarb] can you too have a look once.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: HDFS-14999-01.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-07 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101662#comment-17101662
 ] 

Hadoop QA commented on HDFS-14999:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m  
3s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
35s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
18m 53s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m  
4s{color} | {color:blue} Used deprecated FindBugs config; considering switching 
to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
5s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m  5s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
32s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 12s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
37s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}202m 38s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://builds.apache.org/job/PreCommit-HDFS-Build/29248/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-14999 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13002255/HDFS-14999-01.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 820939054d11 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 
08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | personality/hadoop.sh |
| git revision | trunk / 99840aaba66 |
| Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 |
| unit | 

[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-06 Thread bright.zhou (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101316#comment-17101316
 ] 

bright.zhou commented on HDFS-14999:


[~weichiu] Encountered this problem on our cluster and used this method to 
avoid this issue. I am sorry that the patch was uploaded but not assigned to 
me. I have deleted the patch.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2020-05-06 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101109#comment-17101109
 ] 

Wei-Chiu Chuang commented on HDFS-14999:


[~zgw] i saw you posted a patch. Is that the fix for this issue, and is it 
ready for precommit test?

Some quick comments:
* do you have a test?
* would you please update the comments / javadocs in the code?
{code}
// to this point, it is guaranteed that there is at least one node
// that satisfies the requirement, keep trying until we found one.
Node chosen;
{code}

{code}
  /**
   * Choose a random node that has the required storage type, under the given
   * root, with an excluded subtree root (could also just be a leaf node).
   *
   * Note that excludedNode is checked after a random node, so it is not being
   * handled here.
{code}
both are not accurate after this patch.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Attachments: InfiniteLoop.patch
>
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2019-11-21 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979528#comment-16979528
 ] 

Ayush Saxena commented on HDFS-14999:
-

Thanx [~weichiu] for the pointers.
Yes, HADOOP-151317 discussed this problem of infinite loop only, but it 
concluded in a way not eliminating the loop. [~xiaochen] started discussion 
with having a termination condition for the loop, but in the end, they didn't 
conclude.
Any suggestions on having a termination condition? Maybe some considerably 
large number? Or if you have any idea or suggestion on this or remember 
something from the discussions that time, could really help. :)

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2019-11-20 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978779#comment-16978779
 ] 

Wei-Chiu Chuang commented on HDFS-14999:


It might be related to HADOOP-15317. (not sure. the stacktrace looks familiar 
to me)
We couldn't find the root cause of the infinite loop but the code was rewritten 
to eliminate a while loop.

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology

2019-11-20 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978766#comment-16978766
 ] 

Ayush Saxena commented on HDFS-14999:
-

I don't have a fair idea, what exactly a exit condition should be. May be some 
configurable number of retries? some hard coded value? or equal to number of 
nodes?
This isn't a bug post HDFS-14913, but logically has a potential of getting 
stuck long, if choose random keeps on returning the excluded node
Any suggestions?

> Avoid Potential Infinite Loop in DFSNetworkTopology
> ---
>
> Key: HDFS-14999
> URL: https://issues.apache.org/jira/browse/HDFS-14999
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
>
> {code:java}
> do {
>   chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot,
>   type);
>   if (excludedNodes == null || !excludedNodes.contains(chosen)) {
> break;
>   } else {
> LOG.debug("Node {} is excluded, continuing.", chosen);
>   }
> } while (true);
> {code}
> Observed this loop getting stuck as part of testing HDFS-14913.
> There should be some exit condition or max retries here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org