[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110463#comment-17110463 ] Hudson commented on HDFS-14999: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18271 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/18271/]) HDFS-14999. Avoid Potential Infinite Loop in DFSNetworkTopology. (ayushsaxena: rev c84e6beada4e604175f7f138c9878a29665a8c47) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/DFSNetworkTopology.java > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110452#comment-17110452 ] Ayush Saxena commented on HDFS-14999: - Committed to trunk. Thanx Everyone!!! > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17109230#comment-17109230 ] Ayush Saxena commented on HDFS-14999: - Thanx [~vinayakumarb] for the review. If no further comments will commit by tomorrow EOD. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108986#comment-17108986 ] Vinayakumar B commented on HDFS-14999: -- +1 > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108486#comment-17108486 ] Wei-Chiu Chuang commented on HDFS-14999: Looks similar to HADOOP-15317 > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108465#comment-17108465 ] Ayush Saxena commented on HDFS-14999: - [~elgoiri] [~vinayakumarb] can you help review > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106944#comment-17106944 ] bright.zhou commented on HDFS-14999: Is it the final solution? can the patch be merged? [~vinayakumarb] [~brahma] > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102860#comment-17102860 ] Ayush Saxena commented on HDFS-14999: - Thanx [~vinayakumarb] for the review. I ran the said test. The performance isn’t getting affected. Both took similar times in multiple runs, with the newer one taking negligible number of milliseconds less. This shouldn’t impact performance ideally. Please review!!’ > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17102653#comment-17102653 ] Vinayakumar B commented on HDFS-14999: -- changes looks fine to me. It would be better to have a benchmark done on this. Perhaps the you can check \{{TestDFSNetworkTopologyPerformance}} class for changes with excluded nodes. Results before and after patch would help. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101688#comment-17101688 ] Ayush Saxena commented on HDFS-14999: - The last patch was from our internal code, I wrote it couple of days back, when bright's cluster got this issue. But wasn't complete solution. Have uploaded the complete solution for trunk. Handled the javadoc and comment issues too. [~weichiu] Actually the namenode got stuck for the first time on this loop for 40 mins and could come out only post failover, And post that this was continuously repeating sometimes but the namenode was getting stuck here for around 1 to 2 minutes. This present logic of infinite loop is logically Ok, but if the cluster size is huge, which was actually in our case, and if the excluded nodes are also high. The Random function may keep on returning the excluded nodes and we will be looping around and there won't be any exit to this. This may happen even if there is no bug, just the randomization always returning the excluded nodes. So, I think we should try eliminate this loop, at best or if this doesn't sound well, atleast have a breaking condition. Let me know your thoughts on this. [~vinayakumarb] can you too have a look once. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: HDFS-14999-01.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101662#comment-17101662 ] Hadoop QA commented on HDFS-14999: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 3s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 18m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 3m 4s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 16m 5s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}122m 12s{color} | {color:red} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}202m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.40 ServerAPI=1.40 base: https://builds.apache.org/job/PreCommit-HDFS-Build/29248/artifact/out/Dockerfile | | JIRA Issue | HDFS-14999 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13002255/HDFS-14999-01.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 820939054d11 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 99840aaba66 | | Default Java | Private Build-1.8.0_252-8u252-b09-1~18.04-b09 | | unit |
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101316#comment-17101316 ] bright.zhou commented on HDFS-14999: [~weichiu] Encountered this problem on our cluster and used this method to avoid this issue. I am sorry that the patch was uploaded but not assigned to me. I have deleted the patch. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101109#comment-17101109 ] Wei-Chiu Chuang commented on HDFS-14999: [~zgw] i saw you posted a patch. Is that the fix for this issue, and is it ready for precommit test? Some quick comments: * do you have a test? * would you please update the comments / javadocs in the code? {code} // to this point, it is guaranteed that there is at least one node // that satisfies the requirement, keep trying until we found one. Node chosen; {code} {code} /** * Choose a random node that has the required storage type, under the given * root, with an excluded subtree root (could also just be a leaf node). * * Note that excludedNode is checked after a random node, so it is not being * handled here. {code} both are not accurate after this patch. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > Attachments: InfiniteLoop.patch > > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979528#comment-16979528 ] Ayush Saxena commented on HDFS-14999: - Thanx [~weichiu] for the pointers. Yes, HADOOP-151317 discussed this problem of infinite loop only, but it concluded in a way not eliminating the loop. [~xiaochen] started discussion with having a termination condition for the loop, but in the end, they didn't conclude. Any suggestions on having a termination condition? Maybe some considerably large number? Or if you have any idea or suggestion on this or remember something from the discussions that time, could really help. :) > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978779#comment-16978779 ] Wei-Chiu Chuang commented on HDFS-14999: It might be related to HADOOP-15317. (not sure. the stacktrace looks familiar to me) We couldn't find the root cause of the infinite loop but the code was rewritten to eliminate a while loop. > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14999) Avoid Potential Infinite Loop in DFSNetworkTopology
[ https://issues.apache.org/jira/browse/HDFS-14999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16978766#comment-16978766 ] Ayush Saxena commented on HDFS-14999: - I don't have a fair idea, what exactly a exit condition should be. May be some configurable number of retries? some hard coded value? or equal to number of nodes? This isn't a bug post HDFS-14913, but logically has a potential of getting stuck long, if choose random keeps on returning the excluded node Any suggestions? > Avoid Potential Infinite Loop in DFSNetworkTopology > --- > > Key: HDFS-14999 > URL: https://issues.apache.org/jira/browse/HDFS-14999 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ayush Saxena >Assignee: Ayush Saxena >Priority: Major > > {code:java} > do { > chosen = chooseRandomWithStorageTypeAndExcludeRoot(root, excludeRoot, > type); > if (excludedNodes == null || !excludedNodes.contains(chosen)) { > break; > } else { > LOG.debug("Node {} is excluded, continuing.", chosen); > } > } while (true); > {code} > Observed this loop getting stuck as part of testing HDFS-14913. > There should be some exit condition or max retries here -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org