[ https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257541#comment-15257541 ]
Rohith Sharma K S commented on YARN-4478: ----------------------------------------- Thanks [~vinodkv] for your attention on this JIRA. Yes, I will close this Umbrella Jira since it is going very long and I think compared to where we started this tracking, test case runs seem better. Recently I had discussion with [~sunilg] about various test failure root causes and made observations from this Umbrella JIRA. Some of the test case failure which are said to be *random*, seems to be fixable. I will share some of the observations made while fixing/reviewing/committing test cases in this umbrella JIRA. Types of failures seen # Yarn event model - AsyncDispatcher : Most of the random test failures seen in this category. For example, After registering node to RM, asserting for cluster resource from scheduler. {code} rm.start(); rm.registerNode("h1:1234", 5120); assertEquals(5120,rm.getResourceScheduler().getClusterResource()); {code} Many a times, contributors forget while writing test cases that yarn events are async. Many random failures are because of these events processing delay which seems running in local eclipse tests. # System Settings : We have seen few test case fails regularly. Mainly because of DNS configurations. See HADOOP-12687 and INFRA-11150. This I am not sure how should it be resolved since neither code check in preferred since it breaks RFC standards. # As test cases were made running in parrellel, we ran into "Address bind exception" issues. # MockRM APIs : In MockRM many API's are there to submit job and lunch AM which is added over time. Few such methods are internally waiting for some events to happen and few others explicitly need to wait for these events from test case (contributors has to take care of this). For test case writing, contributors should be aware of MockRM#API what it does internally. And we have seen a mix of these apis causing random failures. For handling open issues in this Umbrella, I will detach it and make it as Test bug. Let us handle these separately. > [Umbrella] : Track all the Test failures in YARN > ------------------------------------------------ > > Key: YARN-4478 > URL: https://issues.apache.org/jira/browse/YARN-4478 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn > Reporter: Rohith Sharma K S > > Recently many test cases are failing either timed out or new bug fix caused > impact. Many test faiures JIRA are raised and are in progress. > This is to track all the test failures JIRA's -- This message was sent by Atlassian JIRA (v6.3.4#6332)