[ 
https://issues.apache.org/jira/browse/YARN-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257541#comment-15257541
 ] 

Rohith Sharma K S commented on YARN-4478:
-----------------------------------------

Thanks [~vinodkv] for your attention on this JIRA. Yes, I will close this 
Umbrella Jira since it is going very long and I think compared to where we 
started this tracking, test case runs seem better.

Recently I had discussion with [~sunilg] about various test failure root causes 
and made observations from this Umbrella JIRA. Some of the test case failure 
which are said to be *random*, seems to be fixable. I will share some of the 
observations made while fixing/reviewing/committing test cases in this umbrella 
JIRA. 

Types of failures seen
# Yarn event model - AsyncDispatcher : Most of the random test failures seen in 
this category. For example, After registering node to RM, asserting for cluster 
resource from scheduler.
{code}
    rm.start();
    rm.registerNode("h1:1234", 5120);
    assertEquals(5120,rm.getResourceScheduler().getClusterResource());
{code}
Many a times, contributors forget while writing test cases that yarn events are 
async. Many random failures are because of these events processing delay which 
seems running in local eclipse tests.
# System Settings : We have seen few test case fails regularly. Mainly because 
of DNS configurations. See HADOOP-12687 and  INFRA-11150. This I am not sure 
how should it be resolved since neither code check in preferred since it breaks 
RFC standards.
# As test cases were made running in parrellel, we ran into "Address bind 
exception" issues.
# MockRM APIs : In MockRM many API's are there to submit job and lunch AM which 
is added over time. Few such methods are internally waiting for some events to 
happen  and few others explicitly need to wait for these events from test case 
(contributors has to take care of this). For test case writing, contributors 
should be aware of MockRM#API what it does internally. And we have seen a mix 
of these apis causing random failures.

For handling open issues in this Umbrella, I will detach it and make it as Test 
bug. Let us handle these separately.

> [Umbrella] : Track all the Test failures in YARN
> ------------------------------------------------
>
>                 Key: YARN-4478
>                 URL: https://issues.apache.org/jira/browse/YARN-4478
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Rohith Sharma K S
>
> Recently many test cases are failing either timed out or new bug fix caused 
> impact. Many test faiures JIRA are raised and are in progress.
> This is to track all the test failures JIRA's



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to