[ 
https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977947#comment-16977947
 ] 

Szilard Nemeth commented on YARN-9052:
--------------------------------------

Thank god, only 5 testcases failed.
Thanks to the exception outputs and the toString outputs, I was able to track 
what is the difference between the runs of my refactored code (with mock RM 
builder) and the old code, in terms of diff of the outputs.
See the attachments.

I also copied the test logs of these 5 testcases to 2 files: before my change 
and after my change. This way, I was able to see if the objects passed by the 
test framework are different or not.

Here are the Jenkins test results: 

{code:java}
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   
TestRMWebServicesSchedulerActivities.testQueueSkippedBecauseOfHeadroom:1556 
expected:<[Queue does not have enough headroom for inner highest-priority 
request]> but was:<[]>
[ERROR] Errors: 
[ERROR]   
TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureRelaxedNodeLocality:332->submitAppWithAMResourceRequests:85
 ? NullPointer
[ERROR]   
TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureStrictNodeLocality:245->submitAppWithAMResourceRequests:85
 ? NullPointer
[ERROR]   TestRMRestart.testAppReportNodeLabelRMRestart:524->submitApp:224 ? 
NullPointer
[ERROR]   
TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled.testAppAssignContainer:267
 ? TestTimedOut
{code}


Let's check the failures one by one:

1. TestRMWebServicesSchedulerActivities.testQueueSkippedBecauseOfHeadroom: This 
is a straightforward one. 
Original code was: 

{code:java}
RMApp app1 = rm.submitApp(10, "app1", "user1", null, "a1a");
{code}


New code is: 

{code:java}
  RMApp app1 = MockRMAppSubmitter.submit(rm,
                  MockRMAppSubmissionData.Builder.createWithDefaults(10, rm)
                      .withName("app1")
                      .withUser("user1")
                      .withAcls(null)
                      .withQueue("b1")
                      .build());

{code}

Queue name was wrong, possibly due to copy-pasted code.

2. This is a bit more difficult. The following 2 testcases failed due to a 
common reason, an NPE: 
TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureRelaxedNodeLocality
TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureStrictNodeLocality

Since these testcases are the only ones calling 
TestNodeBlacklistingOnAMFailures#submitAppWithAMResourceRequests, and the NPE 
is easily reproducible, something had to be gone wrong with the AM resource 
requests. A quick debug revealed that:
In SchedulerUtils#validateResourceRequest, the method receives a resReq 
parameter (type of ResourceRequest). We are reading resReq.getCapability(), but 
it was null. 

I checked how the capability of AM resource requests can be null and found out 
that in MockRMAppSubmitter#submit, there is a block of code that placed a 
logical negate operator wrongly: 


{code:java}
 List<ResourceRequest> amResourceRequests = data.getAmResourceRequests();
    if (amResourceRequests == null || !amResourceRequests.isEmpty()) {
      ResourceRequest amResReq = ResourceRequest.newInstance(
          priority, ResourceRequest.ANY, data.getResource(), 1);
      amResourceRequests = Collections.singletonList(amResReq);
    }
{code}

So in this case, we don't need the negate at the second half of the if 
condition.

3. TestRMRestart.testAppReportNodeLabelRMRestart: Yet another NPE here :(
Very similar location to the ones listed with point 2 and fails because of the 
same reason as well so this is also fixed.

4. 
TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled.testAppAssignContainer:
 I checked the diff of the logged objects and I can't see anything different 
here. The test timed out, so let's consider this as an intermittent failure for 
now.

Attached new patch to let jenkins run again.

> Replace all MockRM submit method definitions with a builder
> -----------------------------------------------------------
>
>                 Key: YARN-9052
>                 URL: https://issues.apache.org/jira/browse/YARN-9052
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Szilard Nemeth
>            Assignee: Szilard Nemeth
>            Priority: Minor
>         Attachments: 
> YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt,
>  YARN-9052-testlogs003-justfailed.txt, 
> YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt,
>  YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, 
> YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, 
> YARN-9052.004.withlogs.patch, YARN-9052.005.patch, 
> YARN-9052.testlogs.002.patch, YARN-9052.testlogs.002.patch, 
> YARN-9052.testlogs.003.patch, YARN-9052.testlogs.patch
>
>
> MockRM has 31 definitions of submitApp, most of them having more than 
> acceptable number of parameters, ranging from 2 to even 22 parameters, which 
> makes the code completely unreadable.
> On top of unreadability, it's very hard to follow what RmApp will be produced 
> for tests as they often pass a lot of empty / null values as parameters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to