[ https://issues.apache.org/jira/browse/YARN-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977947#comment-16977947 ]
Szilard Nemeth commented on YARN-9052: -------------------------------------- Thank god, only 5 testcases failed. Thanks to the exception outputs and the toString outputs, I was able to track what is the difference between the runs of my refactored code (with mock RM builder) and the old code, in terms of diff of the outputs. See the attachments. I also copied the test logs of these 5 testcases to 2 files: before my change and after my change. This way, I was able to see if the objects passed by the test framework are different or not. Here are the Jenkins test results: {code:java} [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestRMWebServicesSchedulerActivities.testQueueSkippedBecauseOfHeadroom:1556 expected:<[Queue does not have enough headroom for inner highest-priority request]> but was:<[]> [ERROR] Errors: [ERROR] TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureRelaxedNodeLocality:332->submitAppWithAMResourceRequests:85 ? NullPointer [ERROR] TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureStrictNodeLocality:245->submitAppWithAMResourceRequests:85 ? NullPointer [ERROR] TestRMRestart.testAppReportNodeLabelRMRestart:524->submitApp:224 ? NullPointer [ERROR] TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled.testAppAssignContainer:267 ? TestTimedOut {code} Let's check the failures one by one: 1. TestRMWebServicesSchedulerActivities.testQueueSkippedBecauseOfHeadroom: This is a straightforward one. Original code was: {code:java} RMApp app1 = rm.submitApp(10, "app1", "user1", null, "a1a"); {code} New code is: {code:java} RMApp app1 = MockRMAppSubmitter.submit(rm, MockRMAppSubmissionData.Builder.createWithDefaults(10, rm) .withName("app1") .withUser("user1") .withAcls(null) .withQueue("b1") .build()); {code} Queue name was wrong, possibly due to copy-pasted code. 2. This is a bit more difficult. The following 2 testcases failed due to a common reason, an NPE: TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureRelaxedNodeLocality TestNodeBlacklistingOnAMFailures.testNodeBlacklistingOnAMFailureStrictNodeLocality Since these testcases are the only ones calling TestNodeBlacklistingOnAMFailures#submitAppWithAMResourceRequests, and the NPE is easily reproducible, something had to be gone wrong with the AM resource requests. A quick debug revealed that: In SchedulerUtils#validateResourceRequest, the method receives a resReq parameter (type of ResourceRequest). We are reading resReq.getCapability(), but it was null. I checked how the capability of AM resource requests can be null and found out that in MockRMAppSubmitter#submit, there is a block of code that placed a logical negate operator wrongly: {code:java} List<ResourceRequest> amResourceRequests = data.getAmResourceRequests(); if (amResourceRequests == null || !amResourceRequests.isEmpty()) { ResourceRequest amResReq = ResourceRequest.newInstance( priority, ResourceRequest.ANY, data.getResource(), 1); amResourceRequests = Collections.singletonList(amResReq); } {code} So in this case, we don't need the negate at the second half of the if condition. 3. TestRMRestart.testAppReportNodeLabelRMRestart: Yet another NPE here :( Very similar location to the ones listed with point 2 and fails because of the same reason as well so this is also fixed. 4. TestRMWebServicesSchedulerActivitiesWithMultiNodesEnabled.testAppAssignContainer: I checked the diff of the logged objects and I can't see anything different here. The test timed out, so let's consider this as an intermittent failure for now. Attached new patch to let jenkins run again. > Replace all MockRM submit method definitions with a builder > ----------------------------------------------------------- > > Key: YARN-9052 > URL: https://issues.apache.org/jira/browse/YARN-9052 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Szilard Nemeth > Assignee: Szilard Nemeth > Priority: Minor > Attachments: > YARN-9052-004withlogs-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs003-justfailed.txt, > YARN-9052-testlogs003-patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt, > YARN-9052-testlogs004-justfailed.txt, YARN-9052.001.patch, > YARN-9052.002.patch, YARN-9052.003.patch, YARN-9052.004.patch, > YARN-9052.004.withlogs.patch, YARN-9052.005.patch, > YARN-9052.testlogs.002.patch, YARN-9052.testlogs.002.patch, > YARN-9052.testlogs.003.patch, YARN-9052.testlogs.patch > > > MockRM has 31 definitions of submitApp, most of them having more than > acceptable number of parameters, ranging from 2 to even 22 parameters, which > makes the code completely unreadable. > On top of unreadability, it's very hard to follow what RmApp will be produced > for tests as they often pass a lot of empty / null values as parameters. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org