[
https://issues.apache.org/jira/browse/YARN-10421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252051#comment-17252051
]
Szilard Nemeth edited comment on YARN-10421 at 12/19/20, 12:13 AM:
-------------------------------------------------------------------
Thanks [~bteke] for working on this.
{quote}1-2-7-9-11-12-15: corrected them.
{quote}
Good, the fixes are okay.
{quote}3. Other similar methods of this class use the same message so I didn't
want to introduce something different.
{quote}
Okay, then it's fine like this.
{quote}I added some TODO comments. This class is used for testing purposes and
the REST testing related tasks are part of YARN-10434.
{quote}
Okay. In general TODOs are not allowed AFAIK, but as you referred to a follow
up YARN-10434, it's okay.
Can you add a reference to YARN-10434 in the code as well. Just add it to the
TODO message.
{quote}6. Thanks, I would have missed this document. As the overall
implementation is subject to changes I'll create a new Jira for the
documentation updates.
{quote}
Thanks.
{quote}8. This item is to be changed in the tests, and in YARN-10432 it will be
replaced by a configuration entry. Should I uppercase it regardless?
{quote}
No, I think in this case it is fine.
{quote}10. It is currently a placeholder, in YARN-10422 I'll bundle the script
and after that I'll update the location.
{quote}
Okay, got it.
{quote}13-14. I refactored the tests.
{quote}
Let me restart the numbering here:
1. I noticed one more thing:
In RMWebServices#getCommonIssueList and RMWebServices#getCommonIssueData: Both
catch blocks catch all the Exceptions, but the newly created WebAppException
doesn't have the expception (variable 'e') passed as the 2nd parameter (cause).
This is bad, because the cause exception won't be propagated up and won't get
printed properly as only the message is printed here.
2. Looking at FederationInterceptorREST once again:
There are two new methods throwing NotImplementedException with message: "Code
is not implemented". Can you add a YARN jira as a reference for the planned
implementor of these methods?
3. Nit:
org.apache.hadoop.yarn.server.resourcemanager.DiagnosticsServiceTest#testParseIssueTypeValidCases
For me, this would be more readable if the line variable would be inlined for
all usages.
4. Please add assertion messages for each assertion in method:
org.apache.hadoop.yarn.server.resourcemanager.DiagnosticsServiceTest#assertIssueEquality
was (Author: snemeth):
Thanks [~bteke] for working on this.
{quote}
1-2-7-9-11-12-15: corrected them.
{quote}
Good, the fixes are okay.
{quote}
3. Other similar methods of this class use the same message so I didn't want to
introduce something different.
{quote}
Okay, then it's fine like this.
{quote}
I added some TODO comments. This class is used for testing purposes and the
REST testing related tasks are part of YARN-10434.
{quote}
Okay. In general TODOs are not allowed AFAIK, but as you referred to a follow
up YARN-10434, it's okay.
Can you add a reference to YARN-10434 in the code as well. Just add it to the
TODO message.
{quote}
6. Thanks, I would have missed this document. As the overall implementation is
subject to changes I'll create a new Jira for the documentation updates.
{quote}
Thanks.
{quote}
8. This item is to be changed in the tests, and in YARN-10432 it will be
replaced by a configuration entry. Should I uppercase it regardless?
{quote}
No, I think in this cas it is fine.
{quote}
10. It is currently a placeholder, in YARN-10422 I'll bundle the script and
after that I'll update the location.
{quote}
Okay, got it.
{quote}
13-14. I refactored the tests.
{quote}
Let me restart the numbering here:
1. I noticed one more thing:
In RMWebServices#getCommonIssueList and RMWebServices#getCommonIssueData: Both
catch blocks catch all the Exceptions, but the newly created WebAppException
doesn't have the expception (variable 'e') passed as the 2nd parameter (cause).
This is bad, because the cause exception won't be propagated up and won't get
printed properly as only the message is printed here.
2. Looking at FederationInterceptorREST once again:
There are two new methods throwing NotImplementedException with message: "Code
is not implemented". Can you add a YARN jira as a reference for the planned
implementor of these methods?
3. Nit:
org.apache.hadoop.yarn.server.resourcemanager.DiagnosticsServiceTest#testParseIssueTypeValidCases
For me, this would be more readable if the line variable would be inlined for
all usages.
4. Please add assertion messages for each assertion in method:
org.apache.hadoop.yarn.server.resourcemanager.DiagnosticsServiceTest#assertIssueEquality
> Create YarnDiagnosticsService to serve diagnostic queries
> ----------------------------------------------------------
>
> Key: YARN-10421
> URL: https://issues.apache.org/jira/browse/YARN-10421
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Benjamin Teke
> Assignee: Benjamin Teke
> Priority: Major
> Attachments: YARN-10421.001.patch, YARN-10421.002.patch,
> YARN-10421.003.patch, YARN-10421.004.patch
>
>
> YarnDiagnosticsServlet should run inside ResourceManager Daemon. The servlet
> forks a separate process, which executes a shell/Python/etc script. Based on
> the use-cases listed below the script collects information, bundles it and
> sends it to UI2. The diagnostic options are the following:
> # Application hanging:
> ** Application logs
> ** Find the hanging container and get multiple Jstacks
> ** ResourceManager logs during job lifecycle
> ** NodeManager logs from NodeManager where the hanging containers of the
> jobs ran
> ** Job configuration from MapReduce HistoryServer, Spark HistoryServer, Tez
> History URL
> # Application failed:
> ** Application logs
> ** ResourceManager logs during job lifecycle.
> ** NodeManager logs from NodeManager where the hanging containers of the
> jobs ran
> ** Job Configuration from MapReduce HistoryServer, Spark HistoryServer, Tez
> History URL.
> ** Job related metrics like container, attempts.
> # Scheduler related issue:
> ** ResourceManager Scheduler logs with DEBUG enabled for 2 minutes.
> ** Multiple Jstacks of ResourceManager
> ** YARN and Scheduler Configuration
> ** Cluster Scheduler API _/ws/v1/cluster/scheduler_ and Cluster Nodes API
> _/ws/v1/cluster/nodes response_
> ** Scheduler Activities _/ws/v1/cluster/scheduler/bulkactivities_ response
> (YARN-10319)
> # ResourceManager / NodeManager daemon fails to start:
> ** ResourceManager and NodeManager out and log file
> ** YARN and Scheduler Configuration
> Two new endpoints should be added to the RM web service: one for listing the
> available diagnostic options (_/common-issue/list_), and one for calling a
> selected option with the user provided parameters (_/common-issue/collect_).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]