[ 
https://issues.apache.org/jira/browse/YARN-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126732#comment-17126732
 ] 

Adam Antal commented on YARN-10304:
-----------------------------------

Thanks for the draft [~gandras]!

If we want a general remote log directory handling class, then we should build 
it in a way to handle all log-related path properties like appId, bucket, 
nodeId. Also it does not make sense to create a n+1 class which interacts with 
these paths, so this class should be used everywhere throughout the code where 
log aggregation is mentioned. The scope of such kind of refactor is massive, 
and is much bigger than this issue necessitates. However this refactor would be 
quite useful though, as the path building is not centralized and can be 
incoherent in different classes: so the benefit is clear, just does not fit 
into this issue. 

For this particular issue we only need the first part of the path, so I think 
it's a better decision to refrain from such refactor. That being said you can 
still file a refactor issue that handles this problem, and I'd be more than 
happy to see your contribution there.

I took a closer look to this patch and have some suggestion to the taken 
approach:
- In {{RemoteLogPathFactory}} (especially in {{initialize()}}, 
{{loadRootLogDirPath()}}, {{loadRootLogDirSuffixPath()}}) you have partially 
recreated the logic from {{LogAggregationFileControllerFactory}} (extracting 
and instantiating controllers) and {{LogAggregationFileController}} (parsing 
configurations and building root log directory). I think this is dangerous: if 
the file controller path is changed they might become inconsistent. I'd like to 
propose the following: use the {{factoryInstance}} in {{LogServlet}} to obtain 
the controllers, and call the {{getRemoteRootLogDir()}} and 
{{getRemoteRootLogDirSuffix()}} function to work with it. In this way you use 
their functions and is not going to be out of sync. You can keep the 
{{RemoteLogPathFactory}} object to handle the work or you can do it directly in 
{{LogServlet}} as well.
- Also {{LogAggregationUtils#getRemoteLogSuffixedDir()}} does exactly what you 
need (construct the path from the remote root log dir, user and the suffix). We 
can use that, and leave the refactor of that function to YARN-10030
- One application of the endpoint would be to let YARN handle the composition 
of the path, if we search for the aggregated logs of another user. So 
{{UserGroupInformation .getCurrentUser()}} might not always be the user we want 
to be querying. A straightforward option is to provide a query parameter to 
that endpoint. I'd like to go that way.
- Please move the {{RemoteLogPathResult}} and {{RemoteLogPath}} DAO objects to 
their own classes from {{RemoteLogPathFactory}}. I think they should be put to 
org.apache.hadoop.yarn.webapp.dao or .log inside hadoop-yarn-common.

> Create an endpoint for remote application log directory path query
> ------------------------------------------------------------------
>
>                 Key: YARN-10304
>                 URL: https://issues.apache.org/jira/browse/YARN-10304
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Andras Gyori
>            Assignee: Andras Gyori
>            Priority: Minor
>         Attachments: YARN-10304.001.patch
>
>
> The logic of the aggregated log directory path determination (currently based 
> on configuration) is scattered around the codebase and duplicated multiple 
> times. By providing a separate class for creating the path for a specific 
> user, it allows for an abstraction over this logic. This could be used in 
> place of the previously duplicated logic, moreover, we could provide an 
> endpoint to query this path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to