[ 
https://issues.apache.org/jira/browse/YARN-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806806#comment-17806806
 ] 

ASF GitHub Bot commented on YARN-11644:
---------------------------------------

laysfire opened a new pull request, #6452:
URL: https://github.com/apache/hadoop/pull/6452

   …cation finished
   
   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   JIRA: [YARN-11644](https://issues.apache.org/jira/browse/YARN-11644) 
LogAggregationService can't upload log in time when application finished.
   Current implementation of AppLogAggregatorImpl is to do empty while loop 
until application finish.
   If too many applications running, it will block the following applications 
upload logs.
   Make AppLogAggregatorImpl return when application has not finished and 
resubmit itself to thread pool. It not only fixed the block issue but also 
shrink the thread pool size (default: 100).
   
   ### How was this patch tested?
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> LogAggregationService can't upload log in time when application finished
> ------------------------------------------------------------------------
>
>                 Key: YARN-11644
>                 URL: https://issues.apache.org/jira/browse/YARN-11644
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: log-aggregation
>    Affects Versions: 3.3.6
>            Reporter: Xie YiFan
>            Assignee: Xie YiFan
>            Priority: Minor
>         Attachments: image-2024-01-10-11-03-57-553.png
>
>
> LogAggregationService is responsible for uploading log to HDFS. It applies 
> thread pool to execute upload task.
> The workflow of upload log as follow:
>  # NM construct Applicaiton object when first container of a certain 
> application launch, then notify LogAggregationService to init 
> AppLogAggregationImpl.
>  # LogAggregationService submit AppLogAggregationImpl to task queue
>  # The idle worker of thread pool pulls AppLogAggregationImpl from task queue.
>  # AppLogAggregationImpl do while loop to check the application state, do 
> upload when application finished.
> Suppose the following scenario:
>  * LogAggregationService initialize thread pool with 4 threads.
>  * 4 long running applications start on this NM, so all threads are occupied 
> by aggregator.
>  * The next short application starts on this NM and quickly finish, but no 
> idle thread for this app to upload log.
> as a result, the following applications have to wait the previous 
> applications finish before uploading their logs.
> !image-2024-01-10-11-03-57-553.png|width=599,height=195!
> h4. Solution
> Change the spin behavior of AppLogAggregationImpl. If application has not 
> finished, just return to yield current thread and resubmit itself to executor 
> service. So the LogAggregationService can roll the task queue and the logs of 
> finished application can be uploaded immediately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to