[ 
https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387707#comment-16387707
 ] 

Xuan Gong edited comment on YARN-7952 at 3/6/18 12:52 PM:
----------------------------------------------------------

Thanks for the review. [~leftnoteasy]
{quote}It looks like pullLock should be a readLock and updateLocker should be a 
writeLock since pull doesn't change internal data, correct?
{quote}
No, it is not. Let me try to explain this. For one application in one NM, we 
only have one AppLogAggregator which would do the log aggregation which means 
it would call updateLogAggregationStatus. And we should allow different 
AppLogAggregator to update their own log aggregation status. In this case, I 
think that the readLock can be used here.

But for the pull call, we need to make sure when we do the pull, there is no 
update. (or when we do update, we should block pull)

bq.  For overall workflow of this class, IMO it should be:
 This is right.(a. When log aggregation starts, application will be added to 
{{NMLogAggregationStatusTracker}} with timestamp.)
 This is not correct (b. When RM acknowledges application finished, application 
will be removed from the {{NMLogAggregationStatusTracker) Because as I 
described in option One, currently, RM will not figure out the log aggregation 
status automatically by itself until some CLI or web API call happens. So, RM 
will not know whether the status is final immediately that is why NM would 
check and remove it periodically without RM.}}

This is correct.(c. If configured timeout reaches, application will be removed 
from the {{NMLogAggregationStatusTracker}}.)

 
{quote} Since the currentTime always almost equal to updateTime. The if 
statement should be false, correct? And I think if application is null in NM 
side, we should not add it to the track, correct?
{quote}
This if is mainly for handling the mis-order log aggregation update call. So, 
first of all, the currentTime should be always larger than updateTime. If this 
is the mis-order log aggregation update call, and it is out of timeOut period, 
we should ignore it.

 
{quote}  I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, 
the RM will timeout log aggregation status after configured timeout. What's the 
purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue 
potentially?
{quote}
 

This check is also used to handle the mis-order log aggregation update call. We 
should always handle the latest update call. If the call is from previous, we 
need to ignore it.

 
{quote}  the {{long updateTime}} parameter is not necessary, can always use 
System.currentTime instead.
{quote}
 

We need, See my previous comment.

 

For other re-name comment, i will handle them together after your next comment.

 

  


was (Author: xgong):
Thanks for the review. [~leftnoteasy]

bq. It looks like pullLock should be a readLock and updateLocker should be a 
writeLock since pull doesn't change internal data, correct?

No, it is not. Let me try to explain this. For one application in one NM, we 
only have one AppLogAggregator which would do the log aggregation which means 
it would call updateLogAggregationStatus. And we should allow different 
AppLogAggregator to update their own log aggregation status. In this case, I 
think that the readLock can be used here.

But for the pull call, we need to make sure when we do the pull, there is no 
update. (or when we do update, we should block pull)

bq .For overall workflow of this class, IMO it should be:
This is right.(a. When log aggregation starts, application will be added to 
{{NMLogAggregationStatusTracker}} with timestamp.)
This is not correct (b. When RM acknowledges application finished, application 
will be removed from the {{NMLogAggregationStatusTracker) Because as I 
described in option One, currently, RM will not figure out the log aggregation 
status automatically by itself until some CLI or web API call happens. So, RM 
will not know whether the status is final immediately that is why NM would 
check and remove it periodically without RM.}}

This is correct.(c. If configured timeout reaches, application will be removed 
from the {{NMLogAggregationStatusTracker}}.)

 

bq. Since the currentTime always almost equal to updateTime. The if statement 
should be false, correct? And I think if application is null in NM side, we 
should not add it to the track, correct?

This if is mainly for handling the mis-order log aggregation update call. So, 
first of all, the currentTime should be always larger than updateTime. If this 
is the mis-order log aggregation update call, and it is out of timeOut period, 
we should ignore it.

 

bq.  I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the 
RM will timeout log aggregation status after configured timeout. What's the 
purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue 
potentially?

 

This check is also used to handle the mis-order log aggregation update call. We 
should always handle the latest update call. If the call is from previous, we 
need to ignore it.

 

bq.  the {{long updateTime}} parameter is not necessary, can always use 
System.currentTime instead.

 

We need, See my previous comment.

 

For other re-name comment, i will handle them together after your next comment.

 

  

> Find a way to persist the log aggregation status
> ------------------------------------------------
>
>                 Key: YARN-7952
>                 URL: https://issues.apache.org/jira/browse/YARN-7952
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Major
>         Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch
>
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In 
> YARN-4946: RM should write out Aggregated Log Completion file flag next to 
> logs, we have a discussion on how we can get the log aggregation status: make 
> a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way 
> to persist the log aggregation status first. This ticket is used to track the 
> working progress for this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to