Ming Ma commented on YARN-2082:

Folks, thanks for the feedbacks and other jiras; quite useful. We have been 
discussing internally how to mitigate log aggregation's impact on NN for some 
time. Here are more context and comments.

1. We discussed Vinod's suggestion of post processing before. The issue is that 
the write RPC hit on NN has already happened. Even more, post processing 
introduces more hits on NN.

2. Agree that HDFS needs to be more scalable. Some improvements have been done; 
some are still being worked on. My opinion is we should do both if possible, 
improve HDFS and improve how applications use HDFS.

3. Reducing the impact on NN in large cluster is our primary motivation, 
similar to what Jason mentioned in YARN-1440. We use the approach mentioned by 
[~jira.shegalov], [~ctrezzo] in YARN-221 to mitigate the issue.

4. YARN-1440 also suggested making it pluggable, but it seems the primary 
motivation is to make it easy for other tools to integrate with yarn logs. If 
that is the case, we have two requirements to make it make log aggregation 
pluggable, easy to integrate with other tools and reduce pressure on NN.

5. We discussed writing logs to key-value store before. At that point, we 
didn't go with that approach as it introduces yarn depend on external component 
like HBase. Based on recent discussion with [~jira.shegalov] and [~ctrezzo], it 
sounds like a reasonable approach.  a) timeline store has dependency on HBase 
and b) the size of logs is small and suitable for HBase scenario.

6. Regarding Zhijie's suggestion of using timeline store, that sounds like an 
interesting idea, if timeline store is highly available.

7. Regarding Steve' comment for the long running job support. It wasn't our 
primary goal; just want to make sure if we do end up changing log aggregation, 
the framework needs to support that scenario as well. If there is long running 
container and we rotate the logs, is there a plan to aggregate them before the 
container finishes?

> Support for alternative log aggregation mechanism
> -------------------------------------------------
>                 Key: YARN-2082
>                 URL: https://issues.apache.org/jira/browse/YARN-2082
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Ming Ma
> I will post a more detailed design later. Here is the brief summary and would 
> like to get early feedback.
> Problem Statement:
> Current implementation of log aggregation create one HDFS file for each 
> {application, nodemanager }. These files are relative small, in the range of 
> 1-2 MB. In a large cluster with lots of application and many nodemanagers, it 
> ends up creating lots of small files in HDFS. This creates pressure on HDFS 
> NN on the following ways.
> 1. It increases NN Memory size. It is mitigated by having history server 
> deletes old log files in HDFS.
> 2. Runtime RPC hit on HDFS. Each log aggregation file introduced several NN 
> RPCs such as create, getAdditionalBlock, complete, rename. When the cluster 
> is busy, such RPC hit has impact on NN performance.
> In addition, to support non-MR applications on YARN, we might need to support 
> aggregation for long running applications.
> Design choices:
> 1. Don't aggregate all the logs, as in YARN-221.
> 2. Create a dedicated HDFS namespace used only for log aggregation.
> 3. Write logs to some key-value store like HBase. HBase's RPC hit on NN will 
> be much less.
> 4. Decentralize the application level log aggregation to NMs. All logs for a 
> given application are aggregated first by a dedicated NM before it is pushed 
> to HDFS.
> 5. Have NM aggregate logs on a regular basis; each of these log files will 
> have data from different applications and there needs to be some index for 
> quick lookup.
> Proposal:
> 1. Make yarn log aggregation pluggable for both read and write path. Note 
> that Hadoop FileSystem provides an abstraction and we could ask alternative 
> log aggregator implement compatable FileSystem, but that seems to an overkill.
> 2. Provide a log aggregation plugin that write to HBase. The scheme design 
> needs to support efficient read on a per application as well as per 
> application+container basis; in addition, it shouldn't create hotspot in a 
> cluster where certain users might create more jobs than others. For example, 
> we can use hash($user+$applicationId} + containerid as the row key.

This message was sent by Atlassian JIRA

Reply via email to