Joep Rottinghuis commented on YARN-3908:

bq. In fact, I'm wondering if we should but info and events into a separate 
column family like what we did for configs/metrics?

In principle we should keep everything in the same column family (fewer store 
files) unless:
a) The items that we store require a different TTL, compression, etc. This is 
the case for metrics where we need a separate TTL.
b) The columns are rather significant in size, and in many queries they'll be 
skipped (and specifically not used in push-down predicate ie. column value 
filters etc). This is the case for configuration. If we have many queries to 
just retrieve info fields and we skip configs in these, then iterating over 
just the rows in the info column family will have a benefit of not needing to 
access the config store files.

Otherwise a separate column family just results in more store files and doesn't 
really gain us anything.
Given the current code setup, switching column family is almost trivial, so 
given that there are no functionality differences,  I'd say let's not even try 
to further optimize this until we have way more code in place.
Then we can run large batches of historical job history files and other inputs 
(perhaps porting data from ATS v1) and then we can see the potential benefit or 

The other reason to not do premature optimization is that I'm still thinking of 
adding a few more perf tweaks. Those would also just be performance 
optimizations, and not any functionality different, so also not a priority now. 
We should look at tuning all those things much later and together in a coherent 
way. Additional settings that we need to test are RPC compression, encoding of 
the store files and/or compression of the same.

In short, let's focus on completing functionality and then tinker with these 
settings later. 

> Bugs in HBaseTimelineWriterImpl
> -------------------------------
>                 Key: YARN-3908
>                 URL: https://issues.apache.org/jira/browse/YARN-3908
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Vrushali C
>         Attachments: YARN-3908-YARN-2928.001.patch, 
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
> fields of a timeline entity plus events. However, entity#info map is not 
> stored at all.
> 2 event#timestamp is also not persisted.

This message was sent by Atlassian JIRA

Reply via email to