[
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630275#comment-14630275
]
Joep Rottinghuis commented on YARN-3908:
----------------------------------------
bq. In fact, I'm wondering if we should but info and events into a separate
column family like what we did for configs/metrics?
In principle we should keep everything in the same column family (fewer store
files) unless:
a) The items that we store require a different TTL, compression, etc. This is
the case for metrics where we need a separate TTL.
b) The columns are rather significant in size, and in many queries they'll be
skipped (and specifically not used in push-down predicate ie. column value
filters etc). This is the case for configuration. If we have many queries to
just retrieve info fields and we skip configs in these, then iterating over
just the rows in the info column family will have a benefit of not needing to
access the config store files.
Otherwise a separate column family just results in more store files and doesn't
really gain us anything.
Given the current code setup, switching column family is almost trivial, so
given that there are no functionality differences, I'd say let's not even try
to further optimize this until we have way more code in place.
Then we can run large batches of historical job history files and other inputs
(perhaps porting data from ATS v1) and then we can see the potential benefit or
downside.
The other reason to not do premature optimization is that I'm still thinking of
adding a few more perf tweaks. Those would also just be performance
optimizations, and not any functionality different, so also not a priority now.
We should look at tuning all those things much later and together in a coherent
way. Additional settings that we need to test are RPC compression, encoding of
the store files and/or compression of the same.
In short, let's focus on completing functionality and then tinker with these
settings later.
> Bugs in HBaseTimelineWriterImpl
> -------------------------------
>
> Key: YARN-3908
> URL: https://issues.apache.org/jira/browse/YARN-3908
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Zhijie Shen
> Assignee: Vrushali C
> Attachments: YARN-3908-YARN-2928.001.patch,
> YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch
>
>
> 1. In HBaseTimelineWriterImpl, the info column family contains the basic
> fields of a timeline entity plus events. However, entity#info map is not
> stored at all.
> 2 event#timestamp is also not persisted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)