[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496538#comment-14496538 ]
Junping Du commented on YARN-3411: ---------------------------------- Thanks [~vrushalic] for delivering the proposal and poc patch which is an excellent job! Some quick comments from walk through proposal: bq. Entity Table - primary key components-putting the UserID first helps to distribute writes across the regions in the hbase cluster. Pros: avoids single region hotspotting. Cons: connections would be open to several region servers during writes from per node ATS. Looks like we are try to get rid of region server hotspotting issues. I agree that this design could helps. However, this is still possible that specific user could submit much more applications than anyone else. In that case, the region hotspot issue will still appear. Isn't it? I think the more general way to solve this problem is making keys get salted with a prefix. Thoughts? bq. Entity Table - column families-config needs to be stored as key value, not as a blob to enable efficient key based querying based on config param name. storing it in a separate column family helps to avoid scanning over config while reading metrics and vice versa +1. This leverage strength of columnar database. We should get rid of storing any default value for key. However, this sounds challengable if TimelineClient only has a configuration object. bq. Entity Table - metrics are written to with an hbase cell timestamp set to top of the minute or top of the 5 minute interval or whatever is decided. This helps in timeseries storage and retrieval in case of querying at the entity level. Can we also let TimelineCollector do some aggregation of metrics in a similar time interval rather than sending to HBase/Pheonix for every metrics when it received? This may help to lease some pressure to backend. bq. Flow by application id table I am still think we should figure out some way to store application attempts info. The typical usecase here is: for some reason (like: bug or hardware capability reason), some flow/application's AM could always get failed more times than other flows/applications. Keeping this info can help us to track these issues. Isn't it? bq. flow summary daily table (aggregation table managed by Phoenix) - could be triggered via coprocessor with each put in flow table or a cron run once per day to aggregate for yesterday (with catchup functionality in case of backlog etc) Do each put in flow table sounds a little expensive especially when putting activity is very frequently. May be we should do some batch mode here? In addition, I think we can leverage per node TimelineCollector to do some first level aggregation which can help to relieve workload in backend. > [Storage implementation] explore the native HBase write schema for storage > -------------------------------------------------------------------------- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Vrushali C > Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)