[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484363#comment-14484363
 ] 

Zhijie Shen edited comment on YARN-3448 at 4/7/15 11:56 PM:
------------------------------------------------------------

Jonathan, I've several high questions about the design and the implementation:

bq. Split the 5 sections of the leveldb database (domain, owner, start time, 
entity, index) into 5 separate databases.

According to the official [document|https://github.com/google/leveldb], LevelDb 
a single process (possibly multi-threaded). Therefore, instead of 5 separate 
(logic) tables, 5 separate databases is used to increase concurrency, isn't it? 

However, this approach may raise the inconsistency issue. For example, if I 
upload an entity with primary filter defined, I may run into a scenario that 
some I/O exception happens when timeline server tries to write into entity db, 
while the index record is persisted without any problem. In scenario, the 
entity is searchable by primary filter, but cannot be got by its identifier.

bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
sections 4:1 ration (index to entity) at least for tez.

If I understand it correct, ownerdb can be treated as the secondary index of 
domaindb. If we want to lookup for the domains of one owner, we have two steps: 
1) get all domain IDs from ownerdb and then 2) pull each individual domains 
from domaindb.

I think we could adopt the similar approach for entitydb and indexdb. Instead 
of a full copy of entity content in indexdb, we could just record the entity 
identifier there, and do two-step lookup to answer the query. By doing this, we 
should be able to significantly shrink indexdb size, and improve write 
performance. In contrast, the previous leveldb index implementation seems to 
optimize towards the query.

3. I'm wondering if we need a separate configuration of rolling period or we 
should use ttl as the rolling period. The reason is if we set ttl smaller than 
the rolling period, in the most recent database, there will still exist old 
data. Therefore, we still need the deletion thread to remove these 
entities/index entries, or the query has to exclude them from result set.

On the other side, it may be also not good to set ttl greater than rolling 
period. This is because if period now is smaller than ttl, we still need to 
wait until ttl to delete the database. Therefore, setting small rolling period 
along won't shrink the total database size if ttl is kept large.

Combining the two points above, it seems to be better to let rolling period = 
ttl. And I think it may simplify the implementation with it, because we know 
current database will have all the live data, and previous databases are sure 
to have the old data to be discarded. Thoughts?

4. I assume that {{roll()}} method is going to be processed quickly, right? 
Otherwise, during the transit state of rolling a database, write performance 
will degrade somehow.


was (Author: zjshen):
Jonathan, I've several high questions about the design:

bq. Split the 5 sections of the leveldb database (domain, owner, start time, 
entity, index) into 5 separate databases.

According to the official [document|https://github.com/google/leveldb], LevelDb 
a single process (possibly multi-threaded). Therefore, instead of 5 separate 
(logic) tables, 5 separate databases is used to increase concurrency, isn't it? 

However, this approach may raise the inconsistency issue. For example, if I 
upload an entity with primary filter defined, I may run into a scenario that 
some I/O exception happens when timeline server tries to write into entity db, 
while the index record is persisted without any problem. In scenario, the 
entity is searchable by primary filter, but cannot be got by its identifier.

bq. Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
sections 4:1 ration (index to entity) at least for tez.

If I understand it correct, ownerdb can be treated as the secondary index of 
domaindb. If we want to lookup for the domains of one owner, we have two steps: 
1) get all domain IDs from ownerdb and then 2) pull each individual domains 
from domaindb.

I think we could adopt the similar approach for entitydb and indexdb. Instead 
of a full copy of entity content in indexdb, we could just record the entity 
identifier there, and do two-step lookup to answer the query. By doing this, we 
should be able to significantly shrink indexdb size, and improve write 
performance. In contrast, the previous leveldb index implementation seems to 
optimize towards the query.

3. I'm wondering if we need a separate configuration of rolling period or we 
should use ttl as the rolling period. The reason is if we set ttl smaller than 
the rolling period, in the most recent database, there will still exist the old 
data. Therefore, we still need the deletion thread to remove these 
entities/index entries, or the query has to exclude them from result set.

On the other side, it may be also not good to set ttl greater than rolling 
period. This is because if period now is smaller than ttl, we still need to 
wait until ttl to delete the database. Therefore, setting small rolling period 
along won't shrink the total database size if ttl is kept large.

Combined the two points above, it seems to be better to letter period = ttl. 
And I think it may simplify the implementation with it, because we know current 
database will have all the live data, and previous databases are sure to have 
the old data to be discarded. Thoughts?

4. I assume that {{roll()}} method is going to be processed quickly, right? 
Otherwise, during the transit state of rolling a database, write performance 
will degrade somehow.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> ------------------------------------------------------
>
>                 Key: YARN-3448
>                 URL: https://issues.apache.org/jira/browse/YARN-3448
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: YARN-3448.1.patch, YARN-3448.2.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to