[
https://issues.apache.org/jira/browse/YARN-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914665#comment-13914665
]
Billie Rinaldi commented on YARN-1730:
--------------------------------------
bq. Why do we need separate start-time caches for read and write calls?
The write cache is essential for having a good write throughput, so you don't
have to hit disk to do a lookup each time you do a write. The size of the
write cache should be the maximum number of active entities (entities that are
still receiving writes). This may vary, so [~zjshen] suggested making it
configurable.
The read cache is just there to improve read performance, so you don't have to
hit disk twice per entity. This cache would have the most recently queried
entities instead of the most recently written entities. I add things to the
read cache and the write cache when writing because more recently written
things are also generally more likely to be queried. The useful size of this
cache can be as big as you want.
bq. Shouldn't we do the write-locking at the put(..) API level itself and not
just creating the start-time? Or at-least when the actual write happens for a
given entity?
It's a good question; we could decide to do this. Locking when determining the
start time is essential so that two writes for the same entity can't come up
with different start times. The writes to leveldb in a put are atomic, so that
part isn't an issue. The question is whether we care about the following:
writes 1 and 2 come in, write 1 sets the start time for an entity, 2 uses that
start time, but 2's put completes before 1's.
> Leveldb timeline store needs simple write locking
> -------------------------------------------------
>
> Key: YARN-1730
> URL: https://issues.apache.org/jira/browse/YARN-1730
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Billie Rinaldi
> Assignee: Billie Rinaldi
> Attachments: YARN-1730.1.patch, YARN-1730.2.patch, YARN-1730.3.patch,
> YARN-1730.4.patch, YARN-1730.5.patch
>
>
> The actual data writes are performed atomically in a batch, but a lock should
> be held while identifying a start time for the entity, which precedes every
> write.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)