Todd Lipcon has posted comments on this change.

Change subject: KUDU-1549: compact LBM container metadata files at startup
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6826/2/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

PS2, Line 87: DEFINE_double(log_container_live_metadata_before_compact_ratio, 
0.10,
            :               "Desired ratio of live block metadata in log 
containers. If a "
            :               "container's live to total block ratio dips below 
this value, "
            :               "the container's metadata file will be compacted at 
startup.");
> I'll tag it advanced. I don't think experimental is appropriate given that 
hmm, I guess i was thinking experimental because who knows if a ratio is even 
the right metric? It's pretty internal-facing, so seems like the kind of thing 
we may never want to allow a user to tweak? (is it reasonable to have something 
be experimental forever?)

As for a better value, I was thinking something like 0.5 -- it's worth 
compacting if we think we can reduce startup time by at least 50%, perhaps?

Do we know how to map a metadata file size/block-count to a time, 
approximately? ie is the timing O(size) or O(record count)?


http://gerrit.cloudera.org:8080/#/c/6826/2/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

PS2, Line 295:   // 3. Containers in 'low_live_block_containers' will have 
their metadata
             :   //    files compacted.
             :  
> I went back and forth on this. I do like the idea of not making any on-disk
hmm, trying to follow this suggestion. So, you're saying that since we already 
have the list of live blocks for each container, we could just dump that out 
rather than rereading the original block *records*?

Also maybe I misread the original code. Is the vector<BlockRecordPB> here just 
the remaining _live_ blocks? If so, maybe it's not so bad, since it's bounded 
at 2x the normal "steady state" usage of the LBM, and we're always assuming 
that is a small fraction of overall TS heap. In the first read through I was 
thinking this woudl be _all_ records, not just te live ones.


-- 
To view, visit http://gerrit.cloudera.org:8080/6826
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I981f7d9e7eb96fb40cef30ad96c5960b72d07756
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to