David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6968/19/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

PS19, Line 348:   // A rowset's space-occupying components are as follows:
              :   // - cfile set
              :   //   - base data
              :   //   - bloom file
              :   //   - ad hoc index
              :   // - delta files
              :   //   - UNDO deltas
              :   //   - REDO deltas
              : 
              :   // Return the size on-disk of this rowset, in bytes.
              :   uint64_t OnDiskSize() const OVERRIDE;
              : 
              :   // Return the on-disk size of this rowset's cfile set, in 
bytes.
              :   uint64_t CFileSetOnDiskSize() const;
              : 
              :   // Return the size on-disk of the base data (no deltas) in 
this rowset's cfile set, in bytes.
              :   uint64_t OnDiskDataSize() const OVERRIDE;
              : 
              :   // Return the size on-disk of this rowset's REDO deltas, in 
bytes.
              :   uint64_t RedoDeltaOnDiskSize() const;
              : 
              :   // Return the size on-disk of this rowset's cfile set's base 
data and this rowset's REDO deltas.
              :   uint64_t OnDiskDataSizeNoUndos() const OVERRIDE;
              : 
              :   size_t DeltaMemStoreSize() const OVERRIDE;
> I don't think it's worth it, for the following reasons:
my problem with these functions are they often return overlapping info and the 
names are confusing. for instance it takes some digging/thinking to be able to 
distinguish: CFileSetOnDiskSize() from OnDiskDataSize().

maybe just list the base ones then add some helpers to the struct (or let the 
caller decide what to add)?
regarding locking these are all taking the component lock, so it's not like 
they're interfering with reads or writes, just flushes/compactions (which are 
asynch, and in any case I don't believe these methods get called often enough 
for this to be problematic). 

So if you have a struct that has:

struct DiskRowSetSpace {
   int64_t delta_memstore_size;
   int64_t base_data_size;
   int64_t bloom_size;
   int64_t ad_hoc_index_size;
   int64_t redo_deltas_size;
   int64_t undo_deltas_size;

   .. and then whatever helpers to add stuff here.
}

then theres no duplicate info and the caller can just choose which info is 
relevant.


-- 
To view, visit http://gerrit.cloudera.org:8080/6968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315
Gerrit-PatchSet: 19
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>
Gerrit-HasComments: Yes

Reply via email to