[ 
https://issues.apache.org/jira/browse/KUDU-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-2263:
------------------------------
    Target Version/s: 1.8.0  (was: 1.7.0)

> Consider removing PB descriptors from PBC header
> ------------------------------------------------
>
>                 Key: KUDU-2263
>                 URL: https://issues.apache.org/jira/browse/KUDU-2263
>             Project: Kudu
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 1.7.0
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Looking at a cmeta file on disk, it seems the vast majority of the bytes are 
> in the supplemental header. We currently serialize the entire descriptor set 
> of the referenced file and its dependencies. This means that in each cmeta 
> file, we end up serializing even things like the definition of SchemaPB – 
> unnecessary to serialize the type at hand and quite large.
>  
> At a minimum we can prune the descriptors serialized to only include those 
> that are transitively referenced by the PB type in the file. I think we 
> should also consider doing away with this information entirely and instead 
> allow 'kudu pbc dump' to take a descriptor set as external input – it's easy 
> enough to generate a descriptor set from any kudu version source tree using 
> the protoc command line.
> One potential major improvement if we can get these files down to <4kb is 
> that we could atomically rewrite them in a single disk IO using O_DIRECT 
> rather than doing a rewrite-rename-fsync dance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to