Adar has told me it's fine to run the new 'kudu fs check' tool against a Kudu 1.2 server. It will require building locally, though.
- Dan On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert <[email protected]> wrote: > Hi Jason, > > First question: what filesystem and OS are you running? > > This has been an ongoing area of work; we fixed a few major issues in 1.2, > and a few more major issues in 1.3, and have a new tool ('kudu fs check') > that will be released in 1.4 to diagnose and fix further issues. In some > cases we are underestimating the true size of the data, and in some cases > we are keeping around data that could be cleaned up. I've included a list > of relevant JIRAs below if you are interested in specifics. It should be > possible to get early access to the 'kudu fs check' tool by compiling Kudu > locally, but I'm going to defer to Adar on that, since he's the resident > expert on the subject. > > KUDU-1755 <https://issues.apache.org/jira/browse/KUDU-1755> > KUDU-1853 <https://issues.apache.org/jira/browse/KUDU-1853> > KUDU-1856 <https://issues.apache.org/jira/browse/KUDU-1856> > KUDU-1769 <https://issues.apache.org/jira/browse/KUDU-1769> > > > > > On Wed, Apr 12, 2017 at 5:02 AM, Jason Heo <[email protected]> > wrote: > >> Hello. >> >> I'm using Apache Kudu 1.2 on CDH 1.2. >> >> I'm estimating how many servers needed to store my data. >> >> After loading my test data sets, total_kudu_on_disk_size_ >> across_kudu_replicas in chart library at CDH is 27.9TB whereas sum of `du >> -sh /path/to/tablet_data/data` on each node is 39.9TB which is 43% >> bigger than chart library. >> >> I also observed the same difference on my another Kudu test cluster. >> >> I'm curious this is normal and wanted to know there is a way to reduce >> physical file size. >> >> Thanks, >> >> Jason. >> >> >> >> >> >
