Re: About data file size and on-disk size

Todd Lipcon Wed, 23 Nov 2016 19:56:19 -0800

On Wed, Nov 23, 2016 at 2:30 PM, Adar Dembo <[email protected]> wrote:


> The difference between du with --apparent-size and without suggests
> that hole punching is working properly. Quick back of the envelope
> math shows that with 8133 containers, each container is just over 10G
> of "apparent size", which means nearly all of the containers were full
> at one point or another. That makes sense; it means that Kudu is
> generally writing to a small number of containers at any given time,
> but is filling them up over time.
>
> I took a look at the tablet disk estimation code and found that it
> excludes the size of all of the UNDO data blocks. I think this is
> because the size estimation is also used to drive decisions regarding
> delta compaction, but with an UPSERT-only workload like yours, we'd
> expect to see many UNDO data blocks over time as updated (and now
> historical) data is further and further compacted. I filed
> https://issues.apache.org/jira/browse/KUDU-1755 to track these issues.
> However, if this were the case, I'd expect the "tablet history GC"
> feature (new in Kudu 1.0) to remove old data that was mutated in an
> UPSERT. The default value for --tablet_history_max_age_sec (which
> controls how old the data must be before it is removed) is 15 minutes;
> have you changed the value of this flag? If not, could you look at
> your tserver log for the presence of major delta compactions? Look for
> references to MajorDeltaCompactionOp. If there aren't any, that means
> Kudu isn't getting opportunities to age out old data.
>

Worth noting that major delta compaction doesn't actually remove old UNDOs.
There are still some open JIRAs about scheduling tasks to age-off UNDOs,
but as it stands today, they only get collected during a normal compaction.

If the workload doesn't involve normal (merging) compactions, then UNDOs
won't be GCed at all. So, if you have a relatively static set of keys, and
are just updating them without causing many new inserts, this could be the
problem.


>
> It's also possible that simply not accounting for the composite index
> and bloom blocks (see KUDU-1755) is the reason. Take a look at
> https://issues.apache.org/jira/browse/KUDU-624?focusedCommentId=15165054&;
> page=com.atlassian.jira.plugin.system.issuetabpanels:
> comment-tabpanel#comment-15165054
> and run the same two commands to compare the total on-disk size of all
> the .data files to the number of bytes that the tserver is aware of.
> If the two numbers are close, it's a sign that, at the very least,
> Kudu is aware of and actively managing all that disk space (i.e.
> there's no "orphaned" data).
>

-Todd


>
>
>
> On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <[email protected]> wrote:
> > Hi,
> >
> >> Can you tell us a little bit more about your table, as well as any
> deleted
> >> tables you once had? How many columns did they have?
> >
> > I do not delete any tables before.
> > There is only one table with 12 columns(string and int) in the kudu
> cluster.
> > This cluster has three tablet servers.
> >
> > I use upsert operation to insert&update rows.
> >
> >> what version of Kudu are you using?
> >
> > kudu -version
> > kudu 1.0.0
> > revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191
> > build type RELEASE
> > built by jenkins at 16 Sep 2016 00:23:10 PST on
> > impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com
> > build id 2016-09-16_00-03-04
> >
> >> It's conceivable that there's a pathological case wherein each of the
> 8133
> >> data files is used, one at a time, to store data blocks, which would
> cause
> >> each to allocate 32 MB of disk space (totaling about 254G).
> >
> > Can the number of data files be decreased? The SSD disk is almost out of
> > space now.
> >
> >> Can you try running du with --apparent-size and compare the results?
> >
> > # du -sh /data/kudu/tserver/data/
> > 213G /data/kudu/tserver/data/
> > # du -sh --apparent-size  /data/kudu/tserver/data/
> > 81T /data/kudu/tserver/data/
> >
> >> What filesystem is being used for /data/kudu/tserver/data?
> >
> > # file -s /dev/vdb1
> > /dev/vdb1: Linux rev 1.0 ext4 filesystem data,
> > UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery)
> (extents)
> > (64bit) (large files) (huge files)
> >
> >
> > Thanks.
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Adar Dembo";<[email protected]>;
> > 发送时间: 2016年11月23日(星期三) 上午9:35
> > 收件人: "user"<[email protected]>;
> > 主题: Re: About data file size and on-disk size
> >
> > Also, if you haven't explicitly disabled it, each .data file is going
> > to preallocate 32 MB of data when used. It's conceivable that there's
> > a pathological case wherein each of the 8133 data files is used, one
> > at a time, to store data blocks, which would cause each to allocate 32
> > MB of disk space (totaling about 254G).
> >
> > Can you tell us a little bit more about your table, as well as any
> > deleted tables you once had? How many columns did they have? Also,
> > what version of Kudu are you using?
> >
> > On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <[email protected]> wrote:
> >> The files in /data/kudu/tserver/data are supposed to be sparse; that
> >> is, when Kudu decides to delete data, it'll punch a hole in one of
> >> those files, allowing the filesystem to reclaim the space in that
> >> hole. Yet, 'du' should reflect that because it measures real space
> >> usage. Can you try running du with --apparent-size and compare the
> >> results? If they're the same or similar, it suggests that the hole
> >> punching behavior isn't working properly. What distribution are you
> >> using? What filesystem is being used for /data/kudu/tserver/data?
> >>
> >> You should also check if maybe Kudu has failed to delete the data
> >> belonging to deleted tables. Has this tserver hosted any tablets
> >> belonging to tables that have since been deleted? Does the tserver log
> >> describe any errors when trying to delete the data belonging to those
> >> tablets?
> >>
> >> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <[email protected]> wrote:
> >>> Hi,
> >>>
> >>>
> >>> I have a table with 16 buckets over 3 physical machines. The tablet
> only
> >>> has
> >>> one replica.
> >>>
> >>>
> >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size.
> >>>
> >>> In one machine, there are total  8 tablets, so the on-disk size is
> about
> >>> 4.5*8 = 36G.
> >>>
> >>> however, in the same machine, the disk actually used is about 211G.
> >>>
> >>>
> >>> # du -sh /data/kudu/tserver/data/
> >>>
> >>> 210G /data/kudu/tserver/data/
> >>>
> >>>
> >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l
> >>>
> >>> 8133
> >>>
> >>>
> >>>
> >>> What’s the difference between data file and on-disk size.
> >>>
> >>> Can files in  /data/kudu/tserver/data/ be compacted, purged, or some of
> >>> them
> >>> be deleted?
> >>>
> >>>
> >>> Thanks very much.
> >>>
> >>>
> >>> BR
> >>>
> >>> Brooks
> >>>
> >>>
> >>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: About data file size and on-disk size

Reply via email to