On Wed, Nov 23, 2016 at 2:30 PM, Adar Dembo <[email protected]> wrote:
> The difference between du with --apparent-size and without suggests > that hole punching is working properly. Quick back of the envelope > math shows that with 8133 containers, each container is just over 10G > of "apparent size", which means nearly all of the containers were full > at one point or another. That makes sense; it means that Kudu is > generally writing to a small number of containers at any given time, > but is filling them up over time. > > I took a look at the tablet disk estimation code and found that it > excludes the size of all of the UNDO data blocks. I think this is > because the size estimation is also used to drive decisions regarding > delta compaction, but with an UPSERT-only workload like yours, we'd > expect to see many UNDO data blocks over time as updated (and now > historical) data is further and further compacted. I filed > https://issues.apache.org/jira/browse/KUDU-1755 to track these issues. > However, if this were the case, I'd expect the "tablet history GC" > feature (new in Kudu 1.0) to remove old data that was mutated in an > UPSERT. The default value for --tablet_history_max_age_sec (which > controls how old the data must be before it is removed) is 15 minutes; > have you changed the value of this flag? If not, could you look at > your tserver log for the presence of major delta compactions? Look for > references to MajorDeltaCompactionOp. If there aren't any, that means > Kudu isn't getting opportunities to age out old data. > Worth noting that major delta compaction doesn't actually remove old UNDOs. There are still some open JIRAs about scheduling tasks to age-off UNDOs, but as it stands today, they only get collected during a normal compaction. If the workload doesn't involve normal (merging) compactions, then UNDOs won't be GCed at all. So, if you have a relatively static set of keys, and are just updating them without causing many new inserts, this could be the problem. > > It's also possible that simply not accounting for the composite index > and bloom blocks (see KUDU-1755) is the reason. Take a look at > https://issues.apache.org/jira/browse/KUDU-624?focusedCommentId=15165054& > page=com.atlassian.jira.plugin.system.issuetabpanels: > comment-tabpanel#comment-15165054 > and run the same two commands to compare the total on-disk size of all > the .data files to the number of bytes that the tserver is aware of. > If the two numbers are close, it's a sign that, at the very least, > Kudu is aware of and actively managing all that disk space (i.e. > there's no "orphaned" data). > -Todd > > > > On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <[email protected]> wrote: > > Hi, > > > >> Can you tell us a little bit more about your table, as well as any > deleted > >> tables you once had? How many columns did they have? > > > > I do not delete any tables before. > > There is only one table with 12 columns(string and int) in the kudu > cluster. > > This cluster has three tablet servers. > > > > I use upsert operation to insert&update rows. > > > >> what version of Kudu are you using? > > > > kudu -version > > kudu 1.0.0 > > revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191 > > build type RELEASE > > built by jenkins at 16 Sep 2016 00:23:10 PST on > > impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com > > build id 2016-09-16_00-03-04 > > > >> It's conceivable that there's a pathological case wherein each of the > 8133 > >> data files is used, one at a time, to store data blocks, which would > cause > >> each to allocate 32 MB of disk space (totaling about 254G). > > > > Can the number of data files be decreased? The SSD disk is almost out of > > space now. > > > >> Can you try running du with --apparent-size and compare the results? > > > > # du -sh /data/kudu/tserver/data/ > > 213G /data/kudu/tserver/data/ > > # du -sh --apparent-size /data/kudu/tserver/data/ > > 81T /data/kudu/tserver/data/ > > > >> What filesystem is being used for /data/kudu/tserver/data? > > > > # file -s /dev/vdb1 > > /dev/vdb1: Linux rev 1.0 ext4 filesystem data, > > UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery) > (extents) > > (64bit) (large files) (huge files) > > > > > > Thanks. > > > > > > ------------------ 原始邮件 ------------------ > > 发件人: "Adar Dembo";<[email protected]>; > > 发送时间: 2016年11月23日(星期三) 上午9:35 > > 收件人: "user"<[email protected]>; > > 主题: Re: About data file size and on-disk size > > > > Also, if you haven't explicitly disabled it, each .data file is going > > to preallocate 32 MB of data when used. It's conceivable that there's > > a pathological case wherein each of the 8133 data files is used, one > > at a time, to store data blocks, which would cause each to allocate 32 > > MB of disk space (totaling about 254G). > > > > Can you tell us a little bit more about your table, as well as any > > deleted tables you once had? How many columns did they have? Also, > > what version of Kudu are you using? > > > > On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <[email protected]> wrote: > >> The files in /data/kudu/tserver/data are supposed to be sparse; that > >> is, when Kudu decides to delete data, it'll punch a hole in one of > >> those files, allowing the filesystem to reclaim the space in that > >> hole. Yet, 'du' should reflect that because it measures real space > >> usage. Can you try running du with --apparent-size and compare the > >> results? If they're the same or similar, it suggests that the hole > >> punching behavior isn't working properly. What distribution are you > >> using? What filesystem is being used for /data/kudu/tserver/data? > >> > >> You should also check if maybe Kudu has failed to delete the data > >> belonging to deleted tables. Has this tserver hosted any tablets > >> belonging to tables that have since been deleted? Does the tserver log > >> describe any errors when trying to delete the data belonging to those > >> tablets? > >> > >> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <[email protected]> wrote: > >>> Hi, > >>> > >>> > >>> I have a table with 16 buckets over 3 physical machines. The tablet > only > >>> has > >>> one replica. > >>> > >>> > >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk size. > >>> > >>> In one machine, there are total 8 tablets, so the on-disk size is > about > >>> 4.5*8 = 36G. > >>> > >>> however, in the same machine, the disk actually used is about 211G. > >>> > >>> > >>> # du -sh /data/kudu/tserver/data/ > >>> > >>> 210G /data/kudu/tserver/data/ > >>> > >>> > >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l > >>> > >>> 8133 > >>> > >>> > >>> > >>> What’s the difference between data file and on-disk size. > >>> > >>> Can files in /data/kudu/tserver/data/ be compacted, purged, or some of > >>> them > >>> be deleted? > >>> > >>> > >>> Thanks very much. > >>> > >>> > >>> BR > >>> > >>> Brooks > >>> > >>> > >>> > -- Todd Lipcon Software Engineer, Cloudera
