Hi, Sorry for the late response here -- things got busy during the holidays.
Yes, the upcoming 1.2 release will include this fix. -Todd On Mon, Dec 12, 2016 at 1:41 AM, 阿香 <1654407...@qq.com> wrote: > > Todd, > > Thanks. > Have not yet trying to get back the empty space from the container. > I will try it later this month. > > By the way, when will kudu's next release come out? Will 1.2 release in > mid-January include this fix? > > Thanks. > BR > -GU > > > ------------------ 原始邮件 ------------------ > *发件人:* "Todd Lipcon";<t...@cloudera.com>; > *发送时间:* 2016年12月12日(星期一) 下午2:26 > *收件人:* "user"<user@kudu.apache.org>; > *主题:* Re: About data file size and on-disk size > > Just a follow-up note here: if you did end up cherry-picking that change, > you should also be sure to cherry-pick > faa587c639aa9e5dcf3fac04259f46ba1921140a > to avoid a potential data loss bug. > > On Wed, Nov 30, 2016 at 9:00 AM, Adar Dembo <a...@cloudera.com> wrote: > >> If you're comfortable rebuilding Kudu from source, you can apply >> https://gerrit.cloudera.org/#/c/5254, rebuild the tserver, and restart >> it. Once the tserver is done restarting, it should trim the empty space off >> of the ends of all of your container data files. >> >> Otherwise, you'll have to wait until the next Kudu release. >> >> On Tue, Nov 29, 2016 at 5:48 PM, 阿香 <1654407...@qq.com> wrote: >> >>> >>> Hi Todd, >>> >>> Thanks. >>> From the results, I think you successfully got the bug. >>> By the way, can I get back the wasted disk space? >>> >>> >>> # du -sm 542d51e55d524034a5274600c31abd11.data >>> 29 542d51e55d524034a5274600c31abd11.data >>> >>> # filefrag -v -b 542d51e55d524034a5274600c31abd11.data >>> >>> filefrag: -b needs a blocksize option, assuming 1024-byte blocks. >>> Filesystem type is: ef53 >>> File size of 542d51e55d524034a5274600c31abd11.data is 10767867904 >>> (10515496 blocks of 1024 bytes) >>> ext: logical_offset: physical_offset: length: expected: >>> flags: >>> 0: 10486144..10497543: 278086588.. 278097987: 11400: >>> unwritten >>> 1: 10497544..10514191: 278691588.. 278708235: 16648: 278097988: >>> unwritten >>> 2: 10514192..10514199: 279581160.. 279581167: 8: 278708236: >>> unwritten >>> 3: 10514200..10514203: 280291284.. 280291287: 4: 279581168: >>> unwritten >>> 4: 10514204..10514227: 280652252.. 280652275: 24: 280291288: >>> unwritten >>> 5: 10514228..10515259: 281289216.. 281290247: 1032: 280652276: >>> unwritten >>> 6: 10515260..10515263: 282068816.. 282068819: 4: 281290248: >>> unwritten >>> 7: 10515264..10515495: 283429184.. 283429415: 232: 282068820: >>> unwritten,eof >>> 542d51e55d524034a5274600c31abd11.data: 8 extents found >>> >>> # echo $[11400 + 16648 + 1032 + 232] >>> 29312 >>> >>> # ls -l 542d51e55d524034a5274600c31abd11.data >>> -rw-r--r-- 1 kudu kudu 10767867904 Oct 26 06:51 >>> 542d51e55d524034a5274600c31abd11.data >>> >>> # ls -lh 542d51e55d524034a5274600c31abd11.data >>> -rw-r--r-- 1 kudu kudu 11G Oct 26 06:51 542d51e55d524034a5274600c31abd >>> 11.data >>> >>> BR >>> -GU >>> >>> ------------------ 原始邮件 ------------------ >>> *发件人:* "Todd Lipcon";<t...@cloudera.com>; >>> *发送时间:* 2016年11月29日(星期二) 凌晨4:15 >>> *收件人:* "user"<user@kudu.apache.org>; >>> *主题:* Re: About data file size and on-disk size >>> >>> Hi Xiang, >>> >>> Adar and I did some investigation and came up with a likely cause: >>> https://issues.apache.org/jira/browse/KUDU-1764 >>> >>> Can you please try the following on one of your .data files? (preferably >>> one which has a modification time a few weeks old?) >>> >>> $ du -sm abcdef.data >>> $ filefrag -v -b abcdef.data >>> $ ls -l abcdef.data >>> >>> We can use this to confirm whether you are hitting the same bug we just >>> discovered. >>> >>> Thanks >>> -Todd >>> >>> On Thu, Nov 24, 2016 at 6:57 AM, 阿香 <1654407...@qq.com> wrote: >>> >>>> >>>> > If the workload doesn't involve normal (merging) compactions, then >>>> UNDOs won't be GCed at all. So, if you have a relatively static set of >>>> keys, and are just updating them without causing many new inserts, this >>>> could be the problem. >>>> >>>> The keys are not relatively static and increasing all the time. >>>> The key of the table is a uuid string with hash partition (16 buckets). >>>> Currently there are about 1000,000,000 rows in this cluster. >>>> >>>> Will these big data files increase the latency time of the upsert >>>> operation? >>>> >>>> I saw the metrics like following by kudu web UI. >>>> >>>> { >>>> "name": "write_op_duration_client_propagated_consistency", >>>> "total_count": 8568729, >>>> "min": 116, >>>> "mean": 2499.56, >>>> "percentile_75": 2176, >>>> "percentile_95": 7680, >>>> "percentile_99": 29568, >>>> "percentile_99_9": 78336, >>>> "percentile_99_99": 123904, >>>> "max": 1562967, >>>> "total_sum": 21418050385 >>>> } >>>> >>>> >>>> >>>> >>>> ------------------ 原始邮件 ------------------ >>>> *发件人:* "Todd Lipcon";<t...@cloudera.com>; >>>> *发送时间:* 2016年11月24日(星期四) 中午11:55 >>>> *收件人:* "user"<user@kudu.apache.org>; >>>> *主题:* Re: About data file size and on-disk size >>>> >>>> On Wed, Nov 23, 2016 at 2:30 PM, Adar Dembo <a...@cloudera.com> wrote: >>>> >>>>> The difference between du with --apparent-size and without suggests >>>>> that hole punching is working properly. Quick back of the envelope >>>>> math shows that with 8133 containers, each container is just over 10G >>>>> of "apparent size", which means nearly all of the containers were full >>>>> at one point or another. That makes sense; it means that Kudu is >>>>> generally writing to a small number of containers at any given time, >>>>> but is filling them up over time. >>>>> >>>>> I took a look at the tablet disk estimation code and found that it >>>>> excludes the size of all of the UNDO data blocks. I think this is >>>>> because the size estimation is also used to drive decisions regarding >>>>> delta compaction, but with an UPSERT-only workload like yours, we'd >>>>> expect to see many UNDO data blocks over time as updated (and now >>>>> historical) data is further and further compacted. I filed >>>>> https://issues.apache.org/jira/browse/KUDU-1755 to track these issues. >>>>> However, if this were the case, I'd expect the "tablet history GC" >>>>> feature (new in Kudu 1.0) to remove old data that was mutated in an >>>>> UPSERT. The default value for --tablet_history_max_age_sec (which >>>>> controls how old the data must be before it is removed) is 15 minutes; >>>>> have you changed the value of this flag? If not, could you look at >>>>> your tserver log for the presence of major delta compactions? Look for >>>>> references to MajorDeltaCompactionOp. If there aren't any, that means >>>>> Kudu isn't getting opportunities to age out old data. >>>>> >>>> >>>> Worth noting that major delta compaction doesn't actually remove old >>>> UNDOs. There are still some open JIRAs about scheduling tasks to age-off >>>> UNDOs, but as it stands today, they only get collected during a normal >>>> compaction. >>>> >>>> If the workload doesn't involve normal (merging) compactions, then >>>> UNDOs won't be GCed at all. So, if you have a relatively static set of >>>> keys, and are just updating them without causing many new inserts, this >>>> could be the problem. >>>> >>>> >>>>> >>>>> It's also possible that simply not accounting for the composite index >>>>> and bloom blocks (see KUDU-1755) is the reason. Take a look at >>>>> https://issues.apache.org/jira/browse/KUDU-624?focusedCommen >>>>> tId=15165054&page=com.atlassian.jira.plugin.system.issuetabp >>>>> anels:comment-tabpanel#comment-15165054 >>>>> and run the same two commands to compare the total on-disk size of all >>>>> the .data files to the number of bytes that the tserver is aware of. >>>>> If the two numbers are close, it's a sign that, at the very least, >>>>> Kudu is aware of and actively managing all that disk space (i.e. >>>>> there's no "orphaned" data). >>>>> >>>> >>>> -Todd >>>> >>>> >>>>> >>>>> >>>>> >>>>> On Wed, Nov 23, 2016 at 12:39 AM, 阿香 <1654407...@qq.com> wrote: >>>>> > Hi, >>>>> > >>>>> >> Can you tell us a little bit more about your table, as well as any >>>>> deleted >>>>> >> tables you once had? How many columns did they have? >>>>> > >>>>> > I do not delete any tables before. >>>>> > There is only one table with 12 columns(string and int) in the kudu >>>>> cluster. >>>>> > This cluster has three tablet servers. >>>>> > >>>>> > I use upsert operation to insert&update rows. >>>>> > >>>>> >> what version of Kudu are you using? >>>>> > >>>>> > kudu -version >>>>> > kudu 1.0.0 >>>>> > revision 6f6e49ca98c3e3be7d81f88ab8a0f9173959b191 >>>>> > build type RELEASE >>>>> > built by jenkins at 16 Sep 2016 00:23:10 PST on >>>>> > impala-ec2-pkg-centos-7-0dc0.vpc.cloudera.com >>>>> > build id 2016-09-16_00-03-04 >>>>> > >>>>> >> It's conceivable that there's a pathological case wherein each of >>>>> the 8133 >>>>> >> data files is used, one at a time, to store data blocks, which >>>>> would cause >>>>> >> each to allocate 32 MB of disk space (totaling about 254G). >>>>> > >>>>> > Can the number of data files be decreased? The SSD disk is almost >>>>> out of >>>>> > space now. >>>>> > >>>>> >> Can you try running du with --apparent-size and compare the results? >>>>> > >>>>> > # du -sh /data/kudu/tserver/data/ >>>>> > 213G /data/kudu/tserver/data/ >>>>> > # du -sh --apparent-size /data/kudu/tserver/data/ >>>>> > 81T /data/kudu/tserver/data/ >>>>> > >>>>> >> What filesystem is being used for /data/kudu/tserver/data? >>>>> > >>>>> > # file -s /dev/vdb1 >>>>> > /dev/vdb1: Linux rev 1.0 ext4 filesystem data, >>>>> > UUID=9f95ba79-f387-42be-a43f-d1421c83e2e5 (needs journal recovery) >>>>> (extents) >>>>> > (64bit) (large files) (huge files) >>>>> > >>>>> > >>>>> > Thanks. >>>>> > >>>>> > >>>>> > ------------------ 原始邮件 ------------------ >>>>> > 发件人: "Adar Dembo";<a...@cloudera.com>; >>>>> > 发送时间: 2016年11月23日(星期三) 上午9:35 >>>>> > 收件人: "user"<user@kudu.apache.org>; >>>>> > 主题: Re: About data file size and on-disk size >>>>> > >>>>> > Also, if you haven't explicitly disabled it, each .data file is going >>>>> > to preallocate 32 MB of data when used. It's conceivable that there's >>>>> > a pathological case wherein each of the 8133 data files is used, one >>>>> > at a time, to store data blocks, which would cause each to allocate >>>>> 32 >>>>> > MB of disk space (totaling about 254G). >>>>> > >>>>> > Can you tell us a little bit more about your table, as well as any >>>>> > deleted tables you once had? How many columns did they have? Also, >>>>> > what version of Kudu are you using? >>>>> > >>>>> > On Tue, Nov 22, 2016 at 11:39 AM, Adar Dembo <a...@cloudera.com> >>>>> wrote: >>>>> >> The files in /data/kudu/tserver/data are supposed to be sparse; that >>>>> >> is, when Kudu decides to delete data, it'll punch a hole in one of >>>>> >> those files, allowing the filesystem to reclaim the space in that >>>>> >> hole. Yet, 'du' should reflect that because it measures real space >>>>> >> usage. Can you try running du with --apparent-size and compare the >>>>> >> results? If they're the same or similar, it suggests that the hole >>>>> >> punching behavior isn't working properly. What distribution are you >>>>> >> using? What filesystem is being used for /data/kudu/tserver/data? >>>>> >> >>>>> >> You should also check if maybe Kudu has failed to delete the data >>>>> >> belonging to deleted tables. Has this tserver hosted any tablets >>>>> >> belonging to tables that have since been deleted? Does the tserver >>>>> log >>>>> >> describe any errors when trying to delete the data belonging to >>>>> those >>>>> >> tablets? >>>>> >> >>>>> >> On Tue, Nov 22, 2016 at 7:19 AM, 阿香 <1654407...@qq.com> wrote: >>>>> >>> Hi, >>>>> >>> >>>>> >>> >>>>> >>> I have a table with 16 buckets over 3 physical machines. The >>>>> tablet only >>>>> >>> has >>>>> >>> one replica. >>>>> >>> >>>>> >>> >>>>> >>> Tablets Web UI shows that each tablet has around ~4.5G on-disk >>>>> size. >>>>> >>> >>>>> >>> In one machine, there are total 8 tablets, so the on-disk size is >>>>> about >>>>> >>> 4.5*8 = 36G. >>>>> >>> >>>>> >>> however, in the same machine, the disk actually used is about 211G. >>>>> >>> >>>>> >>> >>>>> >>> # du -sh /data/kudu/tserver/data/ >>>>> >>> >>>>> >>> 210G /data/kudu/tserver/data/ >>>>> >>> >>>>> >>> >>>>> >>> # find /data/kudu/tserver/data/ -name "*.data" | wc -l >>>>> >>> >>>>> >>> 8133 >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> What’s the difference between data file and on-disk size. >>>>> >>> >>>>> >>> Can files in /data/kudu/tserver/data/ be compacted, purged, or >>>>> some of >>>>> >>> them >>>>> >>> be deleted? >>>>> >>> >>>>> >>> >>>>> >>> Thanks very much. >>>>> >>> >>>>> >>> >>>>> >>> BR >>>>> >>> >>>>> >>> Brooks >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Todd Lipcon >>>> Software Engineer, Cloudera >>>> >>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >> >> > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera