Re: WAL size estimation

Pavel Martynov Wed, 26 Jun 2019 23:59:22 -0700

Hi Todd,

This tablet disappeared from WAL path. I think it was time partition that
we already removed.


чт, 27 июн. 2019 г. в 08:58, Todd Lipcon <[email protected]>:

> Hey Pavel,
>
> I went back and looked at the source here. It appears that 24MB is the
> expected size for an index file -- each entry is 24 bytes and the index
> file should keep 1M entries.
>
> That said, for a "cold tablet" (in which you'd have only a small number of
> actual WAL files) I would expect only a single index file. The example you
> gave where you have 12 index files but only one WAL segment seems quite
> fishy to me. Having 12 index files indicates you have 12M separate WAL
> entries, but given you have only 8MB of WAL, that indicates each entry is
> less than one byte large, which doesn't make much sense at all.
>
> If you go back and look at that same tablet now, did it eventually GC
> those log index files?
>
> -Todd
>
>
>
> On Wed, Jun 19, 2019 at 1:53 AM Pavel Martynov <[email protected]> wrote:
>
>> > Try adding the '-p' flag here? That should show preallocated extents.
>> Would be interesting to run it on some index file which is larger than 1MB,
>> for example.
>>
>> # du -h --apparent-size index.000000108
>> 23M     index.000000108
>>
>> # du -h index.000000108
>> 23M     index.000000108
>>
>> # xfs_bmap -v -p index.000000108
>> index.000000108:
>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTAL
>> FLAGS
>>    0: [0..2719]:       1175815920..1175818639  2 (3704560..3707279)  2720
>> 00000
>>    1: [2720..5111]:    1175828904..1175831295  2 (3717544..3719935)  2392
>> 00000
>>    2: [5112..7767]:    1175835592..1175838247  2 (3724232..3726887)  2656
>> 00000
>>    3: [7768..10567]:   1175849896..1175852695  2 (3738536..3741335)  2800
>> 00000
>>    4: [10568..15751]:  1175877808..1175882991  2 (3766448..3771631)  5184
>> 00000
>>    5: [15752..18207]:  1175898864..1175901319  2 (3787504..3789959)  2456
>> 00000
>>    6: [18208..20759]:  1175909192..1175911743  2 (3797832..3800383)  2552
>> 00000
>>    7: [20760..23591]:  1175921616..1175924447  2 (3810256..3813087)  2832
>> 00000
>>    8: [23592..26207]:  1175974872..1175977487  2 (3863512..3866127)  2616
>> 00000
>>    9: [26208..28799]:  1175989496..1175992087  2 (3878136..3880727)  2592
>> 00000
>>   10: [28800..31199]:  1175998552..1176000951  2 (3887192..3889591)  2400
>> 00000
>>   11: [31200..33895]:  1176008336..1176011031  2 (3896976..3899671)  2696
>> 00000
>>   12: [33896..36591]:  1176031696..1176034391  2 (3920336..3923031)  2696
>> 00000
>>   13: [36592..39191]:  1176037440..1176040039  2 (3926080..3928679)  2600
>> 00000
>>   14: [39192..41839]:  1176072008..1176074655  2 (3960648..3963295)  2648
>> 00000
>>   15: [41840..44423]:  1176097752..1176100335  2 (3986392..3988975)  2584
>> 00000
>>   16: [44424..46879]:  1176132144..1176134599  2 (4020784..4023239)  2456
>> 00000
>>
>>
>>
>>
>>
>> ср, 19 июн. 2019 г. в 10:56, Todd Lipcon <[email protected]>:
>>
>>>
>>>
>>> On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov <[email protected]>
>>> wrote:
>>>
>>>> Hi Todd, thanks for the answer!
>>>>
>>>> > Any chance you've done something like copy the files away and back
>>>> that might cause them to lose their sparseness?
>>>>
>>>> No, I don't think so. Recently we experienced some problems with
>>>> stability with Kudu, and ran rebalance a couple of times, if this related.
>>>> But we never used fs commands like cp/mv against Kudu dirs.
>>>>
>>>> I ran du on all-WALs dir:
>>>> # du -sh /mnt/data01/kudu-tserver-wal/
>>>> 12G     /mnt/data01/kudu-tserver-wal/
>>>>
>>>> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
>>>> 25G     /mnt/data01/kudu-tserver-wal/
>>>>
>>>> And on WAL with a many indexes:
>>>> # du -sh --apparent-size
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>> 306M
>>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>>
>>>> # du -sh
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>> 296M
>>>>  /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>>>>
>>>>
>>>> > Also, any chance you're using XFS here?
>>>>
>>>> Yes, exactly XFS. We use CentOS 7.6.
>>>>
>>>> What is interesting, there are no many holes in index files in
>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WAL dir
>>>> that I mention before). Only single hole in single index file (of 13 
>>>> files):
>>>> # xfs_bmap -v index.000000120
>>>>
>>>
>>> Try adding the '-p' flag here? That should show preallocated extents.
>>> Would be interesting to run it on some index file which is larger than 1MB,
>>> for example.
>>>
>>>
>>>> index.000000120:
>>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET
>>>>  TOTAL
>>>>    0: [0..4231]:       1176541248..1176545479  2 (4429888..4434119)
>>>>  4232
>>>>    1: [4232..9815]:    1176546592..1176552175  2 (4435232..4440815)
>>>>  5584
>>>>    2: [9816..11583]:   1176552832..1176554599  2 (4441472..4443239)
>>>>  1768
>>>>    3: [11584..13319]:  1176558672..1176560407  2 (4447312..4449047)
>>>>  1736
>>>>    4: [13320..15239]:  1176565336..1176567255  2 (4453976..4455895)
>>>>  1920
>>>>    5: [15240..17183]:  1176570776..1176572719  2 (4459416..4461359)
>>>>  1944
>>>>    6: [17184..18999]:  1176575856..1176577671  2 (4464496..4466311)
>>>>  1816
>>>>    7: [19000..20927]:  1176593552..1176595479  2 (4482192..4484119)
>>>>  1928
>>>>    8: [20928..22703]:  1176599128..1176600903  2 (4487768..4489543)
>>>>  1776
>>>>    9: [22704..24575]:  1176602704..1176604575  2 (4491344..4493215)
>>>>  1872
>>>>   10: [24576..26495]:  1176611936..1176613855  2 (4500576..4502495)
>>>>  1920
>>>>   11: [26496..26655]:  1176615040..1176615199  2 (4503680..4503839)
>>>> 160
>>>>   12: [26656..46879]:  hole
>>>> 20224
>>>>
>>>> But in some other WAL I see like this:
>>>> # xfs_bmap -v
>>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/index.000000000
>>>>
>>>> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:
>>>>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET        TOTAL
>>>>    0: [0..7]:          1758753776..1758753783  3 (586736..586743)     8
>>>>    1: [8..46879]:      hole                                       46872
>>>>
>>>> Looks like there actually used only 8 blocks and all other blocks are
>>>> the hole.
>>>>
>>>>
>>>> So looks like I can use formulas with confidence.
>>>> Normal case: 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000
>>>> MB = ~1.3 TB (+ some minor index overhead)
>>>> Worse case: 8 MB/segment * 1 segment * 2000 tablets = 1,280,000 MB =
>>>> ~16 GB (+ some minor index overhead)
>>>>
>>>> Right?
>>>>
>>>>
>>>> ср, 19 июн. 2019 г. в 09:35, Todd Lipcon <[email protected]>:
>>>>
>>>>> Hi Pavel,
>>>>>
>>>>> That's not quite expected. For example, on one of our test clusters
>>>>> here, we have about 65GB of WALs and about 1GB of index files. If I recall
>>>>> correctly, the index files store 8 bytes per WAL entry, so typically a
>>>>> couple orders of magnitude smaller than the WALs themselves.
>>>>>
>>>>> One thing is that the index files are sparse. Any chance you've done
>>>>> something like copy the files away and back that might cause them to lose
>>>>> their sparseness? If I use du --apparent-size on mine, it's total of about
>>>>> 180GB vs the 1GB of actual size.
>>>>>
>>>>> Also, any chance you're using XFS here? XFS sometimes likes to
>>>>> preallocate large amounts of data into files while they're open, and only
>>>>> frees it up if disk space is contended. I think you can use 'xfs_bmap' on
>>>>> an index file to see the allocation status, which might be interesting.
>>>>>
>>>>> -Todd
>>>>>
>>>>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi guys!
>>>>>>
>>>>>> We want to buy SSDs for TServers WALs for our cluster. I'm working on
>>>>>> capacity estimation for this SSDs using "Getting Started with Kudu" book,
>>>>>> Chapter 4, Write-Ahead Log (
>>>>>> https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html
>>>>>> <https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html#idm139738927926240>
>>>>>> ).
>>>>>>
>>>>>> NB: we use default Kudu WAL configuration settings.
>>>>>>
>>>>>> There is a formula for worse-case:
>>>>>> 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB = ~1.3 TB
>>>>>>
>>>>>> So, this formula takes into account only segment files. But in our
>>>>>> cluster, I see that every segment file has >= 1 corresponding index 
>>>>>> files.
>>>>>> And every index file actually larger than segment file.
>>>>>>
>>>>>> Numbers from one of our nodes.
>>>>>> WALs count:
>>>>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
>>>>>> 711
>>>>>>
>>>>>> Overall WAL size:
>>>>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/
>>>>>> 13G     /mnt/data01/kudu-tserver-wal/
>>>>>>
>>>>>> Size of all segment files:
>>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du
>>>>>> -ch {} + | grep total$
>>>>>> 6.1G    total
>>>>>>
>>>>>> Size of all index files:
>>>>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du
>>>>>> -ch {} + | grep total$
>>>>>> 6.5G    total
>>>>>>
>>>>>> So I have questions.
>>>>>>
>>>>>> 1. How can I estimate the size of index files?
>>>>>> Looks like in our cluster size of index files approximately equal to
>>>>>> size segment files.
>>>>>>
>>>>>> 2. There is some WALs with more than one index files. For example:
>>>>>> $ ls -lh
>>>>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
>>>>>> total 296M
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:31 index.000000108
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:41 index.000000109
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 21:52 index.000000110
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:10 index.000000111
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:22 index.000000112
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:35 index.000000113
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 22:48 index.000000114
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:01 index.000000115
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:14 index.000000116
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:27 index.000000117
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:40 index.000000118
>>>>>> -rw-r--r-- 1 root root  23M Jun 18 23:52 index.000000119
>>>>>> -rw-r--r-- 1 root root  23M Jun 19 01:13 index.000000120
>>>>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
>>>>>>
>>>>>> Is this a normal situation?
>>>>>>
>>>>>> 3. Not a question. Please, consider adding documentation about the
>>>>>> estimation of WAL storage. Also, I can't found any mentions about index
>>>>>> files, except here
>>>>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> --
>>>>>> with best regards, Pavel Martynov
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>>
>>>>
>>>> --
>>>> with best regards, Pavel Martynov
>>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>> --
>> with best regards, Pavel Martynov
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


-- 
with best regards, Pavel Martynov

Re: WAL size estimation

Reply via email to