On Wed, Jun 19, 2019 at 12:49 AM Pavel Martynov <[email protected]> wrote:

> Hi Todd, thanks for the answer!
>
> > Any chance you've done something like copy the files away and back that
> might cause them to lose their sparseness?
>
> No, I don't think so. Recently we experienced some problems with stability
> with Kudu, and ran rebalance a couple of times, if this related. But we
> never used fs commands like cp/mv against Kudu dirs.
>
> I ran du on all-WALs dir:
> # du -sh /mnt/data01/kudu-tserver-wal/
> 12G     /mnt/data01/kudu-tserver-wal/
>
> # du -sh --apparent-size /mnt/data01/kudu-tserver-wal/
> 25G     /mnt/data01/kudu-tserver-wal/
>
> And on WAL with a many indexes:
> # du -sh --apparent-size
> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
> 306M    /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>
> # du -sh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
> 296M    /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f
>
>
> > Also, any chance you're using XFS here?
>
> Yes, exactly XFS. We use CentOS 7.6.
>
> What is interesting, there are no many holes in index files in
> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f (WAL dir
> that I mention before). Only single hole in single index file (of 13 files):
> # xfs_bmap -v index.000000120
>

Try adding the '-p' flag here? That should show preallocated extents. Would
be interesting to run it on some index file which is larger than 1MB, for
example.


> index.000000120:
>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET          TOTAL
>    0: [0..4231]:       1176541248..1176545479  2 (4429888..4434119)  4232
>    1: [4232..9815]:    1176546592..1176552175  2 (4435232..4440815)  5584
>    2: [9816..11583]:   1176552832..1176554599  2 (4441472..4443239)  1768
>    3: [11584..13319]:  1176558672..1176560407  2 (4447312..4449047)  1736
>    4: [13320..15239]:  1176565336..1176567255  2 (4453976..4455895)  1920
>    5: [15240..17183]:  1176570776..1176572719  2 (4459416..4461359)  1944
>    6: [17184..18999]:  1176575856..1176577671  2 (4464496..4466311)  1816
>    7: [19000..20927]:  1176593552..1176595479  2 (4482192..4484119)  1928
>    8: [20928..22703]:  1176599128..1176600903  2 (4487768..4489543)  1776
>    9: [22704..24575]:  1176602704..1176604575  2 (4491344..4493215)  1872
>   10: [24576..26495]:  1176611936..1176613855  2 (4500576..4502495)  1920
>   11: [26496..26655]:  1176615040..1176615199  2 (4503680..4503839)   160
>   12: [26656..46879]:  hole                                         20224
>
> But in some other WAL I see like this:
> # xfs_bmap -v
> /mnt/data01/kudu-tserver-wal/wals/508ecdfa8904bdb97a02078a91822af/index.000000000
>
> /mnt/data01/kudu-tserver-wal/wals/508ecdfa89054bdb97a02078a91822af/index.000000000:
>  EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET        TOTAL
>    0: [0..7]:          1758753776..1758753783  3 (586736..586743)     8
>    1: [8..46879]:      hole                                       46872
>
> Looks like there actually used only 8 blocks and all other blocks are the
> hole.
>
>
> So looks like I can use formulas with confidence.
> Normal case: 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB
> = ~1.3 TB (+ some minor index overhead)
> Worse case: 8 MB/segment * 1 segment * 2000 tablets = 1,280,000 MB = ~16
> GB (+ some minor index overhead)
>
> Right?
>
>
> ср, 19 июн. 2019 г. в 09:35, Todd Lipcon <[email protected]>:
>
>> Hi Pavel,
>>
>> That's not quite expected. For example, on one of our test clusters here,
>> we have about 65GB of WALs and about 1GB of index files. If I recall
>> correctly, the index files store 8 bytes per WAL entry, so typically a
>> couple orders of magnitude smaller than the WALs themselves.
>>
>> One thing is that the index files are sparse. Any chance you've done
>> something like copy the files away and back that might cause them to lose
>> their sparseness? If I use du --apparent-size on mine, it's total of about
>> 180GB vs the 1GB of actual size.
>>
>> Also, any chance you're using XFS here? XFS sometimes likes to
>> preallocate large amounts of data into files while they're open, and only
>> frees it up if disk space is contended. I think you can use 'xfs_bmap' on
>> an index file to see the allocation status, which might be interesting.
>>
>> -Todd
>>
>> On Tue, Jun 18, 2019 at 11:12 PM Pavel Martynov <[email protected]>
>> wrote:
>>
>>> Hi guys!
>>>
>>> We want to buy SSDs for TServers WALs for our cluster. I'm working on
>>> capacity estimation for this SSDs using "Getting Started with Kudu" book,
>>> Chapter 4, Write-Ahead Log (
>>> https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html
>>> <https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html#idm139738927926240>
>>> ).
>>>
>>> NB: we use default Kudu WAL configuration settings.
>>>
>>> There is a formula for worse-case:
>>> 8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB = ~1.3 TB
>>>
>>> So, this formula takes into account only segment files. But in our
>>> cluster, I see that every segment file has >= 1 corresponding index files.
>>> And every index file actually larger than segment file.
>>>
>>> Numbers from one of our nodes.
>>> WALs count:
>>> $ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
>>> 711
>>>
>>> Overall WAL size:
>>> $ du -d 0 -h /mnt/data01/kudu-tserver-wal/
>>> 13G     /mnt/data01/kudu-tserver-wal/
>>>
>>> Size of all segment files:
>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du -ch
>>> {} + | grep total$
>>> 6.1G    total
>>>
>>> Size of all index files:
>>> $ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du -ch
>>> {} + | grep total$
>>> 6.5G    total
>>>
>>> So I have questions.
>>>
>>> 1. How can I estimate the size of index files?
>>> Looks like in our cluster size of index files approximately equal to
>>> size segment files.
>>>
>>> 2. There is some WALs with more than one index files. For example:
>>> $ ls -lh
>>> /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
>>> total 296M
>>> -rw-r--r-- 1 root root  23M Jun 18 21:31 index.000000108
>>> -rw-r--r-- 1 root root  23M Jun 18 21:41 index.000000109
>>> -rw-r--r-- 1 root root  23M Jun 18 21:52 index.000000110
>>> -rw-r--r-- 1 root root  23M Jun 18 22:10 index.000000111
>>> -rw-r--r-- 1 root root  23M Jun 18 22:22 index.000000112
>>> -rw-r--r-- 1 root root  23M Jun 18 22:35 index.000000113
>>> -rw-r--r-- 1 root root  23M Jun 18 22:48 index.000000114
>>> -rw-r--r-- 1 root root  23M Jun 18 23:01 index.000000115
>>> -rw-r--r-- 1 root root  23M Jun 18 23:14 index.000000116
>>> -rw-r--r-- 1 root root  23M Jun 18 23:27 index.000000117
>>> -rw-r--r-- 1 root root  23M Jun 18 23:40 index.000000118
>>> -rw-r--r-- 1 root root  23M Jun 18 23:52 index.000000119
>>> -rw-r--r-- 1 root root  23M Jun 19 01:13 index.000000120
>>> -rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
>>>
>>> Is this a normal situation?
>>>
>>> 3. Not a question. Please, consider adding documentation about the
>>> estimation of WAL storage. Also, I can't found any mentions about index
>>> files, except here
>>> https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
>>>
>>> Thanks!
>>>
>>> --
>>> with best regards, Pavel Martynov
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
> --
> with best regards, Pavel Martynov
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to