Hi guys!
We want to buy SSDs for TServers WALs for our cluster. I'm working on
capacity estimation for this SSDs using "Getting Started with Kudu" book,
Chapter 4, Write-Ahead Log (
https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html
<https://www.oreilly.com/library/view/getting-started-with/9781491980248/ch04.html#idm139738927926240>
).
NB: we use default Kudu WAL configuration settings.
There is a formula for worse-case:
8 MB/segment * 80 max segments * 2000 tablets = 1,280,000 MB = ~1.3 TB
So, this formula takes into account only segment files. But in our cluster,
I see that every segment file has >= 1 corresponding index files. And every
index file actually larger than segment file.
Numbers from one of our nodes.
WALs count:
$ ls /mnt/data01/kudu-tserver-wal/wals/ | wc -l
711
Overall WAL size:
$ du -d 0 -h /mnt/data01/kudu-tserver-wal/
13G /mnt/data01/kudu-tserver-wal/
Size of all segment files:
$ find /mnt/data01/kudu-tserver-wal/ -type f -name 'wal-*' -exec du -ch {}
+ | grep total$
6.1G total
Size of all index files:
$ find /mnt/data01/kudu-tserver-wal/ -type f -name 'index*' -exec du -ch {}
+ | grep total$
6.5G total
So I have questions.
1. How can I estimate the size of index files?
Looks like in our cluster size of index files approximately equal to size
segment files.
2. There is some WALs with more than one index files. For example:
$ ls -lh /mnt/data01/kudu-tserver-wal/wals/779a382ea4e6464aa80ea398070a391f/
total 296M
-rw-r--r-- 1 root root 23M Jun 18 21:31 index.000000108
-rw-r--r-- 1 root root 23M Jun 18 21:41 index.000000109
-rw-r--r-- 1 root root 23M Jun 18 21:52 index.000000110
-rw-r--r-- 1 root root 23M Jun 18 22:10 index.000000111
-rw-r--r-- 1 root root 23M Jun 18 22:22 index.000000112
-rw-r--r-- 1 root root 23M Jun 18 22:35 index.000000113
-rw-r--r-- 1 root root 23M Jun 18 22:48 index.000000114
-rw-r--r-- 1 root root 23M Jun 18 23:01 index.000000115
-rw-r--r-- 1 root root 23M Jun 18 23:14 index.000000116
-rw-r--r-- 1 root root 23M Jun 18 23:27 index.000000117
-rw-r--r-- 1 root root 23M Jun 18 23:40 index.000000118
-rw-r--r-- 1 root root 23M Jun 18 23:52 index.000000119
-rw-r--r-- 1 root root 23M Jun 19 01:13 index.000000120
-rw-r--r-- 1 root root 8.0M Jun 19 01:13 wal-000007799
Is this a normal situation?
3. Not a question. Please, consider adding documentation about the
estimation of WAL storage. Also, I can't found any mentions about index
files, except here
https://kudu.apache.org/docs/scaling_guide.html#file_descriptors.
Thanks!
--
with best regards, Pavel Martynov