You are right, Kurt, it's what I was trying to do - lowering compression chunk
size and device read-ahead.
Column-family settings: "compression = {'chunk_length_kb': '16',
'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}"
Device read-ahead: blockdev --setra 8
I had to fallback to default RA 256 and got large merged reads and small iops
with good MBytes/sec after this.
I believe it's not caused by C* settings, but it's something with filesystem /
IO-related kernel settings (or it's by design?).
Tried to emulate C* reads during compactions by dd:
** RA=8 (4k)
# blockdev --setra 8 /dev/xvdb
# dd if=/dev/zero of=/data/ZZZ
^C16980952+0 records in
16980951+0 records out
8694246912 bytes (8.7 GB, 8.1 GiB) copied, 36.4651 s, 238 MB/s
# sync
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C846513+0 records in
846512+0 records out
433414144 bytes (433 MB, 413 MiB) copied, 21.4604 s, 20.2 MB/s <<<<<
High IOPS in this case, io size = 4k.
What's interesting, setting bs=128k in dd didn't decrease iops, io size still
was 4k
** RA=256 (128k):
# blockdev --setra 256 /dev/xvdb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C15123937+0 records in
15123936+0 records out
7743455232 bytes (7.7 GB, 7.2 GiB) copied, 60.8407 s, 127 MB/s <<<<<<
io size - 128k, small iops, good throughput (limited by EBS bandwidth)
Writes were fine in both cases: io size 128k, good throughput limited by EBS
bandwidth only
Is above situation typical for small read-ahead ("price for small fast reads")
or it's something wrong with my setup?
[It's not XFS mailing list, but as somebody here may know this, ] Why in case
of small RA even large reads (bs=128k) are converted to multiple small reads?
Regards,
Kyrill
From: kurt greaves <k...@instaclustr.com>
Sent: Tuesday, May 8, 2018 2:12:40 AM
To: User
Subject: Re: compaction: huge number of random reads
If you've got small partitions/small reads you should test lowering your
compression chunk size on the table and disabling read ahead. This sounds like
it might just be a case of read amplification.
On Tue., 8 May 2018, 05:43 Kyrylo Lebediev,
<kyrylo_lebed...@epam.com<mailto:kyrylo_lebed...@epam.com>> wrote:
Dear Experts,
I'm observing strange behavior on a cluster 2.1.20 during compactions.
My setup is:
12 nodes m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.
Filesystem: XFS, blocksize 4k, device read-ahead - 4k
/sys/block/vxdb/queue/nomerges = 0
SizeTieredCompactionStrategy
After data loads when effectively nothing else is talking to the cluster and
compactions is the only activity, I see something like this:
$ iostat -dkx 1
...
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00 0.00 0.00
xvdb 0.00 0.00 4769.00 213.00 19076.00 26820.0018.42
7.951.171.063.76 0.20 100.00
Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00 0.00 0.00
xvdb 0.00 0.00 6098.00 177.00 24392.00 22076.0014.81
6.461.360.96 15.16 0.16 100.00
Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,
But for some reason compactions generate a huge number of small reads:
6098 reads/s <-> ~24Mbytes/sec. ===> Read size is 4k
Why instead much smaller amount of large reads I'm getting huge number of 4k
reads instead?
What could be the reason?
Thanks,
Kyrill