Re: compaction: huge number of random reads

2018-05-08 Thread Kyrylo Lebediev
You are right, Kurt, it's what I was trying to do - lowering compression chunk 
size and device read-ahead.

Column-family settings: "compression = {'chunk_length_kb': '16', 
'sstable_compression': 'org.apache.cassandra.io.compress.SnappyCompressor'}"
Device read-ahead: blockdev --setra 8 

I had to fallback to default RA 256 and got large merged reads and small iops 
with good MBytes/sec after this.
I believe it's not caused by C* settings, but it's something with filesystem / 
IO-related kernel settings (or it's by design?).


Tried to emulate C* reads during compactions by dd:


**  RA=8 (4k)

# blockdev --setra 8 /dev/xvdb
# dd if=/dev/zero of=/data/ZZZ
^C16980952+0 records in
16980951+0 records out
8694246912 bytes (8.7 GB, 8.1 GiB) copied, 36.4651 s, 238 MB/s
# sync

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C846513+0 records in
846512+0 records out
433414144 bytes (433 MB, 413 MiB) copied, 21.4604 s, 20.2 MB/s   <<<<<

High IOPS in this case, io size = 4k.
What's interesting, setting bs=128k in dd didn't decrease iops, io size still 
was 4k


** RA=256 (128k):
# blockdev --setra 256 /dev/xvdb
# echo 3 > /proc/sys/vm/drop_caches
# dd if=/data/ZZZ of=/dev/null
^C15123937+0 records in
15123936+0 records out
7743455232 bytes (7.7 GB, 7.2 GiB) copied, 60.8407 s, 127 MB/s  <<<<<<

io size - 128k, small iops, good throughput (limited by EBS bandwidth)

Writes were fine in both cases: io size 128k, good throughput limited by EBS 
bandwidth only

Is above situation typical for small read-ahead ("price for small fast reads") 
or it's something wrong with my setup?
[It's not XFS mailing list, but as somebody here may know this, ] Why in case 
of small RA even large reads (bs=128k) are converted to multiple small reads?

Regards,
Kyrill



From: kurt greaves <k...@instaclustr.com>
Sent: Tuesday, May 8, 2018 2:12:40 AM
To: User
Subject: Re: compaction: huge number of random reads

If you've got small partitions/small reads you should test lowering your 
compression chunk size on the table and disabling read ahead. This sounds like 
it might just be a case of read amplification.

On Tue., 8 May 2018, 05:43 Kyrylo Lebediev, 
<kyrylo_lebed...@epam.com<mailto:kyrylo_lebed...@epam.com>> wrote:

Dear Experts,


I'm observing strange behavior on a cluster 2.1.20 during compactions.


My setup is:

12 nodes  m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.

Filesystem: XFS, blocksize 4k, device read-ahead - 4k

/sys/block/vxdb/queue/nomerges = 0

SizeTieredCompactionStrategy


After data loads when effectively nothing else is talking to the cluster and 
compactions is the only activity, I see something like this:
$ iostat -dkx 1
...


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.00 4769.00  213.00 19076.00 26820.0018.42 
7.951.171.063.76   0.20 100.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.00 6098.00  177.00 24392.00 22076.0014.81 
6.461.360.96   15.16   0.16 100.00

Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,

But for some reason compactions generate a huge number of small reads:
6098 reads/s <-> ~24Mbytes/sec.  ===>   Read size is 4k


Why instead much smaller amount of large reads I'm getting huge number of 4k 
reads instead?

What could be the reason?


Thanks,

Kyrill




Re: compaction: huge number of random reads

2018-05-07 Thread kurt greaves
If you've got small partitions/small reads you should test lowering your
compression chunk size on the table and disabling read ahead. This sounds
like it might just be a case of read amplification.

On Tue., 8 May 2018, 05:43 Kyrylo Lebediev, 
wrote:

> Dear Experts,
>
>
> I'm observing strange behavior on a cluster 2.1.20 during compactions.
>
>
> My setup is:
>
> 12 nodes  m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.
>
> Filesystem: XFS, blocksize 4k, device read-ahead - 4k
>
> /sys/block/vxdb/queue/nomerges = 0
>
> SizeTieredCompactionStrategy
>
>
> After data loads when effectively nothing else is talking to the cluster
> and compactions is the only activity, I see something like this:
> $ iostat -dkx 1
> ...
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda  0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> xvdb  0.00 0.00 4769.00  213.00 19076.00 26820.00
> 18.42 7.951.171.063.76   0.20 100.00
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> xvda  0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> xvdb  0.00 0.00 6098.00  177.00 24392.00 22076.00
> 14.81 6.461.360.96   15.16   0.16 100.00
>
> Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,
>
> But for some reason compactions generate a huge number of small reads:
> 6098 reads/s <-> ~24Mbytes/sec.  ===>   Read size is 4k
>
>
> Why instead much smaller amount of large reads I'm getting huge number of
> 4k reads instead?
>
> What could be the reason?
>
> Thanks,
>
> Kyrill
>
>
>


compaction: huge number of random reads

2018-05-07 Thread Kyrylo Lebediev
Dear Experts,


I'm observing strange behavior on a cluster 2.1.20 during compactions.


My setup is:

12 nodes  m4.2xlarge (8 vCPU, 32G RAM) Ubuntu 16.04, 2T EBS gp2.

Filesystem: XFS, blocksize 4k, device read-ahead - 4k

/sys/block/vxdb/queue/nomerges = 0

SizeTieredCompactionStrategy


After data loads when effectively nothing else is talking to the cluster and 
compactions is the only activity, I see something like this:
$ iostat -dkx 1
...


Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.00 4769.00  213.00 19076.00 26820.0018.42 
7.951.171.063.76   0.20 100.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
xvda  0.00 0.000.000.00 0.00 0.00 0.00 
0.000.000.000.00   0.00   0.00
xvdb  0.00 0.00 6098.00  177.00 24392.00 22076.0014.81 
6.461.360.96   15.16   0.16 100.00

Writes are fine: 177 writes/sec <-> ~22Mbytes/sec,

But for some reason compactions generate a huge number of small reads:
6098 reads/s <-> ~24Mbytes/sec.  ===>   Read size is 4k


Why instead much smaller amount of large reads I'm getting huge number of 4k 
reads instead?

What could be the reason?


Thanks,

Kyrill