Hah! Found the problem!

After setting read_ahead to 0 and compression chunk size to 4kb on all CFs,
the situation was PERFECT (nearly, please see below)! I scrubbed some CFs
but not the whole dataset, yet. I knew it was not too few RAM.

Some stats:
- Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
- Disk throughput: https://cl.ly/2a0Z250S1M3c
- Dstat: https://gist.github.com/brstgt/c92bbd46ab76283e534b853b88ad3b26
- This shows, that the request distribution remained the same, so no
dyn-snitch magic: https://cl.ly/3E0t1T1z2c0J

Btw. I stumbled across this one:
https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
Maybe we should also think about lowering default chunk length.

*Unfortunately schema changes had a disturbing effect:*
- I changed the chunk size with a script, so there were a lot of schema
changes in a small period.
- After all tables were changed, one of the seed hosts (cas1) went TOTALLY
crazy.
- Latency on this host was 10x of all other hosts.
- There were more ParNew GCs.
- Load was very high (up to 80, 100% CPU)
- Whole system was unstable due to unpredictable latencies and
backpressures (https://cl.ly/1m022g2W1Q3d)
- Even SELECT * FROM system_schema.table etc appeared as slow query in the
logs
- It was the 1st server in the connect host list for the PHP client
- CS restart didn't help. Reboot did not help (cold page cache made it
probably worse).
- All other nodes were totally ok.
- Stopping CS on cas1 helped to keep the system stable. Brought down
latency again, but was no solution.

=> Only replacing the node (with a newer, faster node) in the connect-host
list helped that situation.

Any ideas why changing schemas and/or chunk size could have such an effect?
For some time the situation was really critical.


2017-02-20 10:48 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>:

> Hi Benjamin,
>
> Yes, Read ahead of 8 would imply more IO count from disk but it should not
> cause more data read off the disk as is happening in your case.
>
> One probable reason for high disk io would be because the 512 vnode has
> less page to RAM ratio of 22% (100G buff /437G data) as compared to 46%
> (100G/237G). And as your avg record size is in bytes for every disk io you
> are fetching complete 64K block to get a row.
>
> Perhaps you can balance the node by adding equivalent RAM ?
>
> Regards,
> Bhuvan
>

Reply via email to