Hah! Found the problem! After setting read_ahead to 0 and compression chunk size to 4kb on all CFs, the situation was PERFECT (nearly, please see below)! I scrubbed some CFs but not the whole dataset, yet. I knew it was not too few RAM.
Some stats: - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L - Disk throughput: https://cl.ly/2a0Z250S1M3c - Dstat: https://gist.github.com/brstgt/c92bbd46ab76283e534b853b88ad3b26 - This shows, that the request distribution remained the same, so no dyn-snitch magic: https://cl.ly/3E0t1T1z2c0J Btw. I stumbled across this one: https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY Maybe we should also think about lowering default chunk length. *Unfortunately schema changes had a disturbing effect:* - I changed the chunk size with a script, so there were a lot of schema changes in a small period. - After all tables were changed, one of the seed hosts (cas1) went TOTALLY crazy. - Latency on this host was 10x of all other hosts. - There were more ParNew GCs. - Load was very high (up to 80, 100% CPU) - Whole system was unstable due to unpredictable latencies and backpressures (https://cl.ly/1m022g2W1Q3d) - Even SELECT * FROM system_schema.table etc appeared as slow query in the logs - It was the 1st server in the connect host list for the PHP client - CS restart didn't help. Reboot did not help (cold page cache made it probably worse). - All other nodes were totally ok. - Stopping CS on cas1 helped to keep the system stable. Brought down latency again, but was no solution. => Only replacing the node (with a newer, faster node) in the connect-host list helped that situation. Any ideas why changing schemas and/or chunk size could have such an effect? For some time the situation was really critical. 2017-02-20 10:48 GMT+01:00 Bhuvan Rawal <bhu1ra...@gmail.com>: > Hi Benjamin, > > Yes, Read ahead of 8 would imply more IO count from disk but it should not > cause more data read off the disk as is happening in your case. > > One probable reason for high disk io would be because the 512 vnode has > less page to RAM ratio of 22% (100G buff /437G data) as compared to 46% > (100G/237G). And as your avg record size is in bytes for every disk io you > are fetching complete 64K block to get a row. > > Perhaps you can balance the node by adding equivalent RAM ? > > Regards, > Bhuvan >