On Thu, Dec 6, 2018 at 11:14 AM Riccardo Ferrari <ferra...@gmail.com> wrote:
> > I had few instances in the past that were showing that unresponsivveness > behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck > on a single thread processing (full throttle) for seconds. According to > iotop that was the kswapd0 process. That system was an ubuntu 16.04 > actually "Ubuntu 16.04.4 LTS". > Riccardo, Did you by chance also observe Linux OOM? How long did the unresponsiveness last in your case? >From there I started to dig what kswap process was involved in a system > with no swap and found that is used for mmapping. This erratic (allow me to > say erratic) behaviour was not showing up when I was on 3.0.6 but started > to right after upgrading to 3.0.17. > > By "load" I refer to the load as reported by the `nodetool status`. On my > systems, when disk_access_mode is auto (read mmap), it is the sum of the > node load plus the jmv heap size. Of course this is just what I noted on my > systems not really sure if that should be the case on yours too. > I've checked and indeed we are using disk_access_mode=auto (well, implicitly because it's not even part of config file anymore): DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap. I hope someone with more experience than me will add a comment about your > settings. Reading the configuration file, writers and compactors should be > 2 at minimum. I can confirm when I tried in the past to change the > concurrent_compactors to 1 I had really bad things happenings (high system > load, high message drop rate, ...) > As I've mentioned, we did not observe any other issues with the current setup: system load is reasonable, no dropped messages, no big number of hints, request latencies are OK, no big number of pending compactions. Also during repair everything looks fine. I have the "feeling", when running on constrained hardware the underlaying > kernel optimization is a must. I agree with Jonathan H. that you should > think about increasing the instance size, CPU and memory mathters a lot. > How did you solve your issue in the end? You didn't rollback to 3.0.6? Did you tune kernel parameters? Which ones? Thank you! -- Alex