On Thu, Dec 6, 2018 at 11:14 AM Riccardo Ferrari <ferra...@gmail.com> wrote:

>
> I had few instances in the past that were showing that unresponsivveness
> behaviour. Back then I saw with iotop/htop/dstat ... the system was stuck
> on a single thread processing (full throttle) for seconds. According to
> iotop that was the kswapd0 process. That system was an ubuntu 16.04
> actually "Ubuntu 16.04.4 LTS".
>

Riccardo,

Did you by chance also observe Linux OOM?  How long did the
unresponsiveness last in your case?

>From there I started to dig what kswap process was involved in a system
> with no swap and found that is used for mmapping. This erratic (allow me to
> say erratic) behaviour was not showing up when I was on 3.0.6 but started
> to right after upgrading to 3.0.17.
>
> By "load" I refer to the load as reported by the `nodetool status`. On my
> systems, when disk_access_mode is auto (read mmap), it is the sum of the
> node load plus the jmv heap size. Of course this is just what I noted on my
> systems not really sure if that should be the case on yours too.
>

I've checked and indeed we are using disk_access_mode=auto (well,
implicitly because it's not even part of config file anymore):
DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap.

I hope someone with more experience than me will add a comment about your
> settings. Reading the configuration file, writers and compactors should be
> 2 at minimum. I can confirm when I tried in the past to change the
> concurrent_compactors to 1 I had really bad things happenings (high system
> load, high message drop rate, ...)
>

As I've mentioned, we did not observe any other issues with the current
setup: system load is reasonable, no dropped messages, no big number of
hints, request latencies are OK, no big number of pending compactions.
Also during repair everything looks fine.

I have the "feeling", when running on constrained hardware the underlaying
> kernel optimization is a must. I agree with Jonathan H. that you should
> think about increasing the instance size, CPU and memory mathters a lot.
>

How did you solve your issue in the end?  You didn't rollback to 3.0.6?
Did you tune kernel parameters?  Which ones?

Thank you!
--
Alex

Reply via email to