Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-23 Thread Pierre Fersing
Hi,

Thanks for your detailed answers. I understand the reason why using low 
priority compaction may not be a great idea in the general case (the example 
with too high CPU for reading).

I’ll give a try with the compaction throughput which I total forgot that this 
option exists. It may fix the issue I see after my upgrade. I’ll see.
If it’s not fixing my trouble, I’ll try to confirm that my issue is indeed due 
to compaction putting too much pressure on CPU because going further as it will 
probably require more works that initial though and likely being a configurable 
option.

Regards,
Pierre Fersing

De : Dmitry Konstantinov 
Date : jeudi, 22 février 2024 à 20:39
À : user@cassandra.apache.org 
Objet : Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)
Hi all,

I was not participating in the changes but I analyzed the question some time 
ago from another side.
There were also changes related to -XX:ThreadPriorityPolicy JVM option. When 
you set a thread priority for a Java thread it does not mean it is always 
propagated as a native OS thread priority. To propagate the priority you should 
use  -XX:ThreadPriorityPolicy and -XX:JavaPriorityN_To_OSPriority JVM options, 
but there is an issue with them because JVM wants to be executed under root to 
set -XX:ThreadPriorityPolicy=1 which enables the priorities usage. A hack was 
invented long time ago to workaround it by setting -XX:ThreadPriorityPolicy=42 
value (any value not equal to 0 or 1) and bypass the not so needed and annoying 
grants validation logic (see 
http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html 
for more details about).
It worked for Java 8 but then there was a change in Java 9 about adding extra 
validation for JVM option values (JEP 245: Validate JVM Command-Line Flag 
Arguments  - https://bugs.openjdk.org/browse/JDK-8059557) and the hack stopped 
to work and started to cause JVM failure with a validation error. As a reaction 
to it - the flag was removed from Cassandra JVM configuration files in 
https://issues.apache.org/jira/browse/CASSANDRA-13107. After it the lower 
priority value for compaction threads have not had any actual effect.
The interesting story is that the JVM logic has been changed to support the 
ability to set -XX:ThreadPriorityPolicy=1 for non-root users in Java 13 
(https://bugs.openjdk.org/browse/JDK-8215962) and the change was backported to 
Java 11 as well (https://bugs.openjdk.org/browse/JDK-8217494).
So, from this point of view I think it would be nice to return back the ability 
to set thread priority for compaction threads. At the same time I would not 
expect too much improvement by enabling it.

P.S. There was also an idea about using ionice 
(https://issues.apache.org/jira/browse/CASSANDRA-9946) but the current Linux IO 
schedulers do not take that into account anymore. It looks like the only 
scheduler that supported ionice was CFQ 
(https://issues.apache.org/jira/browse/CASSANDRA-9946?focusedCommentId=14648616=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14648616)
 and it was deprecated and removed since Linux kernel 5.x 
(https://github.com/torvalds/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c).

Regards,
Dmitry


On Thu, 22 Feb 2024 at 15:30, Bowen Song via user 
mailto:user@cassandra.apache.org>> wrote:

Hi Pierre,

Is there anything stopping you from using the 
compaction_throughput<https://github.com/apache/cassandra/blob/f9e033f519c14596da4dc954875756a69aea4e78/conf/cassandra.yaml#L989>
 option in the cassandra.yaml file to manage the performance impact of 
compaction operations?

With thread priority, there's a failure scenario on busy nodes when the read 
operations uses too much CPU. If the compaction thread has lower priority, it 
does not get enough CPU time to run, and SSTable files will build up, causing 
read to become slower and more expensive, which in turn result in compaction 
gets even less CPU time. At the end, one of the following three will happen:

  *   the node becomes too slow and most queries time out
  *   the Java process crashes due to too many open files or OOM because JVM GC 
can't keep up
  *   the filesystem run out of free space or inodes

However, I'm unsure whether the compaction thread priority was intentionally 
removed from 4.1.0. Someone familiar with this matter may be able to answer 
that.

Cheers,
Bowen


On 22/02/2024 13:26, Pierre Fersing wrote:
Hello all,

I've recently upgraded to Cassandra 4.1 and see a change in compaction behavior 
that seems unwanted:

* With Cassandra 3.11 compaction was run by thread in low priority and thus 
using CPU nice (visible using top) (I believe Cassandra 4.0 also had this 
behavior)

* With Cassandra 4.1, compactions are no longer run as low priority thread 
(compaction now use "normal" CPU).

This means that when the server had limited CPU, Cassandra compaction now 
compete for the CPU with other process (probab

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Dmitry Konstantinov
Thank you for highlighting this, it looks like I need to refresh my
knowledge about IO schedulers :-)

Cheers,
Dmitry

On Thu, 22 Feb 2024 at 22:18, Bowen Song via user 
wrote:

> On the IO scheduler point, cfq WAS the only scheduler supporting IO
> priorities (such as ionice) shipped by default with the Linux kernel, but
> that has changed since bfq and mq-deadline were added to the Linux kernel.
> Both bfq and mq-deadline supports IO priority, as documented here:
> https://docs.kernel.org/block/ioprio.html
>
>
> On 22/02/2024 19:39, Dmitry Konstantinov wrote:
>
> Hi all,
>
> I was not participating in the changes but I analyzed the question some
> time ago from another side.
> There were also changes related to -XX:ThreadPriorityPolicy JVM option.
> When you set a thread priority for a Java thread it does not mean it is
> always propagated as a native OS thread priority. To propagate the priority
> you should use  -XX:ThreadPriorityPolicy and -XX:JavaPriorityN_To_OSPriority
>  JVM options, but there is an issue with them because JVM wants to be
> executed under root to set -XX:ThreadPriorityPolicy=1 which enables the
> priorities usage. A hack was invented long time ago to workaround it by
> setting -XX:ThreadPriorityPolicy=42 value (any value not equal to 0 or 1)
> and bypass the not so needed and annoying grants validation logic (see
> http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html
> for more details about).
> It worked for Java 8 but then there was a change in Java 9 about adding
> extra validation for JVM option values (JEP 245: Validate JVM Command-Line
> Flag Arguments  - https://bugs.openjdk.org/browse/JDK-8059557) and the
> hack stopped to work and started to cause JVM failure with a validation
> error. As a reaction to it - the flag was removed from Cassandra JVM
> configuration files in
> https://issues.apache.org/jira/browse/CASSANDRA-13107. After it the lower
> priority value for compaction threads have not had any actual effect.
> The interesting story is that the JVM logic has been changed to support
> the ability to set -XX:ThreadPriorityPolicy=1 for non-root users in Java 13
> (https://bugs.openjdk.org/browse/JDK-8215962) and the change was
> backported to Java 11 as well (https://bugs.openjdk.org/browse/JDK-8217494
> ).
> So, from this point of view I think it would be nice to return back the
> ability to set thread priority for compaction threads. At the same time I
> would not expect too much improvement by enabling it.
>
> P.S. There was also an idea about using ionice (
> https://issues.apache.org/jira/browse/CASSANDRA-9946) but the current
> Linux IO schedulers do not take that into account anymore. It looks like
> the only scheduler that supported ionice was CFQ (
> https://issues.apache.org/jira/browse/CASSANDRA-9946?focusedCommentId=14648616=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14648616)
> and it was deprecated and removed since Linux kernel 5.x (
> https://github.com/torvalds/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c
> ).
>
> Regards,
> Dmitry
>
>
> On Thu, 22 Feb 2024 at 15:30, Bowen Song via user <
> user@cassandra.apache.org> wrote:
>
>> Hi Pierre,
>>
>> Is there anything stopping you from using the compaction_throughput
>> 
>> option in the cassandra.yaml file to manage the performance impact of
>> compaction operations?
>>
>> With thread priority, there's a failure scenario on busy nodes when the
>> read operations uses too much CPU. If the compaction thread has lower
>> priority, it does not get enough CPU time to run, and SSTable files will
>> build up, causing read to become slower and more expensive, which in turn
>> result in compaction gets even less CPU time. At the end, one of the
>> following three will happen:
>>
>>- the node becomes too slow and most queries time out
>>- the Java process crashes due to too many open files or OOM because
>>JVM GC can't keep up
>>- the filesystem run out of free space or inodes
>>
>> However, I'm unsure whether the compaction thread priority was
>> intentionally removed from 4.1.0. Someone familiar with this matter may be
>> able to answer that.
>>
>> Cheers,
>> Bowen
>>
>>
>> On 22/02/2024 13:26, Pierre Fersing wrote:
>>
>> Hello all,
>>
>> I've recently upgraded to Cassandra 4.1 and see a change in compaction
>> behavior that seems unwanted:
>>
>> * With Cassandra 3.11 compaction was run by thread in low priority and
>> thus using CPU nice (visible using top) (I believe Cassandra 4.0 also had
>> this behavior)
>>
>> * With Cassandra 4.1, compactions are no longer run as low priority
>> thread (compaction now use "normal" CPU).
>>
>> This means that when the server had limited CPU, Cassandra compaction now
>> compete for the CPU with other process (probably including Cassandra
>> itself) that need CPU. When it was using CPU 

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Bowen Song via user
On the IO scheduler point, cfq WAS the only scheduler supporting IO 
priorities (such as ionice) shipped by default with the Linux kernel, 
but that has changed since bfq and mq-deadline were added to the Linux 
kernel. Both bfq and mq-deadline supports IO priority, as documented 
here: https://docs.kernel.org/block/ioprio.html



On 22/02/2024 19:39, Dmitry Konstantinov wrote:

Hi all,

I was not participating in the changes but I analyzed the question 
some time ago from another side.
There were also changes related to -XX:ThreadPriorityPolicy JVM 
option. When you set a thread priority for a Java thread it does not 
mean it is always propagated as a native OS thread priority. To 
propagate the priority you should use  -XX:ThreadPriorityPolicy and 
-XX:JavaPriorityN_To_OSPriorityJVM options, but there is an issue with 
them because JVM wants to be executed under root to 
set -XX:ThreadPriorityPolicy=1 which enables the priorities usage. A 
hack was invented long time ago to workaround it by setting 
-XX:ThreadPriorityPolicy=42 value (any value not equal to 0 or 1) and 
bypass the not so needed and annoying grants validation logic (see 
http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html 
for more details about).
It worked for Java 8 but then there was a change in Java 9 about 
adding extra validation for JVM option values (JEP 245: Validate JVM 
Command-Line Flag Arguments  - 
https://bugs.openjdk.org/browse/JDK-8059557) and the hack stopped to 
work and started to cause JVM failure with a validation error. As a 
reaction to it - the flag was removed from Cassandra JVM configuration 
files in https://issues.apache.org/jira/browse/CASSANDRA-13107. After 
it the lower priority value for compaction threads have not had any 
actual effect.
The interesting story is that the JVM logic has been changed to 
support the ability to set -XX:ThreadPriorityPolicy=1 for non-root 
users in Java 13 (https://bugs.openjdk.org/browse/JDK-8215962) and the 
change was backported to Java 11 as well 
(https://bugs.openjdk.org/browse/JDK-8217494).
So, from this point of view I think it would be nice to return back 
the ability to set thread priority for compaction threads. At the same 
time I would not expect too much improvement by enabling it.


P.S. There was also an idea about using ionice 
(https://issues.apache.org/jira/browse/CASSANDRA-9946) but the current 
Linux IO schedulers do not take that into account anymore. It looks 
like the only scheduler that supported ionice was CFQ 
(https://issues.apache.org/jira/browse/CASSANDRA-9946?focusedCommentId=14648616=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14648616 
) 
and it was deprecated and removed since Linux kernel 5.x 
(https://github.com/torvalds/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c).


Regards,
Dmitry


On Thu, 22 Feb 2024 at 15:30, Bowen Song via user 
 wrote:


Hi Pierre,

Is there anything stopping you from using the
compaction_throughput


option in the cassandra.yaml file to manage the performance impact
of compaction operations?

With thread priority, there's a failure scenario on busy nodes
when the read operations uses too much CPU. If the compaction
thread has lower priority, it does not get enough CPU time to run,
and SSTable files will build up, causing read to become slower and
more expensive, which in turn result in compaction gets even less
CPU time. At the end, one of the following three will happen:

  * the node becomes too slow and most queries time out
  * the Java process crashes due to too many open files or OOM
because JVM GC can't keep up
  * the filesystem run out of free space or inodes

However, I'm unsure whether the compaction thread priority was
intentionally removed from 4.1.0. Someone familiar with this
matter may be able to answer that.

Cheers,
Bowen


On 22/02/2024 13:26, Pierre Fersing wrote:


Hello all,

I've recently upgraded to Cassandra 4.1 and see a change in
compaction behavior that seems unwanted:

* With Cassandra 3.11 compaction was run by thread in low
priority and thus using CPU nice (visible using top) (I believe
Cassandra 4.0 also had this behavior)

* With Cassandra 4.1, compactions are no longer run as low
priority thread (compaction now use "normal" CPU).

This means that when the server had limited CPU, Cassandra
compaction now compete for the CPU with other process (probably
including Cassandra itself) that need CPU. When it was using CPU
nice, the compaction only competed for CPU with other lower
priority process which was great as 

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Dmitry Konstantinov
Hi all,

I was not participating in the changes but I analyzed the question some
time ago from another side.
There were also changes related to -XX:ThreadPriorityPolicy JVM option.
When you set a thread priority for a Java thread it does not mean it is
always propagated as a native OS thread priority. To propagate the priority
you should use  -XX:ThreadPriorityPolicy and -XX:JavaPriorityN_To_OSPriority
 JVM options, but there is an issue with them because JVM wants to be
executed under root to set -XX:ThreadPriorityPolicy=1 which enables the
priorities usage. A hack was invented long time ago to workaround it by
setting -XX:ThreadPriorityPolicy=42 value (any value not equal to 0 or 1)
and bypass the not so needed and annoying grants validation logic (see
http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html
for more details about).
It worked for Java 8 but then there was a change in Java 9 about adding
extra validation for JVM option values (JEP 245: Validate JVM Command-Line
Flag Arguments  - https://bugs.openjdk.org/browse/JDK-8059557) and the hack
stopped to work and started to cause JVM failure with a validation error.
As a reaction to it - the flag was removed from Cassandra JVM configuration
files in https://issues.apache.org/jira/browse/CASSANDRA-13107. After it
the lower priority value for compaction threads have not had any actual
effect.
The interesting story is that the JVM logic has been changed to support the
ability to set -XX:ThreadPriorityPolicy=1 for non-root users in Java 13 (
https://bugs.openjdk.org/browse/JDK-8215962) and the change was backported
to Java 11 as well (https://bugs.openjdk.org/browse/JDK-8217494).
So, from this point of view I think it would be nice to return back the
ability to set thread priority for compaction threads. At the same time I
would not expect too much improvement by enabling it.

P.S. There was also an idea about using ionice (
https://issues.apache.org/jira/browse/CASSANDRA-9946) but the current Linux
IO schedulers do not take that into account anymore. It looks like the only
scheduler that supported ionice was CFQ (
https://issues.apache.org/jira/browse/CASSANDRA-9946?focusedCommentId=14648616=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14648616)
and it was deprecated and removed since Linux kernel 5.x (
https://github.com/torvalds/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c
).

Regards,
Dmitry


On Thu, 22 Feb 2024 at 15:30, Bowen Song via user 
wrote:

> Hi Pierre,
>
> Is there anything stopping you from using the compaction_throughput
> 
> option in the cassandra.yaml file to manage the performance impact of
> compaction operations?
>
> With thread priority, there's a failure scenario on busy nodes when the
> read operations uses too much CPU. If the compaction thread has lower
> priority, it does not get enough CPU time to run, and SSTable files will
> build up, causing read to become slower and more expensive, which in turn
> result in compaction gets even less CPU time. At the end, one of the
> following three will happen:
>
>- the node becomes too slow and most queries time out
>- the Java process crashes due to too many open files or OOM because
>JVM GC can't keep up
>- the filesystem run out of free space or inodes
>
> However, I'm unsure whether the compaction thread priority was
> intentionally removed from 4.1.0. Someone familiar with this matter may be
> able to answer that.
>
> Cheers,
> Bowen
>
>
> On 22/02/2024 13:26, Pierre Fersing wrote:
>
> Hello all,
>
> I've recently upgraded to Cassandra 4.1 and see a change in compaction
> behavior that seems unwanted:
>
> * With Cassandra 3.11 compaction was run by thread in low priority and
> thus using CPU nice (visible using top) (I believe Cassandra 4.0 also had
> this behavior)
>
> * With Cassandra 4.1, compactions are no longer run as low priority thread
> (compaction now use "normal" CPU).
>
> This means that when the server had limited CPU, Cassandra compaction now
> compete for the CPU with other process (probably including Cassandra
> itself) that need CPU. When it was using CPU nice, the compaction only
> competed for CPU with other lower priority process which was great as it
> leaves CPU available for processes that need to kept small response time
> (like an API used by human).
>
> Is it wanted to lose this feature in Cassandra 4.1 or was it just a forget
> during re-write of compaction executor ? Should I open a bug to
> re-introduce this feature in Cassandra ?
>
>
> I've done few searches, and:
>
> * I believe compaction used CPU nice because the compactor executor was
> created with minimal priority:
> https://github.com/apache/cassandra/blob/cassandra-3.11.16/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1906
>
> * I think it was dropped by commit
> 

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Bowen Song via user

Hi Pierre,

Is there anything stopping you from using the compaction_throughput 
 
option in the cassandra.yaml file to manage the performance impact of 
compaction operations?


With thread priority, there's a failure scenario on busy nodes when the 
read operations uses too much CPU. If the compaction thread has lower 
priority, it does not get enough CPU time to run, and SSTable files will 
build up, causing read to become slower and more expensive, which in 
turn result in compaction gets even less CPU time. At the end, one of 
the following three will happen:


 * the node becomes too slow and most queries time out
 * the Java process crashes due to too many open files or OOM because
   JVM GC can't keep up
 * the filesystem run out of free space or inodes

However, I'm unsure whether the compaction thread priority was 
intentionally removed from 4.1.0. Someone familiar with this matter may 
be able to answer that.


Cheers,
Bowen


On 22/02/2024 13:26, Pierre Fersing wrote:


Hello all,

I've recently upgraded to Cassandra 4.1 and see a change in compaction 
behavior that seems unwanted:


* With Cassandra 3.11 compaction was run by thread in low priority and 
thus using CPU nice (visible using top) (I believe Cassandra 4.0 also 
had this behavior)


* With Cassandra 4.1, compactions are no longer run as low priority 
thread (compaction now use "normal" CPU).


This means that when the server had limited CPU, Cassandra compaction 
now compete for the CPU with other process (probably including 
Cassandra itself) that need CPU. When it was using CPU nice, the 
compaction only competed for CPU with other lower priority process 
which was great as it leaves CPU available for processes that need to 
kept small response time (like an API used by human).


Is it wanted to lose this feature in Cassandra 4.1 or was it just a 
forget during re-write of compaction executor ? Should I open a bug to 
re-introduce this feature in Cassandra ?



I've done few searches, and:

* I believe compaction used CPU nice because the compactor executor 
was created with minimal priority: 
https://github.com/apache/cassandra/blob/cassandra-3.11.16/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1906 



* I think it was dropped by commit 
https://github.com/apache/cassandra/commit/be1f050bc8c0cd695a42952e3fc84625ad48d83a 



* It looks doable to set the thread priority with new executor, I 
think adding ".withThreadPriority(Thread.MIN_PRIORITY)" when using 
executorFactory in 
https://github.com/apache/cassandra/blob/77a3e0e818df3cce45a974ecc977ad61bdcace47/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L2028 
should 
do it.



Did I miss a reason to no longer use low priority threads for 
compaction ? Should I open a bug for re-adding this feature / submit a 
PR ?


Regards,

Pierre Fersing