Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
I was there too! and found nothing to work around it except stopping 
big/unnecessary compactions manually (using nodetool stop) whenever they 
appears by some shell scrips (using crontab)


Sent using https://www.zoho.com/mail/








 On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz  wrote ---



HeyWhen deciding which sstables to compact together, how is the priority 
determined between tasks, and can I do something about it?



In some cases (mostly after removing a node), it takes a while for compactions 
to keep up with the new data the came from removed nodes, and I see it is busy 
on huge compaction tasks, but in the meantime a lot of small sstables are 
piling up (new data that is coming from the application, so read performance is 
not good, new data is scattered in many sstables, and probably combining big 
sstables won't help reduce fragmentation as much (I think).



Another thing that comes to mind, is perhaps I have a table that is very big, 
but not being read that much, would be nice to have other tables have higher 
compaction priority (to help in a case like I described above).



Version is 4.0.4



Gil

Re: Compaction task priority

2022-09-02 Thread onmstester onmstester via user
Another thing that comes to my mind: increase minimum sstable count to compact 
from 4 to 32 for the big table that won't be read that much, although you 
should watch out for too many sstables count.


Sent using https://www.zoho.com/mail/








 On Fri, 02 Sep 2022 11:29:59 +0430 onmstester onmstester via user 
 wrote ---



I was there too! and found nothing to work around it except stopping 
big/unnecessary compactions manually (using nodetool stop) whenever they 
appears by some shell scrips (using crontab)



Sent using https://www.zoho.com/mail/








 On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz <mailto:gilg...@gmail.com> 
wrote ---











HeyWhen deciding which sstables to compact together, how is the priority 
determined between tasks, and can I do something about it?



In some cases (mostly after removing a node), it takes a while for compactions 
to keep up with the new data the came from removed nodes, and I see it is busy 
on huge compaction tasks, but in the meantime a lot of small sstables are 
piling up (new data that is coming from the application, so read performance is 
not good, new data is scattered in many sstables, and probably combining big 
sstables won't help reduce fragmentation as much (I think).



Another thing that comes to mind, is perhaps I have a table that is very big, 
but not being read that much, would be nice to have other tables have higher 
compaction priority (to help in a case like I described above).



Version is 4.0.4



Gil

Re: Compaction task priority

2022-09-06 Thread onmstester onmstester via user
Using nodetool stop -id COMPACTION_UUID(reported in compactionstats), also you 
could figure it out with nodetool help stop


Sent using https://www.zoho.com/mail/








 On Mon, 05 Sep 2022 10:18:52 +0430 Gil Ganz  wrote ---




onmstester  - How can you stop a specific compaction task? stop command stops 
all compactions of a given type (would be nice to be able to stop specific one).

Jim - in my case the solution was actually to limit concurrent compactors, not 
increase it. Too many tasks caused the server to slow down and not be able to 
keep up.




On Fri, Sep 2, 2022 at 4:55 PM Jim Shaw <mailto:jxys...@gmail.com> wrote:





if capacity allowed,  increase compaction_throughput_mb_per_sec as 1st tuning,  
and if still behind, increase concurrent_compactors as 2nd tuning.



Regards,



Jim
On Fri, Sep 2, 2022 at 3:05 AM onmstester onmstester via user 
<mailto:user@cassandra.apache.org> wrote:

Another thing that comes to my mind: increase minimum sstable count to compact 
from 4 to 32 for the big table that won't be read that much, although you 
should watch out for too many sstables count.



Sent using https://www.zoho.com/mail/








 On Fri, 02 Sep 2022 11:29:59 +0430 onmstester onmstester via user 
<mailto:user@cassandra.apache.org> wrote ---



I was there too! and found nothing to work around it except stopping 
big/unnecessary compactions manually (using nodetool stop) whenever they 
appears by some shell scrips (using crontab)



Sent using https://www.zoho.com/mail/








 On Fri, 02 Sep 2022 10:59:22 +0430 Gil Ganz <mailto:gilg...@gmail.com> 
wrote ---











HeyWhen deciding which sstables to compact together, how is the priority 
determined between tasks, and can I do something about it?



In some cases (mostly after removing a node), it takes a while for compactions 
to keep up with the new data the came from removed nodes, and I see it is busy 
on huge compaction tasks, but in the meantime a lot of small sstables are 
piling up (new data that is coming from the application, so read performance is 
not good, new data is scattered in many sstables, and probably combining big 
sstables won't help reduce fragmentation as much (I think).



Another thing that comes to mind, is perhaps I have a table that is very big, 
but not being read that much, would be nice to have other tables have higher 
compaction priority (to help in a case like I described above).



Version is 4.0.4



Gil

Re: Using zstd compression on Cassandra 3.x

2022-09-12 Thread onmstester onmstester via user
I patched this on 3.11.2 easily: 

1. build jar file from src and put in cassandra/lib directory

2. restart cassandra service

3. alter table using compression zstd and rebuild sstables



But it was in a time when 4.0 was not available yet and after that i upgraded 
to 4.0 immidiately.


Sent using https://www.zoho.com/mail/








 On Tue, 13 Sep 2022 06:38:08 +0430 Eunsu Kim  
wrote ---



Hi all, 
 
Since zstd compression is a very good compression algorithm, it is available in 
Cassandra 4.0. Because the overall performance and ratio are excellent 
 
There is open source available for Cassandra 3.x. 
https://github.com/MatejTymes/cassandra-zstd 
 
Do you have any experience applying this to production? 
 
I want to improve performance and disk usage by applying it to a running 
Cassandra cluster. 
 
Thanks.

Re: Fwd: Re: Problem on setup Cassandra v4.0.1 cluster

2022-10-08 Thread onmstester onmstester via user
I encountered the same problem again with same error logs(this time with Apache 
Cassandra 4.0.6 and a new cluster), but unlike the previous time, hostname 
config was fine. After days of try and fail, finally i've found the root cause: 
time in faulty server has a 2 minute difference and not in sync with other 
servers in the cluster!, just synced the time and problem fixed.

I wonder if community could provide more information at log level for such 
problems (to prevent users struggle and debug these sort of stuff), because 
these two problems (faulty hostname config and non-sync server timestamp) are 
common due to manual config or no one thought such problems could prevent a 
Cassandra node from joining the cluster!


Sent using https://www.zoho.com/mail/








 On Mon, 31 Jan 2022 16:35:50 +0330 onmstester onmstester 
 wrote ---





Once again it was related to hostname configuration (I remember had problem 
with this multiple times before even on different applications), this time the 
root cause was a typo in one of multiple config files for hostname (different 
name on /etc/hostname with /etc/hosts)! I fixed that and now there is no 
problem.



I wonder how Cassandra-3.11 worked?!



P.S: Default dc name in version 4 was changed to datacenter1 (from dc1) and it 
seems to cause a bit of problem with previous configs(default one in rack-dc 
conf still is dc1).



Thank you



Best Regards

Sent using https://www.zoho.com/mail/






 Forwarded message 
From: Erick Ramirez 
To: 
Date: Mon, 31 Jan 2022 15:06:21 +0330
Subject: Re: Problem on setup Cassandra v4.0.1 cluster
 Forwarded message 












TP stats indicate pending gossip. Check that the times are synchronised on both 
nodes (use NTP) since it can prevent gossip from working.



I'd also suggest looking at the logs on both nodes to see what other WARN and 
ERROR messages are being reported. Cheers!

RE: Best compaction strategy for rarely used data

2023-01-06 Thread onmstester onmstester via user
Another solution: distribute data in more tables, for example you could create 
multiple tables based on value or hash_bucket of one of the columns, by doing 
this current data volume  and compaction overhead would be divided to the 
number of underlying tables. Although there is a limitation for number of 
tables in Cassandra (a few hundreds).

I wish STCS simply had a limitation for maximum sstable size so sstables bigger 
that this limit would not be compacted at all, that would have solved most of 
similar problems?!



Sent using https://www.zoho.com/mail/








 On Fri, 30 Dec 2022 21:43:27 +0330 Durity, Sean R via user 
 wrote ---




Yes, clean-up will reduce the disk space on the existing nodes by re-writing 
only the data that the node now owns into new sstables.

 

 

Sean R. Durity

DB Solutions

Staff Systems Engineer – Cassandra

 

From: Lapo Luchini  
 Sent: Friday, December 30, 2022 4:12 AM
 To: mailto:user@cassandra.apache.org
 Subject: [EXTERNAL] Re: Best compaction strategy for rarely used data

 

On 2022-12-29 21: 54, Durity, Sean R via user wrote: > At some point you will 
end up with large sstables (like 1 TB) that won’t > compact because there are 
not
 4 similar-sized ones able to be compacted Yes, that's exactly what's 
happening. 




 

INTERNAL USE


On 2022-12-29 21:54, Durity, Sean R via user wrote:

> At some point you will end up with large sstables (like 1 TB) that won’t 

> compact because there are not 4 similar-sized ones able to be compacted 

 

Yes, that's exactly what's happening.

 

I'll see maybe just one more compaction, since the biggest sstable is 

already more than 20% of residual free space.

 

> For me, the backup strategy shouldn’t drive the rest.

 

Mhh, yes, that makes sense.

 

> And if your data is ever-growing 

> and never deleted, you will be adding nodes to handle the extra data as 

> time goes by (and running clean-up on the existing nodes).

 

What will happen when adding new nodes, as you say, though?

If I have a 1GB sstable with 250GB of data that will be no longer useful 

(as a new node will be the new owner) will that sstable be reduced to 

750GB by "cleanup" or will it retain old data?

 

Thanks,

 

-- 

Lapo Luchini

mailto:l...@lapo.it

 

Fwd: Re: Cassandra uneven data repartition

2023-01-06 Thread onmstester onmstester via user
Isn't there a very big (>40GB) sstable in /volumes/cassandra/data/data1? If 
there is you could split it or change your data model to prevent such sstables.



Sent using https://www.zoho.com/mail/








 Forwarded message 
From: Loïc CHANEL via user 
To: 
Date: Fri, 06 Jan 2023 12:58:11 +0330
Subject: Re: Cassandra uneven data repartition
 Forwarded message 



Hi team,



Does anyone know how to even the data between several data disks ?

Another approach could be to prevent Cassandra from writing on a 90% full disk, 
but is there a way to do that ?

Thanks,





Loïc CHANEL
System Big Data engineer
SoftAtHome (Lyon, France)
























Le lun. 19 déc. 2022 à 11:07, Loïc CHANEL  
a écrit :





Hi team,



I had a disk space issue on a Cassandra server, and I noticed that the data was 
not evenly shared between my 15 disks.

Here is the repartition :

/dev/vde1        99G   89G  4.7G  96% /volumes/cassandra/data/data1
/dev/vdd1        99G   51G   44G  54% /volumes/cassandra/data/data2
/dev/vdf1        99G   57G   38G  61% /volumes/cassandra/data/data3
/dev/vdg1        99G   51G   44G  54% /volumes/cassandra/data/data4
/dev/vdh1        99G   50G   44G  54% /volumes/cassandra/data/data5
/dev/vdi1        99G   50G   44G  53% /volumes/cassandra/data/data6
/dev/vdj1        99G   77G   17G  83% /volumes/cassandra/data/data7
/dev/vdk1        99G   49G   45G  53% /volumes/cassandra/data/data8
/dev/vdl1        99G   52G   42G  56% /volumes/cassandra/data/data9
/dev/vdm1        99G   50G   45G  53% /volumes/cassandra/data/data10
/dev/vdn1        99G   47G   47G  51% /volumes/cassandra/data/data11
/dev/vdo1        99G   50G   44G  54% /volumes/cassandra/data/data12
/dev/vdp1        99G   52G   43G  55% /volumes/cassandra/data/data13
/dev/vdq1        99G   49G   45G  52% /volumes/cassandra/data/data14
/dev/vdr1        99G   50G   44G  53% /volumes/cassandra/data/data15



Do you know what could cause this, and how to even a little bit the data 
between the disks to avoid any disk saturation situation ? Because I noticed 
that when approaching 5% disk space on one disk, Cassandra performance drops.

Thanks,





Loïc CHANEL
System Big Data engineer
SoftAtHome (Lyon, France)