from:"Paul Chandler"

Re: Mixed Cluster 4.0 and 4.1

2024-04-24 Thread Paul Chandler

Hi Bowen,

Thanks for your quick reply. 

Sorry I used the wrong term there, there it is a maintenance window rather than 
an outage. This is a key system and the vital nature of it means that the 
customer is rightly very risk adverse, so we will only even get permission to 
upgrade one DC per night via a rolling upgrade, meaning this will always be 
over more than a week. 

So we can’t shorten the time the cluster is in mixed mode, but I am concerned 
about having a schema mismatch for this long time. Should I be concerned, or 
have others upgraded in a similar way?

Thanks

Paul

> On 24 Apr 2024, at 17:02, Bowen Song via user  
> wrote:
> 
> Hi Paul,
> 
> You don't need to plan for or introduce an outage for a rolling upgrade, 
> which is the preferred route. It isn't advisable to take down an entire DC to 
> do upgrade.
> 
> You should aim to complete upgrading the entire cluster and finish a full 
> repair within the shortest gc_grace_seconds (default to 10 days) of all 
> tables. Failing to do that may cause data resurrections.
> 
> During the rolling upgrade, you should not run repair or any DDL query (such 
> as ALTER TABLE, TRUNCATE, etc.).
> 
> You don't need to do the rolling upgrade node by node. You can do it rack by 
> rack. Stopping all nodes in a single rack and upgrade them concurrently is 
> much faster. The number of nodes doesn't matter that much to the time 
> required to complete a rolling upgrade, it's the number of DCs and racks 
> matter.
> 
> Cheers,
> Bowen
> 
> On 24/04/2024 16:16, Paul Chandler wrote:
>> Hi all,
>> 
>> We have some large clusters ( 1000+  nodes ), these are across multiple 
>> datacenters.
>> 
>> When we perform upgrades we would normally upgrade a DC at a time during a 
>> planned outage for one DC. This means that a cluster might be in a mixed 
>> mode with multiple versions for a week or 2.
>> 
>> We have noticed that during our testing that upgrading to 4.1 causes a 
>> schema mismatch due to the new tables added into the system keyspace.
>> 
>> Is this going to be an issue if this schema mismatch lasts for maybe several 
>> weeks? I assume that running any DDL during that time would be a bad idea, 
>> is there any other issues to look out for?
>> 
>> Thanks
>> 
>> Paul Chandler

Mixed Cluster 4.0 and 4.1

2024-04-24 Thread Paul Chandler

Hi all,

We have some large clusters ( 1000+  nodes ), these are across multiple 
datacenters. 

When we perform upgrades we would normally upgrade a DC at a time during a 
planned outage for one DC. This means that a cluster might be in a mixed mode 
with multiple versions for a week or 2.

We have noticed that during our testing that upgrading to 4.1 causes a schema 
mismatch due to the new tables added into the system keyspace.

Is this going to be an issue if this schema mismatch lasts for maybe several 
weeks? I assume that running any DDL during that time would be a bad idea, is 
there any other issues to look out for?

Thanks

Paul Chandler

Re: Best compaction strategy for rarely used data

2022-12-29 Thread Paul Chandler

Hi Lapo

Take a look at TWCS, I think that could help your use case: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

Regards 

Paul Chandler

Sent from my iPhone

> On 29 Dec 2022, at 08:55, Lapo Luchini  wrote:
> 
> Hi, I have a table which gets (a lot of) data that is written once and very 
> rarely read (it is used for data that is mandatory for regulatory reasons), 
> and almost never deleted.
> 
> I'm using the default SCTS as at the time I didn't know any better, but 
> SSTables size are getting huge, which is a problem because they both are 
> getting to the size of the available disk and both because I'm using a 
> snapshot-based system to backup the node (and thus compacting a huge SSTable 
> into an even bigger one generates a lot of traffic for mostly-old data).
> 
> I'm thinking about switching to LCS (mainly to solve the size issue), but I 
> read that it is "optimized for read heavy workloads […] not a good choice for 
> immutable time series data". Given that I don't really care about write nor 
> read speed, but would like SSTables size to have a upper limit, would this 
> strategy still be the best?
> 
> PS: Googling around a strategy called "incremental compaction" (ICS) keeps 
> getting in results, but that's only available in ScyllaDB, right?
> 
> -- 
> Lapo Luchini
> l...@lapo.it
>

Re: Wrong Consistency level seems to be used

2022-07-21 Thread Paul Chandler

I came across this problem a few years ago, and had long conversations with 
Datastax support about it. 

In my case it turns out that the error message is misleading and I was pointed 
to the ticket: https://issues.apache.org/jira/browse/CASSANDRA-14715

I con’t remember much about it now,  but see if that ticket applies to your 
experience. 

Thanks 

Paul Chandler

> On 21 Jul 2022, at 15:12, pwozniak  wrote:
> 
> Yes, I did it. Nothing like this in my code. Consistency level is set only in 
> one place (shown below).
> 
> 
> 
> On 7/21/22 4:08 PM, manish khandelwal wrote:
>> Consistency can also be set on a statement basis. So please check in your 
>> code that you might be setting consistency 'ALL' for some queries.
>> 
>> On Thu, Jul 21, 2022 at 7:23 PM pwozniak > <mailto:pwozn...@man.poznan.pl>> wrote:
>> Hi,
>> 
>> we have the following code (java driver):
>> 
>> cluster = Cluster.builder().addContactPoints(contactPoints).withPort(port)
>> .withProtocolVersion(ProtocolVersion.V3)
>> .withQueryOptions(new QueryOptions()
>> .setConsistencyLevel(ConsistencyLevel.QUORUM))
>> .withTimestampGenerator(new AtomicMonotonicTimestampGenerator())
>> .withCredentials(userName, password).build();
>> 
>> session = cluster.connect(keyspaceName);
>> 
>> where ConsistencyLevel.QUORUM is our default consistency level. But we keep 
>> receiving the following exceptions:
>> 
>> 
>> 
>> com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
>> during read query at consistency ALL (3 responses were required but only 2 
>> replica responded)
>> 
>> 
>> 
>> Why the consistency level is ALL in there? Availability of our cluster is 
>> reduced because of that. We verified all our source code and haven't found 
>> places where ALL is set. 
>> We also did heap dump and found only ConsistencyLevel.QUORUM there.
>> 
>> 
>> 
>> Regards,
>> 
>> Pawel
>>

Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler

Hi Yifan,

It looks like you are right, I can reproduce this, when creating the second 
snapshot the ctime does get updated to the time of the second snapshot.

I guess this is what is causing tar to produce the error.

Paul 

> On 22 Mar 2022, at 17:12, Yifan Cai  wrote:
> 
> I am wondering if the cause is tarring when creating hardlinks, i.e. creating 
> a new snapshot. 
> 
> A quick experiment on my Mac indicates the file status (ctime) is updated 
> when creating hardlink. 
> 
> ➜ stat -f "Access (atime): %Sa%nModify (mtime): %Sm%nChange (ctime): %Sc" a
> Access (atime): Mar 22 10:03:43 2022
> Modify (mtime): Mar 22 10:03:43 2022
> Change (ctime): Mar 22 10:05:43 2022
> 
> On Tue, Mar 22, 2022 at 10:01 AM Jeff Jirsa  <mailto:jji...@gmail.com>> wrote:
> The most useful thing that folks can provide is an indication of what was 
> writing to those data files when you were doing backups.
> 
> It's almost certainly one of:
> - Memtable flush 
> - Compaction
> - Streaming from repair/move/bootstrap
> 
> If you have logs that indicate compaction starting/finishing with those 
> sstables, or memtable flushing those sstables, or if the .log file is 
> included in your backup, pasting the contents of that .log file into a ticket 
> will make this much easier to debug.
> 
> 
> 
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  <mailto:yc25c...@gmail.com>> wrote:
> I do not think there is a ticket already. Feel free to create one. 
> https://issues.apache.org/jira/projects/CASSANDRA/issues/ 
> <https://issues.apache.org/jira/projects/CASSANDRA/issues/>
> 
> It would be helpful to provide
> 1. The version of the cassandra
> 2. The options used for snapshotting 
> 
> - Yifan
> 
> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi all,
> 
> Was there any further progress made on this? Did a Jira get created?
> 
> I have been debugging our backup scripts and seem to have found the same 
> problem. 
> 
> As far as I can work out so far, it seems that this happens when a new 
> snapshot is created and the old snapshot is being tarred.
> 
> I get a similar message:
> 
> /bin/tar: 
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>  file changed as we read it
> 
> Thanks 
> 
> Paul 
> 
> 
> 
>> On 19 Mar 2022, at 02:41, Dinesh Joshi > <mailto:djo...@apache.org>> wrote:
>> 
>> Do you have a repro that you can share with us? If so, please file a jira 
>> and we'll take a look.
>> 
>>> On Mar 18, 2022, at 12:15 PM, James Brown >> <mailto:jbr...@easypost.com>> wrote:
>>> 
>>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>>> change, yes.
>>> 
>>> James Brown
>>> Infrastructure Architect @ easypost.com <http://easypost.com/>
>>> 
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa >> <mailto:jji...@gmail.com>> wrote:
>>>> This is nodetool snapshot yes? 3.11 or 4.0?
>>>> 
>>>> In versions prior to 3.0, sstables would be written with -tmp- in the 
>>>> name, then renamed when complete, so an sstable definitely never changed 
>>>> once it had the final file name. With the new transaction log mechanism, 
>>>> we use one name and a transaction log to note what's in flight and what's 
>>>> not, so if the snapshot system is including sstables being written (from 
>>>> flush, from compaction, or from streaming), those aren't final and should 
>>>> be skipped.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 18, 2022 at 11:46 AM James Brown >>> <mailto:jbr...@easypost.com>> wrote:
>>>> We use the boring combo of cassandra snapshots + tar to backup our 
>>>> cassandra nodes; every once in a while, we'll notice tar failing with the 
>>>> following:
>>>> 
>>>> tar: 
>>>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>>>>  file changed as we read it
>>>> 
>>>> I find this a bit perplexing; what would cause an sstable inside a 
>>>> snapshot to change? The only thing I can think of is an incremental repair 
>>>> changing the "repaired_at" flag on the sstable, but it seems like that 
>>>> should "un-share" the hardlinked sstable rather than running the risk of 
>>>> mutating a snapshot.
>>>> 
>>>> 
>>>> James Brown
>>>> Cassandra admin @ easypost.com <http://easypost.com/>
>

Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler

I will do a few more tests to see if I can pin point what is causing this, then 
I will create a Jira ticket.

This is actually a copy of a cluster that I am testing with, so the only writes 
happening to the cluster are internal ones, so I will be surprised if it is 
compaction or memtable flushes on the offending tables. There could be repairs 
going on on the cluster through.

This is a 4.0.0 cluster, but I think I have the same problem on a 3.11.6 
cluster, but not tested the 3.11.6 version yet.

So I will try and get as much detail together before creating the ticket.

Thanks 

Paul


> On 22 Mar 2022, at 17:01, Jeff Jirsa  wrote:
> 
> The most useful thing that folks can provide is an indication of what was 
> writing to those data files when you were doing backups.
> 
> It's almost certainly one of:
> - Memtable flush 
> - Compaction
> - Streaming from repair/move/bootstrap
> 
> If you have logs that indicate compaction starting/finishing with those 
> sstables, or memtable flushing those sstables, or if the .log file is 
> included in your backup, pasting the contents of that .log file into a ticket 
> will make this much easier to debug.
> 
> 
> 
> On Tue, Mar 22, 2022 at 9:49 AM Yifan Cai  <mailto:yc25c...@gmail.com>> wrote:
> I do not think there is a ticket already. Feel free to create one. 
> https://issues.apache.org/jira/projects/CASSANDRA/issues/ 
> <https://issues.apache.org/jira/projects/CASSANDRA/issues/>
> 
> It would be helpful to provide
> 1. The version of the cassandra
> 2. The options used for snapshotting 
> 
> - Yifan
> 
> On Tue, Mar 22, 2022 at 9:41 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi all,
> 
> Was there any further progress made on this? Did a Jira get created?
> 
> I have been debugging our backup scripts and seem to have found the same 
> problem. 
> 
> As far as I can work out so far, it seems that this happens when a new 
> snapshot is created and the old snapshot is being tarred.
> 
> I get a similar message:
> 
> /bin/tar: 
> var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
>  file changed as we read it
> 
> Thanks 
> 
> Paul 
> 
> 
> 
>> On 19 Mar 2022, at 02:41, Dinesh Joshi > <mailto:djo...@apache.org>> wrote:
>> 
>> Do you have a repro that you can share with us? If so, please file a jira 
>> and we'll take a look.
>> 
>>> On Mar 18, 2022, at 12:15 PM, James Brown >> <mailto:jbr...@easypost.com>> wrote:
>>> 
>>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>>> change, yes.
>>> 
>>> James Brown
>>> Infrastructure Architect @ easypost.com <http://easypost.com/>
>>> 
>>> On 2022-03-18 at 12:06:00, Jeff Jirsa >> <mailto:jji...@gmail.com>> wrote:
>>>> This is nodetool snapshot yes? 3.11 or 4.0?
>>>> 
>>>> In versions prior to 3.0, sstables would be written with -tmp- in the 
>>>> name, then renamed when complete, so an sstable definitely never changed 
>>>> once it had the final file name. With the new transaction log mechanism, 
>>>> we use one name and a transaction log to note what's in flight and what's 
>>>> not, so if the snapshot system is including sstables being written (from 
>>>> flush, from compaction, or from streaming), those aren't final and should 
>>>> be skipped.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 18, 2022 at 11:46 AM James Brown >>> <mailto:jbr...@easypost.com>> wrote:
>>>> We use the boring combo of cassandra snapshots + tar to backup our 
>>>> cassandra nodes; every once in a while, we'll notice tar failing with the 
>>>> following:
>>>> 
>>>> tar: 
>>>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>>>>  file changed as we read it
>>>> 
>>>> I find this a bit perplexing; what would cause an sstable inside a 
>>>> snapshot to change? The only thing I can think of is an incremental repair 
>>>> changing the "repaired_at" flag on the sstable, but it seems like that 
>>>> should "un-share" the hardlinked sstable rather than running the risk of 
>>>> mutating a snapshot.
>>>> 
>>>> 
>>>> James Brown
>>>> Cassandra admin @ easypost.com <http://easypost.com/>
>

Re: sstables changing in snapshots

2022-03-22 Thread Paul Chandler

Hi all,

Was there any further progress made on this? Did a Jira get created?

I have been debugging our backup scripts and seem to have found the same 
problem. 

As far as I can work out so far, it seems that this happens when a new snapshot 
is created and the old snapshot is being tarred.

I get a similar message:

/bin/tar: 
var/lib/cassandra/backup/keyspacename/tablename-4eec3b01aba811e896342351775ccc66/snapshots/csbackup_2022-03-22T14\\:04\\:05/nb-523601-big-Data.db:
 file changed as we read it

Thanks 

Paul 



> On 19 Mar 2022, at 02:41, Dinesh Joshi  wrote:
> 
> Do you have a repro that you can share with us? If so, please file a jira and 
> we'll take a look.
> 
>> On Mar 18, 2022, at 12:15 PM, James Brown > > wrote:
>> 
>> This in 4.0.3 after running nodetool snapshot that we're seeing sstables 
>> change, yes.
>> 
>> James Brown
>> Infrastructure Architect @ easypost.com 
>> 
>> On 2022-03-18 at 12:06:00, Jeff Jirsa > > wrote:
>>> This is nodetool snapshot yes? 3.11 or 4.0?
>>> 
>>> In versions prior to 3.0, sstables would be written with -tmp- in the name, 
>>> then renamed when complete, so an sstable definitely never changed once it 
>>> had the final file name. With the new transaction log mechanism, we use one 
>>> name and a transaction log to note what's in flight and what's not, so if 
>>> the snapshot system is including sstables being written (from flush, from 
>>> compaction, or from streaming), those aren't final and should be skipped.
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Mar 18, 2022 at 11:46 AM James Brown >> > wrote:
>>> We use the boring combo of cassandra snapshots + tar to backup our 
>>> cassandra nodes; every once in a while, we'll notice tar failing with the 
>>> following:
>>> 
>>> tar: 
>>> data/addresses/addresses-eb0196100b7d11ec852b1541747d640a/snapshots/backup20220318183708/nb-167-big-Data.db:
>>>  file changed as we read it
>>> 
>>> I find this a bit perplexing; what would cause an sstable inside a snapshot 
>>> to change? The only thing I can think of is an incremental repair changing 
>>> the "repaired_at" flag on the sstable, but it seems like that should 
>>> "un-share" the hardlinked sstable rather than running the risk of mutating 
>>> a snapshot.
>>> 
>>> 
>>> James Brown
>>> Cassandra admin @ easypost.com

Re: Cassandra 4.0 hanging on restart

2022-01-27 Thread Paul Chandler

Thanks Erick and Bowen

I do find all the different parameters for repairs confusing, and even reading 
up on it now, I see Datastax warns against incremental repairs with -pr, but 
then the code here seems to negate the need for this warning.

Anyway running it like this, produces data in the system.repairs table, so I 
assume it is doing incremental repairs.

nodetool -h localhost -p 7199 repair -pr  -st +02596488670266845384 -et 
+02613877898679419724

Then running it like this produces no data in the table, so again assuming that 
means it is full repairs

nodetool -h localhost -p 7199 repair -pr -full-st +02596488670266845384 -et 
+02613877898679419724

Yesterday I recompiled the Cassandra 4.0.0 code with extra logging in the 
following method

https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L338
 


This showed me that the extra 10 minutes ( and more ) on some clusters is being 
taken up in the for loop reading the rows from the system.repairs table. 

So this does seem to be issue if you are trying to do incremental range repairs 
in 4.0

Thanks 

Paul 

> On 27 Jan 2022, at 10:27, Bowen Song  wrote:
> 
> Hi Erick,
> 
> 
> 
> From the source code: 
> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/StorageService.java#L4042
>  
> 
> The -pr option has no effect if -st and -et are specified. Therefore, the 
> command results in an incremental repair.
> 
> 
> 
> Cheers,
> 
> Bowen
> 
> On 27/01/2022 01:32, Erick Ramirez wrote:
>> I just came across this thread and noted that you're running repairs with 
>> -pr which are not incremental repairs. Was that a typo? Cheers!

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Paul Chandler

We don’t expose the JMX port outside localhost, so last time we looked it was 
not possible, I see now there is the sidecar option, but that sounds like there 
are number of caveats, particularly around resources, that may cause some 
issues with our setup. So at the moment reaper does not seem like a quick win 
for us.

Thanks 

Paul

 

> On 26 Jan 2022, at 10:33, Bowen Song  wrote:
> 
> I'm glad that it fixed the problem. Now, may I interest you with Cassandra 
> Reaper <http://cassandra-reaper.io/>? In my experience it has managed the 
> load fairly well on large clusters.
> 
> On 26/01/2022 10:19, Paul Chandler wrote:
>> I changed the the range repair to be full repair, reset the repairedAt for 
>> all SSTables and deleted the old data out of the system.repairs table. 
>> 
>> This then did not create any new rows in the system.repairs table, and the 
>> node was able to restart without any problem, so this seems to be a solution
>> 
>> I am concerned about turning on full repair for some of our cluster and what 
>> that will do to the load, so I am now going to experiment with larger 
>> incremental range repairs to see if there is a sweet spot where this works 
>> ok.
>> 
>> Paul
>> 
>>> On 25 Jan 2022, at 13:36, Bowen Song mailto:bo...@bso.ng>> 
>>> wrote:
>>> 
>>> That would indicate the "isSuperseded(session)" call returned false. After 
>>> looking at the source code, it seems the subrange incremental repair is 
>>> likely causing this.
>>> 
>>> Would you mind to try either subrange full repair or full range incremental 
>>> repair? You may need to reset the "repairedAt" value in all SSTables using 
>>> the "sstablerepairedset" tool if you decide to move on to use subrange full 
>>> repairs.
>>> 
>>> 
>>> 
>>> On 25/01/2022 12:39, Paul Chandler wrote:
>>>> Hi Bowen,
>>>> 
>>>> Yes there are a large number of "Skipping delete of FINALIZED 
>>>> LocalSession” messages.
>>>> 
>>>> We have a script that repairs ranges, stepping through the complete range 
>>>> in 5 days, this should create 1600 ranges over the 5 days, this runs 
>>>> commands like this:
>>>> 
>>>> nodetool -h localhost -p 7199 repair -pr  -st +09152150533683801432 -et 
>>>> +09154639946886262655
>>>> 
>>>> I am also seeing lots of "Auto deleting repair session LocalSession” 
>>>> messages - these seem to be deleting the rows with a repairedAt value of 
>>>> more than 5 days, so it seems like that part is working correctly, but 
>>>> just taking 5 days to delete them. 
>>>> 
>>>> Thanks 
>>>> 
>>>> Paul
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On 24 Jan 2022, at 22:12, Bowen Song mailto:bo...@bso.ng>> 
>>>>> wrote:
>>>>> 
>>>>> From the source code I've read, by default Cassandra will run a clean up 
>>>>> for the system.repairs table every 10 minutes, any row related to a 
>>>>> repair that has completed over 1 day ago will be automatically removed. I 
>>>>> highly doubt that you have ran 75,000 repairs in the 24 hours prior to 
>>>>> shutting down that node, because that's nearly one repair every second.
>>>>> 
>>>>> Do you see any logs like these?
>>>>> 
>>>>> Auto failing timed out repair session...
>>>>> Skipping delete of FINALIZED LocalSession ... because it has not been 
>>>>> superseded by a more recent session
>>>>> Skipping delete of LocalSession ... because it still contains sstables
>>>>> They are the logs from the cleanup() method in 
>>>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416
>>>>>  
>>>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416>
>>>>>  which indicates a record was not deleted during the cleaned up for a 
>>>>> number of reasons.
>>>>> 
>>>>> On 24/01/2022 19:45, Paul Chandler wrote:
>>>>>> Hi Bowen,
>>>>>> 
>>>>>> Yes, there does seem to be a lot of rows, on one of the upgraded 
>>>>>> c

Re: Cassandra 4.0 hanging on restart

2022-01-26 Thread Paul Chandler

I changed the the range repair to be full repair, reset the repairedAt for all 
SSTables and deleted the old data out of the system.repairs table. 

This then did not create any new rows in the system.repairs table, and the node 
was able to restart without any problem, so this seems to be a solution

I am concerned about turning on full repair for some of our cluster and what 
that will do to the load, so I am now going to experiment with larger 
incremental range repairs to see if there is a sweet spot where this works ok.

Paul

> On 25 Jan 2022, at 13:36, Bowen Song  wrote:
> 
> That would indicate the "isSuperseded(session)" call returned false. After 
> looking at the source code, it seems the subrange incremental repair is 
> likely causing this.
> 
> Would you mind to try either subrange full repair or full range incremental 
> repair? You may need to reset the "repairedAt" value in all SSTables using 
> the "sstablerepairedset" tool if you decide to move on to use subrange full 
> repairs.
> 
> 
> 
> On 25/01/2022 12:39, Paul Chandler wrote:
>> Hi Bowen,
>> 
>> Yes there are a large number of "Skipping delete of FINALIZED LocalSession” 
>> messages.
>> 
>> We have a script that repairs ranges, stepping through the complete range in 
>> 5 days, this should create 1600 ranges over the 5 days, this runs commands 
>> like this:
>> 
>> nodetool -h localhost -p 7199 repair -pr  -st +09152150533683801432 -et 
>> +09154639946886262655
>> 
>> I am also seeing lots of "Auto deleting repair session LocalSession” 
>> messages - these seem to be deleting the rows with a repairedAt value of 
>> more than 5 days, so it seems like that part is working correctly, but just 
>> taking 5 days to delete them. 
>> 
>> Thanks 
>> 
>> Paul
>> 
>> 
>> 
>> 
>>> On 24 Jan 2022, at 22:12, Bowen Song mailto:bo...@bso.ng>> 
>>> wrote:
>>> 
>>> From the source code I've read, by default Cassandra will run a clean up 
>>> for the system.repairs table every 10 minutes, any row related to a repair 
>>> that has completed over 1 day ago will be automatically removed. I highly 
>>> doubt that you have ran 75,000 repairs in the 24 hours prior to shutting 
>>> down that node, because that's nearly one repair every second.
>>> 
>>> Do you see any logs like these?
>>> 
>>> Auto failing timed out repair session...
>>> Skipping delete of FINALIZED LocalSession ... because it has not been 
>>> superseded by a more recent session
>>> Skipping delete of LocalSession ... because it still contains sstables
>>> They are the logs from the cleanup() method in 
>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416
>>>  
>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416>
>>>  which indicates a record was not deleted during the cleaned up for a 
>>> number of reasons.
>>> 
>>> On 24/01/2022 19:45, Paul Chandler wrote:
>>>> Hi Bowen,
>>>> 
>>>> Yes, there does seem to be a lot of rows, on one of the upgraded clusters 
>>>> there 75,000 rows.
>>>> 
>>>> I have been experimenting on a test cluster, this has about a 5 minute 
>>>> pause, and around 15,000 rows. 
>>>> 
>>>> If I clear the system.repairs table ( by deleting the sstables ) then this 
>>>> does not pause at all, so seems to fix the problem. However I don’t really 
>>>> understand what the implications are of just removing that data.
>>>> 
>>>> Thanks 
>>>> 
>>>> Paul
>>>> 
>>>>> On 24 Jan 2022, at 18:50, Bowen Song mailto:bo...@bso.ng>> 
>>>>> wrote:
>>>>> 
>>>>> Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a 
>>>>> slow starting node, do you get a lots of rows? This is the most obvious 
>>>>> loop run (indirectly) by the ActiveRepairService.start(). 
>>>>> 
>>>>> On 24/01/2022 13:30, Romain Anselin wrote:
>>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> We generated a JFR profile of the startup phase of Cassandra with Paul, 
>>>>>> and it would appear that the time is spent in the ActiveRepairSe

Re: Unsolicited emails from IRONMAN Monttremblant

2022-01-25 Thread Paul Chandler

Could it be that email address is in the  user group? If so your email could 
have triggered that automatic response, however I have not received anything 
after my recent emails 

There was an email on the dev list on 14/4/2020 that said the following, so 
they do seem to have had an interest in Cassandra in the past:


--

PLEASE delete my email from your communication, please
monttrembl...@ironman.com 



Merci

JACYNTHE



Coordonnatrice, Service aux Athlètes

Coordinator, Athlete Services



[http://ironmanmonttremblantmedia.com/wp-content/uploads/2018/05/signatureIronman2018.jpg
 
]

---

> On 25 Jan 2022, at 13:23, Bowen Song  wrote:
> 
> Hello everyone,
> 
> 
> 
> Did anyone who has been participating discussions in this mailing list 
> receive unsolicited emails from the following email address?
> 
> IRONMAN Monttremblant  
> 
> The emails have subject line looks like this:
> 
> A ticket has been opened for your request [ TKT--] ref:##
> 
> I've noticed this since 18th January, and it took me a while to find the 
> connection between sending emails to this mailing list and receiving one of 
> those unsolicited emails. I want to know whether this is affecting other 
> peoples, and I would also like to know what's the best practice dealing with 
> this?
> 
> BTW, I have attempted to contact the sender and the company directly since 
> the 18th, but I have yet to receive any response. So I doubt this route is 
> going to help resolve the issue.
> 
> 
> 
> Regards,
> 
> Bowen
>

Re: Cassandra 4.0 hanging on restart

2022-01-25 Thread Paul Chandler

Hi Bowen,

Yes there are a large number of "Skipping delete of FINALIZED LocalSession” 
messages.

We have a script that repairs ranges, stepping through the complete range in 5 
days, this should create 1600 ranges over the 5 days, this runs commands like 
this:

nodetool -h localhost -p 7199 repair -pr  -st +09152150533683801432 -et 
+09154639946886262655

I am also seeing lots of "Auto deleting repair session LocalSession” messages - 
these seem to be deleting the rows with a repairedAt value of more than 5 days, 
so it seems like that part is working correctly, but just taking 5 days to 
delete them. 

Thanks 

Paul




> On 24 Jan 2022, at 22:12, Bowen Song  wrote:
> 
> From the source code I've read, by default Cassandra will run a clean up for 
> the system.repairs table every 10 minutes, any row related to a repair that 
> has completed over 1 day ago will be automatically removed. I highly doubt 
> that you have ran 75,000 repairs in the 24 hours prior to shutting down that 
> node, because that's nearly one repair every second.
> 
> Do you see any logs like these?
> 
> Auto failing timed out repair session...
> Skipping delete of FINALIZED LocalSession ... because it has not been 
> superseded by a more recent session
> Skipping delete of LocalSession ... because it still contains sstables
> They are the logs from the cleanup() method in 
> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416
>  
> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/repair/consistent/LocalSessions.java#L416>
>  which indicates a record was not deleted during the cleaned up for a number 
> of reasons.
> 
> On 24/01/2022 19:45, Paul Chandler wrote:
>> Hi Bowen,
>> 
>> Yes, there does seem to be a lot of rows, on one of the upgraded clusters 
>> there 75,000 rows.
>> 
>> I have been experimenting on a test cluster, this has about a 5 minute 
>> pause, and around 15,000 rows. 
>> 
>> If I clear the system.repairs table ( by deleting the sstables ) then this 
>> does not pause at all, so seems to fix the problem. However I don’t really 
>> understand what the implications are of just removing that data.
>> 
>> Thanks 
>> 
>> Paul
>> 
>>> On 24 Jan 2022, at 18:50, Bowen Song mailto:bo...@bso.ng>> 
>>> wrote:
>>> 
>>> Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a slow 
>>> starting node, do you get a lots of rows? This is the most obvious loop run 
>>> (indirectly) by the ActiveRepairService.start(). 
>>> 
>>> On 24/01/2022 13:30, Romain Anselin wrote:
>>>> 
>>>> Hi everyone,
>>>> 
>>>> We generated a JFR profile of the startup phase of Cassandra with Paul, 
>>>> and it would appear that the time is spent in the ActiveRepairSession 
>>>> within the main thread (11mn of execution of the "main" thread in his 
>>>> environment, vs 15s in mine), which has been introduced in CASSANDRA-9143 
>>>> based on a "view git blame" of the source code
>>>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381
>>>>  
>>>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381>
>>>> 
>>>> That seem to match with the gap we see in the logs, where the time is 
>>>> spent just before the "Preloaded" statement in the logs which comes just 
>>>> after in the CassandraDaemon code.
>>>> 
>>>> INFO  [main] 2022-01-20 09:44:40,198  StorageService.java:830 - Token 
>>>> metadata: Normal Tokens:
>>>> ...  ...
>>>> WARN  [Messaging-EventLoop-3-1] 2022-01-20 09:45:13,243  
>>>> NoSpamLogger.java:95 - /IP1:7000->/IP1:7000-SMALL_MESSAGES-[no-channel] 
>>>> dropping message of type SCHEMA_VERSION_REQ whose timeout expired before 
>>>> reaching the network
>>>> INFO  [main] 2022-01-20 09:55:01,134  QueryProcessor.java:150 - Preloaded 
>>>> 0 prepared statements
>>>> 
>>>> Remains to determine what that is doing more in details and why it's 
>>>> taking longer and longer on startup. 
>>>> 
>>>> We also exported the sstablemetadata from all sstables in one node, and at 
>>>> this stage, we can see we have 300 sstables o

Re: Cassandra 4.0 hanging on restart

2022-01-24 Thread Paul Chandler

Hi Bowen,

Yes, there does seem to be a lot of rows, on one of the upgraded clusters there 
75,000 rows.

I have been experimenting on a test cluster, this has about a 5 minute pause, 
and around 15,000 rows. 

If I clear the system.repairs table ( by deleting the sstables ) then this does 
not pause at all, so seems to fix the problem. However I don’t really 
understand what the implications are of just removing that data.

Thanks 

Paul

> On 24 Jan 2022, at 18:50, Bowen Song  wrote:
> 
> Hmm, interesting... Try "select * from system.repairs;" in cqlsh on a slow 
> starting node, do you get a lots of rows? This is the most obvious loop run 
> (indirectly) by the ActiveRepairService.start(). 
> 
> On 24/01/2022 13:30, Romain Anselin wrote:
>> 
>> Hi everyone,
>> 
>> We generated a JFR profile of the startup phase of Cassandra with Paul, and 
>> it would appear that the time is spent in the ActiveRepairSession within the 
>> main thread (11mn of execution of the "main" thread in his environment, vs 
>> 15s in mine), which has been introduced in CASSANDRA-9143 based on a "view 
>> git blame" of the source code
>> https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381
>>  
>> <https://github.com/apache/cassandra/blob/6709111ed007a54b3e42884853f89cabd38e4316/src/java/org/apache/cassandra/service/CassandraDaemon.java#L381>
>> 
>> That seem to match with the gap we see in the logs, where the time is spent 
>> just before the "Preloaded" statement in the logs which comes just after in 
>> the CassandraDaemon code.
>> 
>> INFO  [main] 2022-01-20 09:44:40,198  StorageService.java:830 - Token 
>> metadata: Normal Tokens:
>> ...  ...
>> WARN  [Messaging-EventLoop-3-1] 2022-01-20 09:45:13,243  
>> NoSpamLogger.java:95 - /IP1:7000->/IP1:7000-SMALL_MESSAGES-[no-channel] 
>> dropping message of type SCHEMA_VERSION_REQ whose timeout expired before 
>> reaching the network
>> INFO  [main] 2022-01-20 09:55:01,134  QueryProcessor.java:150 - Preloaded 0 
>> prepared statements
>> 
>> Remains to determine what that is doing more in details and why it's taking 
>> longer and longer on startup. 
>> 
>> We also exported the sstablemetadata from all sstables in one node, and at 
>> this stage, we can see we have 300 sstables out of 577 with "Repaired at:" 
>> set.
>> 
>> cd /var/lib/cassandra/data
>> find . -name '*Data*' | while read datf; do echo  $datf ; sudo -u 
>> cassandra sstablemetadata $datf; done >> ~/sstablemetadata.txt
>> cqlsh -e "paging off; select * from system.repairs" >> ~/repairs.out
>> 
>> $ egrep 'Repaired at: 1' sstablemetadata.txt | wc -l
>> 300
>> $ egrep 'Repaired at:' sstablemetadata.txt | wc -l
>> 577
>> 
>> More info to come
>> 
>> Regards - Romain
>> 
>> On 19/01/2022 13:10, Paul Chandler wrote:
>>> Hi Bowen,
>>> 
>>> Thanks for the reply, these have been our normal shutdowns, so we do a 
>>> nodetool drain before restarting the service, so I would have thought there 
>>> should not be any commtlogs
>>> 
>>> However there is these messages for one commit log, But looks like it has 
>>> finished quickly and correctly:
>>> 
>>> INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
>>> /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>> WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - Origin 
>>> of 2 sstables is unknown or doesn't match the local node; 
>>> commitLogIntervals for them were ignored
>>> Repeated about 10 times
>>> WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - Origin 
>>> of 3 sstables is unknown or doesn't match the local node; 
>>> commitLogIntervals for them were ignored
>>> INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - Finished 
>>> reading /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
>>> INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay 
>>> complete, 0 replayed mutations 
>>> 
>>> Thanks 
>>> 
>>> Paul
>>> 
>>>> On 19 Jan 2022, at 13:03, Bowen Song  <mailto:bo...@bso.ng> 
>>>> wrote:
>>>> 
>>>> Nothing obvious from the logs you posted.
>>>> 
>>>> Generally speaking, replaying commit log is often the culprit when a node 
>>>> takes a long time to start. I h

Re: Cassandra 4.0 hanging on restart

2022-01-19 Thread Paul Chandler

Hi Bowen,

Thanks for the reply, these have been our normal shutdowns, so we do a nodetool 
drain before restarting the service, so I would have thought there should not 
be any commtlogs

However there is these messages for one commit log, But looks like it has 
finished quickly and correctly:

INFO  [main] 2022-01-19 10:08:22,811  CommitLog.java:173 - Replaying 
/var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
WARN  [main] 2022-01-19 10:08:22,839  CommitLogReplayer.java:305 - Origin of 2 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
Repeated about 10 times
WARN  [main] 2022-01-19 10:08:22,842  CommitLogReplayer.java:305 - Origin of 3 
sstables is unknown or doesn't match the local node; commitLogIntervals for 
them were ignored
INFO  [main] 2022-01-19 10:08:22,853  CommitLogReader.java:256 - Finished 
reading /var/lib/cassandra/commitlog/CommitLog-7-1642094921295.log
INFO  [main] 2022-01-19 10:08:22,882  CommitLog.java:175 - Log replay complete, 
0 replayed mutations 

Thanks 

Paul

> On 19 Jan 2022, at 13:03, Bowen Song  wrote:
> 
> Nothing obvious from the logs you posted.
> 
> Generally speaking, replaying commit log is often the culprit when a node 
> takes a long time to start. I have seen many nodes with large memtable and 
> commit log size limit spending over half an hour replaying the commit log. I 
> usually do a "nodetool flush" before shutting down the node to help speed up 
> the start time if the shutdown was planned. There isn't much you can do about 
> unexpected shutdown, such as server crashes. When that happens, the only 
> reasonable thing to do is wait for the commit log replay to finish. You 
> should see log entries related to replaying commit logs if this is the case.
> 
> However, if you don't find any logs related to replaying commit logs, the 
> cause may be completely different.
> 
> 
> On 19/01/2022 11:54, Paul Chandler wrote:
>> Hi all,
>> 
>> We have upgraded a couple of clusters from 3.11.6, now we are having issues 
>> when we restart the nodes.
>> 
>> The node will either hang or take 10-30 minute to restart, these are the 
>> last messages we have in the system.log:
>> 
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 - 
>> Deleting file during startup: 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
>> INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  LogTransaction.java:240 
>> - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
>> 8854746758.log
>> INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  
>> LogTransaction.java:240 - Unfinished transaction log, deleting 
>> /var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log
>> 
>> The debug log has messages from DiskBoundaryManager.java at the same time, 
>> then it just has the following messages:||
>> 
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
>> Checking whether certificates have been updated []
>> 
>> 
>> It seems to get worse after each restart, and then it gets to the state 
>> where it just hangs, then the only thing to do is to re bootstrap the node.
>> 
>> Once I had re bootstrapped all the nodes in the cluster, I thought the 
>> cluster was stable, but I have now got the case where the one of the nodes 
>> is hanging again.
>> 
>> Does anyone have an ideas what is causing the problems ?
>> 
>> 
>> Thanks
>> 
>> Paul Chandler
>>

Cassandra 4.0 hanging on restart

2022-01-19 Thread Paul Chandler

Hi all,

We have upgraded a couple of clusters from 3.11.6, now we are having issues 
when we restart the nodes.

The node will either hang or take 10-30 minute to restart, these are the last 
messages we have in the system.log:

INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,267  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-184-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,268  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-185-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,269  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Data.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,270  FileUtils.java:545 - 
Deleting file during startup: 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb-186-big-Summary.db
INFO  [NonPeriodicTasks:1] 2022-01-19 10:08:23,272  LogTransaction.java:240 - 
Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/table_estimates-176c39cdb93d33a5a2188eb06a56f66e/nb_txn_unknowncompactiontype_bc501d00-790f-11ec-9f80-85
8854746758.log
INFO  [MemtableFlushWriter:2] 2022-01-19 10:08:23,289  LogTransaction.java:240 
- Unfinished transaction log, deleting 
/var/lib/cassandra/data/system/local-7ad54392bcdd35a684174e047860b377/nb_txn_flush_bc52dc20-790f-11ec-9f80-858854746758.log

The debug log has messages from DiskBoundaryManager.java at the same time, then 
it just has the following messages:||

DEBUG [ScheduledTasks:1] 2022-01-19 10:28:09,430  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:38:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:48:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []
DEBUG [ScheduledTasks:1] 2022-01-19 10:58:09,431  SSLFactory.java:354 - 
Checking whether certificates have been updated []


It seems to get worse after each restart, and then it gets to the state where 
it just hangs, then the only thing to do is to re bootstrap the node. 

Once I had re bootstrapped all the nodes in the cluster, I thought the cluster 
was stable, but I have now got the case where the one of the nodes is hanging 
again. 

Does anyone have an ideas what is causing the problems ? 


Thanks 

Paul Chandler

Re: Anyone connecting the Cassandra on a server

2021-11-19 Thread Paul Chandler

I wrote a blog post describing how to do this a few years ago: 
http://www.redshots.com/who-is-connecting-to-a-cassandra-cluster/


Sent from my iPhone

> On 19 Nov 2021, at 18:13, Saha, Sushanta K 
>  wrote:
> 
> 
> I need to shutdown an old Apache Cassandra server for good. Running 3.0.x. 
> Any way I can determine if anyone is still connecting to the Cassandra 
> instance running on this server?
> 
> Thanks
>  Sushanta
>

Hint file getting stuck

2021-11-15 Thread Paul Chandler

Hi all

We keep having a problem with hint files on one of our Cassandra nodes (v 
3.11.6 ), there keeps being the following error messages repeated for same file.

INFO [HintsDispatcher:25] 2021-11-02 08:55:29,830 
HintsDispatchExecutor.java:289 - Finished hinted handoff of file 
72a18469-b7d2-499a-aed3-fd4e2cda9678-1635838529279-1.hints to endpoint 
/10.29.49.210: 72a18469-b7d2-499a-aed3-fd4e2cda9678, partially
INFO [HintsDispatcher:24] 2021-11-02 08:55:39,812 
HintsDispatchExecutor.java:289 - Finished hinted handoff of file 
72a18469-b7d2-499a-aed3-fd4e2cda9678-1635838529279-1.hints to endpoint 
/10.29.49.210: 72a18469-b7d2-499a-aed3-fd4e2cda9678, partially
INFO [HintsDispatcher:25] 2021-11-02 08:55:49,822 
HintsDispatchExecutor.java:289 - Finished hinted handoff of file 
72a18469-b7d2-499a-aed3-fd4e2cda9678-1635838529279-1.hints to endpoint 
/10.29.49.210: 72a18469-b7d2-499a-aed3-fd4e2cda9678, partially

On the receiving node ( cassandra0 ) we see the CPU shoot up, this is how 
notice we have a problem.

This has happened serval times with different files, and we find the only way 
to stop this is to delete the offending hint files.

The cluster can be a bit overloaded, and this is what is causing the hint files 
to be generated in the first place, we are working to get that stopped,  
However the question I don’t know the answer to is what causing this 
“partially” hint processing and how can we stop it happening?

Thanks

Paul

Re: TTL and disk space releasing

2021-10-06 Thread Paul Chandler

Hi Michael, 

I have had similar problems in the past, and found this Last Pickle post very 
useful: https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

This should help you pinpoint what is stopping the SSTables being deleted.

Assuming you are never manually deleting records from the table then there is 
no need to have a large gc_grace_seconds, as a large one is there to ensure 
tombstones are replicated correctly, and you won’t have any tombstones to worry 
about.

If you are doing manual deletes, then that could be the cause of the issue, I 
wrote a post here about why that would be an issue: 
http://www.redshots.com/cassandra-twcs-must-have-ttls/

After reading these if you are still having problems please let us know. 

Thanks 

Paul

 

> On 6 Oct 2021, at 09:42, Michel Barret  wrote:
> 
> Hello,
> 
> I try to use cassandra (3.11.5) with 8 nodes (in single datacenter). I use 
> one simple table, all data are inserted with 31 days TTL (the data are never 
> updated).
> 
> I use the TWCS strategy with:
> - 'compaction_window_size': '24'
> - 'compaction_window_unit': 'HOURS'
> - 'max_threshold': '32'
> - 'min_threshold': '4'
> 
> Each node run one time by week a 'nodetool repair' and our gc_grace_seconds 
> is set to 10 days.
> 
> I track the storage of nodes and the partition used for cassandra data (only 
> use for this) is consuming to ~40% after one month.
> 
> But cassandra consume continuously more space, if I read the sstables with 
> sstabledump I find very old tombstones like it :
> 
> "liveness_info" : { "tstamp" : "2021-07-26T08:15:00.092897Z", "ttl" : 
> 2678400, "expires_at" : "2021-08-26T08:15:00Z", "expired" : true }
> 
> I don't understand why this tombstone isn't erased. I believe that I apply 
> all I found on internet without improvement.
> 
> Anybody had a clue to fix my problem?
> 
> Have a nice day

Re: Cassandra 4.0 and python

2021-04-29 Thread Paul Chandler

Thanks Kane,

If anyone else is interested in this, I created a Jira ticket : 
https://issues.apache.org/jira/browse/CASSANDRA-16641 
<https://issues.apache.org/jira/browse/CASSANDRA-16641> but the response is 
that 3.6 is the minimum official supported version, although 2.7 still should 
work. This is why the Debian packaging was changed. 

Thanks 

Paul  



> On 29 Apr 2021, at 00:27, Kane Wilson  wrote:
> 
> No, I suspect the deb package dependencies haven't been updated correctly, as 
> 2.7 should definitely still work. Could you raise a JIRA for this issue?
> 
> Not sure if apt has some way to force install/ignore dependencies, however if 
> you do that it may work, otherwise your only workaround would be to install 
> from the tarball.
> 
> raft.so <https://raft.so/> - Cassandra consulting, support, and managed 
> services
> 
> 
> On Thu, Apr 29, 2021 at 2:24 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi all,
> 
> We have been testing with 4.0~beta2 in our setup for a few weeks and all has 
> gone very smoothly, however when tried to install 4.0~rc1 we ran into 
> problems with python versions.
> 
> We are on Ubuntu 16.04.7 LTS so use apt to install Cassandra, and this now 
> gives the following error:
> 
> The following packages have unmet dependencies:
>  cassandra : Depends: python3 (>= 3.6) but 3.5.1-3 is to be installed
> E: Unable to correct problems, you have held broken packages.
> 
> Looking at the apt packaging the requirement for python has changed from 2.7 
> to 3.6 between beta4 and rc1. 
> 
> I have found https://issues.apache.org/jira/browse/CASSANDRA-16396 
> <https://issues.apache.org/jira/browse/CASSANDRA-16396> which says it needed 
> to be python 3.6, however reading this ticket this seems to imply 2.7 is 
> still supported https://issues.apache.org/jira/browse/CASSANDRA-15659 
> <https://issues.apache.org/jira/browse/CASSANDRA-15659>
> 
> Also the code for for cqlsh says it supports 2.7 as well:  
> https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65
>  
> <https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65>
> 
> All our clusters are currently on Ubuntu 16.04 which does not come with 
> python 3.6, so this is going to be a major pain to upgrade them to 4.0.
> 
> Does the apt packaging really need to specify 3.6 ?
> 
> Thanks 
> 
> Paul Chandler

Cassandra 4.0 and python

2021-04-28 Thread Paul Chandler

Hi all,

We have been testing with 4.0~beta2 in our setup for a few weeks and all has 
gone very smoothly, however when tried to install 4.0~rc1 we ran into problems 
with python versions.

We are on Ubuntu 16.04.7 LTS so use apt to install Cassandra, and this now 
gives the following error:

The following packages have unmet dependencies:
 cassandra : Depends: python3 (>= 3.6) but 3.5.1-3 is to be installed
E: Unable to correct problems, you have held broken packages.

Looking at the apt packaging the requirement for python has changed from 2.7 to 
3.6 between beta4 and rc1. 

I have found https://issues.apache.org/jira/browse/CASSANDRA-16396 
<https://issues.apache.org/jira/browse/CASSANDRA-16396> which says it needed to 
be python 3.6, however reading this ticket this seems to imply 2.7 is still 
supported https://issues.apache.org/jira/browse/CASSANDRA-15659 
<https://issues.apache.org/jira/browse/CASSANDRA-15659>

Also the code for for cqlsh says it supports 2.7 as well:  
https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65
 
<https://github.com/apache/cassandra/blob/b0c50c10dbc443a05662b111a971a65cafa258d5/bin/cqlsh#L65>

All our clusters are currently on Ubuntu 16.04 which does not come with python 
3.6, so this is going to be a major pain to upgrade them to 4.0.

Does the apt packaging really need to specify 3.6 ?

Thanks 

Paul Chandler

Re: No node was available to execute query error

2021-03-12 Thread Paul Chandler

Hi Joe

This could also be caused by the replication factor of the keyspace, if you 
have NetworkTopologyStrategy and it doesn’t list a replication factor for the 
datacenter datacenter1 then you will get this error message too. 

Paul

> On 12 Mar 2021, at 13:07, Erick Ramirez  wrote:
> 
> Does it get returned by the driver every single time? The 
> NoNodeAvailableException gets thrown when (1) all nodes are down, or (2) all 
> the contact points are invalid from the driver's perspective.
> 
> Is it possible there's no route/connectivity from your app server(s) to the 
> 172.16.x.x network? If you post the full error message + full stacktrace, it 
> might provide clues. Cheers!

Re: Cassandra 4.0 and changing DC setting

2021-02-22 Thread Paul Chandler

Yes, I am only running this on test clusters, I don’t run anything like this 
without lots of tests first.

Anyway this worked well, so thanks for the info.

For anyone else who needs this, the cql statement to do this was:

insert into system_schema.keyspaces ( keyspace_name , durable_writes, 
replication ) values ( 'system_auth', true, {'OLD_DC': '3', ’NEW_DC': '3', 
'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy'} );

This will then allow you to log in after the nodes come back up, and the other 
keyspaces can be changed as normal afterwards.

Thanks for you help.

Paul


> On 21 Feb 2021, at 22:30, Kane Wilson  wrote:
> 
> Make sure you test it on a practice cluster. Messing with the system tables 
> is risky business!
> 
> raft.so <https://raft.so/> - Cassandra consulting, support, and managed 
> services
> 
> 
> On Sun, Feb 21, 2021 at 11:12 PM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi Kane,
> 
> That sounds a good idea, I will give it a try on Monday.
> 
> Thanks 
> 
> Paul
> 
>> On 21 Feb 2021, at 11:33, Kane Wilson mailto:k...@raft.so>> 
>> wrote:
>> 
>> There has been proposals to add a force/unsafe flag to alter DC but it 
>> hasn't been actioned and at this rate seems unlikely to make it into 4.0. 
>> There is however a workaround, albeit not very user friendly. You should be 
>> able to modify the system_schema tables directly to do your DC updates. I am 
>> yet to test it out myself so don't know the updates you'd need to make but 
>> you should be able to get a good idea by querying, doing updates and 
>> observing the effect.
>> 
>> raft.so - Cassandra consulting, support, managed services
>> 
>> On Sat., 20 Feb. 2021, 02:29 Paul Chandler, > <mailto:p...@redshots.com>> wrote:
>> All,
>> 
>> We have a use case where we need to change the datacenter name for a 
>> cassandra cluster, we have a script to do this that involves a short 
>> downtime. This does the following 
>> 
>> 
>> 1) Change replication factor for the system key spaces to be { ‘OLD_DC’ : 
>> ‘3’, ’NEW_DC”: ‘3’  }
>> 2) Change the dc value in cassandra-rackdc.properties to NEW_DC for each 
>> node 
>> 3) Add -Dcassandra.ignore_dc=true in cassandra-env.sh for each node
>> 4) Stop all nodes
>> 5) Start each seed node, then start the rest of the nodes
>> 6) change the replication factor for all the keyspaces to {  ’NEW_DC”: ‘3’  }
>> 
>> In 3.11.x this all works fine and the cluster is good to be used again after 
>> step 6.
>> 
>> However in 4.0 step 1 is now blocked by the following change "Cassandra will 
>> no longer allow invalid keyspace replication options, such as invalid 
>> datacenter names for NetworkTopologyStrategy”
>> 
>> If you skip step 1) then when the nodes come back up, you cannot login 
>> because the system_auth keyspace still has a replication factor of ‘OLD_DC’ 
>> : ‘3’ but there are no nodes in the dc OLD_DC so the keyspace cannot be 
>> accessed.
>> 
>> Is there a way around this to change the name of the datacenter?
>> 
>> 
>> Thanks 
>> 
>> Paul
>> 
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
>> <mailto:user-unsubscr...@cassandra.apache.org>
>> For additional commands, e-mail: user-h...@cassandra.apache.org 
>> <mailto:user-h...@cassandra.apache.org>
>> 
>

Re: Cassandra 4.0 and changing DC setting

2021-02-21 Thread Paul Chandler

Hi Kane,

That sounds a good idea, I will give it a try on Monday.

Thanks 

Paul

> On 21 Feb 2021, at 11:33, Kane Wilson  wrote:
> 
> There has been proposals to add a force/unsafe flag to alter DC but it hasn't 
> been actioned and at this rate seems unlikely to make it into 4.0. There is 
> however a workaround, albeit not very user friendly. You should be able to 
> modify the system_schema tables directly to do your DC updates. I am yet to 
> test it out myself so don't know the updates you'd need to make but you 
> should be able to get a good idea by querying, doing updates and observing 
> the effect.
> 
> raft.so - Cassandra consulting, support, managed services
> 
> On Sat., 20 Feb. 2021, 02:29 Paul Chandler,  <mailto:p...@redshots.com>> wrote:
> All,
> 
> We have a use case where we need to change the datacenter name for a 
> cassandra cluster, we have a script to do this that involves a short 
> downtime. This does the following 
> 
> 
> 1) Change replication factor for the system key spaces to be { ‘OLD_DC’ : 
> ‘3’, ’NEW_DC”: ‘3’  }
> 2) Change the dc value in cassandra-rackdc.properties to NEW_DC for each node 
> 3) Add -Dcassandra.ignore_dc=true in cassandra-env.sh for each node
> 4) Stop all nodes
> 5) Start each seed node, then start the rest of the nodes
> 6) change the replication factor for all the keyspaces to {  ’NEW_DC”: ‘3’  }
> 
> In 3.11.x this all works fine and the cluster is good to be used again after 
> step 6.
> 
> However in 4.0 step 1 is now blocked by the following change "Cassandra will 
> no longer allow invalid keyspace replication options, such as invalid 
> datacenter names for NetworkTopologyStrategy”
> 
> If you skip step 1) then when the nodes come back up, you cannot login 
> because the system_auth keyspace still has a replication factor of ‘OLD_DC’ : 
> ‘3’ but there are no nodes in the dc OLD_DC so the keyspace cannot be 
> accessed.
> 
> Is there a way around this to change the name of the datacenter?
> 
> 
> Thanks 
> 
> Paul
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
>

Cassandra 4.0 and changing DC setting

2021-02-19 Thread Paul Chandler

All,

We have a use case where we need to change the datacenter name for a cassandra 
cluster, we have a script to do this that involves a short downtime. This does 
the following 


1) Change replication factor for the system key spaces to be { ‘OLD_DC’ : ‘3’, 
’NEW_DC”: ‘3’  }
2) Change the dc value in cassandra-rackdc.properties to NEW_DC for each node 
3) Add -Dcassandra.ignore_dc=true in cassandra-env.sh for each node
4) Stop all nodes
5) Start each seed node, then start the rest of the nodes
6) change the replication factor for all the keyspaces to {  ’NEW_DC”: ‘3’  }

In 3.11.x this all works fine and the cluster is good to be used again after 
step 6.

However in 4.0 step 1 is now blocked by the following change "Cassandra will no 
longer allow invalid keyspace replication options, such as invalid datacenter 
names for NetworkTopologyStrategy”

If you skip step 1) then when the nodes come back up, you cannot login because 
the system_auth keyspace still has a replication factor of ‘OLD_DC’ : ‘3’ but 
there are no nodes in the dc OLD_DC so the keyspace cannot be accessed.

Is there a way around this to change the name of the datacenter?


Thanks 

Paul




-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: username/password error when using nodetool flush

2021-01-01 Thread Paul Chandler

Take a look here then and see what you have got the jmx authentication set to, 
see if that helps you.

https://www.guru99.com/cassandra-security.html#7

Sorry for the short answer, New Years Day and away from my computer to look 
things up properly.

Sent from my iPhone

> On 1 Jan 2021, at 23:14, Manu Chadha  wrote:
> 
>  Just nodetool doesn't work unfortunately 
> 
> Sent from my iPhone
> 
>>> On 1 Jan 2021, at 21:28, Paul Chandler  wrote:
>>> 
>>  Hi Manu,
>> 
>> nodetool uses the JMX user and password, I think the normal default for that 
>> is for it not being required, but not sure if that is the case for the setup 
>> you are using. So just try nodetool flush and see if that works.
>> 
>> Regards 
>> 
>> Paul 
>> 
>> Sent from my iPhone
>> 
>>> On 1 Jan 2021, at 20:41, Manu Chadha  wrote:
>>> 
>>> 
>>> In fact, I notice that I can’t run any nodetool command. I get the error 
>>> even when running nodetool status command
>>>  
>>> Sent from Mail for Windows 10
>>>  
>>> From: Manu Chadha
>>> Sent: 01 January 2021 20:36
>>> To: user@cassandra.apache.org
>>> Subject: username/password error when using nodetool flush
>>>  
>>> Hi
>>>  
>>> Happy New Year.
>>>  
>>> I am trying to use `nodetool flush -u username` but I get error `nodetool: 
>>> Failed to connect to '127.0.0.1:7199' - FailedLoginException: 'Invalid 
>>> username or password'.`
>>>  
>>> I am using the same credentials which I use in `cqlsh -u -p`.  As far as I 
>>> can observe, I am using the right values.
>>>  
>>> I am running `cassandra` using `kubernetes` and `K8ssandra`. I have `SSH`ed 
>>>  to a node and am running `nodetool flush` in the container by getting a 
>>> shell in the container using ` docker exec -it 
>>> k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
>>>  /bin/bash `.
>>>  
>>> What mistake I might be making? Is there some other credential I need to 
>>> use?
>>>  
>>> Thanks
>>> Manu
>>>  
>>> Sent from Mail for Windows 10
>>>  
>>>

Re: username/password error when using nodetool flush

2021-01-01 Thread Paul Chandler

Hi Manu,

nodetool uses the JMX user and password, I think the normal default for that is 
for it not being required, but not sure if that is the case for the setup you 
are using. So just try nodetool flush and see if that works.

Regards 

Paul 

Sent from my iPhone

> On 1 Jan 2021, at 20:41, Manu Chadha  wrote:
> 
> 
> In fact, I notice that I can’t run any nodetool command. I get the error even 
> when running nodetool status command
>  
> Sent from Mail for Windows 10
>  
> From: Manu Chadha
> Sent: 01 January 2021 20:36
> To: user@cassandra.apache.org
> Subject: username/password error when using nodetool flush
>  
> Hi
>  
> Happy New Year.
>  
> I am trying to use `nodetool flush -u username` but I get error `nodetool: 
> Failed to connect to '127.0.0.1:7199' - FailedLoginException: 'Invalid 
> username or password'.`
>  
> I am using the same credentials which I use in `cqlsh -u -p`.  As far as I 
> can observe, I am using the right values.
>  
> I am running `cassandra` using `kubernetes` and `K8ssandra`. I have `SSH`ed  
> to a node and am running `nodetool flush` in the container by getting a shell 
> in the container using ` docker exec -it 
> k8s_cassandra_k8ssandra-dc1-default-sts-0_default_00b0d72a-c124-4b04-b25d-9e0f17edc582_0
>  /bin/bash `.
>  
> What mistake I might be making? Is there some other credential I need to use?
>  
> Thanks
> Manu
>  
> Sent from Mail for Windows 10
>  
>

Re: Tool for schema upgrades

2020-10-09 Thread Paul Chandler

Thanks Alex and Alex

That script checks the schema is in agreement after each statement, so that is 
exactly what I am looking for.

Thanks 

Paul

> On 8 Oct 2020, at 18:21, Alexander DEJANOVSKI  wrote:
> 
> I second Alex's recommendation.
> We use https://github.com/patka/cassandra-migration 
> <https://github.com/patka/cassandra-migration> to manage schema migrations in 
> Reaper and it has a consensus feature to prevent concurrent migrations from 
> clashing.
> 
> Cheers,
> 
> Alex
> 
> Le jeu. 8 oct. 2020 à 19:10, Alex Ott  <mailto:alex...@gmail.com>> a écrit :
> Hi
> 
> Look at https://github.com/patka/cassandra-migration 
> <https://github.com/patka/cassandra-migration> - it should be good. 
> 
> P.S. Here is the list of tools that I assembled over the years:
> 
>- [ ] https://github.com/hhandoko/cassandra-migration 
> <https://github.com/hhandoko/cassandra-migration>
>- [ ] https://github.com/Contrast-Security-OSS/cassandra-migration 
> <https://github.com/Contrast-Security-OSS/cassandra-migration>
>- [ ] https://github.com/juxt/joplin <https://github.com/juxt/joplin>
>- [ ] https://github.com/o19s/trireme <https://github.com/o19s/trireme>
>- [ ] https://github.com/golang-migrate/migrate 
> <https://github.com/golang-migrate/migrate>
>- [ ] https://github.com/Cobliteam/cassandra-migrate 
> <https://github.com/Cobliteam/cassandra-migrate>
>- [ ] https://github.com/patka/cassandra-migration 
> <https://github.com/patka/cassandra-migration>
>- [ ] https://github.com/comeara/pillar <https://github.com/comeara/pillar>
> 
> On Thu, Oct 8, 2020 at 5:45 PM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi all,
> 
> Can anyone recommend a tool to perform schema DDL upgrades, that follows best 
> practice to ensure you don’t get schema mismatches if running multiple 
> upgrade statements in one migration ?
> 
> Thanks 
> 
> Paul
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
> 
> 
> 
> -- 
> With best wishes,Alex Ott
> http://alexott.net/ <http://alexott.net/>
> Twitter: alexott_en (English), alexott (Russian)

Tool for schema upgrades

2020-10-08 Thread Paul Chandler

Hi all,

Can anyone recommend a tool to perform schema DDL upgrades, that follows best 
practice to ensure you don’t get schema mismatches if running multiple upgrade 
statements in one migration ?

Thanks 

Paul
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Stopping a Nodetool move

2020-09-07 Thread Paul Chandler

Hi all,


Is there a way to stop a nodetool move that is currently in progress? 

It is not moving the data between the nodes as expected and I would like to 
stop it before it completes.

Thank you 

Paul


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: disable debug message on read repair

2020-03-10 Thread Paul Chandler

Hi Gil,

All the logging is controlled via logback. You can change the level of any type 
of message.

Take a look here for some more details: 
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/configuration/configLoggingLevels.html
 
<https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/configuration/configLoggingLevels.html>

Thanks 

Paul Chandler
www.redshots.com

> On 10 Mar 2020, at 08:56, Gil Ganz  wrote:
> 
> That's one option, I wish I there was a way to disable just that and not the 
> entire debug log level, there are some things there I would like to keep.
> 
> On Sun, Mar 8, 2020 at 6:41 PM Jeff Jirsa  <mailto:jji...@gmail.com>> wrote:
> There are likely two log configs - one for debug.log and one for system.log. 
> Disable the debug.log one, or change org.apache.cassandra.service to log at 
> INFO instead 
> 
> Nobody needs to see every digest mismatch and that someone thought this was a 
> good idea is amazing to me. Someone should jira that to be a trace.
> 
> 
>> On Mar 8, 2020, at 3:25 AM, Gil Ganz > <mailto:gilg...@gmail.com>> wrote:
>> 
>> 
>> Thanks Shalom, I know why these read repairs are happening, and they will 
>> continue to happen for some time, even if I will run a full repair. 
>> I would like to disable these warning messages.
>> 
>> On Sun, Mar 8, 2020 at 10:19 AM Shalom Sagges > <mailto:shalomsag...@gmail.com>> wrote:
>> Hi Gil, 
>> 
>> You can run a full repair on your cluster. But if these messages come back 
>> again, you need to check what's causing these data inconsistencies. 
>> 
>> 
>> On Sun, Mar 8, 2020 at 10:11 AM Gil Ganz > <mailto:gilg...@gmail.com>> wrote:
>> Hey all
>> I have a lot of debug message about read repairs in my debug log :
>> 
>> DEBUG [ReadRepairStage:346] 2020-03-08 08:09:12,959 ReadCallback.java:242 - 
>> Digest mismatch:
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
>> DecoratedKey(-28476014476640, 
>> 000400871130303a33613a37643a33613a62643a383000) 
>> (38d3509295a283326a71887113fc vs 033cbba98c43e7ba6f15c0ba462f5fcc)
>> at 
>> org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92)
>>  ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at 
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233)
>>  ~[apache-cassandra-3.11.5.jar:3.11.5]
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>  [na:1.8.0_201]
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>  [na:1.8.0_201]
>> at 
>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84)
>>  [apache-cassandra-3.11.5.jar:3.11.5]
>> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_201]
>> 
>> Version is 3.11.5, anyone knows how to disable just these warnings?
>> Thanks
>> Gil
>>

Re: Corruption of frozen UDT during upgrade

2020-02-15 Thread Paul Chandler

Thanks Erick, looks like I have a bit of detective work to do on Monday, to 
work out which of my list of clusters started out as 2.* or DSE 4.* and whether 
they had UDT’s at that time. 

> On 15 Feb 2020, at 00:50, Erick Ramirez  wrote:
> 
> I am still having problems reproducing this, so I am wondering if I have 
> created the tables correctly to create this issue.
> 
> Paul, I've since had clarification on the bug and I hope I can explain it 
> correctly here (happy to be corrected if anyone else has insight on the 
> issue). When you create a table with a frozen UDT in C* 2.1 or 2.2, the 
> serialisation header does not get written correctly when you upgrade to C* 
> 3.0. So if you want to replicate it, you'd need to create the table in C* 
> 2.1/2.2.
> 
> Consequently, you would have seen the notification from Michael Shuler a few 
> minutes ago about C* 3.11.6 being released which contains the fix for 
> CASSANDRA-15035 
> . 
> This means that you can use the C* 3.11.6 sstablescrub which contains the new 
> header-fix flag for your testing. Cheers!

Re: Corruption of frozen UDT during upgrade

2020-02-14 Thread Paul Chandler

Erick,

Thank you for your help. 

I am still having problems reproducing this, so I am wondering if I have 
created the tables correctly to create this issue.

I have looked at the sstabledumps and they seem exactly the same.

This is the pre upgrade version 3.0.14 ( a snapshotted version )

sstabledump mc-1-big-Data.db
[
  {
"partition" : {
  "key" : [ "1" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 48,
"liveness_info" : { "tstamp" : "2020-02-13T10:48:53.856469Z" },
"cells" : [
  { "name" : "calendar", "value" : {"holiday_type": 1, "holiday_start": 
"2020-01-01", "holiday_end": "2020-01-02"} }
]
  }
]
  }
]

This is the post upgrade 3.1.4 version:

sstabledump md-2-big-Data.db
[
  {
"partition" : {
  "key" : [ "1" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 48,
"liveness_info" : { "tstamp" : "2020-02-13T10:48:53.856469Z" },
"cells" : [
  { "name" : "calendar", "value" : {"holiday_type": 1, "holiday_start": 
"2020-01-01", "holiday_end": "2020-01-02"} }
]
  }
]
  }
]


I can see the sstable has been upgraded, as the name has changed from 
mc-1-big-Data.db to md-2-big-Data.db

Thanks 

Paul




> On 14 Feb 2020, at 00:47, Erick Ramirez  wrote:
> 
> Paul, if you do a sstabledump in C* 3.0 (before upgrading) and compare it to 
> the dump output after upgrading to C* 3.11 then you will see that the cell 
> names in the outputs are different. This is the symptom of the broken 
> serialization header which leads to various exceptions during compactions and 
> reads.
> 
> CASSANDRA-15035 <https://issues.apache.org/jira/browse/CASSANDRA-15035> has 
> been fixed but is not yet included in a released version of C* (earmarked for 
> C* 3.11.6, 4.0). The patched version of sstablescrub includes a new flag "-e" 
> which rewrites the SSTable serialization headers to include the missing info 
> for the frozen UDTs. See NEWS.txt 
> <https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L116-L133> for more 
> details.
> 
> If you want to run a verification test on your SSTables, you can follow this 
> procedure as a workaround:
> - copy the SSTables to another server that's not part of any C* cluster
> - download the DSE 5.1 (equivalent to C* 3.11) tarball from 
> https://downloads.datastax.com/enterprise/dse-5.1.17-bin.tar.gz 
> <https://downloads.datastax.com/enterprise/dse-5.1.17-bin.tar.gz>
> - unpack the tarball (more details here 
> <https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/install/installTARdse.html>)
> - run sstablescrub -e fix-only to just fix the headers without doing a normal 
> scrub
> 
> If the headers are fine, the scrub will be a no-op. Otherwise, it will report 
> that new metadata files are being written. For more details, see 
> https://support.datastax.com/hc/en-us/articles/360025955351 
> <https://support.datastax.com/hc/en-us/articles/360025955351>. Cheers!
> 
> Erick Ramirez  |  Developer Relations 
> erick.rami...@datastax.com <mailto:erick.rami...@datastax.com> | datastax.com 
> <http://www.datastax.com/> <https://www.linkedin.com/company/datastax>  
> <https://www.facebook.com/datastax>  <https://twitter.com/datastax>  
> <http://feeds.feedburner.com/datastax>  <https://github.com/datastax/>
>  <https://www.datastax.com/accelerate>
> 
> 
> On Fri, 14 Feb 2020 at 01:43, Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi all,
> 
> I have looked at the release notes for the up coming release 3.11.6 and seen 
> the part about corruption of frozen UDT types during upgrade from 3.0.
> 
> We have a number of cluster using UDT and have been upgrading to 3.11.4 and 
> haven’t noticed any problems.
> 
> In the ticket ( CASSANDRA-15035 ) it does not seem to specify how to 
> reproduce this problem, so I tried using the following definition:
> 
> CREATE TYPE supplier_holiday_udt (
> holiday_type int,
> holiday_start date,
> holiday_end date
> );
> 
> CREATE TABLE supplier (
> supplier_id int PRIMARY KEY,
> calendar frozen
> )
> 
> I performed an upgrade from 3.0.15 to 3.11.4, including running a nodetool 
> upgradesstables.
> 
> There were no errors during the process and I can still read the data in 
> supplier table.
> 
> Can anyone tell me how I reproduce this problem, or check that the clusters 
> we have already upgraded do not have any problems .
> 
> Thanks 
> 
> Paul 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
>

Corruption of frozen UDT during upgrade

2020-02-13 Thread Paul Chandler

Hi all,

I have looked at the release notes for the up coming release 3.11.6 and seen 
the part about corruption of frozen UDT types during upgrade from 3.0.

We have a number of cluster using UDT and have been upgrading to 3.11.4 and 
haven’t noticed any problems.

In the ticket ( CASSANDRA-15035 ) it does not seem to specify how to reproduce 
this problem, so I tried using the following definition:

CREATE TYPE supplier_holiday_udt (
holiday_type int,
holiday_start date,
holiday_end date
);

CREATE TABLE supplier (
supplier_id int PRIMARY KEY,
calendar frozen
)

I performed an upgrade from 3.0.15 to 3.11.4, including running a nodetool 
upgradesstables.

There were no errors during the process and I can still read the data in 
supplier table.

Can anyone tell me how I reproduce this problem, or check that the clusters we 
have already upgraded do not have any problems .

Thanks 

Paul 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Cassandra going OOM due to tombstones (heapdump screenshots provided)

2020-01-29 Thread Paul Chandler

Hi Behroz,

It looks like the number of tables is the problem, with 5,000 - 10,000 tables, 
that is way above the recommendations.

Take a look here: 
https://docs.datastax.com/en/dse-planning/doc/planning/planningAntiPatterns.html#planningAntiPatterns__AntiPatTooManyTables
 


This suggests that 5-10GB of heap is going to be taken up just with the table 
information ( 1MB per table )

Thanks 

Paul

> On 29 Jan 2020, at 14:50, Behroz Sikander  wrote:
> 
>>> Some environment details like Cassandra version, amount of physical RAM, 
> JVM configs (heap and others), and any other non-default cassandra.yaaml 
> configs would help. The amount of data, number of keyspaces & tables, 
> since you mention "clients", would also be helpful for people to suggest 
> tuning improvements.
> 
> We are more or less using the default properties.
> Here are some more details
> 
> - Total nodes in the cluster - 9
> - Disk for each node is 2 TB
> - Number of keyspaces - 1000
> - Each keyspace has 5-10 tables
> - We observed this problem on a c4.4xlarge (AWS EC2) instance having 30GB RAM 
> with 8GB heap
> - We observed the same problem on a c4.8xlarge having 60GB RAM with 12GB heap
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

Re: Ram & Space...

2019-10-23 Thread Paul Chandler

We had what sounds like a similar problem with a DSE cluster a little while 
ago, It was not being used, and had no tables in it. The memory kept rising 
until it was killed by the oom-killer.

We spent along time trying to get to the bottom of the problem, but it suddenly 
stopped when the developers started using the cluster. Perhaps the same will 
happen when you start using yours.

Thanks 

Paul

> On 23 Oct 2019, at 18:26, A  wrote:
> 
> Thank you. But I have added any tables yet. It’s empty...
> 
> 
> Sent from Yahoo Mail for iPhone 
> 
> On Tuesday, October 22, 2019, 1:15 AM, Matthias Pfau 
>  wrote:
> 
> Did you check nodetool status and logs? If so, what is reported?
> 
> Regarding that more and more memory is used. This might be a problem with 
> your table design. I would start analyzing nodetool tablestats output. It 
> reports how much memory (especially off heap) is used by which table.
> 
> Best,
> Matthias
> 
> 
> Oct 19, 2019, 18:46 by htt...@yahoo.com.INVALID 
> :
> What are minimum and recommended ram and space requirements to run Cassandra 
> in AWS?
> 
> Every like 24 hours Cassandra stops working. Even though the service is 
> active, it’s dead and non responsive until I restart the service.
> 
> Top shows %MEM slowly creeping upwards. Yesterday it showed 75%. 
> 
> In the logs it throws that Cassandra is running in degraded mode and that I 
> should consider adding more space to the free 25G...
> 
> Thanks in advance for your help. Newbie here... lots to learn.
> 
> Angel
> 
> 
> Sent from Yahoo Mail for iPhone
> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> 
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> 
>

Re: TWCS and gc_grace_seconds

2019-10-18 Thread Paul Chandler

Hi Adarsh,

You will have problems if you manually delete data when using TWCS.

To fully understand why, I recommend reading this The Last Pickle post: 
https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html
And this post I wrote that dives deeper into the problems with deletes: 
http://www.redshots.com/cassandra-twcs-must-have-ttls/

Thanks 

Paul

> On 18 Oct 2019, at 14:22, Adarsh Kumar  wrote:
> 
> Thanks Jeff,
> 
> 
> I just checked with business and we have differences in having TTL. So it 
> will be manula purging always. We do not want to use LCS due to high IOs.
> So:
> As the use case is of time series data model, TWCS will be give some benefit 
> (without TTL) and with frequent deleted data
> Are there any best practices/recommendations to handle high number of 
> tombstones 
> Can we handle this use case  with STCS also (with some configurations)
> 
> Thanks in advance
> 
> Adarsh Kumar
> 
> On Fri, Oct 18, 2019 at 11:46 AM Jeff Jirsa  > wrote:
> Is everything in the table TTL’d? 
> 
> Do you do explicit deletes before the data is expected to expire ? 
> 
> Generally speaking, gcgs exists to prevent data resurrection. But ttl’d data 
> can’t be resurrected once it expires, so gcgs has no purpose unless you’re 
> deleting it before the ttl expires. If you’re doing that, twcs won’t be able 
> to drop whole sstables anyway, so maybe LCS will be less disk usage (but much 
> higher IO)
> 
>> On Oct 17, 2019, at 10:36 PM, Adarsh Kumar > > wrote:
>> 
>> 
>> Hi,
>> 
>> We have a use case of time series data with TTL where we want to use 
>> TimeWindowCompactionStrategy because of its better management for TTL and 
>> tombstones. In this case, data we have is frequently deleted so we want to 
>> reduce gc_grace_seconds to reduce the tombstones' life and reduce pressure 
>> on storage. I have following questions:
>> Do we always need to run repair for the table in reduced gc_grace_seconds or 
>> there is any other way to manage repairs in this vase
>> Do we have any other strategy (or combination of strategies) to manage 
>> frequently deleted time-series data
>> Thanks in advance.
>> 
>> Adarsh Kumar

Re: Understanding TRACE logging

2019-09-26 Thread Paul Chandler

Hi Shalom,

When tracking down specific queries I have used ngrep and fed the results into 
Wireshark, this will allow you to find out everything about the requests coming 
into the node from the client, as long as the connection is not encrypted.

I wrote this up here a few months ago: 
http://www.redshots.com/finding-rogue-cassandra-queries/

I hope this helps.

Paul





> On 26 Sep 2019, at 10:21, Laxmikant Upadhyay  wrote:
> 
> One of the way to figure out  what queries have run is to use audit logging  
> plugin supported in 3.x, 2.2
> https://github.com/Ericsson/ecaudit   
> 
> On Thu, Sep 26, 2019 at 2:19 PM shalom sagges  > wrote:
> Thanks for the quick response Jeff!
> 
> The EXECUTE lines are a prepared statement with the specified number of 
> parameters. 
> Is it possible to find out on which keyspace/table these prepared statements 
> run?
> Can I get additional information from the prepared statement's ID? e.g. 
> EXECUTE d67e6a07c24b675f492686078b46c997
> 
> Thanks!
> 
> On Thu, Sep 26, 2019 at 11:14 AM Jeff Jirsa  > wrote:
> The EXECUTE lines are a prepared statement with the specified number of 
> parameters. 
> 
> 
> On Wed, Sep 25, 2019 at 11:38 PM shalom sagges  > wrote:
> Hi All, 
> 
> I've been trying to find which queries are run on a Cassandra node. 
> I've enabled DEBUG and ran nodetool setlogginglevel 
> org.apache.cassandra.transport TRACE
> 
> I did get some queries, but it's definitely not all the queries that are run 
> on this database. 
> I've also found a lot of DEBUG [SharedPool-Worker-72] 2019-09-25 06:29:16,674 
> Message.java:437 - Received: EXECUTE 2a6022010ffaf55229262de917657d0f with 6 
> values at consistency LOCAL_QUORUM, v=3 but I don't understand what 
> information I can gain from that and why it appears many times (a lot more 
> then the queries I wish to track). 
> 
> Can someone help me understand this type of logging? 
> Thanks!
> DEBUG [SharedPool-Worker-88] 2019-09-25 06:29:16,793 Message.java:437 - 
> Received: EXECUTE 2a6022010ffaf55229262de917657d0f with 6 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-87] 2019-09-25 06:29:16,780 Message.java:437 - 
> Received: EXECUTE 447fdb9c8dfae53fafd78c7583aeb0f1 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-86] 2019-09-25 06:29:16,770 Message.java:437 - 
> Received: EXECUTE db812ac40b66c326f728452350eb0ab2 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-84] 2019-09-25 06:29:16,761 Message.java:437 - 
> Received: EXECUTE 7119db57e0a2041206f62c6d48fb4329 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-82] 2019-09-25 06:29:16,759 Message.java:437 - 
> Received: QUERY UPDATE tbl1 SET col6=?,col7=?,col8=?,col9=? WHERE col1=? AND 
> col2=? AND col3=? AND col4=? AND col5=?;, v=3
> DEBUG [SharedPool-Worker-85] 2019-09-25 06:29:16,751 Message.java:437 - 
> Received: EXECUTE 2cddc1f6af3c6efbeaf435f9b7ec1c8a with 4 values at 
> consistency LOCAL_ONE, v=3
> DEBUG [SharedPool-Worker-83] 2019-09-25 06:29:16,745 Message.java:437 - 
> Received: EXECUTE db812ac40b66c326f728452350eb0ab2 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-81] 2019-09-25 06:29:16,734 Message.java:437 - 
> Received: EXECUTE 7119db57e0a2041206f62c6d48fb4329 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-79] 2019-09-25 06:29:16,732 Message.java:437 - 
> Received: EXECUTE e779e97bc0de5e5e121db71c5cb2b727 with 11 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-80] 2019-09-25 06:29:16,731 Message.java:437 - 
> Received: EXECUTE 91af551f94a4394b96ef9afff71dfcc1 with 2 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-78] 2019-09-25 06:29:16,731 Message.java:437 - 
> Received: EXECUTE 2a6022010ffaf55229262de917657d0f with 6 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-75] 2019-09-25 06:29:16,720 Message.java:437 - 
> Received: EXECUTE b665e5f576dfe70845269d63b485c8ee with 2 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-77] 2019-09-25 06:29:16,715 Message.java:437 - 
> Received: EXECUTE ce545d85a7ee7c8ad58875afa72d9cf6 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-74] 2019-09-25 06:29:16,703 Message.java:437 - 
> Received: EXECUTE 7119db57e0a2041206f62c6d48fb4329 with 3 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-76] 2019-09-25 06:29:16,686 Message.java:437 - 
> Received: EXECUTE b665e5f576dfe70845269d63b485c8ee with 2 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-71] 2019-09-25 06:29:16,682 Message.java:437 - 
> Received: EXECUTE 2a6022010ffaf55229262de917657d0f with 6 values at 
> consistency LOCAL_QUORUM, v=3
> DEBUG [SharedPool-Worker-73] 2019-09-25 06:29:16,675 Message.java:437 - 
> Received:

Re: Differing snitches in different datacenters

2019-07-29 Thread Paul Chandler

Hi Voytek,

I looked into this a little while ago, and couldn’t really find a definitive 
answer. We ended up keeping the GossipingPropertyFileSnitch in our GCP 
Datacenter, the only downside that I could see is that you have to manually 
specify the rack and DC. But doing it that way does allow you to create a multi 
vendor cluster if you wished in the future. 

I would also be interested if anyone has the definitive answer one this.

Thanks 

Paul
www.redshots.com

> On 29 Jul 2019, at 17:06, Voytek Jarnot  wrote:
> 
> Just a quick bump - hoping someone can shed some light on whether running 
> different snitches in different datacenters is a terrible idea or no. It'd be 
> fairly temporary, once the new DC is stood up and nodes are rebuilt, the old 
> DC will be decomissioned.
> 
> On Thu, Jul 25, 2019 at 12:36 PM Voytek Jarnot  > wrote:
> Quick and hopefully easy question for the list. Background is existing 
> cluster (1 DC) will be migrated to AWS-hosted cluster via standing up a 
> second datacenter, existing cluster will be subsequently decommissioned.
> 
> We currently use GossipingPropertyFileSnitch and are thinking about using 
> Ec2MultiRegionSnitch in the new AWS DC - that'd position us nicely if in the 
> future we want to run a multi-DC cluster in AWS. My question is: are there 
> any issues with one DC using GossipingPropertyFileSnitch and the other using 
> Ec2MultiRegionSnitch? This setup would be temporary, existing until the new 
> DC nodes have rebuilt and the old DC is decommissioned.
> 
> Thanks,
> Voytek Jarnot

Re: Cheat Sheet for Unix based OS, Performance troubleshooting

2019-07-27 Thread Paul Chandler

I have always found Amy's Cassandra 2.1 tuning guide great for the Linux 
performance tuning: 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

Sent from my iPhone

> On 26 Jul 2019, at 23:49, Krish Donald  wrote:
> 
> Any one has  Cheat Sheet for Unix based OS, Performance troubleshooting ?

Re: Expanding from 1 to 2 datacenters

2019-06-26 Thread Paul Chandler

Hi Voytek,

We moved a large number of clusters from Rackspace to Google, doing it in a 
similar way to you are suggesting, so to answer your questions:

1) We created the nodes first with seeds from just the original datacenters, 
then updated the seed list to include seeds from the new DC afterwards.

2) Don’t create the keyspace again, it will be there already, but with no data 
until you change the replication factor for the keyspace. Once you have done 
this you will need to do a nodetool rebuild on the new nodes to stream the data 
to the new DC.

3) We kept auto_bootstrap to true, but only changed the replication factor of 
the system keyspaces before creating the new nodes, therefore the bootstrapping 
was quick, and the real data was streamed later during the nodetool rebuild.

4) We just kept the GossipingPropertyFileSnitch, this allows you the most 
flexibility, and allows you to change cloud providers later.

We presented how we did this at Datastax Accelerate earlier in the year, and 
although that presentation is not yet online, I also created a series of blogs 
posts that shows our steps in detail, I recommend reading these as it is the 
combined learnings of moving 90+ clusters: http://www.redshots.com/accelerate/ 
<http://www.redshots.com/accelerate/>

Happy to answer any more questions.

Regards 

Paul Chandler
www.redshots.com

> On 26 Jun 2019, at 16:19, Voytek Jarnot  wrote:
> 
> I started an higher-level thread years ago about moving a cluster by 
> expanding from 1 to 2 datacenters, replicating over, then decommissioning the 
> original DC. Corporate plans being what they are, we're finally getting into 
> this; I'm largely following the writeup here: 
> https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html
>  
> <https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html>
>  , but have a few more-specific questions:
> 
> current setup: 1 DC, 4 nodes, RF=3, 1 keyspace
> new DC will be 4 nodes as well, RF=3
> 
> 1) We currently have 2 seed nodes, I'd like to confirm that the correct 
> course is to make 1-2 (let's say 2) of the new-DC nodes seeds as well, and 
> update all nodes in both DCs to point at all 4 seeds before I get into 
> altering the keyspace.
> 
> 2) Prior to altering the replication on my keyspace to include the new DC, I 
> do not need/want to create the keyspace in the new DC, correct?
> 
> 3) The datastax docs mention the auto_bootstrap=false setting, but don't go 
> into much detail - I'm leaning toward setting it to false on all the new 
> nodes, sound reasonable?
> 
> 4) One of the three environments in which this'll happen is slightly more 
> complicated due to the existing DC living in AWS, whereas the new DC will be 
> in a different AZ. Do I need to get into switching from 
> GossipingPropertyFileSnitch to Ec2MultiRegionSnitch? If so, could someone 
> shed a bit of light on that process, and the associated changes needed for 
> listen_address and broadcast_address?
> 
> Thanks for getting this far,
> Voytek Jarnot

Re: Understanding output of read/write histogram using opscenter API

2019-06-04 Thread Paul Chandler

Hi Rahul,

Opscenter is a Datastax product, have you raised a support request with them ( 
https://support.datastax.com <https://support.datastax.com/> ), they should be 
able to answer this sort of question.


Regards 

Paul Chandler

> On 4 Jun 2019, at 12:24, Bhardwaj, Rahul  wrote:
> 
> Hi All,
> 
> Do we have any document to understand output of read/write histogram using 
> opscenter API. We need them to ingest it to create one of our dashboards. We 
> are facing difficulty in understanding its output if we relate it with 5 
> values like max,min,median,90th percentile,etc. Attaching one sample output. 
> we could not find the way to get different percentile’s data using API for 
> read-histogram and write-histogram. Kindly help or provide some doc link 
> related to its explanation.
> 
>  
>  
> Thanks and Regards,
> Rahul Bhardwaj
>  
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>

Re: TWCS sstables not dropping even though all data is expired

2019-06-04 Thread Paul Chandler

Mike,

It has taken me sometime, but I have now written this up in more detail on my 
blog: http://www.redshots.com/cassandra-twcs-must-have-ttls/ 
<http://www.redshots.com/cassandra-twcs-must-have-ttls/>

However I couldn’t get the tombstone compaction subproperties to work as 
expected.

If I use the following properties:

gc_grace_seconds = 60
   AND default_time_to_live = 300
AND compaction = {'compaction_window_size': '1', 
  'compaction_window_unit': 'MINUTES', 
 'tombstone_compaction_interval': '60',
 'tombstone_threshold': '0.01',
 'unchecked_tombstone_compaction': 'true',
  'class': 
'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy’}


When a row without a TTL is blocking the sstable being deleted, I would expect 
the later sstables to be deleted with these settings.

What actually happens is that the sstables are compacted every 60 seconds, but 
the new sstable is exactly the same as the previous one, even though the rows 
have expired and we are way past the gc_grace_seconds.

I have included an sstabledump below.

Does anyone know what I am doing wrong ?


Thanks 

Paul


sstabledump md-463-big-Data.db
[
  {
"partition" : {
  "key" : [ "1" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 33,
"clustering" : [ 3 ],
"liveness_info" : { "tstamp" : "2019-06-03T14:56:41.120579Z", "ttl" : 
300, "expires_at" : "2019-06-03T15:01:41Z", "expired" : true },
"cells" : [
  { "name" : "when", "deletion_info" : { "local_delete_time" : 
"2019-06-03T14:56:41Z" }
  }
]
  },
  {
"type" : "row",
"position" : 33,
"clustering" : [ 4 ],
"liveness_info" : { "tstamp" : "2019-06-03T14:56:45.499467Z", "ttl" : 
300, "expires_at" : "2019-06-03T15:01:45Z", "expired" : true },
"cells" : [
  { "name" : "when", "deletion_info" : { "local_delete_time" : 
"2019-06-03T14:56:45Z" }
  }
]
  },
  {
"type" : "row",
"position" : 51,
"clustering" : [ 5 ],
"liveness_info" : { "tstamp" : "2019-06-03T14:56:50.009615Z", "ttl" : 
300, "expires_at" : "2019-06-03T15:01:50Z", "expired" : true },
"cells" : [
  { "name" : "when", "deletion_info" : { "local_delete_time" : 
"2019-06-03T14:56:50Z" }
  }
]
  },
  {
"type" : "row",
"position" : 69,
"clustering" : [ 6 ],
"liveness_info" : { "tstamp" : "2019-06-03T14:56:54.926536Z", "ttl" : 
300, "expires_at" : "2019-06-03T15:01:54Z", "expired" : true },
"cells" : [
  { "name" : "when", "deletion_info" : { "local_delete_time" : 
"2019-06-03T14:56:54Z" }
  }
]
  },
  {
"type" : "row",
"position" : 87,
"clustering" : [ 7 ],
"liveness_info" : { "tstamp" : "2019-06-03T14:57:00.600615Z", "ttl" : 
300, "expires_at" : "2019-06-03T15:02:00Z", "expired" : true },
"cells" : [
  { "name" : "when", "deletion_info" : { "local_delete_time" : 
"2019-06-03T14:57:00Z" }
  }
]
  }
]
  }
]


> On 3 May 2019, at 19:59, Mike Torra  wrote:
> 
> Thx for the help Paul - there are definitely some details here I still don't 
> fully understand, but this helped me resolve the problem and know what to 
> look for in the future :)
> 
> On Fri, May 3, 2019 at 12:44 PM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi Mike,
> 
> For TWCS the sstable can only be deleted when all the data has expired in 
> that sstable, but you had a record without a ttl in it, so that sstable could 
> never be deleted.
> 
> That bit is straight forward, the next bit I remember reading somewhere but 
> can’t find it at the moment to confirm my thinking.
> 
> An sstable can only be deleted if it is the earliest sstable. I think

Re: Collecting Latency Metrics

2019-05-29 Thread Paul Chandler

There are various attributes under 
org.apache.cassandra.metrics.ClientRequest.Latency.Read these measure the 
latency in milliseconds

Thanks 

Paul
www.redshots.com

> On 29 May 2019, at 15:31, shalom sagges  wrote:
> 
> Hi All,
> 
> I'm creating a dashboard that should collect read/write latency metrics on C* 
> 3.x. 
> In older versions (e.g. 2.0) I used to divide the total read latency in 
> microseconds with the read count. 
> 
> Is there a metric attribute that shows read/write latency without the need to 
> do the math, such as in nodetool tablestats "Local read latency" output?
> I saw there's a Mean attribute in org.apache.cassandra.metrics.ReadLatency 
> but I'm not sure this is the right one. 
> 
> I'd really appreciate your help on this one. 
> Thanks!
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Counter table in Cassandra

2019-05-29 Thread Paul Chandler

Hi Garvit,

When updating counters, Cassandra does a read then a write, so there is an 
overhead of using counters. This is all explained here: 
https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

<https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>

There is a design pattern that can be used instead of counters, this will not 
work if you need instant accuracy, but if you are looking for a count value to 
be “eventually" correct then it will be a lot less taxing on cassandra. I have 
outlined this pattern in a blog post, when I wrote it I was advising a team 
that was performing a lot of count(*) on a table, so it starts from that 
premise rather than using counters, but the result is the same. This can be 
found here: http://www.redshots.com/cassandra-counting-without-using-counters/ 
<http://www.redshots.com/cassandra-counting-without-using-counters/>

I hope these links help.

Regards

Paul Chandler

> On 29 May 2019, at 10:18, Attila Wind  wrote:
> 
> Hi Garvit,
> 
> I can not answer your main question but when I read your lines one thing was 
> popping up constantly: "why do you ask this?" 
> 
> So what is the background of this question? Do you see anything smelly?
> 
> Actually
> a) I always assumed so naturally there are of course lots of in-parallel 
> activities (writes) against any tables includin counters. So of course there 
> is a race condition and probably threads yes
> 
> b) Cassandra do not have isolated transactions so of course in a complex flow 
> (using multiple tables) there is no business data consistency guarantee for 
> sure
> 
> c) until you are doing just +/- ops it is a mathematical fact that execution 
> order of writes is not really important. Repeating +1 increase 5 times will 
> result in higher counter by num 5...
> 
> Please share your background I am interested in it!
> 
> Cheers
> Attila
> 
> 2019. máj. 29., Sze 2:34 dátummal Garvit Sharma  <mailto:garvit...@gmail.com>> ezt írta:
> Hi,
> 
> I am using counter tables in Cassandra and I want to understand how the 
> concurrent updates to counter table are handled in Cassandra.
> 
> There are more than one threads who are responsible for updating the counter 
> for a partition key. Multiple threads can also update the counter for the 
> same key.
> 
> In case when more than one threads updating the counter for the same key, how 
> Cassandra is handling the race condition?
> 
> UPDATE cycling.popular_count
>  SET popularity = popularity + 1
>  WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;
> 
> Are there overheads of using counter tables? 
> Are there alternatives to counter tables?
> 
> Thanks,
> -- 
> 
> Garvit Sharma
> github.com/garvitlnmiit/ <http://github.com/garvitlnmiit/>
> 
> No Body is a Scholar by birth, its only hard work and strong determination 
> that makes him master.

Re: Corrupted sstables

2019-05-07 Thread Paul Chandler

Roy, We spent along time trying to fix it, but didn’t find a solution, it was a 
test cluster, so we ended up rebuilding the cluster, rather than spending 
anymore time trying to fix the corruption. We have worked out what had caused 
it, so were happy it wasn’t going to occur in production. Sorry that is not 
much help, but I am not even sure it is the same issue you have. 

Paul



> On 7 May 2019, at 07:14, Roy Burstein  wrote:
> 
> I can say that it happens now as well ,currently no node has been 
> added/removed . 
> Corrupted sstables are usually the index files and in some machines the 
> sstable even does not exist on the filesystem.
> On one machine I was able to dump the sstable to dump file without any issue  
> . Any idea how to tackle this issue ? 
>  
> 
> On Tue, May 7, 2019 at 12:32 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Roy,
> 
> I have seen this exception before when a column had been dropped then re 
> added with the same name but a different type. In particular we dropped a 
> column and re created it as static, then had this exception from the old 
> sstables created prior to the ddl change.
> 
> Not sure if this applies in your case.
> 
> Thanks 
> 
> Paul
> 
>> On 6 May 2019, at 21:52, Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> 
>> can Disk have bad sectors? fccheck or something similar can help.
>> 
>> Long shot: repair or any other operation conflicting. Would leave that to 
>> others.
>> 
>> On Mon, May 6, 2019 at 3:50 PM Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> It happens on the same column families and they have the same ddl (as 
>> already posted) . I did not check it after cleanup 
>> .
>> 
>> On Mon, May 6, 2019, 23:43 Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> This is strange, never saw this. does it happen to same column family?
>> 
>> Does it happen after cleanup?
>> 
>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> Yes.
>> 
>> On Mon, May 6, 2019, 23:23 Nitan Kainth > <mailto:nitankai...@gmail.com>> wrote:
>> Roy,
>> 
>> You mean all nodes show corruption when you add a node to cluster??
>> 
>> 
>> Regards,
>> Nitan
>> Cell: 510 449 9629 
>> 
>> On May 6, 2019, at 2:48 PM, Roy Burstein > <mailto:burstein@gmail.com>> wrote:
>> 
>>> It happened  on all the servers in the cluster every time I have added node
>>> .
>>> This is new cluster nothing was upgraded here , we have a similar cluster
>>> running on C* 2.1.15 with no issues .
>>> We are aware to the scrub utility just it reproduce every time we added
>>> node to the cluster .
>>> 
>>> We have many tables there
>

Re: Corrupted sstables

2019-05-06 Thread Paul Chandler

Roy,

I have seen this exception before when a column had been dropped then re added 
with the same name but a different type. In particular we dropped a column and 
re created it as static, then had this exception from the old sstables created 
prior to the ddl change.

Not sure if this applies in your case.

Thanks 

Paul

> On 6 May 2019, at 21:52, Nitan Kainth  wrote:
> 
> can Disk have bad sectors? fccheck or something similar can help.
> 
> Long shot: repair or any other operation conflicting. Would leave that to 
> others.
> 
> On Mon, May 6, 2019 at 3:50 PM Roy Burstein  > wrote:
> It happens on the same column families and they have the same ddl (as already 
> posted) . I did not check it after cleanup 
> .
> 
> On Mon, May 6, 2019, 23:43 Nitan Kainth  > wrote:
> This is strange, never saw this. does it happen to same column family?
> 
> Does it happen after cleanup?
> 
> On Mon, May 6, 2019 at 3:41 PM Roy Burstein  > wrote:
> Yes.
> 
> On Mon, May 6, 2019, 23:23 Nitan Kainth  > wrote:
> Roy,
> 
> You mean all nodes show corruption when you add a node to cluster??
> 
> 
> Regards,
> Nitan
> Cell: 510 449 9629 
> 
> On May 6, 2019, at 2:48 PM, Roy Burstein  > wrote:
> 
>> It happened  on all the servers in the cluster every time I have added node
>> .
>> This is new cluster nothing was upgraded here , we have a similar cluster
>> running on C* 2.1.15 with no issues .
>> We are aware to the scrub utility just it reproduce every time we added
>> node to the cluster .
>> 
>> We have many tables there

Re: TWCS sstables not dropping even though all data is expired

2019-05-03 Thread Paul Chandler

Hi Mike,

For TWCS the sstable can only be deleted when all the data has expired in that 
sstable, but you had a record without a ttl in it, so that sstable could never 
be deleted.

That bit is straight forward, the next bit I remember reading somewhere but 
can’t find it at the moment to confirm my thinking.

An sstable can only be deleted if it is the earliest sstable. I think this is 
due to the fact that deleting later sstables may expose old versions of the 
data stored in the stuck sstable which had been superseded. For example, if 
there was a tombstone in a later sstable for the non TTLed record causing the 
problem in this instance. Then deleting that sstable would cause that deleted 
data to reappear. (Someone please correct me if I have this wrong) 

Because sstables in different time buckets are never compacted together, this 
problem only goes away when you did the major compaction.

This would happen on all replicas of the data, hence the reason you this 
problem on 3 nodes.

Thanks 

Paul
www.redshots.com

> On 3 May 2019, at 15:35, Mike Torra  wrote:
> 
> This does indeed seem to be a problem of overlapping sstables, but I don't 
> understand why the data (and number of sstables) just continues to grow 
> indefinitely. I also don't understand why this problem is only appearing on 
> some nodes. Is it just a coincidence that the one rogue test row without a 
> ttl is at the 'root' sstable causing the problem (ie, from the output of 
> `sstableexpiredblockers`)?
> 
> Running a full compaction via `nodetool compact` reclaims the disk space, but 
> I'd like to figure out why this happened and prevent it. Understanding why 
> this problem would be isolated the way it is (ie only one CF even though I 
> have a few others that share a very similar schema, and only some nodes) 
> seems like it will help me prevent it.
> 
> 
> On Thu, May 2, 2019 at 1:00 PM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi Mike,
> 
> It sounds like that record may have been deleted, if that is the case then it 
> would still be shown in this sstable, but the deleted tombstone record would 
> be in a later sstable. You can use nodetool getsstables to work out which 
> sstables contain the data.
> 
> I recommend reading The Last Pickle post on this: 
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html 
> <http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html> the sections 
> towards the bottom of this post may well explain why the sstable is not being 
> deleted.
> 
> Thanks 
> 
> Paul
> www.redshots.com <http://www.redshots.com/>
> 
>> On 2 May 2019, at 16:08, Mike Torra > <mailto:mto...@salesforce.com.INVALID>> wrote:
>> 
>> I'm pretty stumped by this, so here is some more detail if it helps.
>> 
>> Here is what the suspicious partition looks like in the `sstabledump` output 
>> (some pii etc redacted):
>> ```
>> {
>> "partition" : {
>>   "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>>   "position" : 210
>> },
>> "rows" : [
>>   {
>> "type" : "row",
>> "position" : 1132,
>> "clustering" : [ "2019-01-22 15:27:45.000Z" ],
>> "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
>> "cells" : [
>>   { "some": "data" }
>> ]
>>   }
>> ]
>>   }
>> ```
>> 
>> And here is what every other partition looks like:
>> ```
>> {
>> "partition" : {
>>   "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>>   "position" : 1133
>> },
>> "rows" : [
>>   {
>> "type" : "row",
>> "position" : 1234,
>> "clustering" : [ "2019-01-22 17:59:35.547Z" ],
>> "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
>> 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
>> "cells" : [
>>   { "name" : "activity_data", "deletion_info" : { 
>> "local_delete_time" : "2019-01-22T17:59:35Z" }
>>   }
>> ]
>>   }
>> ]
>>   }
>> ```
>> 
>> As expected, almost all of the data except this one suspicious partition has 
>> a ttl and is alre

Re: TWCS sstables not dropping even though all data is expired

2019-05-02 Thread Paul Chandler

Hi Mike,

It sounds like that record may have been deleted, if that is the case then it 
would still be shown in this sstable, but the deleted tombstone record would be 
in a later sstable. You can use nodetool getsstables to work out which sstables 
contain the data.

I recommend reading The Last Pickle post on this: 
http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html the sections towards 
the bottom of this post may well explain why the sstable is not being deleted.

Thanks 

Paul
www.redshots.com

> On 2 May 2019, at 16:08, Mike Torra  wrote:
> 
> I'm pretty stumped by this, so here is some more detail if it helps.
> 
> Here is what the suspicious partition looks like in the `sstabledump` output 
> (some pii etc redacted):
> ```
> {
> "partition" : {
>   "key" : [ "some_user_id_value", "user_id", "demo-test" ],
>   "position" : 210
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 1132,
> "clustering" : [ "2019-01-22 15:27:45.000Z" ],
> "liveness_info" : { "tstamp" : "2019-01-22T15:31:12.415081Z" },
> "cells" : [
>   { "some": "data" }
> ]
>   }
> ]
>   }
> ```
> 
> And here is what every other partition looks like:
> ```
> {
> "partition" : {
>   "key" : [ "some_other_user_id", "user_id", "some_site_id" ],
>   "position" : 1133
> },
> "rows" : [
>   {
> "type" : "row",
> "position" : 1234,
> "clustering" : [ "2019-01-22 17:59:35.547Z" ],
> "liveness_info" : { "tstamp" : "2019-01-22T17:59:35.708Z", "ttl" : 
> 86400, "expires_at" : "2019-01-23T17:59:35Z", "expired" : true },
> "cells" : [
>   { "name" : "activity_data", "deletion_info" : { "local_delete_time" 
> : "2019-01-22T17:59:35Z" }
>   }
> ]
>   }
> ]
>   }
> ```
> 
> As expected, almost all of the data except this one suspicious partition has 
> a ttl and is already expired. But if a partition isn't expired and I see it 
> in the sstable, why wouldn't I see it executing a CQL query against the CF? 
> Why would this sstable be preventing so many other sstable's from getting 
> cleaned up?
> 
> On Tue, Apr 30, 2019 at 12:34 PM Mike Torra  > wrote:
> Hello -
> 
> I have a 48 node C* cluster spread across 4 AWS regions with RF=3. A few 
> months ago I started noticing disk usage on some nodes increasing 
> consistently. At first I solved the problem by destroying the nodes and 
> rebuilding them, but the problem returns.
> 
> I did some more investigation recently, and this is what I found:
> - I narrowed the problem down to a CF that uses TWCS, by simply looking at 
> disk space usage
> - in each region, 3 nodes have this problem of growing disk space (matches 
> replication factor)
> - on each node, I tracked down the problem to a particular SSTable using 
> `sstableexpiredblockers`
> - in the SSTable, using `sstabledump`, I found a row that does not have a ttl 
> like the other rows, and appears to be from someone else on the team testing 
> something and forgetting to include a ttl
> - all other rows show "expired: true" except this one, hence my suspicion
> - when I query for that particular partition key, I get no results
> - I tried deleting the row anyways, but that didn't seem to change anything
> - I also tried `nodetool scrub`, but that didn't help either
> 
> Would this rogue row without a ttl explain the problem? If so, why? If not, 
> does anyone have any other ideas? Why does the row show in `sstabledump` but 
> not when I query for it?
> 
> I appreciate any help or suggestions!
> 
> - Mike

Re: Using Cassandra as an object store

2019-04-19 Thread Paul Chandler

Gene,

I have found that clusters used as object stores have caused me more problems 
than normal in the past, so I recommend using a separate object store if 
possible.

However, it certainly can be done, there is just a few things to consider:

1) Deletion policy: How are these objects going to be deleted, we have had 
problems in the past where deleted objects didn’t get removed from disk. This 
was because by the time they were deleted they had been compacted into very 
large sstables that were rarely compacted again. So think about compaction 
strategy and any tombstone issues you may come across.

2) Compression: Are the objects already compressed before they are stored eg 
jpgs ? If so turn compression off on the table, this reduces the amount of data 
read into memory when reading the data, reducing pressure on the heap. We did 
some trials with one system, and found much better performance if the 
compression was performed on the client side. So try some tests with that.

3) How often is the data read? There will be be completely different hardware 
requirements depending on whether this is a image store for an e-commerce site, 
compared with a pdf store holding client invoices. With a small amount of reads 
per object, then you can specify smaller CPUs and memory machines with a large 
amount of storage. If there are a large amount of reads, them you need to think 
much more carefully about memory and CPU, as per the Walmart article you 
referenced.

Thanks 

Paul Chandler
www.redshots.com

> On 19 Apr 2019, at 09:04, DuyHai Doan  wrote:
> 
> Idea: 
> 
> To guarantee data integrity, you can store an MD5 of all chunks data as 
> static column in the partition that contains the chunks
> 
> On Fri, Apr 19, 2019 at 9:18 AM cclive1601你  <mailto:cclive1...@gmail.com>> wrote:
> we have use cassandra as object store for some years, you can just split the 
> object into some small pieces. object got a pk, then the some small pieces 
> got some pks ,object's pk and pieces's pk can be store in meta table in 
> cassandra, and small pieces's pk and some pieces store in data table.  we 
> store videos ,picture and other no structure data.
> 
> Gene mailto:gh5...@gmail.com>> 于2019年4月19日周五 下午1:25写道：
> Howdy
> 
> I'm looking at the possibility of using cassandra as an object store to 
> offload image/blob data from an Oracle database.  I've seen mentions of it 
> being used as an object store in a large scale fashion, like with Walmart:
> 
> https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593
>  
> <https://medium.com/walmartlabs/building-object-store-storing-images-in-cassandra-walmart-scale-a6b9c02af593>
> 
> However I have found little on small scale setups and if it's even worth 
> using Cassandra in place of something else that's meant to be used for object 
> storage, like Ceph.
> 
> Additionally, I've read that cassandra struggles with storing objects 10MB or 
> larger and it's recommended to break objects up into smaller chunks, which 
> either requires some kind of middleware between our application and 
> cassandra, or it would require our application to split objects into smaller 
> chunks and recombine them as needed.
> 
> I've looked into pithos and astyanax, but those are both no longer developed 
> and I'm not seeing anything that might replace them in the long term.
> 
> https://github.com/exoscale/pithos <https://github.com/exoscale/pithos>
> https://github.com/Netflix/astyanax <https://github.com/Netflix/astyanax>
> 
> Any helpful information or advice would be greatly appreciated.
> 
> Thanks in advance.
> 
> -Gene
> 
> 
> -- 
> you are the apple of my eye !

Re: multiple snitches in the same cluster

2019-04-16 Thread Paul Chandler

Hi Shravan,

We did not see any down sides of using the GossipingPropertyFileSnitch, in fact 
being able to handcraft the DC allowed us to easily create “virtual” cassandra 
datacenters within the google DC, something we have used for splitting multi 
tenanted clusters. I am not sure how easy that would have been with the 
GoogleCloudSnitch.

Looking at the code for the various snitches, obviously the loading of the dc 
and rack information is different, but once it is loaded, the data is used in 
the same way. 

So I can’t see any problem with continuing to use the 
GossipingPropertyFileSnitch, although that comes with the one caveat that I 
have never tried it on Amazon, only GCP.

 Thanks 

Paul Chandler
www.redshots.com

> On 16 Apr 2019, at 15:02, Shravan R  wrote:
> 
> Thanks Paul. Glad to know that you are speaking on the very subject soon. 
> 
>  Even though you are on GCP, you are still tied with 
> GossipingPropertyFileSnitch and not enjoying the GoogleCloudSnitch. If it is 
> not too much to ask, other than hand stitching RACK and DC in 
> cassandra-rackdc.properties are there any down sides of not using 
> GoogleCloudSnitch in your case and EC2Snitch in my case? 
> 
> Thanks,
> Shravan
> 
> On Tue, Apr 16, 2019 at 4:05 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Sravan,
> 
> When we migrated to google we just continued using the 
> GossipingPropertyFileSnitch, we did not try and use the google version of the 
> snitch.
> 
> The only downside is the need to manually setup the 
> cassandra-rackdc.properties file, but for us that was controlled by puppet 
> anyway, so it was not an issue.
> 
> Thanks 
> 
> Paul Chandler
> www.redshots.com <http://www.redshots.com/>
> 
> PS Myself and Gilberto Müeller are presenting at Datastax Accelerate on this 
> very subject, how we migrated 91 clusters to Google, including what problems 
> we had along the way. It would be worth you attending that session if you are 
> there.
> 
> > On 16 Apr 2019, at 03:46, Shravan R  > <mailto:skr...@gmail.com>> wrote:
> > 
> > Can we have multiple snitches within the same cluster?
> > 
> > Reason for the ask: I need to find an optimal migration path from on prem 
> > DC to AWS.
> > Currently the cluster has TWO on prem DCs and the end cluster state should 
> > be TWO DCs in same region/multiple zones in AWS.
> > 
> > I know there is EC2Snitch - but that is if I have all the servers in AWS
> > 
> > I am thinking of using existing property file snitch but hand stitch the 
> > rack and dc from AWS region/zone and add two more DC's. Down side with this 
> > approach - I am really not using EC2Snitch which would have been nice. 
> > 
> > BTW I am running apache cassandra 2.1.9 (yeah old).
> > 
> > Appreciate your thoughts on this. 
> > 
> > -Sravan
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
>

Re: multiple snitches in the same cluster

2019-04-16 Thread Paul Chandler

Sravan,

When we migrated to google we just continued using the 
GossipingPropertyFileSnitch, we did not try and use the google version of the 
snitch.

The only downside is the need to manually setup the cassandra-rackdc.properties 
file, but for us that was controlled by puppet anyway, so it was not an issue.

Thanks 

Paul Chandler
www.redshots.com

PS Myself and Gilberto Müeller are presenting at Datastax Accelerate on this 
very subject, how we migrated 91 clusters to Google, including what problems we 
had along the way. It would be worth you attending that session if you are 
there.

> On 16 Apr 2019, at 03:46, Shravan R  wrote:
> 
> Can we have multiple snitches within the same cluster?
> 
> Reason for the ask: I need to find an optimal migration path from on prem DC 
> to AWS.
> Currently the cluster has TWO on prem DCs and the end cluster state should be 
> TWO DCs in same region/multiple zones in AWS.
> 
> I know there is EC2Snitch - but that is if I have all the servers in AWS
> 
> I am thinking of using existing property file snitch but hand stitch the rack 
> and dc from AWS region/zone and add two more DC's. Down side with this 
> approach - I am really not using EC2Snitch which would have been nice. 
> 
> BTW I am running apache cassandra 2.1.9 (yeah old).
> 
> Appreciate your thoughts on this. 
> 
> -Sravan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: cass-2.2 trigger - how to get clustering columns and value?

2019-04-11 Thread Paul Chandler

Hi Carl,

I now this is not exactly answering your question, but it may help with the 
split.

I have split a multi tenancy  cluster several times using a similar process to 
TLP’s Data Centre Switch: 
http://thelastpickle.com/blog/2019/02/26/data-center-switch.html

However instead of phase 3, we have split the cluster, by changing the seeds 
definition to only point at nodes within their own DC, and change the cluster 
name of the new DC. This last step does require a short downtime of the cluster.

We have had success with this method, and if you are only want to track the 
updates to feed into the new cluster, this this will work, however it you want 
it for anything else then it doesn’t help at all.

I can supply more details later if this method is of interest. 

Thanks 

Paul Chandler 

> On 10 Apr 2019, at 22:52, Carl Mueller  
> wrote:
> 
> We have a multitenant cluster that we can't upgrade to 3.x easily, and we'd 
> like to migrate some apps off of the shared cluster to dedicated clusters.
> 
> This is a 2.2 cluster.
> 
> So I'm trying a trigger to track updates while we transition and will send 
> via kafka. Right now I'm just trying to extract all the data from the 
> incoming updates
> 
> so for 
> 
> public Collection augment(ByteBuffer key, ColumnFamily update) {
> 
> the names returned by the update.getColumnNames() for an update of a table 
> with two clustering columns and had a regular column update produced two 
> CellName/Cells: 
> 
> one has no name, and no apparent raw value (bytebuffer is empty)
> 
> the other is the data column. 
> 
> I can extract the primary key from the key field
> 
> But how do I get the values of the two clustering columns? They aren't listed 
> in the iterator, and they don't appear to be in the key field. Since 
> clustering columns are encoded into the name of a cell, I'd imagine there 
> might be some "unpacking" trick to that. 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: All time blocked in nodetool tpstats

2019-04-11 Thread Paul Chandler

Hi Abdul,

That all depends on the cluster, so it really is best to experiment.

By adding more threads you will use more of the system resources, so before you 
start you need to know if there is spare capacity in the CPU usage and the disk 
throughput. If there is spare capacity then increase the threads in steps, I 
normally go in steps of 32., but that is based on the size of machines I 
normally work with. 

But as Anthony said, if it is a high read system, then it could easily be 
tombstones or garbage collection. 

Thanks 

Paul Chandler

> On 11 Apr 2019, at 03:57, Abdul Patel  wrote:
> 
> Do we have any recommendations on concurrents reads ans writes settings?
> Mine is 18 node 3 dc cluster with 20 core cpu
> 
> On Wednesday, April 10, 2019, Anthony Grasso  <mailto:anthony.gra...@gmail.com>> wrote:
> Hi Abdul,
> 
> Usually we get no noticeable improvement at tuning concurrent_reads and 
> concurrent_writes above 128. I generally try to keep current_reads to no 
> higher than 64 and concurrent_writes to no higher than 128. In creasing the 
> values beyond that you might start running into issues where the kernel IO 
> scheduler and/or the disk become saturated. As Paul mentioned, it will depend 
> on the size of your nodes though.
> 
> If the client is timing out, it is likely that the node that is selected as 
> the coordinator for the read has a resource contention somewhere. The root 
> cause is usually due to a number of things going on though. As Paul 
> mentioned, one of the issues could be the query design. It is worth 
> investigating if a particular read query is timing out.
> 
> I would also inspect the Cassandra logs and garbage collection logs on the 
> node where you are seeing the timeouts. The things to look out for is high 
> garbage collection frequency, long garbage collection pauses, and high 
> tombstone read warnings.
> 
> Regards,
> Anthony
> 
> On Thu, 11 Apr 2019 at 06:01, Abdul Patel  <mailto:abd786...@gmail.com>> wrote:
> Yes the queries are all select queries as they are more of read intensive app.
> Last night i rebooted cluster and today they are fine .(i know its temporary) 
> as i still is all time blocked values.
> I am thinking of incresiing concurrent 
> 
> On Wednesday, April 10, 2019, Paul Chandler  <mailto:p...@redshots.com>> wrote:
> Hi Abdul,
> 
> When I have seen dropped messages, I normally double check to ensure the node 
> not CPU bound. 
> 
> If you have a high CPU idle value, then it is likely that tuning the thread 
> counts will help.
> 
> I normally start with concurrent_reads and concurrent_writes, so in your case 
> as reads are being dropped then increase concurrent_reads, I normally change 
> it to 96 to start with, but it will depend on size of your nodes.
> 
> Otherwise it might be badly designed queries, have you investigated which 
> queries are producing the client timeouts?
> 
> Regards 
> 
> Paul Chandler 
> 
> 
> 
> > On 9 Apr 2019, at 18:58, Abdul Patel  > <mailto:abd786...@gmail.com>> wrote:
> > 
> > Hi,
> > 
> > My nodetool tpstats arw showing all time blocked high numbers a d also read 
> > dropped messages as 400 .
> > Client is expeirince high timeouts.
> > Checked few online forums they recommend to increase, 
> > native_transport_max_threads.
> > As of jow its commented with 128 ..
> > Is it adviabke to increase this and also can this fix timeout issue?
> > 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>
>

Re: All time blocked in nodetool tpstats

2019-04-10 Thread Paul Chandler

Hi Abdul,

When I have seen dropped messages, I normally double check to ensure the node 
not CPU bound. 

If you have a high CPU idle value, then it is likely that tuning the thread 
counts will help.

I normally start with concurrent_reads and concurrent_writes, so in your case 
as reads are being dropped then increase concurrent_reads, I normally change it 
to 96 to start with, but it will depend on size of your nodes.

Otherwise it might be badly designed queries, have you investigated which 
queries are producing the client timeouts?

Regards 

Paul Chandler 

> On 9 Apr 2019, at 18:58, Abdul Patel  wrote:
> 
> Hi,
> 
> My nodetool tpstats arw showing all time blocked high numbers a d also read 
> dropped messages as 400 .
> Client is expeirince high timeouts.
> Checked few online forums they recommend to increase, 
> native_transport_max_threads.
> As of jow its commented with 128 ..
> Is it adviabke to increase this and also can this fix timeout issue?
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: New user on Ubuntu 18.04 laptop, nodetest status throws NullPointerException

2019-04-03 Thread Paul Chandler

On further reading it does look like there may be a problem with your Java 
setup, as others are reporting this with Java 9 and above.

You could try the 3rd answer here and see if this helps: 
https://stackoverflow.com/questions/48193965/cassandra-nodetool-java-lang-nullpointerexception



> On 3 Apr 2019, at 16:55, David Taylor  wrote:
> 
> Hi Paul thanks for responding.
> 
> I created a ~/.cassandra directory and chmodded it to 777
> 
> in /var/log/cassandra/system.log the only non-INFO items are:
> WARN  [main] 2019-04-03 11:47:54,172 StartupChecks.java:136 - jemalloc shared 
> library could not be preloaded to speed up memory allocations
> WARN  [main] 2019-04-03 11:47:54,172 StartupChecks.java:169 - JMX is not 
> enabled to receive remote connections. Please see cassandra-env.sh for more 
> info.
> 
> Indeed, I meant nodetool, not nodetest.
> 
> Running nodetool status (or nodetool --help) results in the same stack trace 
> as before.
> 
> On Wed, Apr 3, 2019 at 11:34 AM Paul Chandler  <mailto:p...@redshots.com>> wrote:
> David,
> 
> When you start cassandra all the logs go to system.log normally in the 
> /var/log/cassandra directory, so you should look there once it has started, 
> to check everything is ok.
> 
> I assume you mean you ran nodetool status rather than nodetest.
> 
> The nodetool command stores a history of commands in the directory 
> ~/.cassandra, and from the stack trace you supply it looks like it is failing 
> to create that directory. So I would check the file system permissions there.
> 
> Thanks 
> 
> Paul Chandler
> 
> 
>> On 3 Apr 2019, at 15:15, David Taylor > <mailto:prooffrea...@gmail.com>> wrote:
>> 
>> I am running a System87 Oryx Pro laptop with Ubuntu 18.04
>> 
>> I had only Oracle Java 11 installed for Hadoop, so I also installed OpenJDK8 
>>  with:
>> $ sudo apt-get install openjdk-8-jre
>> and switched to it with
>> $ sudo update-java-alternatives --set 
>> path/shown/with/"update-java-alternatives --list"
>> 
>> $ java-version
>> openjdk version "1.8.0_191"
>> OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
>> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
>> 
>> I installed Cassandra according to the directions on 
>> http://cassandra.apache.org/doc/latest/getting_started/installing.html 
>> <http://cassandra.apache.org/doc/latest/getting_started/installing.html>, 
>> using the "Install from debian packages" instructions.
>> 
>> Now when I run
>> $ sudo service cassandra start
>> There are no errors, no feedback to stdout. I assume that's expected 
>> behavior?
>> 
>> However, this fails:
>> $ nodetest status
>> error: null
>> -- StackTrace --
>> java.lang.NullPointerException
>>  at 
>> org.apache.cassandra.config.DatabaseDescriptor.getDiskFailurePolicy(DatabaseDescriptor.java:1892)
>>  at 
>> org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:82)
>>  at org.apache.cassandra.io.util.FileUtils.(FileUtils.java:79)
>>  at 
>> org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:860)
>>  at org.apache.cassandra.tools.NodeTool.printHistory(NodeTool.java:200)
>>  at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:168)
>> 
>> Can anyone help me fix this?
>> 
>

Re: New user on Ubuntu 18.04 laptop, nodetest status throws NullPointerException

2019-04-03 Thread Paul Chandler

David,

When you start cassandra all the logs go to system.log normally in the 
/var/log/cassandra directory, so you should look there once it has started, to 
check everything is ok.

I assume you mean you ran nodetool status rather than nodetest.

The nodetool command stores a history of commands in the directory 
~/.cassandra, and from the stack trace you supply it looks like it is failing 
to create that directory. So I would check the file system permissions there.

Thanks 

Paul Chandler


> On 3 Apr 2019, at 15:15, David Taylor  wrote:
> 
> I am running a System87 Oryx Pro laptop with Ubuntu 18.04
> 
> I had only Oracle Java 11 installed for Hadoop, so I also installed OpenJDK8  
> with:
> $ sudo apt-get install openjdk-8-jre
> and switched to it with
> $ sudo update-java-alternatives --set 
> path/shown/with/"update-java-alternatives --list"
> 
> $ java-version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> 
> I installed Cassandra according to the directions on 
> http://cassandra.apache.org/doc/latest/getting_started/installing.html 
> <http://cassandra.apache.org/doc/latest/getting_started/installing.html>, 
> using the "Install from debian packages" instructions.
> 
> Now when I run
> $ sudo service cassandra start
> There are no errors, no feedback to stdout. I assume that's expected behavior?
> 
> However, this fails:
> $ nodetest status
> error: null
> -- StackTrace --
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.config.DatabaseDescriptor.getDiskFailurePolicy(DatabaseDescriptor.java:1892)
>   at 
> org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:82)
>   at org.apache.cassandra.io.util.FileUtils.(FileUtils.java:79)
>   at 
> org.apache.cassandra.utils.FBUtilities.getToolsOutputDirectory(FBUtilities.java:860)
>   at org.apache.cassandra.tools.NodeTool.printHistory(NodeTool.java:200)
>   at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:168)
> 
> Can anyone help me fix this?
>

Re: Procedures for moving part of a C* cluster to a different datacenter

2019-04-03 Thread Paul Chandler

Saleil,

Are you performing any regular repairs on the existing cluster?

If you are, you could set this repair up on the Tampa cluster, then after all 
the applications have been switched to Tampa, wait for a complete repair cycle, 
then it will be safe to decommission Orlando. however, there could be missing 
data in Tampa until the repairs are completed. 

If you are not performing any regular repairs, then you could already have data 
inconsistencies between the 2 existing clusters, so it won’t be any worse.

Having said that, we moved more than 50 clusters from the UK to Belgium, using 
a similar process, but we didn’t do any additional repairs apart from the ones 
performed by Opscenter, and we didn’t have any reports of missing data.

One thing I definitely would not do is have  a “logical” datacenter in 
Cassandra actually spans two different physical datacenters. If there is any 
connection issue between the datacenters, including long latencies, then any 
local_quorum may not serviced, due to 2 replicas being in the inaccessible 
datacenter.

Finally, we quite often had problems at the rebuild stage, and needed different 
settings depending on the type of cluster. So be prepared to fail at that point 
and experiment with different settings, but the good thing about this process 
is the fact that you can rollback at any stage without affecting the original 
cluster.

Paul Chandler


> On 3 Apr 2019, at 10:46, Stefan Miklosovic 
>  wrote:
> 
> On Wed, 3 Apr 2019 at 18:38, Oleksandr Shulgin
> mailto:oleksandr.shul...@zalando.de>> wrote:
>> 
>> On Wed, Apr 3, 2019 at 12:28 AM Saleil Bhat (BLOOMBERG/ 731 LEX) 
>>  wrote:
>>> 
>>> 
>>> The standard procedure for doing this seems to be add a 3rd datacenter to 
>>> the cluster, stream data to the new datacenter via nodetool rebuild, then 
>>> decommission the old datacenter. A more detailed review of this procedure 
>>> can be found here:
> http://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>>> 
>>> 
> 
> However, I see two problems with the above protocol. First, it requires 
> changes on the application layer because of the datacenter name change; e.g. 
> all applications referring to the datacenter ‘Orlando’ will now have to be 
> changed to refer to ‘Tampa’.
>> 
>> 
>> Alternatively, you may omit DC specification in the client and provide 
>> internal network addresses as the contact points.
> 
> I am afraid you are mixing two things together. I believe OP means
> that he has to change local dc in DCAwareRoundRobinPolicy. I am not
> sure what contact points have to do with that. If there is at least
> one contact point from DC nobody removes all should be fine.
> 
> The process in the article is right. Before transitioning to new DC
> one has to be sure that all writes and reads still target old dc too
> after you alter a keyspace and add new dc there so you dont miss any
> write when something goes south and you have to switch it back. Thats
> achieved by local_one / local_quorum and DCAwareRoundRobinPolicy with
> localDc pointing to the old one.
> 
> Then you do rebuild and you restart your app in such way that new DC
> will be in that policy so new writes and reads are going primarily to
> that DC and once all is fine you drop the old one (you can do maybe
> additional repair to be sure). I think the rolling restart of the app
> is inevitable but if services are in some kind of HA setup I dont see
> a problem with that. From outside it would look like there is not any
> downtime.
> 
> OP has a problem with repair on nodes and it is true that can be time
> consuming, even not doable, but there are workarounds around that and
> I do not want to go into here. You can speed this process
> significantly when you are smart about that and you repair in smaller
> chunks so you dont clog your cluster completely, its called subrange
> repair.
> 
>>> As such, I was wondering what peoples’ thoughts were on the following 
>>> alternative procedure:
>>> 1) Kill one node in the old datacenter
>>> 2) Add a new node in the new datacenter but indicate that it is to REPLACE 
>>> the one just shutdown; this node will bootstrap, and all the data which it 
>>> is supposed to be responsible for will be streamed to it
>> 
>> 
>> I don't think this is going to work.  First, I believe streaming for 
>> bootstrap or for replacing a node is DC-local, so the first node won't have 
>> any peers to stream from.  Even if it would stream from the remote DC, this 
>> single node will own 100% of the ring and will most likely die of the load 
>> well before it finishes streaming.
>> 
>> Regards,
>> --
>> Alex
>> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>

57 matches

Mail list logo