from:"Kenneth Brotman"

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

Alex,

According to this TLP article 
http://thelastpickle.com/blog/2018/09/18/assassinate.html :

Note that the LEFT status should stick around for 72 hours to ensure all nodes 
come to the consensus that the node has been removed. So please don’t rush 
things if that’s the case. Again, it’s only cosmetic.

If a gossip state will not forget a node that was removed from the cluster more 
than a week ago:

Login to each node within the Cassandra cluster.
Download jmxterm on each node, if nodetool assassinate is not an option.
Run nodetool assassinate, or the unsafeAssassinateEndpoint command, 
multiple times in quick succession.
I typically recommend running the command 3-5 times within 2 seconds.
I understand that sometimes the command takes time to return, so the “2 
seconds” suggestion is less of a requirement than it is a mindset.
Also, sometimes 3-5 times isn’t enough. In such cases, shoot for the 
moon and try 20 assassination attempts in quick succession.

What we are trying to do is to create a flood of messages requesting all nodes 
completely forget there used to be an entry within the gossip state for the 
given IP address. If each node can prune its own gossip state and broadcast 
that to the rest of the nodes, we should eliminate any race conditions that may 
exist where at least one node still remembers the given IP address.

As soon as all nodes come to agreement that they don’t remember the deprecated 
node, the cosmetic issue will no longer be a concern in any system.logs, 
nodetool describecluster commands, nor nodetool gossipinfo output.

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, April 04, 2019 10:40 AM
To: user@cassandra.apache.org
Subject: RE: Assassinate fails

Alex,

Did you remove the option JVM_OPTS="$JVM_OPTS 
-Dcassandra.replace_address=address_of_dead_node after the node started and 
then restart the node again?

Are you sure there isn't a typo in the file?

Ken

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, April 04, 2019 10:31 AM
To: user@cassandra.apache.org
Subject: RE: Assassinate fails

I see; system_auth is a separate keyspace.

-Original Message-
From: Jon Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, April 04, 2019 10:17 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

No, it can't.  As Alain (and I) have said, since the system keyspace
is local strategy, it's not replicated, and thus can't be repaired.

On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
 wrote:
>
> Right, could be similar issue, same type of fix though.
>
> -Original Message-
> From: Jon Haddad [mailto:j...@jonhaddad.com]
> Sent: Thursday, April 04, 2019 9:52 AM
> To: user@cassandra.apache.org
> Subject: Re: Assassinate fails
>
> System != system_auth.
>
> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>  wrote:
> >
> > From Mastering Cassandra:
> >
> >
> > Forcing read repairs at consistency – ALL
> >
> > The type of repair isn't really part of the Apache Cassandra repair 
> > paradigm at all. When it was discovered that a read repair will trigger 
> > 100% of the time when a query is run at ALL consistency, this method of 
> > repair started to gain popularity in the community. In some cases, this 
> > method of forcing data consistency provided better results than normal, 
> > scheduled repairs.
> >
> > Let's assume, for a second, that an application team is having a hard time 
> > logging into a node in a new data center. You try to cqlsh out to these 
> > nodes, and notice that you are also experiencing intermittent failures, 
> > leading you to suspect that the system_auth tables might be missing a 
> > replica or two. On one node you do manage to connect successfully using 
> > cqlsh. One quick way to fix consistency on the system_auth tables is to set 
> > consistency to ALL, and run an unbound SELECT on every table, tickling each 
> > record:
> >
> > use system_auth ;
> > consistency ALL;
> > consistency level set to ALL.
> >
> > SELECT COUNT(*) FROM resource_role_permissons_index ;
> > SELECT COUNT(*) FROM role_permissions ;
> > SELECT COUNT(*) FROM role_members ;
> > SELECT COUNT(*) FROM roles;
> >
> > This problem is often seen when logging in with the default cassandra user. 
> > Within cqlsh, there is code that forces the default cassandra user to 
> > connect by querying system_auth at QUORUM consistency. This can be 
> > problematic in larger clusters, and is another reason why you should never 
> > use the default cassandra user.
> >
> >
> >
> > -Original Message-
> > From: Jon Haddad

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

Alex,

Did you remove the option JVM_OPTS="$JVM_OPTS 
-Dcassandra.replace_address=address_of_dead_node after the node started and 
then restart the node again?

Are you sure there isn't a typo in the file?

Ken


-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, April 04, 2019 10:31 AM
To: user@cassandra.apache.org
Subject: RE: Assassinate fails

I see; system_auth is a separate keyspace.

-Original Message-
From: Jon Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, April 04, 2019 10:17 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

No, it can't.  As Alain (and I) have said, since the system keyspace
is local strategy, it's not replicated, and thus can't be repaired.

On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
 wrote:
>
> Right, could be similar issue, same type of fix though.
>
> -Original Message-
> From: Jon Haddad [mailto:j...@jonhaddad.com]
> Sent: Thursday, April 04, 2019 9:52 AM
> To: user@cassandra.apache.org
> Subject: Re: Assassinate fails
>
> System != system_auth.
>
> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>  wrote:
> >
> > From Mastering Cassandra:
> >
> >
> > Forcing read repairs at consistency – ALL
> >
> > The type of repair isn't really part of the Apache Cassandra repair 
> > paradigm at all. When it was discovered that a read repair will trigger 
> > 100% of the time when a query is run at ALL consistency, this method of 
> > repair started to gain popularity in the community. In some cases, this 
> > method of forcing data consistency provided better results than normal, 
> > scheduled repairs.
> >
> > Let's assume, for a second, that an application team is having a hard time 
> > logging into a node in a new data center. You try to cqlsh out to these 
> > nodes, and notice that you are also experiencing intermittent failures, 
> > leading you to suspect that the system_auth tables might be missing a 
> > replica or two. On one node you do manage to connect successfully using 
> > cqlsh. One quick way to fix consistency on the system_auth tables is to set 
> > consistency to ALL, and run an unbound SELECT on every table, tickling each 
> > record:
> >
> > use system_auth ;
> > consistency ALL;
> > consistency level set to ALL.
> >
> > SELECT COUNT(*) FROM resource_role_permissons_index ;
> > SELECT COUNT(*) FROM role_permissions ;
> > SELECT COUNT(*) FROM role_members ;
> > SELECT COUNT(*) FROM roles;
> >
> > This problem is often seen when logging in with the default cassandra user. 
> > Within cqlsh, there is code that forces the default cassandra user to 
> > connect by querying system_auth at QUORUM consistency. This can be 
> > problematic in larger clusters, and is another reason why you should never 
> > use the default cassandra user.
> >
> >
> >
> > -Original Message-
> > From: Jon Haddad [mailto:j...@jonhaddad.com]
> > Sent: Thursday, April 04, 2019 9:21 AM
> > To: user@cassandra.apache.org
> > Subject: Re: Assassinate fails
> >
> > Ken,
> >
> > Alain is right about the system tables.  What you're describing only
> > works on non-local tables.  Changing the CL doesn't help with
> > keyspaces that use LocalStrategy.  Here's the definition of the system
> > keyspace:
> >
> > CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
> > AND durable_writes = true;
> >
> > Jon
> >
> > On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
> >  wrote:
> > >
> > > The trick below I got from the book Mastering Cassandra.  You have to set 
> > > the consistency to ALL for it to work. I thought you guys knew that one.
> > >
> > >
> > >
> > > From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> > > Sent: Thursday, April 04, 2019 8:46 AM
> > > To: user cassandra.apache.org
> > > Subject: Re: Assassinate fails
> > >
> > >
> > >
> > > Hi Alex,
> > >
> > >
> > >
> > > About previous advices:
> > >
> > >
> > >
> > > You might have inconsistent data in your system tables.  Try setting the 
> > > consistency level to ALL, then do read query of system tables to force 
> > > repair.
> > >
> > >
> > >
> > > System tables use the 'LocalStrategy', thus I don't think any repair 
> > > would happen for the system.* tables. Regardless the consistency you use. 
> > > It should not harm, but I really

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

I see; system_auth is a separate keyspace.

-Original Message-
From: Jon Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, April 04, 2019 10:17 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

No, it can't.  As Alain (and I) have said, since the system keyspace
is local strategy, it's not replicated, and thus can't be repaired.

On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
 wrote:
>
> Right, could be similar issue, same type of fix though.
>
> -Original Message-
> From: Jon Haddad [mailto:j...@jonhaddad.com]
> Sent: Thursday, April 04, 2019 9:52 AM
> To: user@cassandra.apache.org
> Subject: Re: Assassinate fails
>
> System != system_auth.
>
> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>  wrote:
> >
> > From Mastering Cassandra:
> >
> >
> > Forcing read repairs at consistency – ALL
> >
> > The type of repair isn't really part of the Apache Cassandra repair 
> > paradigm at all. When it was discovered that a read repair will trigger 
> > 100% of the time when a query is run at ALL consistency, this method of 
> > repair started to gain popularity in the community. In some cases, this 
> > method of forcing data consistency provided better results than normal, 
> > scheduled repairs.
> >
> > Let's assume, for a second, that an application team is having a hard time 
> > logging into a node in a new data center. You try to cqlsh out to these 
> > nodes, and notice that you are also experiencing intermittent failures, 
> > leading you to suspect that the system_auth tables might be missing a 
> > replica or two. On one node you do manage to connect successfully using 
> > cqlsh. One quick way to fix consistency on the system_auth tables is to set 
> > consistency to ALL, and run an unbound SELECT on every table, tickling each 
> > record:
> >
> > use system_auth ;
> > consistency ALL;
> > consistency level set to ALL.
> >
> > SELECT COUNT(*) FROM resource_role_permissons_index ;
> > SELECT COUNT(*) FROM role_permissions ;
> > SELECT COUNT(*) FROM role_members ;
> > SELECT COUNT(*) FROM roles;
> >
> > This problem is often seen when logging in with the default cassandra user. 
> > Within cqlsh, there is code that forces the default cassandra user to 
> > connect by querying system_auth at QUORUM consistency. This can be 
> > problematic in larger clusters, and is another reason why you should never 
> > use the default cassandra user.
> >
> >
> >
> > -Original Message-
> > From: Jon Haddad [mailto:j...@jonhaddad.com]
> > Sent: Thursday, April 04, 2019 9:21 AM
> > To: user@cassandra.apache.org
> > Subject: Re: Assassinate fails
> >
> > Ken,
> >
> > Alain is right about the system tables.  What you're describing only
> > works on non-local tables.  Changing the CL doesn't help with
> > keyspaces that use LocalStrategy.  Here's the definition of the system
> > keyspace:
> >
> > CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
> > AND durable_writes = true;
> >
> > Jon
> >
> > On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
> >  wrote:
> > >
> > > The trick below I got from the book Mastering Cassandra.  You have to set 
> > > the consistency to ALL for it to work. I thought you guys knew that one.
> > >
> > >
> > >
> > > From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> > > Sent: Thursday, April 04, 2019 8:46 AM
> > > To: user cassandra.apache.org
> > > Subject: Re: Assassinate fails
> > >
> > >
> > >
> > > Hi Alex,
> > >
> > >
> > >
> > > About previous advices:
> > >
> > >
> > >
> > > You might have inconsistent data in your system tables.  Try setting the 
> > > consistency level to ALL, then do read query of system tables to force 
> > > repair.
> > >
> > >
> > >
> > > System tables use the 'LocalStrategy', thus I don't think any repair 
> > > would happen for the system.* tables. Regardless the consistency you use. 
> > > It should not harm, but I really think it won't help.
> > >
> > >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: user-h...@cassandra.apache.org
> >
> >
> > -
> > To unsubscrib

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

Right, could be similar issue, same type of fix though.

-Original Message-
From: Jon Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, April 04, 2019 9:52 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

System != system_auth.

On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
 wrote:
>
> From Mastering Cassandra:
>
>
> Forcing read repairs at consistency – ALL
>
> The type of repair isn't really part of the Apache Cassandra repair paradigm 
> at all. When it was discovered that a read repair will trigger 100% of the 
> time when a query is run at ALL consistency, this method of repair started to 
> gain popularity in the community. In some cases, this method of forcing data 
> consistency provided better results than normal, scheduled repairs.
>
> Let's assume, for a second, that an application team is having a hard time 
> logging into a node in a new data center. You try to cqlsh out to these 
> nodes, and notice that you are also experiencing intermittent failures, 
> leading you to suspect that the system_auth tables might be missing a replica 
> or two. On one node you do manage to connect successfully using cqlsh. One 
> quick way to fix consistency on the system_auth tables is to set consistency 
> to ALL, and run an unbound SELECT on every table, tickling each record:
>
> use system_auth ;
> consistency ALL;
> consistency level set to ALL.
>
> SELECT COUNT(*) FROM resource_role_permissons_index ;
> SELECT COUNT(*) FROM role_permissions ;
> SELECT COUNT(*) FROM role_members ;
> SELECT COUNT(*) FROM roles;
>
> This problem is often seen when logging in with the default cassandra user. 
> Within cqlsh, there is code that forces the default cassandra user to connect 
> by querying system_auth at QUORUM consistency. This can be problematic in 
> larger clusters, and is another reason why you should never use the default 
> cassandra user.
>
>
>
> -Original Message-
> From: Jon Haddad [mailto:j...@jonhaddad.com]
> Sent: Thursday, April 04, 2019 9:21 AM
> To: user@cassandra.apache.org
> Subject: Re: Assassinate fails
>
> Ken,
>
> Alain is right about the system tables.  What you're describing only
> works on non-local tables.  Changing the CL doesn't help with
> keyspaces that use LocalStrategy.  Here's the definition of the system
> keyspace:
>
> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
> AND durable_writes = true;
>
> Jon
>
> On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
>  wrote:
> >
> > The trick below I got from the book Mastering Cassandra.  You have to set 
> > the consistency to ALL for it to work. I thought you guys knew that one.
> >
> >
> >
> > From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> > Sent: Thursday, April 04, 2019 8:46 AM
> > To: user cassandra.apache.org
> > Subject: Re: Assassinate fails
> >
> >
> >
> > Hi Alex,
> >
> >
> >
> > About previous advices:
> >
> >
> >
> > You might have inconsistent data in your system tables.  Try setting the 
> > consistency level to ALL, then do read query of system tables to force 
> > repair.
> >
> >
> >
> > System tables use the 'LocalStrategy', thus I don't think any repair would 
> > happen for the system.* tables. Regardless the consistency you use. It 
> > should not harm, but I really think it won't help.
> >
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

>From Mastering Cassandra:

Forcing read repairs at consistency – ALL

The type of repair isn't really part of the Apache Cassandra repair paradigm at 
all. When it was discovered that a read repair will trigger 100% of the time 
when a query is run at ALL consistency, this method of repair started to gain 
popularity in the community. In some cases, this method of forcing data 
consistency provided better results than normal, scheduled repairs.

Let's assume, for a second, that an application team is having a hard time 
logging into a node in a new data center. You try to cqlsh out to these nodes, 
and notice that you are also experiencing intermittent failures, leading you to 
suspect that the system_auth tables might be missing a replica or two. On one 
node you do manage to connect successfully using cqlsh. One quick way to fix 
consistency on the system_auth tables is to set consistency to ALL, and run an 
unbound SELECT on every table, tickling each record:

use system_auth ;
consistency ALL;
consistency level set to ALL.

SELECT COUNT(*) FROM resource_role_permissons_index ;
SELECT COUNT(*) FROM role_permissions ;
SELECT COUNT(*) FROM role_members ;
SELECT COUNT(*) FROM roles;

This problem is often seen when logging in with the default cassandra user. 
Within cqlsh, there is code that forces the default cassandra user to connect 
by querying system_auth at QUORUM consistency. This can be problematic in 
larger clusters, and is another reason why you should never use the default 
cassandra user.

-Original Message-
From: Jon Haddad [mailto:j...@jonhaddad.com] 
Sent: Thursday, April 04, 2019 9:21 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

Ken,

Alain is right about the system tables.  What you're describing only
works on non-local tables.  Changing the CL doesn't help with
keyspaces that use LocalStrategy.  Here's the definition of the system
keyspace:

CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
AND durable_writes = true;

Jon

On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
 wrote:
>
> The trick below I got from the book Mastering Cassandra.  You have to set the 
> consistency to ALL for it to work. I thought you guys knew that one.
>
>
>
> From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> Sent: Thursday, April 04, 2019 8:46 AM
> To: user cassandra.apache.org
> Subject: Re: Assassinate fails
>
>
>
> Hi Alex,
>
>
>
> About previous advices:
>
>
>
> You might have inconsistent data in your system tables.  Try setting the 
> consistency level to ALL, then do read query of system tables to force repair.
>
>
>
> System tables use the 'LocalStrategy', thus I don't think any repair would 
> happen for the system.* tables. Regardless the consistency you use. It should 
> not harm, but I really think it won't help.
>
>

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

The trick below I got from the book Mastering Cassandra.  You have to set the 
consistency to ALL for it to work. I thought you guys knew that one.

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Thursday, April 04, 2019 8:46 AM
To: user cassandra.apache.org
Subject: Re: Assassinate fails

Hi Alex,

About previous advices:

You might have inconsistent data in your system tables.  Try setting the 
consistency level to ALL, then do read query of system tables to force repair.

System tables use the 'LocalStrategy', thus I don't think any repair would 
happen for the system.* tables. Regardless the consistency you use. It should 
not harm, but I really think it won't help.

RE: Assassinate fails

2019-04-04 Thread Kenneth Brotman

Hi Alex,

You might have inconsistent data in your system tables.  Try setting the 
consistency level to ALL, then do read query of system tables to force repair.

Kenneth Brotman

From: Alex [mailto:m...@aca-o.com] 
Sent: Thursday, April 04, 2019 1:58 AM
To: user@cassandra.apache.org
Subject: Re: Assassinate fails

Hi Anthony,

Thanks for your help.

I tried to run multiple times in quick succession but it fails with :

-- StackTrace --
java.lang.RuntimeException: Endpoint still alive: /192.168.1.18 generation 
changed while trying to assassinate it
at 
org.apache.cassandra.gms.Gossiper.assassinateEndpoint(Gossiper.java:592)

I can see that the generation number for this node increases by 1 every time I 
call nodetool assassinate ; and the command itself waits for 30 seconds before 
assassinating node. When ran multiple times in quick succession, the command 
fails because the generation number has been changed by the previous instance.

In 'nodetool gossipinfo', the node is marked as "LEFT" on every node.

However, in 'nodetool describecluster', this node is marked as "unreacheable" 
on 3 nodes out of 5.

Alex

Le 04.04.2019 00:56, Anthony Grasso a écrit :

Hi Alex, 

We wrote a blog post on this topic late last year: 
http://thelastpickle.com/blog/2018/09/18/assassinate.html.

In short, you will need to run the assassinate command on each node 
simultaneously a number of times in quick succession. This will generate a 
number of messages requesting all nodes completely forget there used to be an 
entry within the gossip state for the given IP address.

Regards,

Anthony

On Thu, 4 Apr 2019 at 03:32, Alex  wrote:

Same result it seems:
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.net:type=Gossiper
#bean is set to org.apache.cassandra.net:type=Gossiper
$>run unsafeAssassinateEndpoint 192.168.1.18
#calling operation unsafeAssassinateEndpoint of mbean 
org.apache.cassandra.net:type=Gossiper
#RuntimeMBeanException: java.lang.NullPointerException

There not much more to see in log files :
WARN  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,626 
Gossiper.java:575 - Assassinating /192.168.1.18 via gossip
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:13,627 
Gossiper.java:585 - Sleeping for 3ms to ensure /192.168.1.18 does 
not change
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,628 
Gossiper.java:1029 - InetAddress /192.168.1.18 is now DOWN
INFO  [RMI TCP Connection(10)-127.0.0.1] 2019-04-03 16:25:43,631 
StorageService.java:2324 - Removing tokens [..] for /192.168.1.18

Le 03.04.2019 17:10, Nick Hatfield a écrit :
> Run assassinate the old way. I works very well...
> 
> wget -q -O jmxterm.jar
> http://downloads.sourceforge.net/cyclops-group/jmxterm-1.0-alpha-4-uber.jar
> 
> java -jar ./jmxterm.jar
> 
> $>open localhost:7199
> 
> $>bean org.apache.cassandra.net:type=Gossiper
> 
> $>run unsafeAssassinateEndpoint 192.168.1.18
> 
> $>quit
> 
> 
> Happy deleting
> 
> -Original Message-
> From: Alex [mailto:m...@aca-o.com]
> Sent: Wednesday, April 03, 2019 10:42 AM
> To: user@cassandra.apache.org
> Subject: Assassinate fails
> 
> Hello,
> 
> Short story:
> - I had to replace a dead node in my cluster
> - 1 week after, dead node is still seen as DN by 3 out of 5 nodes
> - dead node has null host_id
> - assassinate on dead node fails with error
> 
> How can I get rid of this dead node ?
> 
> 
> Long story:
> I had a 3 nodes cluster (Cassandra 3.9) ; one node went dead. I built
> a new node from scratch and "replaced" the dead node using the
> information from this page
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceNode.html.
> It looked like the replacement went ok.
> 
> I added two more nodes to strengthen the cluster.
> 
> A few days have passed and the dead node is still visible and marked
> as "down" on 3 of 5 nodes in nodetool status:
> 
> --  Address   Load   Tokens   Owns (effective)  Host ID
>   Rack
> UN  192.168.1.9   16 GiB 256  35.0%
> 76223d4c-9d9f-417f-be27-cebb791cddcc  rack1
> UN  192.168.1.12  16.09 GiB  256  34.0%
> 719601e2-54a6-440e-a379-c9cf2dc20564  rack1
> UN  192.168.1.14  14.16 GiB  256  32.6%
> d8017a03-7e4e-47b7-89b9-cd9ec472d74f  rack1
> UN  192.168.1.17  15.4 GiB   256  34.1%
> fa238b21-1db1-47dc-bfb7-beedc6c9967a  rack1
> DN  192.168.1.18  24.3 GiB   256  33.7% null
>   rack1
> UN  192.168.1.22  19.06 GiB  256  30.7%
> 09d24557-4e98-44c3-8c9d-53c4c31066e1  rack1
> 
> Its host ID is nul

RE: Five Questions for Cassandra Users

2019-03-28 Thread Kenneth Brotman

Yes, absolutely!

From: Jonathan Koppenhofer [mailto:j...@koppedomain.com] 
Sent: Thursday, March 28, 2019 1:18 PM
To: user@cassandra.apache.org
Subject: Re: Five Questions for Cassandra Users

I think it would also be interesting to hear how people are handling automation 
(which to me is different than AI) and config management.

For us it is a combo of custom Java workflows and Saltstack.

On Thu, Mar 28, 2019, 5:03 AM Kenneth Brotman  
wrote:

I’m looking to get a better feel for how people use Cassandra in practice.  I 
thought others would benefit as well so may I ask you the following five 
questions:

1.   Do the same people where you work operate the cluster and write the 
code to develop the application?

2.   Do you have a metrics stack that allows you to see graphs of various 
metrics with all the nodes displayed together?

3.   Do you have a log stack that allows you to see the logs for all the 
nodes together?

4.   Do you regularly repair your clusters - such as by using Reaper?

5.   Do you use artificial intelligence to help manage your clusters?

Thank you for taking your time to share this information!

Kenneth Brotman

Five Questions for Cassandra Users

2019-03-28 Thread Kenneth Brotman

I'm looking to get a better feel for how people use Cassandra in practice.
I thought others would benefit as well so may I ask you the following five
questions:

 

1.   Do the same people where you work operate the cluster and write the
code to develop the application?

 

2.   Do you have a metrics stack that allows you to see graphs of
various metrics with all the nodes displayed together?

 

3.   Do you have a log stack that allows you to see the logs for all the
nodes together?

 

4.   Do you regularly repair your clusters - such as by using Reaper?

 

5.   Do you use artificial intelligence to help manage your clusters?

 

 

Thank you for taking your time to share this information!

 

Kenneth Brotman

RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman

I found their YouTube video, Machine Learning & The future of DevOps – An Intro 
to Vorstella: https://www.youtube.com/watch?v=YZ5_LAXvUUo

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 05, 2019 11:50 AM
To: user@cassandra.apache.org
Subject: RE: Looking for feedback on automated root-cause system

You are the real deal. I know you’ve been a top notch person in the community 
for a long time.  Glad to hear that this is coming.  It’s very exciting!

From: Matthew Stump [mailto:mst...@vorstella.com] 
Sent: Tuesday, March 05, 2019 11:47 AM
To: user@cassandra.apache.org
Subject: Re: Looking for feedback on automated root-cause system

We probably will, that'll come soon-ish (a couple of weeks perhaps). Right now 
we're limited by who we can engage with in order to collect feedback.

On Tue, Mar 5, 2019 at 11:34 AM Kenneth Brotman  
wrote:

Simulators will never get you there.  Why don’t you let everyone plug in to the 
NOC in exchange for standard features or limited scale, make some money on the 
big cats that can you can make value proposition attractive for anyway.  You 
get the data you have to have – and free; everyone’s Cassandra cluster get’s 
smart!

From: Matthew Stump [mailto:mst...@vorstella.com] 
Sent: Tuesday, March 05, 2019 11:12 AM
To: user@cassandra.apache.org
Subject: Re: Looking for feedback on automated root-cause system

Getting people to send data to us can be a little bit of a PITA, but it's 
doable. We've got data from regulated/secure environments streaming in. None of 
the data we collect is a risk, but the default is to say no and you've got to 
overcome that barrier. We've been through the audit a bunch of times, it gets 
easier each time because everyone asks more or less the same questions and 
requires the same set of disclosures.

Cold start for AI is always an issue but we overcame it via two routes:

We had customers from a pre-existing line of business. We were probably the 
first ones to run production Cassandra workloads at scale in k8s. We funded the 
work behind the some of the initial blog posts and had to figure out most of 
the ins-and-outs of making it work. This data is good for helping to identify 
edge cases and bugs that you wouldn't normally encounter, but it's super noisy 
and you've got to do a lot to isolate and/or derive value from data in the 
beginning if you're attempting to do root cause.

Leveraging the above we built out an extensive simulations pipeline. It 
initially started as python scripts targeting k8s, but it's since been fully 
automated with Spinnaker.  We have a couple of simulations running all the time 
doing continuous integration with the models, collectors and pipeline code, but 
will burst out to a couple hundred clusters if we need to test something 
complicated. It's takes just a couple of minutes to have it spin up hundreds of 
different load generators, targeting different versions of C*, running with 
different topologies, using clean disks or restoring from previous snapshots.

As the corpus grows simulations mater less, and it's easier to get signal from 
noise in a customer cluster.

On Tue, Mar 5, 2019 at 10:15 AM Kenneth Brotman  
wrote:

Matt,

Do you anticipate having trouble getting clients to allow the collector to send 
data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy about 
that?

Your ML can only work if it’s got LOTS of data from many different scenarios.  
How are you addressing that?  How are you able to get that much good quality 
data?

Kenneth Brotman

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, March 05, 2019 10:01 AM
To: 'user@cassandra.apache.org'
Subject: RE: Looking for feedback on automated root-cause system

I see they have a website now at https://vorstella.com/

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in abo

RE: data modelling

2019-03-05 Thread Kenneth Brotman

You definitely don’t need a secondary index.  A MV might be the answer.  

 

How many tagids does a sensor have ?

Do you have to use a collection for tagids?

How many sensors would you expect to have a particular tagid?

Would you know the customerid and sensorid and be able to specify that in the 
query?

 

If you could have tagid not be a collection, and make it part of the primary 
key, that would help a lot.

  

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 05, 2019 4:33 PM
To: user@cassandra.apache.org
Subject: RE: data modelling

 

Hi Bobbie,

 

You’re not giving enough information to model the data.  With Cassandra it’s 
based on the queries you are going to need.  This link to Jeffrey Carpenter’s 
book, Cassandra the Definitive Guide, Chapter 5, which is on how to do data 
modeling for Cassandra, should be of help to you: 
https://books.google.com/books?id=uW-PDAAAQBAJ 
<https://books.google.com/books?id=uW-PDAAAQBAJ=PA79=PA79=jeff+carpenter+chapter+5=bl=58cM-BII2M=ACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ=en=X=2ahUKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=onepage=jeff%20carpenter%20chapter%205=false>
 
=PA79=PA79=jeff+carpenter+chapter+5=bl=58cM-BII2M=ACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ=en=X=2ahUKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=onepage=jeff%20carpenter%20chapter%205=false

 

 

 

From: Bobbie Haynes [mailto:haynes30...@gmail.com] 
Sent: Tuesday, March 05, 2019 4:19 PM
To: user@cassandra.apache.org
Subject: data modelling

 

Hi 

   Could you help  modelling this usecase 

 

   I have below table ..I will update tagid's columns set(bigit) based on PK. I 
have created the secondary index column on tagid to query like below..

 

Select * from keyspace.customer_sensor_tagids where tagids CONTAINS 11358097;

 

this query is doing the range scan because of the secondary index.. and causing 
performance issues 

 

If i create a MV on Tagid's can i be able to query like above.. please suggest 
a Datamodel for this scenario.Apprecite your help on this.

---

---

example of Tagids for each row:-

   4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, 
8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, 10980255, 
11009971, 11043805, 11075379, 11078819, 11167844, 11358097, 11479340, 11481769, 
11481770, 11481771, 11481772, 11693597, 11709012, 12193230, 12421500, 12421516, 
12421781, 12422011, 12422368, 12422501, 12422512, 12422553, 12422555, 12423381, 
12423382

 

   
---

---
 

 

   CREATE TABLE keyspace.customer_sensor_tagids (

customerid bigint,

sensorid bigint,

XXX frozen,

XXX frozen,

XXX text,

XXX text,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX frozen,

XXX boolean,

XXX bigint,

XXX list>,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX list>,

XXX set>,

XXX set,

XXX set,

tagids set,

XXX bigint,

XXX list>,

PRIMARY KEY ((customerid, sensorid))

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

RE: data modelling

2019-03-05 Thread Kenneth Brotman

Hi Bobbie,

 

You’re not giving enough information to model the data.  With Cassandra it’s 
based on the queries you are going to need.  This link to Jeffrey Carpenter’s 
book, Cassandra the Definitive Guide, Chapter 5, which is on how to do data 
modeling for Cassandra, should be of help to you: 
https://books.google.com/books?id=uW-PDAAAQBAJ 

 
=PA79=PA79=jeff+carpenter+chapter+5=bl=58cM-BII2M=ACfU3U0-188Fw-jcj1tbMItdPlNH8Lk9yQ=en=X=2ahUKEwinrY3OoezgAhWoHDQIHRfmA7IQ6AEwA3oECAcQAQ#v=onepage=jeff%20carpenter%20chapter%205=false

 

 

 

From: Bobbie Haynes [mailto:haynes30...@gmail.com] 
Sent: Tuesday, March 05, 2019 4:19 PM
To: user@cassandra.apache.org
Subject: data modelling

 

Hi 

   Could you help  modelling this usecase 

 

   I have below table ..I will update tagid's columns set(bigit) based on PK. I 
have created the secondary index column on tagid to query like below..

 

Select * from keyspace.customer_sensor_tagids where tagids CONTAINS 11358097;

 

this query is doing the range scan because of the secondary index.. and causing 
performance issues 

 

If i create a MV on Tagid's can i be able to query like above.. please suggest 
a Datamodel for this scenario.Apprecite your help on this.

---

---

example of Tagids for each row:-

   4608831, 608886, 608890, 609164, 615024, 679579, 814791, 830404, 71756, 
8538307, 9936868, 10883336, 10954034, 10958062, 10976553, 10976554, 10980255, 
11009971, 11043805, 11075379, 11078819, 11167844, 11358097, 11479340, 11481769, 
11481770, 11481771, 11481772, 11693597, 11709012, 12193230, 12421500, 12421516, 
12421781, 12422011, 12422368, 12422501, 12422512, 12422553, 12422555, 12423381, 
12423382

 

   
---

---
 

 

   CREATE TABLE keyspace.customer_sensor_tagids (

customerid bigint,

sensorid bigint,

XXX frozen,

XXX frozen,

XXX text,

XXX text,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX frozen,

XXX boolean,

XXX bigint,

XXX list>,

XXX frozen,

XXX bigint,

XXX bigint,

XXX list>,

XXX list>,

XXX set>,

XXX set,

XXX set,

tagids set,

XXX bigint,

XXX list>,

PRIMARY KEY ((customerid, sensorid))

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(tagids));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (values(XXX));

CREATE INDEX XXX ON keyspace.customer_sensor_tagids (XXX);

RE: AxonOps - Cassandra operational management tool

2019-03-05 Thread Kenneth Brotman

Hayato,

 

I agree with what you are addressing as I’ve always thought the big elephant in 
the room regarding Cassandra was that you had to use all these other tools, 
each of which requires updating, configuring changes, and that too much 
attention had to be paid to all those other tools instead of what your trying 
to accomplish; when instead if addressed it all could be centralized, 
internalized, or something but clearly it was quite doable.  

 

Questions regarding where things are at:

 

Are you using AxonOps in any of your clients Apache Cassandra production 
clusters?

 

What is the largest Cassandra cluster in which you use it?

 

Would you recommend NOT using AxonOps on production clusters for now or do you 
consider it safe to do so?

 

What is the largest Cassandra cluster you would recommend using AxonOps on?

 

Can it handle multi-cloud clusters?

 

Which clouds does it play nice with?

 

Is it good for use for on-prem nodes (or cloud only)?

 

Which versions of Cassandra does it play nice with?

 

Any rough idea when a download will be available?

 

Your blog post at https://digitalis.io/blog/apache-cassandra-management-tool/ 
provides a lot of answers already!  Really very promising!

 

Thanks,

 

Kenneth Brotman

 

 

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Sunday, March 03, 2019 7:51 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

Hi Kenneth,

 

Thanks for your great feedback! We're not trying to be secretive, but just not 
amazing at promoting ourselves!

 

AxonOps was built by digitalis.io (https://digitalis.io), a company based in 
the UK providing consulting and managed services for Cassandra, Kafka and 
Spark. digitalis.io was founded 3 years ago by 2 ex-DataStax architects but 
their experience of Cassandra predates the tenure at DataStax.

 

We have been looking after a lot of Cassandra clusters for our customers, but 
found ourselves spending more time maintaining monitoring and operational tools 
than Cassandra clusters themselves. The motivation was to build a management 
platform to make our lives easier. You can read my blog here - 
https://digitalis.io/blog/apache-cassandra-management-tool/

 

We have not yet created any videos but that's in our backlog so people can see 
AxonOps in action. No testimonials yet either since the customer of the product 
has been ourselves, and only just released it to the public as beta few weeks 
ago. We've decided to share it for free to anybody using up to 6 nodes, as we 
see a lot of clusters out there within this range.

 

The only investment would be a minimum amount of your time to install it. We 
have made the installation process as easy as possible. Hopefully you will find 
it immensely quicker and easier than installing and configuring ELK, 
Prometheus, Grafana, Nagios, custom backups and repair scheduling. It has 
certainly made our lives easier for sure.

 

We are fully aware of the new features going into 4.0 and beyond. As mentioned 
earlier, we built this for ourselves - a product that does everything we want 
in one solution providing a single pane of glass. It's free and we're sharing 
this with you.

 

Enjoy!

 

Hayato Shimizu

 

 

On Sun, 3 Mar 2019 at 06:05, Kenneth Brotman  
wrote:

Sorry, Nitan was only making a comment about this post but the comments I’m 
making are to AxonOps.  

 

It appears we don’t have a name for anyone at AxonOps at all then!  You guys 
are going to need to be more open.

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Saturday, March 02, 2019 10:02 PM
To: user@cassandra.apache.org
Subject: RE: AxonOps - Cassandra operational management tool

 

Nitan,

 

A few thoughts:


Isn’t it a lot to expect folks to download, install and evaluate the product 
considering,

· You aren’t being very clear about who you are,

· You don’t have any videos demonstrating the product,

· You don’t provide any testimonials, 

· You have no case studies with repeatable results, ROI, etc.  All the 
normal stuff.

· What about testing?  No one knows how well tested it is.  Why would 
we download it?

 

Don’t forget that the open source Cassandra community is already addressing 
ways in which Cassandra itself will be able to do several of the things that 
you listed.  

 

How much added value are you providing with this product?  It’s up to you to 
make the case.  You’ll have to spend more time on the business side of things 
if you want to do any business.

 

Kenneth Brotman

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Saturday, March 02, 2019 3:47 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

It's not an open source product but free up to 6 nodes for now. We're actively 
adding more features to it but it currently supports the following features:

 

- Metrics collection and dashboards

- Logs / events

RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman

Matt,

 

Do you anticipate having trouble getting clients to allow the collector to send 
data up to your NOC?  Wouldn’t a lot of companies be unable or uneasy about 
that?

 

Your ML can only work if it’s got LOTS of data from many different scenarios.  
How are you addressing that?  How are you able to get that much good quality 
data?

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, March 05, 2019 10:01 AM
To: 'user@cassandra.apache.org'
Subject: RE: Looking for feedback on automated root-cause system

 

I see they have a website now at https://vorstella.com/

 

 

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 minute. The goal is to get you from 0 to 
resolution as quickly as possible. 

 

We're looking for feedback on the existing system, do these events make sense, 
do I need to beef up a knowledge base article, did it classify correctly, or is 
there some big bug that everyone is running into that needs to be publicized. 
We're also looking for where to go next, which models are going to make your 
life easier?

 

The system works for C*, Elastic and Kafka. We'll be doing some blog posts 
explaining in more detail how it works and some of the interesting things we've 
found. For example everything everyone thought they knew about Cassandra thread 
pool tuning is wrong, nobody really knows how to tune Kafka for large messages, 
or that there are major issues with the Kubernetes charts that people are using.

 

 

 

On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman  
wrote:

Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump

RE: Looking for feedback on automated root-cause system

2019-03-05 Thread Kenneth Brotman

I see they have a website now at https://vorstella.com/

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 minute. The goal is to get you from 0 to 
resolution as quickly as possible. 

We're looking for feedback on the existing system, do these events make sense, 
do I need to beef up a knowledge base article, did it classify correctly, or is 
there some big bug that everyone is running into that needs to be publicized. 
We're also looking for where to go next, which models are going to make your 
life easier?

The system works for C*, Elastic and Kafka. We'll be doing some blog posts 
explaining in more detail how it works and some of the interesting things we've 
found. For example everything everyone thought they knew about Cassandra thread 
pool tuning is wrong, nobody really knows how to tune Kafka for large messages, 
or that there are major issues with the Kubernetes charts that people are using.

On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman  
wrote:

Any information you can share on the inputs it needs/uses would be helpful.

Kenneth Brotman

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump

RE: A Question About Hints

2019-03-04 Thread Kenneth Brotman

Since you are in the process of upgrading, I’d do nothing on the settings right 
now.  But if you wanted to do something on the settings in the meantime, based 
on my read of the information available, I’d maybe double the default settings. 
The upgrade will help a lot of things as you know.

 

Everyone really should move off of the 2.x versions just like you are doing.

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 12:34 PM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

See my comments inline. 

 

Do the 8 nodes clusters have the problem too?

Yes

 

To the same extent?  

It depends on the throughput, but basically the smaller clusters get low 
throughput, so the problem is naturally smaller. 

 

Is it any cluster across multi-DC’s?

Yes

 

Do all the clusters use nodes with similar specs?

All nodes have similar specs within a cluster but different specs on different 
clusters. 

 

The version of Cassandra you are on can make a difference.  What version are 
you on?

Currently I'm on various versions, 2.0.14, 2.1.15 and 3.0.12. In the process of 
upgrading to 3.11.4 

 

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU where he briefly mentions you can 
get into trouble if you go to fast or two slow?

I guess you can say it about almost any parameter you change :)

 

BTW, I thought the comments at the end of the article you mentioned were really 
good.

The entire article is very good, but I wonder if it's still valid since it was 
created around 4 years ago. 

 

Thanks!

 

 

 

 

On Mon, Mar 4, 2019 at 9:37 PM Kenneth Brotman  
wrote:

Makes sense  If you have time and don’t mind, could you answer the following:

Do the 8 nodes clusters have the problem too? 

To the same extent?  

Is it just the clusters with the large node count? 

Is it any cluster across multi-DC’s?

Do all the clusters use nodes with similar specs?

 

The version of Cassandra you are on can make a difference.  What version are 
you on?

 

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU where he briefly mentions you can 
get into trouble if you go to fast or two slow?

BTW, I thought the comments at the end of the article you mentioned were really 
good.

 

 

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 11:04 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

It varies...

Some clusters have 48 nodes, others 24 nodes and some 8 nodes. 

Both settings are on default. 

 

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

That's the plan, but I thought I might first get some valuable information from 
someone in the community that has already experienced in this type of change. 

 

Thanks!

 

On Mon, Mar 4, 2019 at 8:27 PM Kenneth Brotman  
wrote:

It sounds like your use case might be appropriate for tuning those two settings 
some. 

 

How many nodes are in the cluster?

Are both settings definitely on the default values currently?

 

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

 

Then of course share your results with us.

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 9:54 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

Hi Kenneth, 

 

The concern is that in some cases, hints accumulate on nodes, and it takes a 
while until they are delivered (multi DCs). 

I see that whenever there are  a lot of hints in play,like after a rolling 
restart, the cluster works harder. That's why I want to decrease the hints 
delivery time. 

I didn't want to change the configuration blindly and thought the community 
might have some experience on this subject. 

 

I went over the cassandra.yaml file but didn't find any information on 
optimizing these attributes, just that the max_throttle is divided between 
nodes in the cluster and that I should increase the max_hints_delivery_threads 
because I have multi-dc deployments.   

 

# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously)
hinted_handoff_throttle_in_kb: 1024

# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 2

 

 

Thanks for your help!

 

 

On Mon, Mar 4, 2019 at 6:44 PM Kenneth Brotman  
wrote:

What is the concern?  Why are you looking

RE: A Question About Hints

2019-03-04 Thread Kenneth Brotman

Makes sense.  If you have time and don’t mind, could you answer the following:

Do the 8 nodes clusters have the problem too? 

To the same extent?  

Is it just the clusters with the large node count? 

Is it any cluster across multi-DC’s?

Do all the clusters use nodes with similar specs?

 

The version of Cassandra you are on can make a difference.  What version are 
you on?

 

Did you see Edward Capriolo’s presentation at 26:19 into the YouTube video at: 
https://www.youtube.com/watch?v=uN4FtAjYmLU where he briefly mentions you can 
get into trouble if you go to fast or two slow?



BTW, I thought the comments at the end of the article you mentioned were really 
good.

 

 

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 11:04 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

It varies...

Some clusters have 48 nodes, others 24 nodes and some 8 nodes. 

Both settings are on default. 

 

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

That's the plan, but I thought I might first get some valuable information from 
someone in the community that has already experienced in this type of change. 

 

Thanks!

 

On Mon, Mar 4, 2019 at 8:27 PM Kenneth Brotman  
wrote:

It sounds like your use case might be appropriate for tuning those two settings 
some. 

 

How many nodes are in the cluster?

Are both settings definitely on the default values currently?

 

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

 

Then of course share your results with us.

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 9:54 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

Hi Kenneth, 

 

The concern is that in some cases, hints accumulate on nodes, and it takes a 
while until they are delivered (multi DCs). 

I see that whenever there are  a lot of hints in play,like after a rolling 
restart, the cluster works harder. That's why I want to decrease the hints 
delivery time. 

I didn't want to change the configuration blindly and thought the community 
might have some experience on this subject. 

 

I went over the cassandra.yaml file but didn't find any information on 
optimizing these attributes, just that the max_throttle is divided between 
nodes in the cluster and that I should increase the max_hints_delivery_threads 
because I have multi-dc deployments.   

 

# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously)
hinted_handoff_throttle_in_kb: 1024

# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 2

 

 

Thanks for your help!

 

 

On Mon, Mar 4, 2019 at 6:44 PM Kenneth Brotman  
wrote:

What is the concern?  Why are you looking there?  The casssandra.yml file has 
some notes about it.  Did you read them?

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 7:22 AM
To: user@cassandra.apache.org
Subject: A Question About Hints

 

Hi All,

 

Does anyone know what is the most optimal hints configuration (multiple DCs) in 
terms of 

max_hints_delivery_threads and hinted_handoff_throttle_in_kb? 

If it's different for various use cases, is there a rule of thumb I can work 
with?

 

I found this post but it's quite old:

http://www.uberobert.com/bandwidth-cassandra-hinted-handoff/

 

Thanks!

RE: A Question About Hints

2019-03-04 Thread Kenneth Brotman

It sounds like your use case might be appropriate for tuning those two settings 
some. 

 

How many nodes are in the cluster?

Are both settings definitely on the default values currently?

 

I’d try making a single conservative change to one or the other, measure and 
reassess.  Then do same to other setting.

 

Then of course share your results with us.

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 9:54 AM
To: user@cassandra.apache.org
Subject: Re: A Question About Hints

 

Hi Kenneth, 

 

The concern is that in some cases, hints accumulate on nodes, and it takes a 
while until they are delivered (multi DCs). 

I see that whenever there are  a lot of hints in play,like after a rolling 
restart, the cluster works harder. That's why I want to decrease the hints 
delivery time. 

I didn't want to change the configuration blindly and thought the community 
might have some experience on this subject. 

 

I went over the cassandra.yaml file but didn't find any information on 
optimizing these attributes, just that the max_throttle is divided between 
nodes in the cluster and that I should increase the max_hints_delivery_threads 
because I have multi-dc deployments.   

 

# Maximum throttle in KBs per second, per delivery thread.  This will be
# reduced proportionally to the number of nodes in the cluster.  (If there
# are two nodes in the cluster, each delivery thread will use the maximum
# rate; if there are three, each will throttle to half of the maximum,
# since we expect two nodes to be delivering hints simultaneously.)
hinted_handoff_throttle_in_kb: 1024

# Number of threads with which to deliver hints;
# Consider increasing this number when you have multi-dc deployments, since
# cross-dc handoff tends to be slower
max_hints_delivery_threads: 2

 

 

Thanks for your help!

 

 

On Mon, Mar 4, 2019 at 6:44 PM Kenneth Brotman  
wrote:

What is the concern?  Why are you looking there?  The casssandra.yml file has 
some notes about it.  Did you read them?

 

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 7:22 AM
To: user@cassandra.apache.org
Subject: A Question About Hints

 

Hi All,

 

Does anyone know what is the most optimal hints configuration (multiple DCs) in 
terms of 

max_hints_delivery_threads and hinted_handoff_throttle_in_kb? 

If it's different for various use cases, is there a rule of thumb I can work 
with?

 

I found this post but it's quite old:

http://www.uberobert.com/bandwidth-cassandra-hinted-handoff/

 

Thanks!

RE: A Question About Hints

2019-03-04 Thread Kenneth Brotman

What is the concern?  Why are you looking there?  The casssandra.yml file has 
some notes about it.  Did you read them?

From: shalom sagges [mailto:shalomsag...@gmail.com] 
Sent: Monday, March 04, 2019 7:22 AM
To: user@cassandra.apache.org
Subject: A Question About Hints

Hi All,

Does anyone know what is the most optimal hints configuration (multiple DCs) in 
terms of 

max_hints_delivery_threads and hinted_handoff_throttle_in_kb? 

If it's different for various use cases, is there a rule of thumb I can work 
with?

I found this post but it's quite old:

http://www.uberobert.com/bandwidth-cassandra-hinted-handoff/

Thanks!

RE: MV's stuck in build state

2019-03-04 Thread Kenneth Brotman

Hi Dipan,

 

What is the state of the dev and production clusters now?  

How big is the production cluster?

How many nodes on each cluster spin out of control?

 

On the production cluster, the data is 67 MB, you'd have use a value at
least twice that size as the commitlog_segment_size.  Of course you don't
want to leave it really high if you do change it.  

On the dev cluster the data is 18 MB, you used a value way over twice size
when you bumped the commitlog_sement_size to 128 MB, but in doing so you are
wasting a lot of memory capacity of course, but you say it didn't fix the
problem on that cluster, so. is the message you are getting reflecting this
change by showing  to be 128 MB now or is it still a different value?

 

Is there another problem too perhaps you were running low on capacity on the
node?  Could low capacity be a problem on either cluster? 

 

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Monday, March 04, 2019 12:52 AM
To: Kenneth Brotman; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Hello Kenneth,

 

Apologies for the late reply.

 

1) On production the value of x was 67 MB and y was 16 MV as value of
commitlog_segment_size_in_mb is 32.

2) On Dev the value of x was 18 MB and y was 16 MV as value of
commitlog_segment_size_in_mb was 32 initially. I had bumped up the value of
commitlog_segment_size_in_mb to 128 when the node eventually crashed.

3) No I did not try org.apache.cassandra.db:type=CompactionManager but I did
try "nodetool stop" and "nodetool stop VIEW_BUILD".

 

Thanks,

Dipan Shah

  _____  

From: Kenneth Brotman 
Sent: Friday, March 1, 2019 8:19 PM
To: user@cassandra.apache.org
Subject: RE: MV's stuck in build state 

 

Dipan,

 

On your production cluster, when you were first getting the "Mutation of 
bytes ." message, what was the value of x and y?

How about when you got the message on the Dev Cluster, what was the value of
x and y in that message?

On the Dev cluster, did you try going into JMX and directly hitting the
org.apache.cassandra.db:type=CompactionManager mbean's stopCompaction
operation?

 

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Friday, March 01, 2019 12:56 AM
To: Kenneth Brotman; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Hello Kenneth,

 

Thanks for replying.

 

I had actually tried this on a Dev environment earlier and it caused the
node to spin out of control. I'll explain what I did over there:

 

1) Found "Mutation of  bytes is too large for the maxiumum size of "
and thus increased the value of "commitlog_segment_size_in_mb" to 64

2) This worked for a few minutes and again the view started failing when it
hit the new limits and the messages now were "Mutation of  bytes is too
large for the maxiumum size of 2*"

3) So just to try I increased the value to 128

4) Now after this change the node started crashing as soon as I brought the
service online. I was not able to recover even after restoring the value of
"commitlog_segment_size_in_mb" to 32

 

Now there is a key differences to that issue and what I am facing currently:

 

The views were not dropped on the earlier environment whereas I have already
dropped the view on the current environment (and cant experiment much as the
current environment is in production).

 

I know this is a bit tricky but I'm pretty much stuck over here and thinking
of finding a non-problem creating solution over here.

 

Thanks,

Dipan Shah

  _  

From: Kenneth Brotman 
Sent: Friday, March 1, 2019 12:26 AM
To: user@cassandra.apache.org
Subject: RE: MV's stuck in build state 

 

Hi Dipan,

 

Did you try following the advice in the referenced DataStax article called
Mutation of
<https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-byte
s-is-too-large-for-the-maxiumum-size-of-y->  bytes is too large for the
maximum size of  as suggested in the stackoverflow.com post you cited?

 

Kenneth Brotman

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Thursday, February 28, 2019 2:23 AM
To: Dipan Shah; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Forgot to add version info. This is on 37.

 

[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 34.2 | Native protocol v4]

 

Thanks,

Dipan Shah

  _  

From: Dipan Shah 
Sent: Thursday, February 28, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: MV's stuck in build state 

 

Hello All,

 

I have a few MV's that are stuck in build state because of a bad schema
design and thus getting a lot of messages like this "Mutation xxx is too
large for maximum size of 16.000MiB".

 



 

I have dropped those MV's and I can no longer see their schema in the
keyspace. But they are visible under "system.views_build_in_progress" and
"nodetool viewbuildstatus"

 

I have tried "nodetool stop VIEW_BUILD" as suggested here:
h

RE: AxonOps - Cassandra operational management tool

2019-03-02 Thread Kenneth Brotman

Sorry, Nitan was only making a comment about this post but the comments I’m 
making are to AxonOps.  

 

It appears we don’t have a name for anyone at AxonOps at all then!  You guys 
are going to need to be more open.

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Saturday, March 02, 2019 10:02 PM
To: user@cassandra.apache.org
Subject: RE: AxonOps - Cassandra operational management tool

 

Nitan,

 

A few thoughts:


Isn’t it a lot to expect folks to download, install and evaluate the product 
considering,

· You aren’t being very clear about who you are,

· You don’t have any videos demonstrating the product,

· You don’t provide any testimonials, 

· You have no case studies with repeatable results, ROI, etc.  All the 
normal stuff.

· What about testing?  No one knows how well tested it is.  Why would 
we download it?

 

Don’t forget that the open source Cassandra community is already addressing 
ways in which Cassandra itself will be able to do several of the things that 
you listed.  

 

How much added value are you providing with this product?  It’s up to you to 
make the case.  You’ll have to spend more time on the business side of things 
if you want to do any business.

 

Kenneth Brotman

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Saturday, March 02, 2019 3:47 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

It's not an open source product but free up to 6 nodes for now. We're actively 
adding more features to it but it currently supports the following features:

 

- Metrics collection and dashboards

- Logs / events collection and dashboards

- User configurable health checks

- Alerts / notifications integrations to Slack, Email, PagerDuty (and more to 
come)

- Cassandra backups and scheduling against local filesystem, AWS S3, Azure 
Storage, Google Cloud Storage (and we're adding more cloud vendors)

- Cassandra Adaptive Repair - repair speed is dynamically controlled based on 
the cluster performance

 

We'll be adding more Cassandra operational features as well as security 
enhancements.

 

The installation instructions are available from https://docs.axonops.com.

 

Let us know what you think.

 

AxonOps Team

 

 

 

 

 

On Thu, 14 Feb 2019 at 18:52, Nitan Kainth  wrote:

This is really cool!

 

will it be open source or licensed in near future?

 

On Thu, Feb 14, 2019 at 12:15 PM AxonOps  wrote:

Hi folks,

 

We are excited to announce AxonOps, an operational management tool for Apache 
Cassandra, is now ready for Beta testing.

 

We'd be interested to hear you try this and let us know what you think!

 

Please read the installation instructions on https://www.axonops.com

 

AxonOps Team

RE: AxonOps - Cassandra operational management tool

2019-03-02 Thread Kenneth Brotman

Nitan,

 

A few thoughts:


Isn’t it a lot to expect folks to download, install and evaluate the product 
considering,

· You aren’t being very clear about who you are,

· You don’t have any videos demonstrating the product,

· You don’t provide any testimonials, 

· You have no case studies with repeatable results, ROI, etc.  All the 
normal stuff.

· What about testing?  No one knows how well tested it is.  Why would 
we download it?

 

Don’t forget that the open source Cassandra community is already addressing 
ways in which Cassandra itself will be able to do several of the things that 
you listed.  

 

How much added value are you providing with this product?  It’s up to you to 
make the case.  You’ll have to spend more time on the business side of things 
if you want to do any business.

 

Kenneth Brotman

 

From: AxonOps [mailto:axon...@digitalis.io] 
Sent: Saturday, March 02, 2019 3:47 AM
To: user@cassandra.apache.org
Subject: Re: AxonOps - Cassandra operational management tool

 

It's not an open source product but free up to 6 nodes for now. We're actively 
adding more features to it but it currently supports the following features:

 

- Metrics collection and dashboards

- Logs / events collection and dashboards

- User configurable health checks

- Alerts / notifications integrations to Slack, Email, PagerDuty (and more to 
come)

- Cassandra backups and scheduling against local filesystem, AWS S3, Azure 
Storage, Google Cloud Storage (and we're adding more cloud vendors)

- Cassandra Adaptive Repair - repair speed is dynamically controlled based on 
the cluster performance

 

We'll be adding more Cassandra operational features as well as security 
enhancements.

 

The installation instructions are available from https://docs.axonops.com.

 

Let us know what you think.

 

AxonOps Team

 

 

 

 

 

On Thu, 14 Feb 2019 at 18:52, Nitan Kainth  wrote:

This is really cool!

 

will it be open source or licensed in near future?

 

On Thu, Feb 14, 2019 at 12:15 PM AxonOps  wrote:

Hi folks,

 

We are excited to announce AxonOps, an operational management tool for Apache 
Cassandra, is now ready for Beta testing.

 

We'd be interested to hear you try this and let us know what you think!

 

Please read the installation instructions on https://www.axonops.com

 

AxonOps Team

RE: MV's stuck in build state

2019-03-01 Thread Kenneth Brotman

Dipan,

 

On your production cluster, when you were first getting the "Mutation of 
bytes ." message, what was the value of x and y?

How about when you got the message on the Dev Cluster, what was the value of
x and y in that message?

On the Dev cluster, did you try going into JMX and directly hitting the
org.apache.cassandra.db:type=CompactionManager mbean's stopCompaction
operation?

 

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Friday, March 01, 2019 12:56 AM
To: Kenneth Brotman; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Hello Kenneth,

 

Thanks for replying.

 

I had actually tried this on a Dev environment earlier and it caused the
node to spin out of control. I'll explain what I did over there:

 

1) Found "Mutation of  bytes is too large for the maxiumum size of "
and thus increased the value of "commitlog_segment_size_in_mb" to 64

2) This worked for a few minutes and again the view started failing when it
hit the new limits and the messages now were "Mutation of  bytes is too
large for the maxiumum size of 2*"

3) So just to try I increased the value to 128

4) Now after this change the node started crashing as soon as I brought the
service online. I was not able to recover even after restoring the value of
"commitlog_segment_size_in_mb" to 32

 

Now there is a key differences to that issue and what I am facing currently:

 

The views were not dropped on the earlier environment whereas I have already
dropped the view on the current environment (and cant experiment much as the
current environment is in production).

 

I know this is a bit tricky but I'm pretty much stuck over here and thinking
of finding a non-problem creating solution over here.

 

Thanks,

Dipan Shah

  _  

From: Kenneth Brotman 
Sent: Friday, March 1, 2019 12:26 AM
To: user@cassandra.apache.org
Subject: RE: MV's stuck in build state 

 

Hi Dipan,

 

Did you try following the advice in the referenced DataStax article called
Mutation of
<https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-byte
s-is-too-large-for-the-maxiumum-size-of-y->  bytes is too large for the
maximum size of  as suggested in the stackoverflow.com post you cited?

 

Kenneth Brotman

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Thursday, February 28, 2019 2:23 AM
To: Dipan Shah; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Forgot to add version info. This is on 3.7.

 

[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 34.2 | Native protocol v4]

 

Thanks,

Dipan Shah

  _  

From: Dipan Shah 
Sent: Thursday, February 28, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: MV's stuck in build state 

 

Hello All,

 

I have a few MV's that are stuck in build state because of a bad schema
design and thus getting a lot of messages like this "Mutation xxx is too
large for maximum size of 16.000MiB".

 



 

I have dropped those MV's and I can no longer see their schema in the
keyspace. But they are visible under "system.views_build_in_progress" and
"nodetool viewbuildstatus".

 

I have tried "nodetool stop VIEW_BUILD" as suggested here:
https://stackoverflow.com/questions/40553499/stop-cassandra-materialized-vie
w-build and have also reboot a few nodes in the cluster. This has also not
helped.

 

Is there anything else that can be done over here?


 
<https://stackoverflow.com/questions/40553499/stop-cassandra-materialized-vi
ew-build> 

 
<https://stackoverflowcom/questions/40553499/stop-cassandra-materialized-vie
w-build> Stop Cassandra Materialized View Build - Stack Overflow

Its not documented, but nodetool stop actually takes any compaction type,
not just the ones listed (which the view build is one of). So you can
simply: nodetool stop VIEW_BUILD Or you can hit JMX directly with the
org.apache.cassandra.db:type=CompactionManager mbean's stopCompaction
operation.. All thats really gonna do is set a flag for the view builder to
stop on its next loop.

stackoverflow.com

 

 

Thanks,

Dipan Shah

RE: MV's stuck in build state

2019-02-28 Thread Kenneth Brotman

Hi Dipan,

 

Did you try following the advice in the referenced DataStax article called
Mutation of
<https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-byte
s-is-too-large-for-the-maxiumum-size-of-y->  bytes is too large for the
maximum size of  as suggested in the stackoverflow.com post you cited?

 

Kenneth Brotman

 

From: Dipan Shah [mailto:dipan@hotmail.com] 
Sent: Thursday, February 28, 2019 2:23 AM
To: Dipan Shah; user@cassandra.apache.org
Subject: Re: MV's stuck in build state

 

Forgot to add version info. This is on 3.7.

 

[cqlsh 5.0.1 | Cassandra 3.7 | CQL spec 3.4.2 | Native protocol v4]

 

Thanks,

Dipan Shah

  _  

From: Dipan Shah 
Sent: Thursday, February 28, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: MV's stuck in build state 

 

Hello All,

 

I have a few MV's that are stuck in build state because of a bad schema
design and thus getting a lot of messages like this "Mutation xxx is too
large for maximum size of 16.000MiB".

 



 

I have dropped those MV's and I can no longer see their schema in the
keyspace. But they are visible under "system.views_build_in_progress" and
"nodetool viewbuildstatus".

 

I have tried "nodetool stop VIEW_BUILD" as suggested here:
https://stackoverflow.com/questions/40553499/stop-cassandra-materialized-vie
w-build and have also reboot a few nodes in the cluster. This has also not
helped.

 

Is there anything else that can be done over here?


 
<https://stackoverflow.com/questions/40553499/stop-cassandra-materialized-vi
ew-build> 

 
<https://stackoverflowcom/questions/40553499/stop-cassandra-materialized-vie
w-build> Stop Cassandra Materialized View Build - Stack Overflow

Its not documented, but nodetool stop actually takes any compaction type,
not just the ones listed (which the view build is one of). So you can
simply: nodetool stop VIEW_BUILD Or you can hit JMX directly with the
org.apache.cassandra.db:type=CompactionManager mbean's stopCompaction
operation.. All thats really gonna do is set a flag for the view builder to
stop on its next loop.

stackoverflow.com

 

 

Thanks,

Dipan Shah

RE: Upgrade 3.11.1 to 3.11.4

2019-02-28 Thread Kenneth Brotman

Hi John,

 

Was the cluster running ok before decommissioning the node? 

Why were you decommissioning the node?

Were you upgrading from 3.11.1 to 3.11.4?

 

 

From: Ioannis Zafiropoulos [mailto:john...@gmail.com] 
Sent: Wednesday, February 27, 2019 7:33 AM
To: user@cassandra.apache.org
Subject: Upgrade 3.11.1 to 3.11.4

 

Hi all,

 

During a decommission on a production cluster (9 nodes) we have some issues on 
the remaining nodes regarding compaction, and I have some questions about that:

 

One remaining node who has stopped compacting, due to some bug 
  in 3.11.1, has 
received all the streaming files from the decommission node (decommissioning is 
still in progress for the rest of the cluster). Could I upgrade this node to 
3.11.4 and restart it?

 

Some other nodes which are still receiving files appear to do very little to no 
auto-compaction from nodetool tpstats. Should I wait for streaming to complete 
or should I upgrade these nodes as well and restart them? What would happen if 
I bounce such a node? will the whole process of decommissioning fail?

 

Do you recommend to eventually do a rolling upgrade to 3.11.4 or choose another 
version?

 

Thanks in advance for your help,

John Zaf

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Here is the link to Presto: http://prestodb.github.io/  

 

Yes, it’s definitely a distributed SQL query engine for big data.

 

If you are expecting a lot of different values to come in, then you’re only 
trying to add a value to the existing records?  I don’t understand where the 
data is coming from.  This seems straightforward. I must be missing some aspect 
to it still.

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 5:19 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

The value is constant as of now but we are expecting lot many different values 
to come in 

future.

 

Secondly, is Presto a distributed system ?

 

Thanks and Regards,

Goutham.

 

On Wed, Feb 27, 2019 at 5:09 PM Kenneth Brotman  
wrote:

If you know the value already, why do you need to store it in every row of a 
table?  Seems like something is wrong.  Why do you need to do that, if you can 
share that information?

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, February 27, 2019 5:08 PM
To: 'user@cassandra.apache.org'
Subject: RE: Insert constant value for all the rows for a given column

 

Yup

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 5:06 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

Are you taking about SQL engine Presto ;)

 

On Wed, Feb 27, 2019 at 4:59 PM Kenneth Brotman  
wrote:

Who are Presto?

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

Thanks Kenneth, writing Spark application is our last option and we are looking 
out for some hack way to update the column.

 

On Wed, Feb 27, 2019 at 4:48 PM Kenneth Brotman  
wrote:

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

 

Kenneth Brotman

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

True Bharath, there is no option to add a default value in Cassandra for a 
column.


Regards

Goutham Reddy

 

 

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

 

I didn't see any such option  for adding a default value on a column.

 

Thanks,

Bharath Kumar B

 

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

 

From: Kenneth Brotman [mailto:kenbrotman@yahoo.comINVALID 
<mailto:kenbrot...@yahoo.com.INVALID> ] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

 

Good point.

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

 

Regards

Goutham Reddy

 

 

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 

Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

 

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

 

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

If you know the value already, why do you need to store it in every row of a 
table?  Seems like something is wrong.  Why do you need to do that, if you can 
share that information?

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, February 27, 2019 5:08 PM
To: 'user@cassandra.apache.org'
Subject: RE: Insert constant value for all the rows for a given column

Yup

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 5:06 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Are you taking about SQL engine Presto ;)

On Wed, Feb 27, 2019 at 4:59 PM Kenneth Brotman  
wrote:

Who are Presto?

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Thanks Kenneth, writing Spark application is our last option and we are looking 
out for some hack way to update the column.

On Wed, Feb 27, 2019 at 4:48 PM Kenneth Brotman  
wrote:

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

Kenneth Brotman

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

True Bharath, there is no option to add a default value in Cassandra for a 
column.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

I didn't see any such option  for adding a default value on a column.

Thanks,

Bharath Kumar B

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

From: Kenneth Brotman [mailto:kenbrotman@yahoo.comINVALID 
<mailto:kenbrot...@yahoo.com.INVALID> ] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 

Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

I meant to say, what about using Presto?

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 27, 2019 4:59 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Who are Presto?

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Thanks Kenneth, writing Spark application is our last option and we are looking 
out for some hack way to update the column.

On Wed, Feb 27, 2019 at 4:48 PM Kenneth Brotman  
wrote:

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

Kenneth Brotman

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

True Bharath, there is no option to add a default value in Cassandra for a 
column.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

I didn't see any such option  for adding a default value on a column.

Thanks,

Bharath Kumar B

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 

Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Yup

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 5:06 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Are you taking about SQL engine Presto ;)

On Wed, Feb 27, 2019 at 4:59 PM Kenneth Brotman  
wrote:

Who are Presto?

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Thanks Kenneth, writing Spark application is our last option and we are looking 
out for some hack way to update the column.

On Wed, Feb 27, 2019 at 4:48 PM Kenneth Brotman  
wrote:

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

Kenneth Brotman

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

True Bharath, there is no option to add a default value in Cassandra for a 
column.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

I didn't see any such option  for adding a default value on a column.

Thanks,

Bharath Kumar B

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

From: Kenneth Brotman [mailto:kenbrotman@yahoo.comINVALID 
<mailto:kenbrot...@yahoo.com.INVALID> ] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 

Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Who are Presto?

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:52 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Thanks Kenneth, writing Spark application is our last option and we are looking 
out for some hack way to update the column.

On Wed, Feb 27, 2019 at 4:48 PM Kenneth Brotman  
wrote:

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

Kenneth Brotman

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

True Bharath, there is no option to add a default value in Cassandra for a 
column.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

I didn't see any such option  for adding a default value on a column.

Thanks,

Bharath Kumar B

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 

Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

-- 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Ouch!  I was sure I saw it in some material I studied but no.  It looks like 
you have to provide the value before Cassandra, maybe through your application 
or something in the stream before Cassandra, or add it after Cassandra or use 
something like Spark to process it.

 

Kenneth Brotman

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:32 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

True Bharath, there is no option to add a default value in Cassandra for a 
column.


Regards

Goutham Reddy

 

 

On Wed, Feb 27, 2019 at 4:31 PM kumar bharath  
wrote:

Kenneth, 

 

I didn't see any such option  for adding a default value on a column.

 

Thanks,

Bharath Kumar B

 

On Wed, Feb 27, 2019 at 4:25 PM Kenneth Brotman  
wrote:

How about using a default value on the column?

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

 

Good point.

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

 

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

 

Regards

Goutham Reddy

 

 

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

 

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:06 PM
To: u...@cassandraapache.org <mailto:user@cassandra.apache.org> 
Subject: Insert constant value for all the rows for a given column

 

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

 

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

How about using a default value on the column?

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, February 27, 2019 4:23 PM
To: user@cassandra.apache.org
Subject: RE: Insert constant value for all the rows for a given column

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:06 PM
To: user@cassandra.apache.org
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Good point.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:11 PM
To: user@cassandra.apache.org
Subject: Re: Insert constant value for all the rows for a given column

Kenneth,

I believe "static column" applies for one partition key. Correct me if my 
understanding is wrong.

Regards

Goutham Reddy

On Wed, Feb 27, 2019 at 4:08 PM Kenneth Brotman  
wrote:

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:06 PM
To: user@cassandra.apache.org
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

RE: Disable Truststore CA check for internode_encryption

2019-02-27 Thread Kenneth Brotman

Hello,

 

Why would you want to do that?  

 

From: Jai Bheemsen Rao Dhanwada [mailto:jaibheem...@gmail.com] 
Sent: Wednesday, February 27, 2019 3:57 PM
To: user@cassandra.apache.org
Subject: Disable Truststore CA check for internode_encryption

 

Hello,

 

Is it possible to disable truststore CA check for the cassandra 
internode_encyrption? if yes, is there a config property to do that?

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Here’s a DataStax article called Sharing a static column: 
https://docs.datastax.com/en/cql/3.3/cql/cql_using/refStaticCol.html

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, February 27, 2019 4:08 PM
To: 'user@cassandra.apache.org'
Subject: RE: Insert constant value for all the rows for a given column

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:06 PM
To: user@cassandra.apache.org
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

RE: Insert constant value for all the rows for a given column

2019-02-27 Thread Kenneth Brotman

Sounds like what’s called a “static column”.

From: Goutham reddy [mailto:goutham.chiru...@gmail.com] 
Sent: Wednesday, February 27, 2019 4:06 PM
To: user@cassandra.apache.org
Subject: Insert constant value for all the rows for a given column

Hi,

We have a requirement to add constant value to all the rows for a particular 
column, and we could not find any solution. Can anybody provide standard 
procedure for the problem. Appreciate your help.

Regards

Goutham Reddy

RE: Determine disc space that will be freed after expansion cleanup

2019-02-25 Thread Kenneth Brotman

A real quick way to get an idea might be to run nodetool status and look at the 
imbalance of the data on each node assuming all the nodes have the same specs.

From: Cameron Gandevia [mailto:cameron.gande...@globalrelay.net] 
Sent: Monday, February 25, 2019 3:00 PM
To: user@cassandra.apache.org
Subject: Determine disc space that will be freed after expansion cleanup

Hi

Some Cassandra nodes could have rows that are associated with tokens that 
aren't owned by those nodes anymore as a result of expansion, this data will 
remain until a cleanup compaction is run.

We would like to know the best way to calculate the amount (or close to) of 
data that is essentially dead data on each node to determine how much disk 
space will be freed once the expansion is complete. One possible approach we 
considered is to identify the rows no longer owned by each node and their size 
by scanning the sstables.

RE: Feedback wanted for Knowledge base for all things cassandra (cassandra.link)

2019-02-25 Thread Kenneth Brotman

Hi Rahul!

 

A truly outstanding effort for the community.  I think it should be tied to a 
complete what’s known as a “Body of Knowledge” which is the best way to 
organize it as a learning resource.  Otherwise, you are just trying to out 
google Google.  Lots of luck.  They will always have superior search and A.I.  
You’d be just looking up subject matter sources on Google to try to keep your 
data current and complete.

 

But, if you seek to describe the competencies of various roles related to 
Cassandra – role based is best structure and describing competencies is the 
next step flowing from the body of knowledge - then you have a specialized 
resource for all learners of Cassandra at all levels of competence and for all 
roles related to Cassandra.

 

Regarding the resource directories, for several reasons all resource 
directories should show:

  Type of medium: i.e. video, book, article, course, etc.

  Date of production

  Author/Presenter

  Publisher/Producer/Event

  

Thank you for the continuing effort you have made on this project Rahul!

 

Kenneth Brotman

 

From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com] 
Sent: Monday, February 25, 2019 7:05 AM
To: user
Subject: Feedback wanted for Knowledge base for all things cassandra 
(cassandra.link)

 

Folks, 


I've been scrounging time to work on a knowledge resource for all things 
Cassandra ( Cassandra, DSE, Scylla, YugaByte, Elassandra)

I feel like the Cassandra core community still has the most knowledge even 
though people are fragmenting into their brands. 

 

Would love to get your feedback on what you guys would want as a go to resource 
for Cassandra development, administration, architecture, etc. resources. 


MVP  1

https://anant.github.io/awesome-cassandra 

MVP  2

https://cassandra.netlify.com/ 

MVP  3

https://leaves-search.netlify.com/documents.html#/q=*:* 
<https://leaves-search.netlify.com/documents.html#/q=*:*=tags:(cassandra)=*=20&=>
 =tags:(cassandra)=*=20&=  - 

 

Each of these were iterated with feedback from the community, so would love to 
get your feedback to make it better. 

 

Up next is to add the RSS feeds from the major Cassandra folks like on 
https://cassandra.alteroot.org 

 

Thanks for your feedback in advance. 


  
<https://t.sidekickopen77.com/s2t/o/5/f18dQhb0S7kC8dDMPbW2n0x6l2B9gXrN7sKj6v5KRN6W56jV1n3LjlnqW7dWWcg4mxpbMf197v5Y04?si=71660263=96145521-2573-4382-c792-21f0579a8f91=null>

RE: Cassandra config in table

2019-02-25 Thread Kenneth Brotman

Hi Abdul,

 

system.local I believe has the info you would want.  Here is a link about 
querying the system keyspace: 
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html

 

Kenneth Brotman 

 

From: Abdul Patel [mailto:abd786...@gmail.com] 
Sent: Monday, February 25, 2019 7:23 AM
To: User@cassandra.apache.org
Subject: Cassandra config in table

 

Do we have any sustem table which stores all config details which we have in 
yaml or cassandra env.sh?

RE: Looking for feedback on automated root-cause system

2019-02-24 Thread Kenneth Brotman

Sounds like a promising step forward.  I’d certainly like to know when the blog 
posts are up. 

 

Kenneth Brotman

 

From: Matt Stump [mailto:mrevilgn...@gmail.com] 
Sent: Friday, February 22, 2019 7:56 AM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

For some reason responses to the thread didn't hit my work email, I didn't see 
the responses until I check from my personal. 

 

The way that the system works is that we install a collector that pulls a bunch 
of metrics from each node and sends it up to our NOC every minute. We've got a 
bunch of stream processors that take this data and do a bunch of things with 
it. We've got some dumb ones that check for common miss-configurations, bugs 
etc.. they also populate dashboards and a couple of minimal graphs. The more 
intelligent agents take a look at the metrics and they start generating a bunch 
of calculated/scaled metrics and events. If one of these triggers a threshold 
then we kick off the ML that does classification using the stored data to 
classify the root cause, and point you to the correct knowledge base article 
with remediation steps. Because we've got he cluster history we can identify a 
breach, and give you an SLA in about 1 minute. The goal is to get you from 0 to 
resolution as quickly as possible. 

 

We're looking for feedback on the existing system, do these events make sense, 
do I need to beef up a knowledge base article, did it classify correctly, or is 
there some big bug that everyone is running into that needs to be publicized. 
We're also looking for where to go next, which models are going to make your 
life easier?

 

The system works for C*, Elastic and Kafka. We'll be doing some blog posts 
explaining in more detail how it works and some of the interesting things we've 
found. For example everything everyone thought they knew about Cassandra thread 
pool tuning is wrong, nobody really knows how to tune Kafka for large messages, 
or that there are major issues with the Kubernetes charts that people are using.

 

 

 

On Tue, Feb 19, 2019 at 4:40 PM Kenneth Brotman  
wrote:

Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump

RE: tombstones threshold warning

2019-02-24 Thread Kenneth Brotman

, "deletion_info" : { "marked_deleted" : 
"2019-02-21T21:34:20.024605Z", "local_delete_time" : "2019-02-21T21:34:20Z" } },

  { "name" : "events_list", "deletion_info" : { "marked_deleted" : 
"2019-02-21T21:34:20.024605Z", "local_delete_time" : "2019-02-21T21:34:20Z" } },

  { "name" : "frozen_race_list", "deletion_info" : { "marked_deleted" : 
"2019-02-21T21:34:20.024605Z", "local_delete_time" : "2019-02-21T21:34:20Z" } },

  { "name" : "teams_map", "deletion_info" : { "marked_deleted" : 
"2019-02-21T21:34:20.024605Z", "local_delete_time" : "2019-02-21T21:34:20Z" } },

  { "name" : "teams_set", "deletion_info" : { "marked_deleted" : 
"2019-02-21T21:34:20.024605Z", "local_delete_time" : "2019-02-21T21:34:20Z" } }

 

cassandra@cqlsh:dev_ticket> select * from collsndudt where id = 
cb07baad-eac8-4f65-b28a-bddc06a0de23;

id   | lastname | basics_udt | events_list | 
frozen_race | frozen_race_list | teams_map | teams_set

--+--++-+-+--+---+---

cb07baad-eac8-4f65-b28a-bddc06a0de23 |ADAMS |   null |null |
null | null |  null |  null

 

Tracing session: 63b75250-3621-11e9-81ac-87caa6eca935

activity
 | timestamp  | source| source_elapsed 
| client

--++---++---

           
Execute CQL3 query | 2019-02-21 21:41:04.629000 | 10.216.87.180 |  
0 | 127.0.0.1

Parsing select * from collsndudt where id = 
cb07baad-eac8-4f65-b28a-bddc06a0de23; [CoreThread-8] | 2019-02-21 
21:41:04.629000 | 10.216.87.180 |118 | 127.0.0.1

   Preparing 
statement [CoreThread-8] | 2019-02-21 21:41:04.629000 | 10.216.87.180 | 
   177 | 127.0.0.1

Reading data from 
[/10.216.87.180] [CoreThread-8] | 2019-02-21 21:41:04.629000 | 10.216.87.180 |  
  318 | 127.0.0.1

Executing single-partition query on 
collsndudt [CoreThread-6] | 2019-02-21 21:41:04.629001 | 10.216.87.180 |
460 | 127.0.0.1

  Acquiring sstable 
references [CoreThread-6] | 2019-02-21 21:41:04629001 | 10.216.87.180 | 
   460 | 127.0.0.1

 Skipped 0/1 non-slice-intersecting sstables, included 0 due to 
tombstones [CoreThread-6] | 2019-02-21 21:41:04.63 | 10.216.87.180 |
543 | 127.0.0.1

 Merged data from memtables and 1 
sstables [CoreThread-6] | 2019-02-21 21:41:04.63 | 10.216.87.180 |  
  611 | 127.0.0.1

Read 1 live rows and 1 tombstone 
cells [CoreThread-6] | 2019-02-21 21:41:04.63 | 10.216.87.180 |
611 | 127.0.0.1


 Request complete | 2019-02-21 

When it reports 1 tombstone cells, does it mean 1 records? Otherwise it read 
more than one tombstone cell.

 

On Wed, Feb 20, 2019 at 1:30 AM Kenneth Brotman  
wrote:

There is another good article called Common Problems with Cassandra Tombstones 
by Alla Babkina at 
https://opencredocom/blogs/cassandra-tombstones-common-issues/ 
<https://opencredo.com/blogs/cassandra-tombstones-common-issues/>  .   It says 
interesting stuff like:

 

1.   You can get tombstones without deleting anything

2.   Inserting null values causes tombstones

3.   Inserting values into collection columns results in tombstones even if 
you never delete a value

4.   Expiring Data with TTL results in tombstones (of course)

5.   The Invisible Column Ranges Tombstones – Resolved in CASSANDRA-11166 
though; I should have said that in the last email  too not CASSANDRA-8527.  But 
it shouldn’t be this since you are on 3.11.3.

 

I think number three above answers your questions based on your original post.  
See the article for the details.  It’s really good.

 

Kenneth Brotman 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, February 19, 2019 10:12 PM
To: 'user@cassandra.apache.org'
Subject: RE: tombstones threshold warning

 

Hi Ayub,

 

Is everything flushing to SSTables?  It has to be somewhere right?  So is it in 
the memtables?

 

Or is it that there are tombstones that are sometimes detected and sometimes 
not detected as descr

RE: Tombstones in memtable

2019-02-23 Thread Kenneth Brotman

eyspace="mykeyspace",Table="tablename",function="Count"}[5m])

 

during peak hours. we only have couple of hundred inserts and 5-8k reads/s per 
node.

```

 

```tablestats

Read Count: 605231874

Read Latency: 0.021268529760215503 ms.

Write Count: 2763352

Write Latency: 0.027924007871599422 ms.

Pending Flushes: 0

Table: name

SSTable count: 1

Space used (live): 1413203

Space used (total): 1413203

Space used by snapshots (total): 0

Off heap memory used (total): 28813

SSTable Compression Ratio: 0.5015090954531143

Number of partitions (estimate): 19568

Memtable cell count: 573

Memtable data size: 22971

Memtable off heap memory used: 0

Memtable switch count: 6

Local read count: 529868919

Local read latency: 0.020 ms

Local write count: 2707371

Local write latency: 0.024 ms

Pending flushes: 0

Percent repaired: 0.0

Bloom filter false positives: 1

Bloom filter false ratio: 0.0

Bloom filter space used: 23888

Bloom filter off heap memory used: 23880

Index summary off heap memory used: 4717

Compression metadata off heap memory used: 216

Compacted partition minimum bytes: 73

Compacted partition maximum bytes: 124

Compacted partition mean bytes: 99

Average live cells per slice (last five minutes): 1.0

Maximum live cells per slice (last five minutes): 1

Average tombstones per slice (last five minutes): 1.0

Maximum tombstones per slice (last five minutes): 1

Dropped Mutations: 0

histograms

Percentile  SSTables Write Latency  Read LatencyPartition Size  
  Cell Count

  (micros)  (micros)   (bytes)  


50% 000 20.50 17.0886   
  1

75% 0.00 24.60 20.50   124  
   1

95% 0.00 35.43 29.52   124  
   1

98% 0.00 35.43 42.51   124  
   1

99% 0.00 42.51 51.01   124  
   1

Min 0.00  8.24  5.7273  
   0

Max 1.00 42.51152.32   124  
   1

```

 

3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 m4.xlarge

 

On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa  wrote:

Would also be good to see your schema (anonymized if needed) and the select 
queries you’re running

 

-- 

Jeff Jirsa

 


On Feb 23, 2019, at 4:37 PM, Rahul Reddy  wrote:

Thanks Jeff,

 

I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours 
compared to insert ttl to 4 hours .  Tracing on doesn't show any tombstone 
scans for the reads.  And also log doesn't show tombstone scan alerts. Has the 
reads are happening 5-8k reads per node during the peak hours it shows 1M 
tombstone scans count per read. 

 

On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa  wrote:

If all of your data is TTL’d and you never explicitly delete a cell without 
using s TTL, you can probably drop your GCGS to 1 hour (or less).

 

Which compaction strategy are you using? You need a way to clear out those 
tombstones. There exist tombstone compaction sub properties that can help 
encourage compaction to grab sstables just because they’re full of tombstones 
which will probably help you.

 

-- 

Jeff Jirsa

 


On Feb 22, 2019, at 8:37 AM, Kenneth Brotman  
wrote:

Can we see the histogram?  Why wouldn’t you at times have that many tombstones? 
 Makes sense.

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Thursday, February 21, 2019 7:06 AM
To: user@cassandra.apache.org
Subject: Tombstones in memtable

 

We have small table records are about 5k .

All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc grace 
seconds has 3 hours.  We do 5k reads a second during peak load During the peak 
load seeing Alerts for tomstone scanned histogram reaching million.

Cassandra version 3.11.1. Please let me know how can this tombstone scan can be 
avoided in memtable

RE: Tombstones in memtable

2019-02-23 Thread Kenneth Brotman

  
   0

Max 1.00 42.51152.32   124  
   1

```

 

3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 m4.xlarge

 

On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa  wrote:

Would also be good to see your schema (anonymized if needed) and the select 
queries you’re running

 

-- 

Jeff Jirsa

 


On Feb 23, 2019, at 4:37 PM, Rahul Reddy  wrote:

Thanks Jeff,

 

I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours 
compared to insert ttl to 4 hours .  Tracing on doesn't show any tombstone 
scans for the reads.  And also log doesn't show tombstone scan alerts. Has the 
reads are happening 5-8k reads per node during the peak hours it shows 1M 
tombstone scans count per read. 

 

On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa  wrote:

If all of your data is TTL’d and you never explicitly delete a cell without 
using s TTL, you can probably drop your GCGS to 1 hour (or less).

 

Which compaction strategy are you using? You need a way to clear out those 
tombstones. There exist tombstone compaction sub properties that can help 
encourage compaction to grab sstables just because they’re full of tombstones 
which will probably help you.

 

-- 

Jeff Jirsa

 


On Feb 22, 2019, at 8:37 AM, Kenneth Brotman  
wrote:

Can we see the histogram?  Why wouldn’t you at times have that many tombstones? 
 Makes sense.

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Thursday, February 21, 2019 7:06 AM
To: user@cassandra.apache.org
Subject: Tombstones in memtable

 

We have small table records are about 5k .

All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc grace 
seconds has 3 hours.  We do 5k reads a second during peak load During the peak 
load seeing Alerts for tomstone scanned histogram reaching million.

Cassandra version 3.11.1. Please let me know how can this tombstone scan can be 
avoided in memtable

RE: Tombstones in memtable

2019-02-23 Thread Kenneth Brotman

Rahul,

 

Please see this DataStax article which suggests you might be using Cassandra as 
a queue-like dataset – and that’s an anti-pattern for Cassandra.  It could be 
you need to use a different database.  It could be your data model is wrong:

https://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Saturday, February 23, 2019 4:47 PM
To: user@cassandra.apache.org
Subject: Re: Tombstones in memtable

 

Would also be good to see your schema (anonymized if needed) and the select 
queries you’re running

 

-- 

Jeff Jirsa

 


On Feb 23, 2019, at 4:37 PM, Rahul Reddy  wrote:

Thanks Jeff,

 

I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours 
compared to insert ttl to 4 hours .  Tracing on doesn't show any tombstone 
scans for the reads.  And also log doesn't show tombstone scan alerts. Has the 
reads are happening 5-8k reads per node during the peak hours it shows 1M 
tombstone scans count per read. 

 

On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa  wrote:

If all of your data is TTL’d and you never explicitly delete a cell without 
using s TTL, you can probably drop your GCGS to 1 hour (or less).

 

Which compaction strategy are you using? You need a way to clear out those 
tombstones. There exist tombstone compaction sub properties that can help 
encourage compaction to grab sstables just because they’re full of tombstones 
which will probably help you.

 

-- 

Jeff Jirsa

 


On Feb 22, 2019, at 8:37 AM, Kenneth Brotman  
wrote:

Can we see the histogram?  Why wouldn’t you at times have that many tombstones? 
 Makes sense.

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Thursday, February 21, 2019 7:06 AM
To: user@cassandra.apache.org
Subject: Tombstones in memtable

 

We have small table records are about 5k .

All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc grace 
seconds has 3 hours.  We do 5k reads a second during peak load During the peak 
load seeing Alerts for tomstone scanned histogram reaching million.

Cassandra version 3.11.1. Please let me know how can this tombstone scan can be 
avoided in memtable

RE: Tombstones in memtable

2019-02-22 Thread Kenneth Brotman

Can we see the histogram?  Why wouldn’t you at times have that many tombstones? 
 Makes sense.

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Thursday, February 21, 2019 7:06 AM
To: user@cassandra.apache.org
Subject: Tombstones in memtable

 

We have small table records are about 5k .

All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc grace 
seconds has 3 hours.  We do 5k reads a second during peak load During the peak 
load seeing Alerts for tomstone scanned histogram reaching million.

Cassandra version 3.11.1. Please let me know how can this tombstone scan can be 
avoided in memtable

RE: Group By Does Not Follow Clustering Order

2019-02-21 Thread Kenneth Brotman

Hey Joseph,

 

Also, in your materialized view I think the “clustering order by” should have 
started with policy_id, so it would have been:   …WITH CLUSTERING ORDER 
BY(policy_id, device_id, date_created DESC);

however that doesn’t seem very performant. 

 

Maybe in the materialized view the primary key should have a compound partition 
key of (project_id, policy_id, device_id).

 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, February 21, 2019 3:38 PM
To: user@cassandra.apache.org
Subject: RE: Group By Does Not Follow Clustering Order

 

Joseph,

 

In this statement from your email:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

…why would you have the “group by policy_id, device_id” section at all when you 
are already doing “policy_id=7331 and device_id=’1234567890’” and returning one 
(limit 1)?

 

Kenneth Brotman

 

From: Joseph Wonesh [mailto:joseph.won...@sticknfind.com] 
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order

 

Hi all,

 

I am bumping this email hoping that it can reach a larger audience.

 

Thanks,

Joseph

 

On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh  
wrote:

Hello,

 

I have a materialized view defined by the following:

 

CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
   SELECT policy_id, device_id, project_id, namespace, metric_type, 
blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, 
blufi_name, value, duration FROM policy_state0
   WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS 
NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS 
NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT 
NULL AND date_created IS NOT NULL
   PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, 
beacon_id, state, namespace, metric_type, event_uuid)
   WITH CLUSTERING ORDER BY (date_created DESC);

 

This view works fine if i run a query like the following:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' limit 1;

 

The result of this query gives me the most recent due to the date_created desc 
clustering order.

 

However, this query does not behave as expected:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

The result of this query gives me the FIRST record from the partition, which is 
the OLDEST record due to the clustering order desc.

 

Is this a natural result due to my ordering? Would I need to use a view that 
has order by ASC to achieve what I want to do using the built-in group by 
aggregations? I am hoping there is a way to achieve what I want to do (getting 
the most recent record for each of the  
tuples using the built-in aggregation functions.

 

Thanks,

Joseph Wonesh


This message is private and confidential. If you have received message in 
error, please notify us and remove from your system.

RE: Group By Does Not Follow Clustering Order

2019-02-21 Thread Kenneth Brotman

Joseph,

 

In this statement from your email:

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

…why would you have the “group by policy_id, device_id” section at all when you 
are already doing “policy_id=7331 and device_id=’1234567890’” and returning one 
(limit 1)?

 

Kenneth Brotman

 

From: Joseph Wonesh [mailto:joseph.won...@sticknfind.com] 
Sent: Thursday, February 21, 2019 10:39 AM
To: user@cassandra.apache.org
Subject: Re: Group By Does Not Follow Clustering Order

 

Hi all,

 

I am bumping this email hoping that it can reach a larger audience.

 

Thanks,

Joseph

 

On Tue, Feb 12, 2019 at 11:45 AM Joseph Wonesh  
wrote:

Hello,

 

I have a materialized view defined by the following:

 

CREATE MATERIALIZED VIEW m_ps_project_policy_device0 AS
   SELECT policy_id, device_id, project_id, namespace, metric_type, 
blufi_id, beacon_id, event_uuid, state, date_created, policy_name, beacon_name, 
blufi_name, value, duration FROM policy_state0
   WHERE policy_id IS NOT NULL AND device_id IS NOT NULL AND project_id IS 
NOT NULL AND namespace IS NOT NULL AND metric_type IS NOT NULL AND blufi_id IS 
NOT NULL AND beacon_id IS NOT NULL AND event_uuid IS NOT NULL AND state IS NOT 
NULL AND date_created IS NOT NULL
   PRIMARY KEY ((project_id), policy_id, device_id, date_created, blufi_id, 
beacon_id, state, namespace, metric_type, event_uuid)
   WITH CLUSTERING ORDER BY (date_created DESC);

 

This view works fine if i run a query like the following:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' limit 1;

 

The result of this query gives me the most recent due to the date_created desc 
clustering order.

 

However, this query does not behave as expected:

 

SELECT * FROM m_ps_project_policy_device4 where project_id=1337 and 
policy_id=7331 and device_id='1234567890' group by policy_id, device_id limit 1;

 

The result of this query gives me the FIRST record from the partition, which is 
the OLDEST record due to the clustering order desc.

 

Is this a natural result due to my ordering? Would I need to use a view that 
has order by ASC to achieve what I want to do using the built-in group by 
aggregations? I am hoping there is a way to achieve what I want to do (getting 
the most recent record for each of the  
tuples using the built-in aggregation functions.

 

Thanks,

Joseph Wonesh


This message is private and confidential. If you have received message in 
error, please notify us and remove from your system.

RE: Unable to track compaction completion

2019-02-19 Thread Kenneth Brotman

Hi Rajsekhar,

 

I think monitoring the CompactionManagerMBean is what you need.

 

Kenneth Brotman

 

From: Rajsekhar Mallick [mailto:raj.mallic...@gmail.com] 
Sent: Friday, February 15, 2019 8:59 AM
To: user@cassandra.apache.org
Subject: Unable to track compaction completion

 

Hello team,

 

I have been trying to figure out, how to track the completion of a compaction 
on a node.

Nodetool compactionstats show instantaneous results.

 

I found that system.compaction_in_progress gets me the same details as that of 
compactionstats. Also it gets me a id for running compaction.

I was of the view that checking for the same id in system.compaction_history 
would fetch me the compaction details after a running compaction ends.

But no such relationship exists I see.

Please do confirm on the above.

 

Thanks,

Rajsekhar Mallick

RE: tombstones threshold warning

2019-02-19 Thread Kenneth Brotman

There is another good article called Common Problems with Cassandra Tombstones 
by Alla Babkina at 
https://opencredo.com/blogs/cassandra-tombstones-common-issues/ .   It says 
interesting stuff like:

 

1.   You can get tombstones without deleting anything

2.   Inserting null values causes tombstones

3.   Inserting values into collection columns results in tombstones even if 
you never delete a value

4.   Expiring Data with TTL results in tombstones (of course)

5.   The Invisible Column Ranges Tombstones – Resolved in CASSANDRA-11166 
though; I should have said that in the last email  too not CASSANDRA-8527.  But 
it shouldn’t be this since you are on 3.11.3.

 

I think number three above answers your questions based on your original post.  
See the article for the details.  It’s really good.

 

Kenneth Brotman 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Tuesday, February 19, 2019 10:12 PM
To: 'user@cassandra.apache.org'
Subject: RE: tombstones threshold warning

 

Hi Ayub,

 

Is everything flushing to SSTables?  It has to be somewhere right?  So is it in 
the memtables?

 

Or is it that there are tombstones that are sometimes detected and sometimes 
not detected as described in the very detailed article on The Last Pickle by 
Alex Dejanovski called Undetectable tombstones in Apache Cassandra: 
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
 . 

 

I thought that was resolved in 3.11.2 by CASSANDRA-8527; and you are running 
3.11.3!  Is there still an outstanding issue?

 

Kenneth Brotman

 

 

From: Ayub M [mailto:hia...@gmail.com] 
Sent: Saturday, February 16, 2019 9:58 PM
To: user@cassandra.apache.org
Subject: tombstones threshold warning

 

In the logs I see tombstone warning threshold.

Read 411 live rows and 1644 tombstone cells for query SELECT * FROM ks.tbl 
WHERE key = XYZ LIMIT 5000 (see tombstone_warn_threshold)

This is Cassandra 3.11.3, I see there are 2 sstables for this table and the 
partition XYZ exists in only one file. Now I dumped this sstable into json 
using sstabledump. I extracted the data of only this partition and I see there 
are only 411 rows in it. And all of them are active/live records, so I do not 
understand from where these tombstone are coming from?

This table has collection columns and there are cell tombstones for the 
collection columns when they were inserted. Does collection cell tombstones get 
counted as tombstones cells in the warning displayed?

Did a small test to see if collection tombstones are counted as tombstones and 
it does not seem so. So wondering where are those tombstones coming from in my 
above query.

CREATE TABLE tbl (
col1 text,
col2 text,
c1 int,
col3 map,
PRIMARY KEY (col1, col2)
) WITH CLUSTERING ORDER BY (col2 ASC)
 
cassandra@cqlsh:dev_test> insert into tbl (col1 , col2 , c1, col3 ) 
values('3','3',3,{'key':'value'});
cassandra@cqlsh:dev_test> select * from tbl where col1 = '3';
 col1 | col2 | c1 | col3
+--++--
  3 |3 |  3 | {'key': 'value'}
(1 rows)
 
Tracing session: 4c2a1894-3151-11e9-838d-29ed5fcf59ee
 activity   
  | timestamp  | source| source_elapsed | client
--++---++---
   Execute 
CQL3 query | 2019-02-15 18:41:25.145000 | 10.216.1.1 |  0 | 
127.0.0.1
  Parsing select * from tbl where col1 = '3'; [CoreThread-3]
  | 2019-02-15 18:41:25.145000 | 10.216.1.1 |177 | 127.0.0.1
   Preparing statement 
[CoreThread-3] | 2019-02-15 18:41:25.145001 | 10.216.1.1 |295 | 
127.0.0.1
Reading data from [/10.216.1.1] 
[CoreThread-3]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |491 | 
127.0.0.1
Executing single-partition query on tbl 
[CoreThread-2]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |770 | 
127.0.0.1
  Acquiring sstable references 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |897 | 
127.0.0.1
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1096 | 
127.0.0.1
 Merged data from memtables and 1 sstables 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1235 | 
127.0.0.1
Read 1 live rows and 0 tombstone cells 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |

RE: tombstones threshold warning

2019-02-19 Thread Kenneth Brotman

Hi Ayub,

 

Is everything flushing to SSTables?  It has to be somewhere right?  So is it in 
the memtables?

 

Or is it that there are tombstones that are sometimes detected and sometimes 
not detected as described in the very detailed article on The Last Pickle by 
Alex Dejanovski called Undetectable tombstones in Apache Cassandra: 
http://thelastpickle.com/blog/2018/07/05/undetectable-tombstones-in-apache-cassandra.html
 . 

 

I thought that was resolved in 3.11.2 by CASSANDRA-8527; and you are running 
3.11.3!  Is there still an outstanding issue?

 

Kenneth Brotman

 

 

From: Ayub M [mailto:hia...@gmail.com] 
Sent: Saturday, February 16, 2019 9:58 PM
To: user@cassandra.apache.org
Subject: tombstones threshold warning

 

In the logs I see tombstone warning threshold.

Read 411 live rows and 1644 tombstone cells for query SELECT * FROM ks.tbl 
WHERE key = XYZ LIMIT 5000 (see tombstone_warn_threshold)

This is Cassandra 3.11.3, I see there are 2 sstables for this table and the 
partition XYZ exists in only one file. Now I dumped this sstable into json 
using sstabledump. I extracted the data of only this partition and I see there 
are only 411 rows in it. And all of them are active/live records, so I do not 
understand from where these tombstone are coming from?

This table has collection columns and there are cell tombstones for the 
collection columns when they were inserted. Does collection cell tombstones get 
counted as tombstones cells in the warning displayed?

Did a small test to see if collection tombstones are counted as tombstones and 
it does not seem so. So wondering where are those tombstones coming from in my 
above query.

CREATE TABLE tbl (
col1 text,
col2 text,
c1 int,
col3 map,
PRIMARY KEY (col1, col2)
) WITH CLUSTERING ORDER BY (col2 ASC)
 
cassandra@cqlsh:dev_test> insert into tbl (col1 , col2 , c1, col3 ) 
values('3','3',3,{'key':'value'});
cassandra@cqlsh:dev_test> select * from tbl where col1 = '3';
 col1 | col2 | c1 | col3
+--++--
  3 |3 |  3 | {'key': 'value'}
(1 rows)
 
Tracing session: 4c2a1894-3151-11e9-838d-29ed5fcf59ee
 activity   
  | timestamp  | source| source_elapsed | client
--++---++---
   Execute 
CQL3 query | 2019-02-15 18:41:25.145000 | 10.216.1.1 |  0 | 
127.0.0.1
  Parsing select * from tbl where col1 = '3'; [CoreThread-3]
  | 2019-02-15 18:41:25.145000 | 10.216.1.1 |177 | 127.0.0.1
   Preparing statement 
[CoreThread-3] | 2019-02-15 18:41:25.145001 | 10.216.1.1 |295 | 
127.0.0.1
Reading data from [/10.216.1.1] 
[CoreThread-3]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |491 | 
127.0.0.1
Executing single-partition query on tbl 
[CoreThread-2]| 2019-02-15 18:41:25.146000 | 10.216.1.1 |770 | 
127.0.0.1
  Acquiring sstable references 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |897 | 
127.0.0.1
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1096 | 
127.0.0.1
 Merged data from memtables and 1 sstables 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1235 | 
127.0.0.1
Read 1 live rows and 0 tombstone cells 
[CoreThread-2] | 2019-02-15 18:41:25.146000 | 10.216.1.1 |   1317 | 
127.0.0.1
 
Request complete | 2019-02-15 18:41:25.146529 | 10.216.1.1 |   1529 | 
127.0.0.1
[root@localhost tbl-8aaa6bc1315011e991e523330936276b]# sstabledump 
aa-1-bti-Data.db 
[
  {
"partition" : {
  "key" : [ "3" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-02-15T18:36:16.838103Z" },
"cells" : [
  { "name" : "c1", "value" : 3 },
  { "name" : "col3", "deletion_info" : { "marked_deleted" : 
"2019-02-15T18:36:16.838102Z", "local_delete_time" : "2019-02-15T18:36:17Z" } },
  { "name" : "col3", "path" : [ "key" ], "value" : "value" }
]
  }
]
  }

RE: Looking for feedback on automated root-cause system

2019-02-19 Thread Kenneth Brotman

Any information you can share on the inputs it needs/uses would be helpful.

 

Kenneth Brotman

 

From: daemeon reiydelle [mailto:daeme...@gmail.com] 
Sent: Tuesday, February 19, 2019 4:27 PM
To: user
Subject: Re: Looking for feedback on automated root-cause system

 

Welcome to the world of testing predictive analytics. I will pass this on to my 
folks at Accenture, know of a couple of C* clients we run, wondering what you 
had in mind?

 

 

Daemeon C.M. Reiydelle

email: daeme...@gmail.com

San Francisco 1.415.501.0198/London 44 020 8144 9872/Skype daemeon.c.mreiydelle

 

 

On Tue, Feb 19, 2019 at 3:35 PM Matthew Stump  wrote:

Howdy,

I’ve been engaged in the Cassandra user community for a long time, almost 8 
years, and have worked on hundreds of Cassandra deployments. One of the things 
I’ve noticed in myself and a lot of my peers that have done consulting, support 
or worked on really big deployments is that we get burnt out. We fight a lot of 
the same fires over and over again, and don’t get to work on new or interesting 
stuff Also, what we do is really hard to transfer to other people because it’s 
based on experience. 

Over the past year my team and I have been working to overcome that gap, 
creating an assistant that’s able to scale some of this knowledge. We’ve got it 
to the point where it’s able to classify known root causes for an outage or an 
SLA breach in Cassandra with an accuracy greater than 90%. It can accurately 
diagnose bugs, data-modeling issues, or misuse of certain features and when it 
does give you specific remediation steps with links to knowledge base articles. 

 

We think we’ve seeded our database with enough root causes that it’ll catch the 
vast majority of issues but there is always the possibility that we’ll run into 
something previously unknown like CASSANDRA-11170 (one of the issues our system 
found in the wild).

We’re looking for feedback and would like to know if anyone is interested in 
giving the product a trial. The process would be a collaboration, where we both 
get to learn from each other and improve how we’re doing things.

Thanks,
Matt Stump

RE: Bootstrap stuck in JOINING state?

2019-02-18 Thread Kenneth Brotman



https://stackoverflow.com/questions/39823972/cassandra-node-cant-complete-jo
ining-operation

https://stackoverflow.com/questions/27251504/cassandra-2-1-2-node-stuck-on-j
oining-the-cluster


-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Monday, February 18, 2019 7:43 AM
To: user@cassandra.apache.org
Subject: RE: Bootstrap stuck in JOINING state?

Hi Troels,

Stackoverflow.com is a good resource for such situations.  Have you seen
these posts?  I think they are same Cassandra version:

https://stackoverflow.com/questions/39823972/cassandra-node-cant-complete-jo
ining-operation
https://stackoverflow.com/questions/27251504/cassandra-2-1-2-node-stuck-on-j
oining-the-cluster

Kenneth Brotman

-Original Message-
From: Troels Arvin [mailto:tro...@arvin.dk] 
Sent: Monday, February 18, 2019 2:54 AM
To: user@cassandra.apache.org
Subject: Bootstrap stuck in JOINING state?

Hello,

Nine days ago, I bootstrapped an Cassandra node, let's call it 
10.1.2.11. The node is part of a six-node Cassandra 2.1.11 cluster 
spread across two datacenters. The bootstrapping was conducted the 
following way on the node:
1. Shut down Cassandra.
2. Deleted its data.
3. Temporarily adjusted its configuration to be in bootstrapping node:
  - In cassandra-env.sh:
 - JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=ipaddressofnode"
  - In cassandra.yaml:
  - auto_bootstrap: true
  - Made sure the nodes' IP address was not listed in seed_provider
4. Started Cassandra again.

That seemed to go well: In system.log the following was seen:
INFO  [main] 2019-02-08 14:49:17,223 StorageService.java:1120 - JOINING: 
Starting to bootstrap...

No errors seen in system.log.
A graph of the server's data filesystem shows fine growth during the 
next ~24 hours.

However, it seems the joining is stuck without completing.

Parts of output from "nodetool netstats", note the "JOINING" mode:

[cas@nodename ~]$ nodetool netstats
Mode: JOINING
Bootstrap 53d1a8a0-2ba8-11e9-9212-2d89290b6701
 /10.1.2.10
 /10.1.1.10
 /10.1.1.12
 /10.1.2.12
 Receiving 225 files, 461265115573 bytes total. Already received 
225 files, 461265115573 bytes total
 /data/cassandra/ksname/...db 43392372/43392372 bytes(100%) 
received from idx:0/10.1.2.12
[... lots of lines like this ...]
 /data/cassandra/ksname/...db 131825151/131825151 bytes(100%) 
received from idx:0/10.1.2.12
 /10.1.1.11
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0  5
Responses   n/a 03153171


All the "...received from..." lines show 100% bytes received.


Parts of output from "nodetool status":

[cas@nodename ~]$ nodetool status
Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  LoadTokens  Owns Host ID   Rack
UN  10.1.1.12  454.14 GB  256 ?   c100e77d-...  RAC1
UN  10.1.1.11  449.99 GB  256 ?   0e06dc69-...  RAC1
UN  10.1.1.10  725.81 GB  256 ?   f266489b-...  RAC1
Datacenter: DC2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  LoadTokens  Owns Host ID   Rack
UN  10.1.2.12  474.59 GB  256 ?   89b71027-...  RAC1
UN  10.1.2.11  455.1 GB   256 ?   c08b8e99-...  RAC1
UN  10.1.2.10  475 GB 256 ?   e17bae55-...  RAC1


How come "nodetool netstats" is still showing "Mode: JOINING"? Is there 
a way I can push the node to complete its joining?

-- 
Kind regards,
Troels Arvin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Bootstrap stuck in JOINING state?

2019-02-18 Thread Kenneth Brotman

Hi Troels,

Stackoverflow.com is a good resource for such situations.  Have you seen
these posts?  I think they are same Cassandra version:

https://stackoverflow.com/questions/39823972/cassandra-node-cant-complete-jo
ining-operation
https://stackoverflow.com/questions/27251504/cassandra-2-1-2-node-stuck-on-j
oining-the-cluster

Kenneth Brotman

-Original Message-
From: Troels Arvin [mailto:tro...@arvin.dk] 
Sent: Monday, February 18, 2019 2:54 AM
To: user@cassandra.apache.org
Subject: Bootstrap stuck in JOINING state?

Hello,

Nine days ago, I bootstrapped an Cassandra node, let's call it 
10.1.2.11. The node is part of a six-node Cassandra 2.1.11 cluster 
spread across two datacenters. The bootstrapping was conducted the 
following way on the node:
1. Shut down Cassandra.
2. Deleted its data.
3. Temporarily adjusted its configuration to be in bootstrapping node:
  - In cassandra-env.sh:
 - JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=ipaddressofnode"
  - In cassandra.yaml:
  - auto_bootstrap: true
  - Made sure the nodes' IP address was not listed in seed_provider
4. Started Cassandra again.

That seemed to go well: In system.log the following was seen:
INFO  [main] 2019-02-08 14:49:17,223 StorageService.java:1120 - JOINING: 
Starting to bootstrap...

No errors seen in system.log.
A graph of the server's data filesystem shows fine growth during the 
next ~24 hours.

However, it seems the joining is stuck without completing.

Parts of output from "nodetool netstats", note the "JOINING" mode:

[cas@nodename ~]$ nodetool netstats
Mode: JOINING
Bootstrap 53d1a8a0-2ba8-11e9-9212-2d89290b6701
 /10.1.2.10
 /10.1.1.10
 /10.1.1.12
 /10.1.2.12
 Receiving 225 files, 461265115573 bytes total. Already received 
225 files, 461265115573 bytes total
 /data/cassandra/ksname/...db 43392372/43392372 bytes(100%) 
received from idx:0/10.1.2.12
[... lots of lines like this ...]
 /data/cassandra/ksname/...db 131825151/131825151 bytes(100%) 
received from idx:0/10.1.2.12
 /10.1.1.11
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0  5
Responses   n/a 03153171


All the "...received from..." lines show 100% bytes received.


Parts of output from "nodetool status":

[cas@nodename ~]$ nodetool status
Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  LoadTokens  Owns Host ID   Rack
UN  10.1.1.12  454.14 GB  256 ?   c100e77d-...  RAC1
UN  10.1.1.11  449.99 GB  256 ?   0e06dc69-...  RAC1
UN  10.1.1.10  725.81 GB  256 ?   f266489b-...  RAC1
Datacenter: DC2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  LoadTokens  Owns Host ID   Rack
UN  10.1.2.12  474.59 GB  256 ?   89b71027-...  RAC1
UN  10.1.2.11  455.1 GB   256 ?   c08b8e99-...  RAC1
UN  10.1.2.10  475 GB 256 ?   e17bae55-...  RAC1


How come "nodetool netstats" is still showing "Mode: JOINING"? Is there 
a way I can push the node to complete its joining?

-- 
Kind regards,
Troels Arvin

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Cassandra vnodes Streaming Reliability Calculator

2019-02-16 Thread Kenneth Brotman

Hi James,

 

Thanks for doing that.  Very interesting.  I haven’t had a chance to check the 
math.  Did you look at this white paper by Lynch and Snyder called Cassandra 
Availability with Virtual Nodes: 
https://github.com/jolynch/python_performance_toolkit/blob/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf

 

Are the calculations consistent with your online calculator?

 

Thanks again,

 

Kenneth Brotman

 

From: James Briggs [mailto:james.bri...@yahoo.com.INVALID] 
Sent: Friday, February 15, 2019 7:42 PM
To: user@cassandra.apache.org
Subject: Cassandra vnodes Streaming Reliability Calculator

 

Hi folks.







Please check out my online vnodes reliability calculator and reply with any 
feedback:

http://www.jebriggs.com/blog/2019/02/cassandra-vnodes-reliability-calculator/

 

Thanks, James Briggs.
--
Cassandra/MySQL DBA. Available in Bay Area or remote.
cass_top: https://github.com/jamesbriggs/cassandra-top

RE: Bootstrap keeps failing

2019-02-14 Thread Kenneth Brotman

Those aren’t the same error messages so I think progress has been made.  

 

What version of C* are you running?

How did you clear out the space?

 

Kenneth Brotman

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Thursday, February 14, 2019 7:54 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrap keeps failing

 

Hello again !

 

I have managed to free a lot of disk space and now most nodes hover between 50% 
and 80%.

I am still getting bootstrapping failures :(

 

Here I have some logs : 

2019-02-14T15:23:05+00:00 cass02-0001.c.company.internal user err cassandra  
[org.apache.cassandra.streaming.StreamSession] [onError] - [Stream 
#ea8ae230-2f8f-11e9-8418-6d4f57de615d] Streaming error occurred

2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user info cassandra  
[org.apache.cassandra.streaming.StreamResultFuture] [handleSessionComplete] - 
[Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Session with /10.10.23.1

55 is complete

2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning 
cassandra  [org.apache.cassandra.streaming.StreamResultFuture] [maybeComplete] 
- [Stream #ea8ae230-2f8f-11e9-8418-6d4f57de615d] Stream failed

2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning 
cassandra  [org.apache.cassandra.service.StorageService] [onFailure] - Error 
during bootstrap.

2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user err cassandra  
[org.apache.cassandra.service.StorageService] [bootstrap] - Error while waiting 
on bootstrap to complete. Bootstrap will have to be restarted.

2019-02-14T15:23:05+00:00 cass02-0001.c.company..internal user warning 
cassandra  [org.apache.cassandra.service.StorageService] [joinTokenRing] - Some 
data streaming failed. Use nodetool to check bootstrap state and resume. For 
more, see `nodetool help bootstrap`. IN_PROGRESS

 

 

I can see a `Streaming error occured` for all of my nodes it is trying to 
stream from. Is there a way to have more logs to know why the failures occurred 
? 

I have set `` but it doesn't seem to give me more details, is there another 
class I should set to DEBUG ?

 

Finally I have also noticed a lot of :

[org.apache.cassandra.db.compaction.LeveledManifest] [getCompactionCandidates] 
- Bootstrapping - doing STCS in L0

In my logs files, It might be important.

 

Regards,

 

Leo

 

On Fri, Feb 8, 2019 at 3:59 PM Léo FERLIN SUTTON  wrote:

On Fri, Feb 8, 2019 at 3:37 PM Kenneth Brotman  
wrote:

Thanks for the details that helps us understand the situation.  I’m pretty sure 
you’ve exceed the working capacity of some of those nodes.  Going over 50% - 
75% depending on compaction strategy is ill-advised.

50% free disk space is a steep price to pay for disk space not used. We have 
about 90 terabytes of data on SSD and we are paying about 100$ per terabytes of 
ssd storage (on google cloud). 

Maybe we can get closer to 75%.

 

Our compaction strategy is `LeveledCompactionStrategy` on our two biggest 
tables (90% of the data).

 

You need to clear out as much room as possible to add more nodes.  

Are the tombstones clearing out.  

I think we don't have a lot of tombstones :

We have 0 deletes on our two biggest tables. 

One of them get updated with new data (messages.messages), but the updates are 
filling columns previously empty, I am unsure but I think this doesn't cause 
any tombstones.

I have joined the info from `nodetool tablestats` for our two largest tables.

 

We are using cassandra-reaper that manages our repairs. A full repair takes 
about 13 days. So if we have tombstones they should not be older than 13 days.

 

Are there old snap shots that you can delete.  And so on.  

Unfortunately no. We take a daily snapshot that we backup then drop.  

 

You have to make more room on the existing nodes.

 

I am trying to run `nodetool cleanup` on our most "critical" nodes to see if it 
helps. If that doesn't do the trick we will only have two solutions :

*   Add more disk space on each node
*   Adding new nodes

We have looked at some other companies case studies and it looks like we have a 
few very big nodes instead of a lot of smaller ones.

We are currently trying to add nodes, and are hoping to eventually transition 
to a "lot of small nodes" model and be able to add nodes a lot faster.

 

Thank you again for your interest,

 

Regards,

 

Leo

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Friday, February 08, 2019 6:16 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrap keeps failing

 

On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman  
wrote:

Lots of things come to mind. We need more information from you to help us 
understand:

How long have you had your cluster running?

A bit more than a year old. But it has been constantly growing (3 nodes to 6 
nodes to 12 nodes, etc).

We have a replication_factor of 3 on all keyspaces and 3 racks with an equal 
amount of nodes.

Ansible scripts for Cassandra to help with automation needs

2019-02-13 Thread Kenneth Brotman

I want to generate a variety of Ansible scripts to share with the Apache
Cassandra community.  I'll put them in a Github repository.  Just email me
offline what scripts would help the most. 

 

Does this exist already?  I can't find it.  Let me know if it does.

 

If not, let's put it together for the community.  Maybe we'll end up with a
download right on the Apache Cassandra web site or packaged with future
releases of Cassandra.

 

Kenneth Brotman

 

P.S.  Terraform is next!

RE: Maximum memory usage

2019-02-10 Thread Kenneth Brotman

Can we the see “nodetool tablestats” for the biggest table as well.

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Sunday, February 10, 2019 7:21 AM
To: user@cassandra.apache.org
Subject: RE: Maximum memory usage

Okay, that’s at the moment it was calculated.  Still need to see histograms.

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Sunday, February 10, 2019 7:09 AM
To: user@cassandra.apache.org
Subject: Re: Maximum memory usage

Thanks Kenneth,

110mb is the biggest partition in our db

On Sun, Feb 10, 2019, 9:55 AM Kenneth Brotman  
wrote:

Rahul,

Those partitions are tiny.  Could you give us the table histograms for the 
biggest tables.

Thanks,

Kenneth Brotman

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Sunday, February 10, 2019 6:43 AM
To: user@cassandra.apache.org
Subject: Re: Maximum memory usage

```Percentile  SSTables Write Latency  Read LatencyPartition Size   
 Cell Count

  (micros)  (micros)   (bytes)  

50% 1.00 24.60219.34   258  
   4

75% 100 24.60379.02   446   
  8

95% 1.00 35.43379.02   924  
  17

98% 1.00 51.01379.02  1109  
  24

99% 1.00 61.21379.02  1331  
  29

Min 0.00  8.24126.94   104  
   0

Max 1.00152.32379.02   8409007  
  152321

```

On Wed, Feb 6, 2019, 8:34 PM Kenneth Brotman  
wrote:

Can you give us the “nodetool tablehistograms”

Kenneth Brotman

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Wednesday, February 06, 2019 6:19 AM
To: user@cassandra.apache.org
Subject: Maximum memory usage

Hello,

I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

Thanks in advance

RE: Maximum memory usage

2019-02-10 Thread Kenneth Brotman

Okay, that’s at the moment it was calculated.  Still need to see histograms.

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Sunday, February 10, 2019 7:09 AM
To: user@cassandra.apache.org
Subject: Re: Maximum memory usage

Thanks Kenneth,

110mb is the biggest partition in our db

On Sun, Feb 10, 2019, 9:55 AM Kenneth Brotman  
wrote:

Rahul,

Those partitions are tiny.  Could you give us the table histograms for the 
biggest tables.

Thanks,

Kenneth Brotman

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Sunday, February 10, 2019 6:43 AM
To: user@cassandra.apache.org
Subject: Re: Maximum memory usage

```Percentile  SSTables Write Latency  Read LatencyPartition Size   
 Cell Count

  (micros)  (micros)   (bytes)  

50% 1.00 24.60219.34   258  
   4

75% 100 24.60379.02   446   
  8

95% 1.00 35.43379.02   924  
  17

98% 1.00 51.01379.02  1109  
  24

99% 1.00 61.21379.02  1331  
  29

Min 0.00  8.24126.94   104  
   0

Max 1.00152.32379.02   8409007  
  152321

```

On Wed, Feb 6, 2019, 8:34 PM Kenneth Brotman  
wrote:

Can you give us the “nodetool tablehistograms”

Kenneth Brotman

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Wednesday, February 06, 2019 6:19 AM
To: user@cassandra.apache.org
Subject: Maximum memory usage

Hello,

I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

Thanks in advance

RE: Maximum memory usage

2019-02-10 Thread Kenneth Brotman

Rahul,

 

Those partitions are tiny.  Could you give us the table histograms for the 
biggest tables.

 

Thanks,

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Sunday, February 10, 2019 6:43 AM
To: user@cassandra.apache.org
Subject: Re: Maximum memory usage

 

```Percentile  SSTables Write Latency  Read LatencyPartition Size   
 Cell Count

  (micros)  (micros)   (bytes)  


50% 1.00 24.60219.34   258  
   4

75% 1.00 24.60379.02   446  
   8

95% 1.00 35.43379.02   924  
  17

98% 1.00 51.01379.02  1109  
  24

99% 1.00 61.21379.02  1331  
  29

Min 0.00  8.24126.94   104  
   0

Max 1.00152.32379.02   8409007  
  152321

```

 

On Wed, Feb 6, 2019, 8:34 PM Kenneth Brotman  
wrote:

Can you give us the “nodetool tablehistograms”

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Wednesday, February 06, 2019 6:19 AM
To: user@cassandra.apache.org
Subject: Maximum memory usage

 

Hello,

 

I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked

 

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

 

Thanks in advance

RE: How to read the Index.db file

2019-02-08 Thread Kenneth Brotman

This link 
https://www.datastax.com/dev/blog/debugging-sstables-in-3-0-with-sstabledump 
explains how to read an SSTable with sstabledump for 3x and sstable2json for 2.x

 

Kenneth Brotman

 

From: Ben Slater [mailto:ben.sla...@instaclustr.com] 
Sent: Thursday, February 07, 2019 1:19 PM
To: Cassandra User
Subject: Re: How to read the Index.db file

 

They don’t do exactly what you want but depending on why you are trying to get 
this info you might find our sstable-tools useful: 
https://github.com/instaclustr/cassandra-sstable-tools


--- 

Ben Slater
Chief Product Officer

  
<https://docs.google.com/uc?export=download=1KsSdSa-ucqhGi8XrGcV-a7nIx0TcFFGK=0B-NJl5XxcJATV2ZkYzcveEFjcEhPRGZnS2FuQWtrWXVUNXFjPQ>
 

 <https://www.facebook.com/instaclustr><https://twitter.com/instaclustr>
<https://www.linkedin.com/company/instaclustr> 

Read our latest technical blog posts  <https://www.instaclustr.com/blog/> here.

This email has been sent on behalf of Instaclustr Pty. Limited (Australia) and 
Instaclustr Inc (USA).

This email and any attachments may contain confidential and legally privileged 
information.  If you are not the intended recipient, do not copy or disclose 
its content, but please reply to this email immediately and highlight the error 
to the sender and then immediately delete the message.

 

 

On Fri, 8 Feb 2019 at 08:14, Kenneth Brotman  
wrote:

When you say you’re trying to get all the partition of a particular SSTable, 
I’m not sure what you mean.  Do you want to make a copy of it?  I don’t 
understand.

 

Kenneth Brotman

 

From: Pranay akula [mailto:pranay.akula2...@gmail.com] 
Sent: Wednesday, February 06, 2019 7:51 PM
To: user@cassandra.apache.org
Subject: How to read the Index.db file

 

I was trying to get all the partition of a particular SSTable, i have tried 
reading Index,db file  i can read some part of it but not all of it , is there 
any way to convert it to readable format?

 

 

Thanks

Pranay

RE: Bootstrap keeps failing

2019-02-08 Thread Kenneth Brotman

Thanks for the details that helps us understand the situation.  I’m pretty sure 
you’ve exceed the working capacity of some of those nodes.  Going over 50% - 
75% depending on compaction strategy is ill-advised.

 

You need to clear out as much room as possible to add more nodes.  Are the 
tombstones clearing out.  Are there old snap shots that you can delete.  And so 
on.  You have to make more room on the existing nodes.

 

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Friday, February 08, 2019 6:16 AM
To: user@cassandra.apache.org
Subject: Re: Bootstrap keeps failing

 

On Thu, Feb 7, 2019 at 10:11 PM Kenneth Brotman  
wrote:

Lots of things come to mind. We need more information from you to help us 
understand:

How long have you had your cluster running?

A bit more than a year old. But it has been constantly growing (3 nodes to 6 
nodes to 12 nodes, etc).

We have a replication_factor of 3 on all keyspaces and 3 racks with an equal 
amount of nodes. 

 

Is it generally working ok?

Works fine. Good performance, repairs managed by cassandra-reaper.

 

Is it just one node that is misbehaving at a time?

We only bootstrap nodes one at a time. Sometimes it works flawlessly, sometimes 
it fails. When it fails it tends to fail a lot in a row before we manage to get 
it bootstrapped. 

 

How many nodes do you need to replace?

I am adding nodes, not replacing any. Our nodes are starting to get very full 
and we wish to add at least 6 more nodes (short-term).

Adding a new node is quite slow (48 to 72 hours) and that's when the boostrap 
process works at the first try.

 

Are you doing rolling restarts instead of simultaneously?

Yes.

 

Do you have enough capacity on your machines?  Did you say some of the nodes 
are at 90% capacity?

The free disk space left fluctuates but is generally between 80% and 90%, this 
is why we are planning to add a lot more nodes. 

 

When did this problem begin?

 Not sure about this one. Probably since our nodes have more than 2to data, I 
don't remember it being an issue when our nodes were smaller.

 

Could something be causing a racing condition?

We have schema changes every day. 

We have temporary data stored in cassandra, only used for 6 days then 
destroyed. 

 

In order to avoid tombstones we have a table rotation, every day we create a 
new table to contain the data for the next day, and we drop the oldest 
temporary table.

 

This means that when the node starts to bootstrap it will ask other nodes for 
data that will almost certainly be dropped before the boostrap process is 
finished.

 

Did you recheck the commands you used to make sure they are correct?

What procedure do you use?

 

Our procedure is :

1.  We install cassandra on a brand new instance (debian).
2.  We install cassandra.
3.  We stop the default cassandra (launched by the debian package).
4.  We empty these directories :
/var/lib/cassandra/commitlog
/var/lib/cassandra/data
/var/lib/cassandra/saved_caches
5.  We put our configuration in place of the default one.
6.  We start the cassandra.

If after 3 days we see that the node hasn't joined the cluster we check the 
`nodetool netstats` command to see if the node is still streaming data. If it 
is not we launch `nodetool bootstrap resume` on the instance.

 

Thank you for you interest in our issue !

 

Regards,

 

Leo

 

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Thursday, February 07, 2019 9:16 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Bootstrap keeps failing

 

Thank you for the recommendation. 

 

We are already using datastax's recommended settings for tcp_keepalive

 

Regards,

 

Leo

 

On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R  
wrote:

I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

 

Sean Durity

From: Léo FERLIN SUTTON  
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

 

Hello ! 

Thank you for your answers.

 

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

 

I have joined a jstack dump and some logs.

 

Our node was shut down at around 97% disk space used

I turned it back on and it starting the bootstrap process again. 

 

The log file is the log from this attempt, same for the thread dump.

 

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

 

Regards

RE: range repairs multiple dc

2019-02-07 Thread Kenneth Brotman

This webpage has relevant information on procedures you need to use: 
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAddDCToCluster.html

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Thursday, February 07, 2019 1:31 PM
To: user@cassandra.apache.org
Subject: RE: range repairs multiple dc

A nice article on The Last Pickle blog at 
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html 
should be helpful to you.  A line in the comments following the article states:

“So restricting a -pr repair on a specific datacenter will be forbidden by 
Cassandra to prevent an incomplete repair from being performed.”

Give it a read.

Kenneth Brotman

From: CPC [mailto:acha...@gmail.com] 
Sent: Wednesday, February 06, 2019 11:59 PM
To: user@cassandra.apache.org
Subject: range repairs multiple dc

Hi All,

I searched over documentation but could not find enough reference regarding -pr 
option. In some documentation it says you have to cover all ring in some places 
it says you have to run it on every node regardless of you have multiple dc.

In our case we have three dc (DC1,DC2,DC3) with every DC having 4 nodes and 12 
nodes cluster in total. If i run "nodetool repair -pr --full" on very node in 
DC1, does it means DC1 is consistent but DC2 and DC3 is not or  DC1 is not 
consistent at all? Because in our case we added DC3 to our cluster and will 
remove DC2 from cluster so i dont care whether DC2 have consistent data. I dont 
want to repair DC2.

Also can i run "nodetool repair -pr -full" in parallel? I mean run it at the 
same time in each DC or run it more than one node in same DC? Does -dcpar 
option making the same thing?

Best Regards...

RE: range repairs multiple dc

2019-02-07 Thread Kenneth Brotman

A nice article on The Last Pickle blog at 
http://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html 
should be helpful to you.  A line in the comments following the article states:

 

“So restricting a -pr repair on a specific datacenter will be forbidden by 
Cassandra to prevent an incomplete repair from being performed.”

 

Give it a read.

 

Kenneth Brotman

 

From: CPC [mailto:acha...@gmail.com] 
Sent: Wednesday, February 06, 2019 11:59 PM
To: user@cassandra.apache.org
Subject: range repairs multiple dc

 

Hi All,

 

I searched over documentation but could not find enough reference regarding -pr 
option. In some documentation it says you have to cover all ring in some places 
it says you have to run it on every node regardless of you have multiple dc.

 

In our case we have three dc (DC1,DC2,DC3) with every DC having 4 nodes and 12 
nodes cluster in total. If i run "nodetool repair -pr --full" on very node in 
DC1, does it means DC1 is consistent but DC2 and DC3 is not or  DC1 is not 
consistent at all? Because in our case we added DC3 to our cluster and will 
remove DC2 from cluster so i dont care whether DC2 have consistent data. I dont 
want to repair DC2.

 

Also can i run "nodetool repair -pr -full" in parallel? I mean run it at the 
same time in each DC or run it more than one node in same DC? Does -dcpar 
option making the same thing?

 

Best Regards...

RE: How to read the Index.db file

2019-02-07 Thread Kenneth Brotman

When you say you’re trying to get all the partition of a particular SSTable, 
I’m not sure what you mean.  Do you want to make a copy of it?  I don’t 
understand.

 

Kenneth Brotman

 

From: Pranay akula [mailto:pranay.akula2...@gmail.com] 
Sent: Wednesday, February 06, 2019 7:51 PM
To: user@cassandra.apache.org
Subject: How to read the Index.db file

 

I was trying to get all the partition of a particular SSTable, i have tried 
reading Index,db file  i can read some part of it but not all of it , is there 
any way to convert it to readable format?

 

 

Thanks

Pranay

Re: Bootstrap keeps failing

2019-02-07 Thread Kenneth Brotman

Lots of things come to mind. We need more information from you to help us 
understand:

How long have you had your cluster running?

Is it generally working ok?
Is it just one node that is misbehaving at a time?

How many nodes do you need to replace?

Are you doing rolling restarts instead of simultaneously?

Do you have enough capacity on your machines?  Did you say some of the nodes 
are at 90% capacity?

When did this problem begin?

Could something be causing a racing condition?

Did you recheck the commands you used to make sure they are correct?

What procedure do you use?

 

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Thursday, February 07, 2019 9:16 AM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Bootstrap keeps failing

 

Thank you for the recommendation. 

 

We are already using datastax's recommended settings for tcp_keepalive

 

Regards,

 

Leo

 

On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R  
wrote:

I have seen unreliable streaming (streaming that doesn’t finish) because of TCP 
timeouts from firewalls or switches. The default tcp_keepalive kernel 
parameters are usually not tuned for that. See 
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
 for more details. These “remote” timeouts are difficult to detect or prove if 
you don’t have access to the intermediate network equipment.

 

Sean Durity

From: Léo FERLIN SUTTON  
Sent: Thursday, February 07, 2019 10:26 AM
To: user@cassandra.apache.org; dinesh.jo...@yahoo.com
Subject: [EXTERNAL] Re: Bootstrap keeps failing

 

Hello ! 

Thank you for your answers.

 

So I have tried, multiple times, to start bootstrapping from scratch. I often 
have the same problem (on other nodes as well) but sometimes it works and I can 
move on to another node.

 

I have joined a jstack dump and some logs.

 

Our node was shut down at around 97% disk space used

I turned it back on and it starting the bootstrap process again. 

 

The log file is the log from this attempt, same for the thread dump.

 

Small warning, I have somewhat anonymised the log files so there may be some 
inconsistencies.

 

Regards,

 

Leo

 

On Thu, Feb 7, 2019 at 8:13 AM dinesh.jo...@yahoo.com.INVALID 
mailto:dinesh.joshi@yahoocom.invalid> > wrote:

Would it be possible for you to take a thread dump & logs and share them?

 

Dinesh

 

 

On Wednesday, February 6, 2019, 10:09:11 AM PST, Léo FERLIN SUTTON 
 wrote: 

 

 

Hello !

 

I am having a recurrent problem when trying to bootstrap a few new nodes.

 

Some general info : 

*   I am running cassandra 3.0.17
*   We have about 30 nodes in our cluster
*   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

 

`nodetool status` says the node is still joining, 

 

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

1.  The node fills up to 100% disk space and crashes.
2.  The bootstrap resume finishes with errors

When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

 

This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file

RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Kenneth Brotman

Peter,

 

Sounds like you may need to use a different architecture.  Perhaps you need 
something like Presto or Kafka as a part of the solution.  If the data from the 
legacy system is wrong for Cassandra it’s an ETL problem?  You’d have to 
transform the data you want to use with Cassandra so that a proper data model 
for Cassandra can be used.

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 10:05 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I have read the material. The problem is that the application has a query 
facility available to the user where they can type in "(A = foo AND B = bar) OR 
C = chex" where A, B, and C are from a defined list of terms, many of which are 
columns in the mytable below while others are from other tables. This query 
facility was implemented and shipped years before we decided to move to 
Cassandra 

On Thu, Feb 7, 2019, 8:21 AM Kenneth Brotman  
wrote:

The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this link 
<https://books.google.com/books?id=uW-PDAAAQBAJ=PA79=PA79=jeff+carpenter+chapter+5=bl=58bUYyNM-J=ACfU3U22U58-QPlz6kzo0zziNF-bP30l4Q=en=X=2ahUKEwi0n-nWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ#v=onepage=jeff%20carpenter%20chapter%205=false>
 .

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:33 PM


To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet 

On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman  
wrote:

Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic

RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman

The problem is you’re not using a query first design.  I would recommend first 
reading chapter 5 of Cassandra: The Definitive Guide by Jeff Carpenter and Eben 
Hewitt.  It’s available free online at this link 
<https://books.google.com/books?id=uW-PDAAAQBAJ=PA79=PA79=jeff+carpenter+chapter+5=bl=58bUYyNM-J=ACfU3U22U58-QPlz6kzo0zziNF-bP30l4Q=en=X=2ahUKEwi0n-nWzajgAhXnHzQIHf6jBJIQ6AEwAXoECAgQAQ#v=onepage=jeff%20carpenter%20chapter%205=false>
 .

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:33 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

Yes, I "know" that allow filtering is a sign of a (possibly fatal) inefficient 
data model. I haven't figured out how to do it correctly yet 

On Thu, Feb 7, 2019, 7:59 AM Kenneth Brotman  
wrote:

Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS',

RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman

Exactly.  When you design your data model correctly you shouldn’t have to use 
ALLOW FILTERING in the queries.  That is not recommended.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Wednesday, February 06, 2019 6:09 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

You are completely right! My problem is that I am trying to port code for SQL 
to CQL for an application that provides the user with a relatively general 
search facility. The original implementation didn't worry about secondary 
indexes - it just took advantage of the ability to create arbitrarily complex 
queries with inner joins, left joins, etc. I am reimplimenting it to create a 
parse tree of CQL queries and doing the ANDs and ORs in the application. Of 
course once I get enough of this implemented I will have to load up the table 
with a large data set and see if it gives acceptable performance for our use 
case. 

On Wed, Feb 6, 2019, 8:52 PM Kenneth Brotman  
wrote:

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'orgapache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable 
(manufacturer)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON mykeyspace.mytable 
(manufacturer_oui)

   USING 'org.apache.cassandra.index.sasiSASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable 
(hw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex

RE: Bootstrap keeps failing

2019-02-06 Thread Kenneth Brotman

Not sure off hand why that is happening but could you try bootstrapping that 
node from scratch again or try a different new node?

 

Kenneth Brotman

 

From: Léo FERLIN SUTTON [mailto:lfer...@mailjet.com.INVALID] 
Sent: Wednesday, February 06, 2019 9:15 AM
To: user@cassandra.apache.org
Subject: Bootstrap keeps failing

 

Hello !

 

I am having a recurrent problem when trying to bootstrap a few new nodes.

 

Some general info : 

*   I am running cassandra 3.0.17
*   We have about 30 nodes in our cluster
*   All healthy nodes have between 60% to 90% used disk space on 
/var/lib/cassandra

So I create a new node and let auto_bootstrap do it's job. After a few days the 
bootstrapping node stops streaming new data but is still not a member of the 
cluster.

 

`nodetool status` says the node is still joining, 

 

When this happens I run `nodetool bootstrap resume`. This usually ends up in 
two different ways :

1.  The node fills up to 100% disk space and crashes.
2.  The bootstrap resume finishes with errors

When I look at `nodetool netstats -H` is  looks like `bootstrap resume` does 
not resume but restarts a full transfer of every data from every node.

 

This is the output I get from `nodetool resume` :

[2019-02-06 01:39:14,369] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-225-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:16,821] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-88-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,003] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-89-big-Data.db
 (progress: 2113%)

[2019-02-06 01:39:17,032] session with /10.16.XX.YYY complete (progress: 2113%)

[2019-02-06 01:41:15,160] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-220-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:02,864] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-226-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:09,284] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-227-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,522] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-228-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:10,622] received file 
/var/lib/cassandra/raw/raw_17930-d7cc0590230d11e9bc0af381b0ee7ac6/mc-229-big-Data.db
 (progress: 2113%)

[2019-02-06 01:42:11,925] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-90-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,887] received file 
/var/lib/cassandra/data/system_distributed/repair_history-759fffad624b318180eefa9a52d1f627/mc-91-big-Data.db
 (progress: 2114%)

[2019-02-06 01:42:14,980] session with /1016.XX.ZZZ complete (progress: 2114%)

[2019-02-06 01:42:14,980] Stream failed

[2019-02-06 01:42:14,982] Error during bootstrap: Stream failed

[2019-02-06 01:42:14,982] Resume bootstrap complete

  

The bootstrap `progress` goes way over 100% and eventually fails.

 

 

Right now I have a node with this output from `nodetool status` : 

`UJ  10.16.XX.YYY  2.93 TB256  ? 
5788f061-a3c0-46af-b712-ebeecd397bf7  c`

 

It is almost filled with data, yet if I look at `nodetool netstats` :

Receiving 480 files, 325.39 GB total. Already received 5 files, 68.32 
MB total
Receiving 499 files, 328.96 GB total. Already received 1 files, 1.32 GB 
total
Receiving 506 files, 345.33 GB total. Already received 6 files, 24.19 
MB total
Receiving 362 files, 206.73 GB total. Already received 7 files, 34 MB 
total
Receiving 424 files, 281.25 GB total. Already received 1 files, 1.3 GB 
total
Receiving 581 files, 349.26 GB total. Already received 8 files, 45.96 
MB total
Receiving 443 files, 337.26 GB total. Already received 6 files, 96.15 
MB total
Receiving 424 files, 275.23 GB total. Already received 5 files, 42.67 
MB total

 

It is trying to pull all the data again.

 

Am I missing something about the way `nodetool bootstrap resume` is supposed to 
be used ?

 

Regards,

 

Leo

RE: Maximum memory usage

2019-02-06 Thread Kenneth Brotman

Can you give us the “nodetool tablehistograms”

 

Kenneth Brotman

 

From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
Sent: Wednesday, February 06, 2019 6:19 AM
To: user@cassandra.apache.org
Subject: Maximum memory usage

 

Hello,

 

I see maximum memory usage alerts in my system.log couple of times in a day as 
INFO. So far I haven't seen any issue with db. Why those messages are logged in 
system.log do we have any impact for reads/writes with those warnings? And what 
nerd to be looked

 

INFO  [RMI TCP Connection(170917)-127.0.0.1] 2019-02-05 23:15:47,408 
NoSpamLogger.java:91 - Maximum memory usage reached (512.000MiB), cannot 
allocate chunk of 1.000MiB

 

Thanks in advance

RE: Two datacenters with one cassandra node in each datacenter

2019-02-06 Thread Kenneth Brotman

Hi Kunal,

 

The short answer is absolutely not; that’s not what Cassandra is for.  
Cassandra is a distributed database for when you have to much data for one 
machine.

 

Kenneth Brotman

 

From: Kunal [mailto:kunal.v...@gmail.com] 
Sent: Wednesday, February 06, 2019 3:47 PM
To: user@cassandra.apache.org
Subject: Two datacenters with one cassandra node in each datacenter

 

HI All,

 

I need some recommendation on using two datacenters with one node in each 
datacenter. 

In our organization, We are trying to have two cassandra dataceters with only 1 
node on each side. From the preliminary investigation, I see replication is 
happening but I want to know if we can use this deployment in production? Will 
there be any performance issue with replication ?

We have already setup 2 datacenters with one node on each datacenter and 
replication is working fine. 

Can you please let me know if this kind of setup is recommended for production 
deployment. 

Thanks in anticipation. 

 

Regards,

Kunal Vaid

RE: SASI queries- cqlsh vs java driver

2019-02-06 Thread Kenneth Brotman

Isn’t that a lot of SASI indexes for one table.  Could you denormalize more to 
reduce both columns per table and SASI indexes per table?  Eight SASI indexes 
on one table seems like a lot.

 

Kenneth Brotman

 

From: Peter Heitman [mailto:pe...@heitman.us] 
Sent: Tuesday, February 05, 2019 6:59 PM
To: user@cassandra.apache.org
Subject: Re: SASI queries- cqlsh vs java driver

 

The table and secondary indexes look generally like this. Note that I have 
changed the names of many of the columns to be generic since they aren't 
important to the question as far as I know. I left the actual names for those 
columns that I've created SASI indexes for. The query I use to try to create a 
PreparedStatement is:

 

SELECT sql_id, type, cpe_id, serial, product_class, manufacturer, sw_version 
FROM mytable WHERE serial IN :v0 LIMIT :limit0 ALLOW FILTERING

 

the schema cql statements are:

 

CREATE TABLE IF NOT EXISTS mykeyspace.mytable ( 

  id text,

  sql_id bigint,

  cpe_id text,

  sw_version text,

  hw_version text,

  manufacturer text,

  product_class text,

  manufacturer_oui text,

  description text,

  periodic_inform_interval text,

  restricted_mode_enabled text,

  restricted_mode_reason text,

  type text,

  model_name text,

  serial text,

  mac text,

   text,

  generic0 timestamp, 

  household_id text,

  generic1 int, 

  generic2 text,

  generic3 text,

  generic4 int,

  generic5 int,

  generic6 text,

  generic7 text,

  generic8 text,

  generic9 text,

  generic10 text,

  generic11 timestamp,

  generic12 text,

  generic13 text,

  generic14 timestamp,

  generic15 text,

  generic16 text,

  generic17 text,

  generic18 text,

  generic19 text,

  generic20 text,

  generic21 text,

  generic22 text,

  generic23 text,

  generic24 text,

  generic25 text,

  generic26 text,

  generic27 text,

  generic28 int,

  generic29 int,

  generic30 text,

  generic31 text,

  generic32 text,

  generic33 text,

  generic34 text,

  generic35 int,

  generic36 int,

  generic37 int,

  generic38 int,

  generic39 text,

  generic40 text,

  generic41 text,

  generic42 text,

  generic43 text,

  generic44 text,

  generic45 text,

  PRIMARY KEY (id)

);

 

CREATE INDEX IF NOT EXISTS bv_sql_id_idx ON mykeyspace.mytable (sql_id);

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_serial_idx ON mykeyspace.mytable (serial)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_cpe_id_idx ON mykeyspace.mytable (cpe_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_mac_idx ON mykeyspace.mytable (mac)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_idx ON mykeyspace.mytable 
(manufacturer)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_manufacturer_oui_idx ON mykeyspace.mytable 
(manufacturer_oui)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_hw_version_idx ON mykeyspace.mytable 
(hw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_sw_version_idx ON mykeyspace.mytable 
(sw_version)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

CREATE CUSTOM INDEX IF NOT EXISTS bv_household_id_idx ON mykeyspace.mytable 
(household_id)

   USING 'org.apache.cassandra.index.sasi.SASIIndex'

   WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'false'};

 

 

On Tue, Feb 5, 2019 at 3:33 PM Oleksandr Petrov  
wrote:

Could you post full table schema (names obfuscated, if required) with index 
creation statements and queries?

 

On Mon, Feb 4, 2019 at 10:04 AM Jacques-Henri Berthemet 
 wrote:

I’m

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman

If it’s a legacy write table why does it write 10% of the time?  Maybe it’s the 
design of the big legacy table you mentioned.  It could be so many things.  

 

Is it the same time of day? 

Same days of the week or month?  

Are there analytics run at that time?  

What are you using for monitoring and how did you find out it was happening?  

Is this a DSE cluster or OSS Cassandra cluster?

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 10:48 AM
To: user@cassandra.apache.org
Subject: Re: Help with sudden spike in read requests

 

We migrated one of the application from on-Prem to aws; the queries are very 
light, more like registration info;

 

Queries from the new app is via pk of data type, “text”, no cc (this table has 
about 200 rows; however the legacy table (more like reference table) has 
several million rows, about 800 sstables per node, using lcs (9:1, read-write 
ratio)

Subroto 


On Feb 1, 2019, at 10:33 AM, Kenneth Brotman  
wrote:

Do you have that many queries?  You could just review them and your data model 
to see if there was an error of some kind.  How long has it been happening?  
What changed since it started happening?

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 10:13 AM
To: user@cassandra.apache.org
Subject: Re: Help with sudden spike in read requests

 

Vnode is 256

C*: 3.0.15 on m4.4xlarge gp2 vol

 

There are 2 more DCs on bare metal (raid 10 and older machines) attached to 
this cluster and we have not seen this behavior on on-prem servers 

 

If this event is triggered by some bad query/queries, what is the best way to 
trap it?

Subroto 


On Feb 1, 2019, at 8:55 AM, Kenneth Brotman  
wrote:

If you had a query that went across the partitions and especially if you had 
vNodes set high, that would do it.

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 8:45 AM
To: User cassandra.apache.org <http://cassandraapache.org> 
Subject: Help with sudden spike in read requests

 

In our production cluster, we observed sudden spike (over 160 MB/s) in read 
requests on *all* Cassandra nodes for a very short period (less than a min); 
this event happens few times a day.

 

I am not able to get to the bottom of this issue, nothing interesting in 
system.log or from app level; repair was not running

 

Does anyone have any thoughts on what could have triggered this event? Under 
what condition C* (if it is tied to c*) will trigger this type of event?

 

Thanks!

 

Subroto

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman

Do you have that many queries?  You could just review them and your data model 
to see if there was an error of some kind.  How long has it been happening?  
What changed since it started happening?

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 10:13 AM
To: user@cassandra.apache.org
Subject: Re: Help with sudden spike in read requests

 

Vnode is 256

C*: 3.0.15 on m4.4xlarge gp2 vol

 

There are 2 more DCs on bare metal (raid 10 and older machines) attached to 
this cluster and we have not seen this behavior on on-prem servers 

 

If this event is triggered by some bad query/queries, what is the best way to 
trap it?

Subroto 


On Feb 1, 2019, at 8:55 AM, Kenneth Brotman  
wrote:

If you had a query that went across the partitions and especially if you had 
vNodes set high, that would do it.

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 8:45 AM
To: User cassandra.apache.org <http://cassandraapache.org> 
Subject: Help with sudden spike in read requests

 

In our production cluster, we observed sudden spike (over 160 MB/s) in read 
requests on *all* Cassandra nodes for a very short period (less than a min); 
this event happens few times a day.

 

I am not able to get to the bottom of this issue, nothing interesting in 
system.log or from app level; repair was not running

 

Does anyone have any thoughts on what could have triggered this event? Under 
what condition C* (if it is tied to c*) will trigger this type of event?

 

Thanks!

 

Subroto

RE: Help with sudden spike in read requests

2019-02-01 Thread Kenneth Brotman

If you had a query that went across the partitions and especially if you had 
vNodes set high, that would do it.

 

Kenneth Brotman

 

From: Subroto Barua [mailto:sbarua...@yahoo.com.INVALID] 
Sent: Friday, February 01, 2019 8:45 AM
To: User cassandra.apache.org
Subject: Help with sudden spike in read requests

 

In our production cluster, we observed sudden spike (over 160 MB/s) in read 
requests on *all* Cassandra nodes for a very short period (less than a min); 
this event happens few times a day.

 

I am not able to get to the bottom of this issue, nothing interesting in 
system.log or from app level; repair was not running

 

Does anyone have any thoughts on what could have triggered this event? Under 
what condition C* (if it is tied to c*) will trigger this type of event?

 

Thanks!

 

Subroto

RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman

Agreed that you tend to add capacity to nodes or add nodes once you know you 
have no unneeded data in the cluster.

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Wednesday, April 04, 2018 9:10 AM
To: user cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Hi,

When the disks are full, here are the options I can think of depending on the 
situation and how 'full' the disk really is:

- Add capacity - Add a disk, use JBOD adding a second location folder for the 
sstables and move some of them around then restart Cassandra. Or add a new node.
- Reduce disk space used. Some options come to my mind to reduce space used:

1 - Clean tombstones if any (use sstablemetadata for example to check the 
number of tombstones). If you have some not being purged, my first guess would 
be to set 'unchecked_tombstone_compaction' to 'true' at the node level. Yet be 
aware that this will trigger some compactions, that before freeing space, start 
by taking some more temporary!

If remaining space is really low on one node, you can control to compact only 
on the sstables having the higher tombstone ratio after you made the change 
above and that fit in the disk space you have left. It can even be scripted. It 
worked for me in the past with disk 100% full. If you do so, you might have to 
disable/reenable automatic compactions at key moments as well

2 -  If you added nodes recently to the data center you can consider running a 
'nodetool cleanup', but here again, it will start by using more space for 
temporary sstables, and might have no positive impacts if the node only own 
data for its token ranges.

3 - Another common way to easily claim space is to clear snapshots that are not 
needed and might have been forgotten or taken by Cassandra: 'nodetool 
clearsnapshot'. This has no other risk than removing a useful backup.

4 - Delete data from this table or another table (effectively), directly 
removing the sstables indeed - as you use TWCS. If you don't need the data 
anyway.

5 - Truncate one of those other tables we tend to have that are written 'just 
in case' and actually never used and never read for months. It has been a 
powerful way out of this situation for me in the past too :). I would say: be 
sure that the disk space is used properly.

There is zero reason to believe a full repair would make this better and a lot 
of reason to believe it’ll make it worse

I second that too, just in case. Really, do not run a repair. The only thing it 
could do is bring more data to a node that really don't need it for now.

Finally, when this is behind you, the disk size is something you could consider 
monitoring as it is way easier to fix it when the disk is not completely full 
and it can be fixed preemptively. Usually, 50 to 20% of free disk is 
recommended depending on your use case.

C*heers,

---

Alain Rodriguez - @arodream - al...@thelastpickle.com

France / Spain

The Last Pickle - Apache Cassandra Consulting

http://www.thelastpickle.com <http://wwwthelastpickle.com> 

2018-04-04 15:34 GMT+01:00 Kenneth Brotman <kenbrot...@yahoo.com.invalid>:

There's also the old snapshots to remove that could be a significant amount of 
memory.

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID]
Sent: Wednesday, April 04, 2018 7:28 AM
To: user@cassandra.apache.org
Subject: RE: Urgent Problem - Disk full

Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster,
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS.

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data

--
Jeff Jirsa

> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
> <juergen.albersdor...@zweiradteile.net> wrote:
>
> Hi,
>
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
>
> From Application Standpoint it would be safe to loose some of the oldest Data.
>
> Is it safe to delet

RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman

There's also the old snapshots to remove that could be a significant amount of 
memory.

-Original Message-
From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, April 04, 2018 7:28 AM
To: user@cassandra.apache.org
Subject: RE: Urgent Problem - Disk full

Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster, 
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS. 

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
> <juergen.albersdor...@zweiradteile.net> wrote:
> 
> Hi,
> 
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
> 
> From Application Standpoint it would be safe to loose some of the oldest Data.
> 
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
> 
> Or maybe there is a different way to free some disk space? - Any suggestions?
> 
> best regards
> Jürgen Albersdorfer
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman

Jeff,

Just wondering: why wouldn't the answer be to:
1. move anything you want to archive to colder storage off the cluster, 
2. nodetool cleanup
3. snapshot
4. use delete command to remove archived data.

Kenneth Brotman

-Original Message-
From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Wednesday, April 04, 2018 7:10 AM
To: user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

Yes, this works in TWCS. 

Note though that if you have tombstone compaction subproperties set, there may 
be sstables with newer filesystem timestamps that actually hold older Cassandra 
data, in which case sstablemetadata can help finding the sstables with truly 
old timestamps

Also if you’ve expanded the cluster over time and you see an imbalance of disk 
usage on the oldest hosts, “nodetool cleanup” will likely free up some of that 
data



-- 
Jeff Jirsa


> On Apr 4, 2018, at 4:32 AM, Jürgen Albersdorfer 
> <juergen.albersdor...@zweiradteile.net> wrote:
> 
> Hi,
> 
> I have an urgent Problem. - I will run out of disk space in near future.
> Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
> and default_time_to_live = 0
> Keyspace Replication Factor RF=3. I run C* Version 3.11.2
> We have grown the Cluster over time, so SSTable files have different Dates on 
> different Nodes.
> 
> From Application Standpoint it would be safe to loose some of the oldest Data.
> 
> Is it safe to delete some of the oldest SSTable Files, which will no longer 
> get touched by TWCS Compaction any more, while Node is clean Shutdown? - And 
> doing so for one Node after another?
> 
> Or maybe there is a different way to free some disk space? - Any suggestions?
> 
> best regards
> Jürgen Albersdorfer
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Urgent Problem - Disk full

2018-04-04 Thread Kenneth Brotman

Assuming the data model is good and there haven’t been any sudden jumps in 
memory use, it seems like the normal thing to do is archive some of the old 
time series data that you don’t care about.

 

Kenneth Brotman

 

From: Rahul Singh [mailto:rahul.xavier.si...@gmail.com] 
Sent: Wednesday, April 04, 2018 4:38 AM
To: user@cassandra.apache.org; user@cassandra.apache.org
Subject: Re: Urgent Problem - Disk full

 

Nothing a full repair won’t be able to fix. 


On Apr 4, 2018, 7:32 AM -0400, Jürgen Albersdorfer 
<juergen.albersdor...@zweiradteile.net>, wrote:



Hi,

I have an urgent Problem. - I will run out of disk space in near future.
Largest Table is a Time-Series Table with TimeWindowCompactionStrategy (TWCS) 
and default_time_to_live = 0
Keyspace Replication Factor RF=3. I run C* Version 3.11.2
We have grown the Cluster over time, so SSTable files have different Dates on 
different Nodes.

>From Application Standpoint it would be safe to loose some of the oldest Data.

Is it safe to delete some of the oldest SSTable Files, which will no longer get 
touched by TWCS Compaction any more, while Node is clean Shutdown? - And doing 
so for one Node after another?

Or maybe there is a different way to free some disk space? - Any suggestions?

best regards
Jürgen Albersdorfer

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

RE: Roadmap for 4.0

2018-03-30 Thread Kenneth Brotman

Someone with an Apache email address has insisted this topic be moved to the 
dev list and not on a Jira so in an effort the help the group concentrate on 
making progress, I’ll post this topic there.  

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Friday, March 30, 2018 2:44 PM
To: 'user@cassandra.apache.org'
Subject: RE: Roadmap for 4.0

 

Jira 14357:

https://issues.apache.org/jira/browse/CASSANDRA-14357

 

Just list any desired new major features for 4.0 that you want added.  I will 
maintain a compiled list somewhere on this Jira as well.  Don't worry about any 
steps beyond this.  Don't make any judgements about or make any comments at all 
about what others add. 

No judgments at this point.  This is a list of everyone's suggestions.  Add 
your suggestions for new major features you desire be added for version 4.0 
only.

 

Trust me.  This will get the ball rolling.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Friday, March 30, 2018 2:33 PM
To: user@cassandra.apache.org
Subject: RE: Roadmap for 4.0

 

Does anyone have a simple list of new major features desired for 4.0?  It 
should be a list of things desired regardless of judgements of any kind beyond 
that.  Just start with that  if you want to get anywhere.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Friday, March 30, 2018 7:30 AM
To: user@cassandra.apache.org
Subject: RE: Roadmap for 4.0

 

Thanks Ben!  

 

I’m reading a book on Cassandra right now that says “The 4.0 release series is 
scheduled to begin in Fall 2016.”  This is one of the group’s first big tests 
since things changed.  

 

Kenneth Brotman

 

From: Ben Bromhead [mailto:b...@instaclustr.com] 
Sent: Friday, March 30, 2018 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Roadmap for 4.0

 

After some further discussions with folks offline, I'd like to revive this 
discussion. 

 

As Kurt mentioned, to keep it simple I if we can simply build consensus around 
what is in for 4.0 and what is out. We can then start the process of working 
off a 4.0 branch towards betas and release candidates. Again as Kurt mentioned, 
assigning a timeline to it right now is difficult, but having a firm line in 
the sand around what features/patches are in, then limiting future 4.0 work to 
bug fixes will give folks a less nebulous target to work on. 

 

The other thing to mention is that once we have a 4.0 branch to work off, we at 
Instaclustr have a commitment to dogfooding the release candidates on our 
internal staging and internal production workloads before 4.0 becomes generally 
available. I know other folks have similar commitments and simply having a 4.0 
branch with a clear list of things that are in or out will allow everyone to 
start testing and driving towards a quality release. 

 

The other thing is that there are already a large number of changes ready for 
4.0, I would suggest not recommending tickets for 4.0 that have not yet been 
finished/have outstanding work unless you are the person working on it (or are 
offering to work on it instead) and can get it ready for review in a timely 
fashion. That way we can build a more realistic working target. For major 
breaking changes, there is always 5.0 or 4.1 or whatever we end up doing :)

 

Cheers

 

Ben

 

On Thu, Feb 15, 2018 at 9:39 PM kurt greaves <k...@instaclustr.com> wrote:

I don't believe Q3/Q4 is realistic, but I may be biased (or jaded). It's 
possible Q3/Q4 alpha/beta is realistic, but definitely not a release. 

Well, this mostly depends on how much stuff to include in 4.0. Either way it's 
not terribly important. If people think 2019 is more realistic we can aim for 
that. As I said, it's just a rough timeframe to keep in mind.

 

3.10 was released in January 2017, and we've got around 180 changes for 4.0 so 
far, and let's be honest, 3.11 is still pretty young so it's going to be a 
significant effort to properly test and verify 4.0. 

Let's just stick to getting a list of changes for the moment. I probably 
shouldn't have mentioned timeframes, let's just keep in mind that we shouldn't 
have such a large set of changes for 4.0 that it takes us years to complete.

 

All that said, what I really care about is building confidence in the release, 
which means an extended testing cycle. If all of those patches landed tomorrow, 
I'd still expect us to be months away from a release, because we need to bake 
the next major - there's too many changes to throw out an alpha/beta/rc and 
hope someone actually runs it. 

Yep. As I said, I'll follow up about testing after we sort out what we're 
actually going to include in 4.0. No point trying to come up with a testing 
plan for 

 

On 13 February 2018 at 04:25, Jeff Jirsa <jji...@gmail.com> wrote:

 

Advantages of cutting a release sooner than later:

1) The project needs to constantly progress forward. Releases are the m

RE: Roadmap for 4.0

2018-03-30 Thread Kenneth Brotman

Jira 14357:

https://issues.apache.org/jira/browse/CASSANDRA-14357

 

Just list any desired new major features for 4.0 that you want added.  I will 
maintain a compiled list somewhere on this Jira as well.  Don't worry about any 
steps beyond this.  Don't make any judgements about or make any comments at all 
about what others add. 

No judgments at this point.  This is a list of everyone's suggestions.  Add 
your suggestions for new major features you desire be added for version 4.0 
only.

 

Trust me.  This will get the ball rolling.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Friday, March 30, 2018 2:33 PM
To: user@cassandra.apache.org
Subject: RE: Roadmap for 4.0

 

Does anyone have a simple list of new major features desired for 4.0?  It 
should be a list of things desired regardless of judgements of any kind beyond 
that.  Just start with that  if you want to get anywhere.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Friday, March 30, 2018 7:30 AM
To: user@cassandra.apache.org
Subject: RE: Roadmap for 4.0

 

Thanks Ben!  

 

I’m reading a book on Cassandra right now that says “The 4.0 release series is 
scheduled to begin in Fall 2016.”  This is one of the group’s first big tests 
since things changed.  

 

Kenneth Brotman

 

From: Ben Bromhead [mailto:b...@instaclustr.com] 
Sent: Friday, March 30, 2018 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Roadmap for 4.0

 

After some further discussions with folks offline, I'd like to revive this 
discussion. 

 

As Kurt mentioned, to keep it simple I if we can simply build consensus around 
what is in for 4.0 and what is out. We can then start the process of working 
off a 4.0 branch towards betas and release candidates. Again as Kurt mentioned, 
assigning a timeline to it right now is difficult, but having a firm line in 
the sand around what features/patches are in, then limiting future 4.0 work to 
bug fixes will give folks a less nebulous target to work on. 

 

The other thing to mention is that once we have a 4.0 branch to work off, we at 
Instaclustr have a commitment to dogfooding the release candidates on our 
internal staging and internal production workloads before 4.0 becomes generally 
available. I know other folks have similar commitments and simply having a 4.0 
branch with a clear list of things that are in or out will allow everyone to 
start testing and driving towards a quality release. 

 

The other thing is that there are already a large number of changes ready for 
4.0, I would suggest not recommending tickets for 4.0 that have not yet been 
finished/have outstanding work unless you are the person working on it (or are 
offering to work on it instead) and can get it ready for review in a timely 
fashion. That way we can build a more realistic working target. For major 
breaking changes, there is always 5.0 or 4.1 or whatever we end up doing :)

 

Cheers

 

Ben

 

On Thu, Feb 15, 2018 at 9:39 PM kurt greaves <k...@instaclustr.com> wrote:

I don't believe Q3/Q4 is realistic, but I may be biased (or jaded). It's 
possible Q3/Q4 alpha/beta is realistic, but definitely not a release. 

Well, this mostly depends on how much stuff to include in 4.0. Either way it's 
not terribly important. If people think 2019 is more realistic we can aim for 
that. As I said, it's just a rough timeframe to keep in mind.

 

3.10 was released in January 2017, and we've got around 180 changes for 4.0 so 
far, and let's be honest, 3.11 is still pretty young so it's going to be a 
significant effort to properly test and verify 4.0. 

Let's just stick to getting a list of changes for the moment. I probably 
shouldn't have mentioned timeframes, let's just keep in mind that we shouldn't 
have such a large set of changes for 4.0 that it takes us years to complete.

 

All that said, what I really care about is building confidence in the release, 
which means an extended testing cycle. If all of those patches landed tomorrow, 
I'd still expect us to be months away from a release, because we need to bake 
the next major - there's too many changes to throw out an alpha/beta/rc and 
hope someone actually runs it. 

Yep. As I said, I'll follow up about testing after we sort out what we're 
actually going to include in 4.0. No point trying to come up with a testing 
plan for 

 

On 13 February 2018 at 04:25, Jeff Jirsa <jji...@gmail.com> wrote:

 

Advantages of cutting a release sooner than later:

1) The project needs to constantly progress forward. Releases are the most 
visible part of that.

2) Having a huge changelog in a release increases the likelihood of bugs that 
take time to find.

 

Advantages of a slower release:

1) We don't do major versions often, and when we do breaking changes (protocol, 
file format, etc), we should squeeze in as many as possible to avoid having to 
roll new majors 

2) There are probably few people actually

RE: Roadmap for 4.0

2018-03-30 Thread Kenneth Brotman

Does anyone have a simple list of new major features desired for 4.0?  It 
should be a list of things desired regardless of judgements of any kind beyond 
that.  Just start with that  if you want to get anywhere.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Friday, March 30, 2018 7:30 AM
To: user@cassandra.apache.org
Subject: RE: Roadmap for 4.0

 

Thanks Ben!  

 

I’m reading a book on Cassandra right now that says “The 4.0 release series is 
scheduled to begin in Fall 2016.”  This is one of the group’s first big tests 
since things changed.  

 

Kenneth Brotman

 

From: Ben Bromhead [mailto:b...@instaclustr.com] 
Sent: Friday, March 30, 2018 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Roadmap for 4.0

 

After some further discussions with folks offline, I'd like to revive this 
discussion. 

 

As Kurt mentioned, to keep it simple I if we can simply build consensus around 
what is in for 4.0 and what is out. We can then start the process of working 
off a 4.0 branch towards betas and release candidates. Again as Kurt mentioned, 
assigning a timeline to it right now is difficult, but having a firm line in 
the sand around what features/patches are in, then limiting future 4.0 work to 
bug fixes will give folks a less nebulous target to work on. 

 

The other thing to mention is that once we have a 4.0 branch to work off, we at 
Instaclustr have a commitment to dogfooding the release candidates on our 
internal staging and internal production workloads before 4.0 becomes generally 
available. I know other folks have similar commitments and simply having a 4.0 
branch with a clear list of things that are in or out will allow everyone to 
start testing and driving towards a quality release. 

 

The other thing is that there are already a large number of changes ready for 
4.0, I would suggest not recommending tickets for 4.0 that have not yet been 
finished/have outstanding work unless you are the person working on it (or are 
offering to work on it instead) and can get it ready for review in a timely 
fashion. That way we can build a more realistic working target. For major 
breaking changes, there is always 5.0 or 4.1 or whatever we end up doing :)

 

Cheers

 

Ben

 

On Thu, Feb 15, 2018 at 9:39 PM kurt greaves <k...@instaclustr.com> wrote:

I don't believe Q3/Q4 is realistic, but I may be biased (or jaded). It's 
possible Q3/Q4 alpha/beta is realistic, but definitely not a release. 

Well, this mostly depends on how much stuff to include in 4.0. Either way it's 
not terribly important. If people think 2019 is more realistic we can aim for 
that. As I said, it's just a rough timeframe to keep in mind.

 

3.10 was released in January 2017, and we've got around 180 changes for 4.0 so 
far, and let's be honest, 3.11 is still pretty young so it's going to be a 
significant effort to properly test and verify 4.0. 

Let's just stick to getting a list of changes for the moment. I probably 
shouldn't have mentioned timeframes, let's just keep in mind that we shouldn't 
have such a large set of changes for 4.0 that it takes us years to complete.

 

All that said, what I really care about is building confidence in the release, 
which means an extended testing cycle. If all of those patches landed tomorrow, 
I'd still expect us to be months away from a release, because we need to bake 
the next major - there's too many changes to throw out an alpha/beta/rc and 
hope someone actually runs it. 

Yep. As I said, I'll follow up about testing after we sort out what we're 
actually going to include in 4.0. No point trying to come up with a testing 
plan for 

 

On 13 February 2018 at 04:25, Jeff Jirsa <jji...@gmail.com> wrote:

 

Advantages of cutting a release sooner than later:

1) The project needs to constantly progress forward. Releases are the most 
visible part of that.

2) Having a huge changelog in a release increases the likelihood of bugs that 
take time to find.

 

Advantages of a slower release:

1) We don't do major versions often, and when we do breaking changes (protocol, 
file format, etc), we should squeeze in as many as possible to avoid having to 
roll new majors 

2) There are probably few people actually running 3.11 at scale, so probably 
few people actually testing trunk.

 

In terms of "big" changes I'd like to see land, the ones that come to mind are: 

 

https://issues.apache.org/jira/browse/CASSANDRA-9754 - "Birch" (changes file 
format)

https://issues.apache.org/jira/browse/CASSANDRA-13442 - Transient Replicas 
(probably adds new replication strategy or similar)

https://issues.apache.org/jira/browse/CASSANDRA-13628 - Rest of the internode 
netty stuff (no idea if this changes internode stuff, but I bet it's a lot 
easier if it lands on a major)

https://issues.apache.org/jira/browse/CASSANDRA-7622 - Virtual Tables (selfish 
inclusion, probably doesn't need to be a major at all, and I would

RE: Roadmap for 4.0

2018-03-30 Thread Kenneth Brotman

Thanks Ben!  

 

I’m reading a book on Cassandra right now that says “The 4.0 release series is 
scheduled to begin in Fall 2016.”  This is one of the group’s first big tests 
since things changed.  

 

Kenneth Brotman

 

From: Ben Bromhead [mailto:b...@instaclustr.com] 
Sent: Friday, March 30, 2018 6:57 AM
To: user@cassandra.apache.org
Subject: Re: Roadmap for 4.0

 

After some further discussions with folks offline, I'd like to revive this 
discussion. 

 

As Kurt mentioned, to keep it simple I if we can simply build consensus around 
what is in for 4.0 and what is out. We can then start the process of working 
off a 4.0 branch towards betas and release candidates. Again as Kurt mentioned, 
assigning a timeline to it right now is difficult, but having a firm line in 
the sand around what features/patches are in, then limiting future 4.0 work to 
bug fixes will give folks a less nebulous target to work on. 

 

The other thing to mention is that once we have a 4.0 branch to work off, we at 
Instaclustr have a commitment to dogfooding the release candidates on our 
internal staging and internal production workloads before 4.0 becomes generally 
available. I know other folks have similar commitments and simply having a 4.0 
branch with a clear list of things that are in or out will allow everyone to 
start testing and driving towards a quality release. 

 

The other thing is that there are already a large number of changes ready for 
4.0, I would suggest not recommending tickets for 4.0 that have not yet been 
finished/have outstanding work unless you are the person working on it (or are 
offering to work on it instead) and can get it ready for review in a timely 
fashion. That way we can build a more realistic working target. For major 
breaking changes, there is always 5.0 or 4.1 or whatever we end up doing :)

 

Cheers

 

Ben

 

On Thu, Feb 15, 2018 at 9:39 PM kurt greaves <k...@instaclustr.com> wrote:

I don't believe Q3/Q4 is realistic, but I may be biased (or jaded). It's 
possible Q3/Q4 alpha/beta is realistic, but definitely not a release. 

Well, this mostly depends on how much stuff to include in 4.0. Either way it's 
not terribly important. If people think 2019 is more realistic we can aim for 
that. As I said, it's just a rough timeframe to keep in mind.

 

3.10 was released in January 2017, and we've got around 180 changes for 4.0 so 
far, and let's be honest, 3.11 is still pretty young so it's going to be a 
significant effort to properly test and verify 4.0. 

Let's just stick to getting a list of changes for the moment. I probably 
shouldn't have mentioned timeframes, let's just keep in mind that we shouldn't 
have such a large set of changes for 4.0 that it takes us years to complete.

 

All that said, what I really care about is building confidence in the release, 
which means an extended testing cycle. If all of those patches landed tomorrow, 
I'd still expect us to be months away from a release, because we need to bake 
the next major - there's too many changes to throw out an alpha/beta/rc and 
hope someone actually runs it. 

Yep. As I said, I'll follow up about testing after we sort out what we're 
actually going to include in 4.0. No point trying to come up with a testing 
plan for 

 

On 13 February 2018 at 04:25, Jeff Jirsa <jji...@gmail.com> wrote:

 

Advantages of cutting a release sooner than later:

1) The project needs to constantly progress forward. Releases are the most 
visible part of that.

2) Having a huge changelog in a release increases the likelihood of bugs that 
take time to find.

 

Advantages of a slower release:

1) We don't do major versions often, and when we do breaking changes (protocol, 
file format, etc), we should squeeze in as many as possible to avoid having to 
roll new majors 

2) There are probably few people actually running 3.11 at scale, so probably 
few people actually testing trunk.

 

In terms of "big" changes I'd like to see land, the ones that come to mind are: 

 

https://issues.apache.org/jira/browse/CASSANDRA-9754 - "Birch" (changes file 
format)

https://issues.apache.org/jira/browse/CASSANDRA-13442 - Transient Replicas 
(probably adds new replication strategy or similar)

https://issues.apache.org/jira/browse/CASSANDRA-13628 - Rest of the internode 
netty stuff (no idea if this changes internode stuff, but I bet it's a lot 
easier if it lands on a major)

https://issues.apache.org/jira/browse/CASSANDRA-7622 - Virtual Tables (selfish 
inclusion, probably doesn't need to be a major at all, and I wouldn't even lose 
sleep if it slips, but I'd like to see it land)

 

Stuff I'm ok with slipping to 4.X or 5.0, but probably needs to land on a major 
because we'll change something big (like gossip, or the way schema is passed, 
etc):

 

https://issues.apache.org/jira/browse/CASSANDRA-9667 - Strongly consistent 
membership 

https://issues.apache.org/jira/browse/CASSANDRA-10699 - Strongly cons

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman

Properly Sizing Your Heap to Prevent OutOfMemoryErrors

https://support.datastax.com/hc/en-us/articles/204225929-Properly-Sizing-Your-Heap-to-Prevent-OutOfMemoryErrors

 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, March 28, 2018 5:35 AM
To: user@cassandra.apache.org
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

If you think that will fix the problem, maybe you could add a little more 
memory to each machine as a short term fix.

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 5:24 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

 

Yes ,we discussed and plan to figured out the data model issue and upgrade to 
3.11.3 version.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 20:16
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

David, 

 

Did you figure out what to do about the data model problem?  It could be that 
your data files finally grow to the point that the data model problem caused 
the Java heap space issue – in which case everything is actually working as 
it’s supposed to; You just have to fix the data model.

 

Kenneth Brotman

 

 

From: Kenneth Brotman [ <mailto:kenbrot...@yahoo.com> 
mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, March 28, 2018 4:46 AM
To: 'user@cassandra.apache.org'
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

Was any change to hardware done around the time the problem started ?

Was any change to the client software done around the time the problem started?

Was any change to the database schema done around the time the problem started?

 

Kenneth Brotman

 

From: Xiangfei Ni [ <mailto:xiangfei...@cm-dt.com> 
mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 4:40 AM
To:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

 

Hi Kenneth,

The cluster has been running for 4 months,

The problem occurred from last week,

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Kenneth Brotman < <mailto:kenbrot...@yahoo.com.INVALID> 
kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 19:34
收件人:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

David,

 

How long has the cluster been operating?

How long has the problem been occurring?

 

Kenneth Brotman

 

From: Jeff Jirsa [ <mailto:jji...@gmail.com> mailto:jji...@gmail.com] 
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6 nodes cluster

 

 

java.langOutOfMemoryError: Java heap space

 

 

You’re oom’ ing 

 

-- 

Jeff Jirsa

 


On Mar 27, 2018, at 6:45 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

Today another node was shutdown,I have attached the exception log 
file,could you please help to analyze?Thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa < <mailto:jji...@gmail.com> jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni < <mailto:xiangfei...@cm-dt.com> xiangfei...@cm-dt.com>
抄送:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob:  <tel:+86%20137%209700%207811> +86 13797007811|Tel:  
<tel:+86%2027%205024%202516> + 86 27 5024 2516

 

发件人: Jeff Jirsa < &l

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman

If you think that will fix the problem, maybe you could add a little more 
memory to each machine as a short term fix.

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 5:24 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

Yes ,we discussed and plan to figured out the data model issue and upgrade to 
3.11.3 version.

Best Regards, 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 20:16
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

David, 

Did you figure out what to do about the data model problem?  It could be that 
your data files finally grow to the point that the data model problem caused 
the Java heap space issue – in which case everything is actually working as 
it’s supposed to; You just have to fix the data model.

Kenneth Brotman

From: Kenneth Brotman [ <mailto:kenbrot...@yahoo.com> 
mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, March 28, 2018 4:46 AM
To: 'user@cassandra.apache.org'
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster

Was any change to hardware done around the time the problem started ?

Was any change to the client software done around the time the problem started?

Was any change to the database schema done around the time the problem started?

Kenneth Brotman

From: Xiangfei Ni [ <mailto:xiangfei...@cm-dt.com> 
mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 4:40 AM
To:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

Hi Kenneth,

The cluster has been running for 4 months,

The problem occurred from last week,

Best Regards, 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Kenneth Brotman < <mailto:kenbrot...@yahoo.com.INVALID> 
kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 19:34
收件人:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

David,

How long has the cluster been operating?

How long has the problem been occurring?

Kenneth Brotman

From: Jeff Jirsa [ <mailto:jji...@gmail.com> mailto:jji...@gmail.com] 
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6 nodes cluster

java.langOutOfMemoryError: Java heap space

You’re oom’ ing 

-- 

Jeff Jirsa

On Mar 27, 2018, at 6:45 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

Today another node was shutdown,I have attached the exception log 
file,could you please help to analyze?Thanks.

Best Regards, 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

发件人: Jeff Jirsa < <mailto:jji...@gmail.com> jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni < <mailto:xiangfei...@cm-dt.com> xiangfei...@cm-dt.com>
抄送:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

Best Regards, 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob:  <tel:+86%20137%209700%207811> +86 13797007811|Tel:  
<tel:+86%2027%205024%202516> + 86 27 5024 2516

发件人: Jeff Jirsa < <mailto:jji...@gmail.com> jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人:  <mailto:user@cassandra.apache.org> user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

That warning isn’t sufficient to understand why the node is going down

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

A

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman

David, 

 

Did you figure out what to do about the data model problem?  It could be that 
your data files finally grow to the point that the data model problem caused 
the Java heap space issue – in which case everything is actually working as 
it’s supposed to; You just have to fix the data model.

 

Kenneth Brotman

 

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Wednesday, March 28, 2018 4:46 AM
To: 'user@cassandra.apache.org'
Subject: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

Was any change to hardware done around the time the problem started ?

Was any change to the client software done around the time the problem started?

Was any change to the database schema done around the time the problem started?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 4:40 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

 

Hi Kenneth,

The cluster has been running for 4 months,

The problem occurred from last week,

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 19:34
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

David,

 

How long has the cluster been operating?

How long has the problem been occurring?

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc: user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6 nodes cluster

 

 

java.langOutOfMemoryError: Java heap space

 

 

You’re oom’ ing 

 

-- 

Jeff Jirsa

 


On Mar 27, 2018, at 6:45 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

Today another node was shutdown,I have attached the exception log 
file,could you please help to analyze?Thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-180.jar:na]

at com.go

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman

Was any change to hardware done around the time the problem started ?

Was any change to the client software done around the time the problem started?

Was any change to the database schema done around the time the problem started?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Wednesday, March 28, 2018 4:40 AM
To: user@cassandra.apache.org
Subject: 答复: 答复: 答复: A node down every day in a 6 nodes cluster

 

Hi Kenneth,

The cluster has been running for 4 months,

The problem occurred from last week,

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
发送时间: 2018年3月28日 19:34
收件人: user@cassandra.apache.org
主题: RE: 答复: 答复: A node down every day in a 6 nodes cluster

 

David,

 

How long has the cluster been operating?

How long has the problem been occurring?

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc: user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6 nodes cluster

 

 

java.langOutOfMemoryError: Java heap space

 

 

You’re oom’ ing 

 

-- 

Jeff Jirsa

 


On Mar 27, 2018, at 6:45 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

Today another node was shutdown,I have attached the exception log 
file,could you please help to analyze?Thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-180.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(Authe

RE: 答复: 答复: A node down every day in a 6 nodes cluster

2018-03-28 Thread Kenneth Brotman

David,

 

How long has the cluster been operating?

How long has the problem been occurring?

 

Kenneth Brotman

 

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Tuesday, March 27, 2018 7:00 PM
To: Xiangfei Ni
Cc: user@cassandra.apache.org
Subject: Re: 答复: 答复: A node down every day in a 6 nodes cluster

 

 

java.langOutOfMemoryError: Java heap space

 

 

You’re oom’ ing 

 

-- 

Jeff Jirsa

 


On Mar 27, 2018, at 6:45 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

Today another node was shutdown,I have attached the exception log 
file,could you please help to analyze?Thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-180.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:211)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryPro

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman

First, anything Jeff Jirsa says is likely very accurate, like it being a really 
good idea to also get off the version you’re on and onto a version that fixes 
some of the known problems of the version you’re one.

 

Replacing a running node:

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html

 

Kenneth Brotman

 

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 5:44 AM
To: user@cassandra.apache.org
Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster

 

Thanks,Kenneth,this is production database,and it is one of three seed nodes,do 
you have doc for replacing a seed node?

 

 

 

发自我的小米手机

在 Kenneth Brotman <kenbrot...@yahoo.com.INVALID>，2018年3月27日 下午7:45写道：

David,

 

Can you replace the misbehaving node to see if that resolves the problem?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes cluster

 

Thanks Jeff,

   So your suggestion is to first resolve the data model issue which 
cause wide partition,right?

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.

RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman

David,

 

Can you replace the misbehaving node to see if that resolves the problem?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes cluster

 

Thanks Jeff,

   So your suggestion is to first resolve the data model issue which 
cause wide partition,right?

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:211)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.ap

A JIRA on a third party web site to be curated by Rahul Singh to Gather / Curate / Organize Cassandra Best Practice and Patterns

2018-03-17 Thread Kenneth Brotman

JIRA CASSANDRA-14321: https://issues.apache.org/jira/browse/CASSANDRA-14321

 

It is clear that various types of users would find a third party website to
be invaluable that has up to date,  complete and organized information (and
links) to various Casandra resources including to third party providers.
Such as website would need to be under the management of someone
independently to address the concerns of many that having links to third
parties on the official Apache Cassandra website is problematic.

 

I'm a quick "+1" for Rahul Singh as the curator of such a site should he be
willing to do it.  He already has the awesome Cassandra list
https://github.com/anant/awesome-cassandra .  He has put a lot of work into
learning how to do this type of thing.  It would solve a lot of problems for
us.

 

Kenneth Brotman

RE: [EXTERNAL] RE: What versions should the documentation support now?

2018-03-14 Thread Kenneth Brotman

I don’t think it’s acceptable to have a site that’s “just poor with holes all 
over, goofy examples..”  The documents are a reflection of the quality 
standards of the group.  Why would the testing of the software be any better?  
It sends up red flags to me Sean.  I’m very concerned about whether the group 
can manage this project when read things like that!

  

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Wednesday, March 14, 2018 12:40 PM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: What versions should the documentation support now?

 

The DataStax documentation is far superior to the Apache Cassandra attempts. 
Apache is just poor with holes all over, goofy examples, etc. It would take a 
team of people working full time to try and catch up with DataStax. I have met 
the DataStax team; they are doing good work. I think it would be far more 
effective to support/encourage the DataStax documentation efforts. I think they 
accept corrections/suggestions – perhaps publish that email address…

 

What is missing most from DataStax (and most software) is the discussions of 
why/when you would change a particular parameter and what should change if the 
parameter changes. If DataStax created a community comments section (somewhat 
similar to what MySQL tried), that would be something worth contributing to. I 
love good docs (like DataStax); Apache Cassandra is hopelessly behind.

 

And, yes, the good documentation from DataStax was a strong reason why our 
company pursued Cassandra as a data technology. It was better than almost any 
other open source project we knew.

 

(Please, let’s refrain from the high pri emails to the user group list…)

 

 

Sean Durity

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Wednesday, March 14, 2018 3:02 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: What versions should the documentation support now?
Importance: High

 

This went nowhere quick.  Come on everyone.  The website has to support users 
who are on “supported” versions of the software.  That’s more than one version. 
 There was a JIRA on this months ago.  You are smart people. I just gave a 
perfect answer and ended up burning a bunch of time for nothing.  Now its back 
on you.  Are you going to properly support the software you create or not!

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 13, 2018 11:03 PM
To: user@cassandra.apache.org
Subject: RE: What versions should the documentation support now?

 

I made sub directories “2_x” and “3_x” under docs and put a copy of the doc in 
each.  No links were changed yet.  We can work on the files first and discuss 
how we want to change the template and links.  I did the pull request already.

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Tuesday, March 13, 2018 6:19 PM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

Yes, I agree, we should host versioned docs.  I don't think anyone is against 
it, it's a matter of someone having the time to do it.

 

On Tue, Mar 13, 2018 at 6:14 PM kurt greaves <k...@instaclustr.com> wrote:

I’ve never heard of anyone shipping docs for multiple versions, I don’t know 
why we’d do that.  You can get the docs for any version you need by downloading 
C*, the docs are included.  I’m a firm -1 on changing that process.

We should still host versioned docs on the website however. Either that or we 
specify "since version x" for each component in the docs with notes on 
behaviour.



 

  _  


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: What versions should the documentation support now?

2018-03-14 Thread Kenneth Brotman

This went nowhere quick.  Come on everyone.  The website has to support users 
who are on “supported” versions of the software.  That’s more than one version. 
 There was a JIRA on this months ago.  You are smart people. I just gave a 
perfect answer and ended up burning a bunch of time for nothing.  Now its back 
on you.  Are you going to properly support the software you create or not!

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
Sent: Tuesday, March 13, 2018 11:03 PM
To: user@cassandra.apache.org
Subject: RE: What versions should the documentation support now?

 

I made sub directories “2_x” and “3_x” under docs and put a copy of the doc in 
each.  No links were changed yet.  We can work on the files first and discuss 
how we want to change the template and links.  I did the pull request already.

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Tuesday, March 13, 2018 6:19 PM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

Yes, I agree, we should host versioned docs.  I don't think anyone is against 
it, it's a matter of someone having the time to do it.

 

On Tue, Mar 13, 2018 at 6:14 PM kurt greaves <k...@instaclustr.com> wrote:

I’ve never heard of anyone shipping docs for multiple versions, I don’t know 
why we’d do that.  You can get the docs for any version you need by downloading 
C*, the docs are included.  I’m a firm -1 on changing that process.

We should still host versioned docs on the website however. Either that or we 
specify "since version x" for each component in the docs with notes on 
behaviour.

RE: RE: What versions should the documentation support now?

2018-03-14 Thread Kenneth Brotman

I show a 3.0 and a 3.11 branch but no 4.0.  I’m at 
https://github.com/apache/cassandra .

From: Dinesh Joshi [mailto:dinesh.jo...@yahoo.com.INVALID] 
Sent: Tuesday, March 13, 2018 11:30 PM
To: user@cassandra.apache.org
Subject: Re: RE: What versions should the documentation support now?

Kenneth,

The only 4.x docs should go in trunk. If you would like to contribute docs to 
the 2.x and/or 3.x releases, please make pull requests against branches for 
those versions.

During normal development process, the docs should be updated in trunk. When a 
release is cut from trunk, any further fixes to the docs pertaining to that 
release should go into that branch. This is in principle the same process that 
the code follows. So the docs will live with their respective branches. You 
should not put the documentation for older releases in trunk because it will 
end up confusing the user.

It looks like the in-tree docs were introduced in 4.x. They seem to also be 
present in the 3.11 branch. If you're inclined, you might back port them to the 
older 3.x & 2.x releases and update them.

Personally, I think focusing on making the 4.x docs awesome is a better use of 
your time.

Thanks,

Dinesh

On Tuesday, March 13, 2018, 11:03:04 PM PDT, Kenneth Brotman 
<kenbrot...@yahoo.com.INVALID> wrote: 

I made sub directories “2_x” and “3_x” under docs and put a copy of the doc in 
each.  No links were changed yet.  We can work on the files first and discuss 
how we want to change the template and links.  I did the pull request already.

Kenneth Brotman

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Tuesday, March 13, 2018 6:19 PM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

Yes, I agree, we should host versioned docs.  I don't think anyone is against 
it, it's a matter of someone having the time to do it.

On Tue, Mar 13, 2018 at 6:14 PM kurt greaves <k...@instaclustr.com> wrote:

I’ve never heard of anyone shipping docs for multiple versions, I don’t know 
why we’d do that.  You can get the docs for any version you need by downloading 
C*, the docs are included.  I’m a firm -1 on changing that process.

We should still host versioned docs on the website however. Either that or we 
specify "since version x" for each component in the docs with notes on 
behaviour.

RE: What versions should the documentation support now?

2018-03-14 Thread Kenneth Brotman

I made sub directories “2_x” and “3_x” under docs and put a copy of the doc in 
each.  No links were changed yet.  We can work on the files first and discuss 
how we want to change the template and links.  I did the pull request already.

 

Kenneth Brotman

 

From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Tuesday, March 13, 2018 6:19 PM
To: user@cassandra.apache.org
Subject: Re: What versions should the documentation support now?

 

Yes, I agree, we should host versioned docs.  I don't think anyone is against 
it, it's a matter of someone having the time to do it.

 

On Tue, Mar 13, 2018 at 6:14 PM kurt greaves <k...@instaclustr.com> wrote:

I’ve never heard of anyone shipping docs for multiple versions, I don’t know 
why we’d do that.  You can get the docs for any version you need by downloading 
C*, the docs are included.  I’m a firm -1 on changing that process.

We should still host versioned docs on the website however. Either that or we 
specify "since version x" for each component in the docs with notes on 
behaviour.

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

Also to check:

 

You should use the same list of seeds, probably two in each data center if you 
will have five nodes in each, in all the yaml files.  All the seeds node 
addresses from all the data centers listed in each yaml file where it says 
“-seeds:”.  I’m not sure from your previous replies if you’re doing that.

 

Let us know your results.

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kun

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

While we are looking into all this I feel compelled to ask you to check your 
security configurations now that you are using public addresses to communicate 
inter-node across data centers.  Are you sure you are using best practices?  

 

Kenneth Brotman

 

From: Kenneth Brotman [mailto:kenbrot...@yahoo.com] 
Sent: Monday, March 12, 2018 7:14 PM
To: 'user@cassandra.apache.org'
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject

RE: [EXTERNAL] RE: Adding new DC?

2018-03-12 Thread Kenneth Brotman

Kunal,

 

Sorry for asking you things you already answered.  You provided a lot of good 
information and you know what you’re are doing.  It’s going to be something 
really simple to figure out.  While I read through the thread more closely, I’m 
guessing we are right on top of it so could I ask you:

 

Please read through 
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configMultiNetworks.html
 as it probably has the answer.  

 

One of things it says specifically is: 

Additional cassandra.yaml configuration for non-EC2 implementations

If multiple network interfaces are used in a non-EC2 implementation, enable 
thelisten_on_broadcast_address option.

listen_on_broadcast_address: true

In non-EC2 environments, the public address to private address routing is not 
automatically enabled. Enabling listen_on_broadcast_address allows DSE to 
listen on both listen_address andbroadcast_address with two network interfaces.

 

Please consider that specially and be sure everything else it mentions is done

 

You said you changed the broadcast_rpc_address in one of the instances in GCE 
and saw a change.  Did you update the other nodes in GCE?  And then restarted 
each one (in a rolling manner)?

 

Did you restart each node in each datacenter starting with the seed nodes since 
you last updated a yaml file?

 

Could the client in your application be causing the problem?  

 

Kenneth Brotman

 

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:43 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

Yes, that's correct. The customer wants us to migrate the cassandra setup in 
their AWS account.

 

Thanks,


Kunal

 

On 13 March 2018 at 04:56, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

I didn’t understand something.  Are you saying you are using one data center on 
Google and one on Amazon?

 

Kenneth Brotman

 

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Monday, March 12, 2018 4:24 PM
To: user@cassandra.apache.org
Cc: Nikhil Soman
Subject: Re: [EXTERNAL] RE: Adding new DC?

 

 

On 13 March 2018 at 03:28, Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote:

You can’t migrate and upgrade at the same time perhaps but you could do one and 
then the other so as to end up on new version.  I’m guessing it’s an error in 
the yaml file or a port not open.  Is there any good reason for a production 
cluster to still be on version 2.1x?

 

I'm not trying to migrate AND upgrade at the same time. However, the apt repo 
shows only 2.120 as the available version.

This is the output from the new node in AWS

 

ubuntu@ip-10-0-43-213:~$ apt-cache policy cassandra 
cassandra: 
 Installed: 2.1.20 
 Candidate: 2.1.20 
 Version table: 
*** 2.1.20 500 
   500 http://www.apache.org/dist/cassandra/debian 21x/main amd64 Packages 
   100 /var/lib/dpkg/status

Regarding open ports, I can cqlsh into the GCE node(s) from the AWS node into 
GCE nodes.

As I mentioned earlier, I've opened the ports 9042, 7000, 7001 in GCE firewall 
for the public IP of the AWS instance.

 

I mentioned earlier - there are some differences in the column types - for 
example, date (>= 2.2) vs. timestamp (2.1.x)

The application has not been updated yet.

Hence sticking to 2.1.x for now.

 

And, so far, 2.1.x has been serving the purpose.



Kunal

 

 

Kenneth Brotman

 

From: Durity, Sean R [mailto:sean_r_dur...@homedepot.com] 
Sent: Monday, March 12, 2018 11:36 AM
To: user@cassandra.apache.org
Subject: RE: [EXTERNAL] RE: Adding new DC?

 

You cannot migrate and upgrade at the same time across major versions. 
Streaming is (usually) not compatible between versions.

 

As to the migration question, I would expect that you may need to put the 
external-facing ip addresses in several places in the cassandra.yaml file. And, 
yes, it would require a restart. Why is a non-restart more desirable? Most 
Cassandra changes require a restart, but you can do a rolling restart and not 
impact your application. This is fairly normal admin work and can/should be 
automated.

 

How large is the cluster to migrate (# of nodes and size of data). The 
preferred method might depend on how much data needs to move. Is any 
application outage acceptable?

 

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

From: Kunal Gangakhedkar [mailto:kgangakhed...@gmail.com] 
Sent: Sunday, March 11, 2018 10:20 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] RE: Adding new DC?

 

Hi Kenneth,

 

Replies inline below.

 

On 12-Mar-2018 3:40 AM, "Kenneth Brotman" <kenbrot...@yahoo.com.invalid> wrote:

Hi Kunal,

 

That version of Cassandra is too far before me so I’ll let others answer.  I 
was wonder why you wouldn’t want to end up on 3.0x if you’re going through all 
the trouble of migrating anyway?  

 

 

Application side constraints - some data types are d

RE: command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman

You say the nicest things!

From: Jeff Jirsa [mailto:jji...@gmail.com] 
Sent: Monday, March 12, 2018 6:43 PM
To: user@cassandra.apache.org
Subject: Re: command to view yaml file setting in use on console

Cassandra-7622 went patch available today 

-- 

Jeff Jirsa

On Mar 12, 2018, at 6:40 PM, Kenneth Brotman <kenbrot...@yahoo.com.INVALID> 
wrote:

Is there a command, perhaps a nodetool command to view the actual yaml settings 
a node is using so you can confirm it is using the changes to a yaml file you 
made?

Kenneth Brotman

command to view yaml file setting in use on console

2018-03-12 Thread Kenneth Brotman

Is there a command, perhaps a nodetool command to view the actual yaml
settings a node is using so you can confirm it is using the changes to a
yaml file you made?

 

Kenneth Brotman

1 2 >

1 - 100 of 166 matches

Mail list logo