RE: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

2019-06-17 Thread Durity, Sean R
The advice so far is exactly correct for an in-place kind of upgrade. The blog post you mentioned is different. They decided to jump versions in Cassandra by standing up a new cluster and using a dual-write/dual-read process for their app. They also wrote code to read and interpret sstables in

Re: [EXTERNAL] Re: Sstableloader

2019-05-30 Thread Goetz, Anthony
It appears you have two goals you are trying to accomplish at the same time. My recommendation is to break it into two different steps. You need to decide if you are going to upgrade DSE or OSS. * Upgrade DSE then migrate to OSS * Upgrade DSE to version that matches OSS 3.11.3

RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R
This may sound a bit harsh, but I teach my developers that if they are trying to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for its high availability and scalability characteristics. We love no downtime. ALLOW FILTERING is breaking the rules of availability and

RE: [EXTERNAL] Re: Python driver concistency problem

2019-05-28 Thread Durity, Sean R
This is a stretch, but are you using authentication and/or authorization? In my understanding the queries executed for you to do the authentication and/or authorization are usually done at LOCAL_ONE (or QUORUM for cassandra user), but maybe there is something that is changed in the security

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-16 Thread Ahmed Eljami
The issue is fixed with nodetool scrub, now both rows are under the same clustering. I'll open a jira to analyze the source of this issue with Cassandra 3.11.3 Thanks. Le jeu. 16 mai 2019 à 04:53, Jeff Jirsa a écrit : > I don’t have a good answer for you - I don’t know if scrub will fix this

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
I don’t have a good answer for you - I don’t know if scrub will fix this (you could copy an sstable offline and try it locally in ccm) - you may need to delete and reinsert, though I’m really interested in knowing how this happened if you weren’t ever exposed to #14008. Can you open a JIRA?

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Jeff, In this case is there any solution to resolve that directly in the sstable (compact, scrub...) or we have to apply a batch on the client level (delete a partition and re write it)? Thank you for your reply. Le mer. 15 mai 2019 à 18:09, Ahmed Eljami a écrit : > effectively, this was

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
effectively, this was written in 2.1.14 and we upgrade to 3.11.3 so we should not be impacted by this issue ?! thanks

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-14008 If this was written in 2.1/2.2 and you upgraded to 3.0.x (x < 16) or 3.1-3.11.1, could be this issue. -- Jeff Jirsa > On May 15, 2019, at 8:43 AM, Ahmed Eljami wrote: > > What about this part of the dump: > > "type" : "row", >

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
What about this part of the dump: "type" : "row", "position" : 4123, "clustering" : [ "", "Token", "abcd", "" ], "cells" : [ { "name" : "dvalue", "value" : "", "tstamp" : "2019-04-26T17:20:39.910Z", "ttl" : 31708792, "expires_at" : "2020-04-27T17:20:31Z",

Re: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Ahmed Eljami
Hi Sean, Thanks for reply, I'm agree with you about uniquness but when the output of sstabledump show that we have the same value for the column g => "clustering" : [ "", "Token", "abcd", "" ], and when we select with the whole primary key with the valuers wich I see in the sstable, cqlsh

RE: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Durity, Sean R
Uniqueness is determined by the partition key PLUS the clustering columns. Hard to tell from your data below, but is it possible that one of the clustering columns (perhaps g) has different values? That would easily explain the 2 rows returned – because they ARE different rows in the same

RE: [EXTERNAL] Re: Using Cassandra as an object store

2019-04-19 Thread Durity, Sean R
Object stores are some of our largest and oldest use cases. Cassandra has been a good choice for us. We do chunk the objects into 64k chunks (I think), so that partitions are not too large and it scales predictably. For us, the choice was more about high availability and scalability, which

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jonathan Koppenhofer
y for your ... > > > -- > > *From:* Jon Haddad > *Sent:* Thursday, April 18, 2019 6:43:15 PM > *To:* user@cassandra.apache.org > *Subject:* Re: [EXTERNAL] multiple Cassandra instances per server, > possible? > > Agreed with Jeff here.

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread William R
gt; > From: Jon Haddad > Sent: Thursday, April 18, 2019 6:43:15 PM > To: user@cassandra.apache.org > Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible? > > Agreed with Jeff here. The whole "community recommends no more than > 1TB" has been ar

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jacques-Henri Berthemet
he.org Subject: Re: [EXTERNAL] multiple Cassandra instances per server, possible? Agreed with Jeff here. The whole "community recommends no more than 1TB" has been around, and inaccurate, for a long time. The biggest issue with dense nodes is how long it takes to replace them.

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jon Haddad
Agreed with Jeff here. The whole "community recommends no more than 1TB" has been around, and inaccurate, for a long time. The biggest issue with dense nodes is how long it takes to replace them. 4.0 should help with that under certain circumstances. On Thu, Apr 18, 2019 at 6:57 AM Jeff Jirsa

Re: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Jeff Jirsa
Agreed that you can go larger than 1T on ssd You can do this safely with both instances in the same cluster if you guarantee two replicas aren’t on the same machine. Cassandra provides a primitive to do this - rack awareness through the network topology snitch. The limitation (until 4.0) is

RE: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Durity, Sean R
What is the data problem that you are trying to solve with Cassandra? Is it high availability? Low latency queries? Large data volumes? High concurrent users? I would design the solution to fit the problem(s) you are solving. For example, if high availability is the goal, I would be very

Re: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Krishnanand Khambadkone
Thank you gentlemen for all your responses.   Reading through them I was able to resolve the issue by doing the following, a.  Creating an index on one of the query fieldsb.  Setting page size to 200 Now the query runs instantaneously. On Wednesday, April 17, 2019, 7:12:21 AM PDT, Shaurya

Re: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Shaurya Gupta
As already mentioned in this thread, ALLOW FILTERING should be avoided in any scenario. It seems to work in test scenarios, but as soon as the data increases to certain size(a few MBs), it starts failing miserably and fails almost always. Thanks Shaurya On Wed, Apr 17, 2019, 6:44 PM Durity,

RE: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Durity, Sean R
If you are just trying to get a sense of the data, you could try adding a limit clause to limit the amount of results and hopefully beat the timeout. However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND CLUSTER." It means the data model does not support the query and

Re: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-12 Thread Jean Carlo
I think this jira https://issues.apache.org/jira/browse/CASSANDRA-9895 Answer my question Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Fri, Apr 12, 2019 at 10:04 AM Jean Carlo wrote: > Hello Sean > > Well this is a little bit confusing. After digging

Re: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-12 Thread Jean Carlo
Hello Sean Well this is a little bit confusing. After digging into the doc, I found this old documentation of Datastax that says "First, we can dynamically adjust behavior depending on the cluster size and arrangement. Cassandra prefers to perform batchlog writes to two different replicas in the

RE: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-11 Thread Durity, Sean R
https://issues.apache.org/jira/browse/CASSANDRA-9620 has something similar that was determined to be a driver error. I would start with looking at the driver version and also the RetryPolicy that is in effect for the Cluster. Secondly, I would look at whether a batch is really needed for the

Re: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Mahesh Daksha
Thank you Sean for your response. We are also suspecting the same and analyzing/troubleshooting it around queries associated timestamp. Thanks, Mahesh Daksha On Tue, Apr 9, 2019 at 7:08 PM Durity, Sean R wrote: > My first suspicion would be to look at the server times in the cluster. It >

RE: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Durity, Sean R
My first suspicion would be to look at the server times in the cluster. It looks like other cases where a write occurs (with no errors) but the data is not retrieved as expected. If the write occurs with an earlier timestamp than the existing data, this is the behavior you would see. The write

Re: [EXTERNAL] Re: Garbage Collector

2019-03-22 Thread Ahmed Eljami
Thx guys for sharing your experiences with G1. Since I sent you my question about GC, we have updated the version of java. Always with CMS/java8 and updating from u9x to u201. Just with that, we observe a gain of 66% (150ms ==> 50ms of STW) :) We are planning a second tuning, this time with

RE: [EXTERNAL] Re: Garbage Collector

2019-03-19 Thread Durity, Sean R
My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB for the JVM). That has worked in just about every case I’m familiar with. In the old days we used CMS, but tuning that beast is a black art with few wizards available (though several on this mailing list). Today, I

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Rahul Singh
> - Do this until the data has been copied from old to new (with > dsbulk or custom code or Spark) > > - Drop the double writes and conditional reads > > > > > > Sean > > > > *From:* Stefan Miklosovic > *Sent:* Wednesday, March 13,

RE: [EXTERNAL] Re: Default TTL on CF

2019-03-14 Thread Durity, Sean R
I spent a month of my life on similar problem... There wasn't an easy answer, but this is what I did #1 - Stop the problem from growing further. Get new inserts using a TTL (or set the default on the table so they get it). App team had to do this one. #2 - Delete any data that should already

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Durity, Sean R
ser@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option. Hi Leena, as already suggested in my previous email, you could use Apache Spark and Cassandra Spark connector (1). I have checked TTLs an

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-14 Thread Ahmed Eljami
So less vnodes allows more nodes, I understand. But, It still hard to implement on existing cluster with more than 10 Keyspaces with different RF...

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Gregory Raevski
nd Regards, *Gregory Raevski* On Thu, 14 Mar 2019 at 05:11, Durity, Sean R wrote: > Rebuild the DCs with a new number of vnodes… I have done it. > > > > Sean > > > > *From:* Ahmed Eljami > *Sent:* Wednesday, March 13, 2019 2:09 PM > *To:* user@cassandra.apache.org &

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Stefan Miklosovic
copy 70M rows from current table to the 2nd table with ttl set > on each record as the first table? > > -- > *From:* Durity, Sean R > *Sent:* Wednesday, March 13, 2019 8:17 AM > *To:* user@cassandra.apache.org > *Subject:* RE: [EXTERNAL] Re: Migrate l

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
Rebuild the DCs with a new number of vnodes… I have done it. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 2:09 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Cluster size "limit" Is not possible with an existing cluster! Le mer. 13 mars 2019 à 18:39, Duri

Re: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Ahmed Eljami
Is not possible with an existing cluster! Le mer. 13 mars 2019 à 18:39, Durity, Sean R a écrit : > If you can change to 8 vnodes, it will be much better for repairs and > other kinds of streaming operations. The old advice of 256 per node is now > not very helpful. > > > > Sean > > > > *From:*

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
If you can change to 8 vnodes, it will be much better for repairs and other kinds of streaming operations. The old advice of 256 per node is now not very helpful. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 1:27 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cluster size

Re: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Leena Ghatpande
@cassandra.apache.org Subject: RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option. Correct, there is no current flag. I think there SHOULD be one. From: Dieudonné Madishon NGAYA Sent: Tuesday, March 12, 2019 7:17 PM

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Durity, Sean R
Correct, there is no current flag. I think there SHOULD be one. From: Dieudonné Madishon NGAYA Sent: Tuesday, March 12, 2019 7:17 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an

RE: [EXTERNAL] Re: A Question About Hints

2019-03-05 Thread Durity, Sean R
Versions 2.0 and 2.1 were generally very stable, so I can understand a reticence to move when there are so many other things competing for time and attention. Sean Durity From: shalom sagges Sent: Monday, March 04, 2019 4:21 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: A

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Peter Heitman
I appreciate the thoughtful replies. We will have to evaluate whether cassandra is the right datastore for us. It was chosen because our primary requirement is to store lots of data about lots of devices at a high rate. The search requirements are very secondary but required for the management of

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Jonathan Haddad
If the goal is arbitrary queries, I'd avoid Cassandra altogether. Don't use DSE Search or Ellesandra, they're two solutions designed to solve problems that are Cassandra first, search second. I'd go straight to elastic search for workloads that are primarily search driven, like you listed above.

Re: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-27 Thread Rahul Singh
+1 on Datastax and could consider looking at Elassandra. On Thu, Feb 7, 2019 at 9:14 AM Durity, Sean R wrote: > Kenneth is right. Trying to port/support a relational model to a CQL model > the way you are doing it is not going to go well. You won’t be able to > scale or get the search

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
. From: Alexander Dejanovski Sent: Wednesday, February 27, 2019 9:22 AM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Question on changing node IP address It has to be balanced with the dangers related to the PropertyFileSnitch. I've seen such incidents happen twice in the last few

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Oleksandr Shulgin
On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R wrote: > We use the PropertyFileSnitch precisely because it is the same on every > node. If each node has to have a different file (for GPFS) – deployment is > more complicated. (And for any automated configuration you would have a > list of hosts

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Alexander Dejanovski
It has to be balanced with the dangers related to the PropertyFileSnitch. I've seen such incidents happen twice in the last few months in different places and both times recovery was difficult and hazardous. I still strongly recommend against it. On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
We use the PropertyFileSnitch precisely because it is the same on every node. If each node has to have a different file (for GPFS) – deployment is more complicated. (And for any automated configuration you would have a list of hosts and DC/rack information to compile anyway) I do put UNKNOWN

Re: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Oleksandr Shulgin
On Tue, Feb 26, 2019 at 3:26 PM Durity, Sean R wrote: > This has not been my experience. Changing IP address is one of the worst > admin tasks for Cassandra. System.peers and other information on each nodes > is stored by ip address. And gossip is really good at sending around the > old

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Durity, Sean R
This has not been my experience. Changing IP address is one of the worst admin tasks for Cassandra. System.peers and other information on each nodes is stored by ip address. And gossip is really good at sending around the old information mixed with new… Sean Durity From: Oleksandr Shulgin

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-14 Thread Jeff Jirsa
If increase the value, it will affect only newly created indexes. Will repair > rebuilds old indexes with new , larger, size, or leave them with the same > size? > > Best regards, Ilya > > > > Исходное сообщение > Тема: Re: [EXTERNAL] R

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread ishib...@gmail.com
Hi Jeff,If increase the value, it will affect only newly created indexes. Will repair rebuilds old indexes with new , larger, size, or leave them with the same size?Best regards, Ilya Исходное сообщение Тема: Re: [EXTERNAL] Re: Make large partitons lighter on select without

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Jeff Jirsa
> a row only? > > Best regards, Ilya > > > > Исходное сообщение > Тема: Re: [EXTERNAL] Re: Make large partitons lighter on select without > changing primary partition formation. > От: Jeff Jirsa > Кому: user@cassandr

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread ishib...@gmail.com
Hello!increase column_index_size_in_kb for rarely index creations, am I correct?But will it be used in every read request, or column index for queries within a row only?Best regards, Ilya Исходное сообщение Тема: Re: [EXTERNAL] Re: Make large partitons lighter on select without

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Vsevolod Filaretov
@all, thank you for your answers, Jeff, Thank you very much, will look into it. ср, 13 февр. 2019 г., 18:38 Jeff Jirsa jji...@gmail.com: > Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) > is in 3.11 and does have a few knobs to make this less painful > > You can also

Re: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Jeff Jirsa
Cassandra-11206 (https://issues.apache.org/jira/browse/CASSANDRA-11206) is in 3.11 and does have a few knobs to make this less painful You can also increase the column index size from 64kb to something significantly higher to decrease the cost of those reads on the JVM (shifting cost to the

RE: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Durity, Sean R
Agreed. It’s pretty close to impossible to administrate your way out of a data model that doesn’t play to Cassandra’s strengths. Which is true for other data storage technologies – you need to model the data the way that the engine is designed to work. Sean Durity From: DuyHai Doan Sent:

Re: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Léo FERLIN SUTTON
Thank you for the recommendation. We are already using datastax's recommended settings for tcp_keepalive. Regards, Leo On Thu, Feb 7, 2019 at 5:49 PM Durity, Sean R wrote: > I have seen unreliable streaming (streaming that doesn’t finish) because > of TCP timeouts from firewalls or switches.

RE: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Durity, Sean R
I have seen unreliable streaming (streaming that doesn’t finish) because of TCP timeouts from firewalls or switches. The default tcp_keepalive kernel parameters are usually not tuned for that. See https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html for more

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R
Kenneth is right. Trying to port/support a relational model to a CQL model the way you are doing it is not going to go well. You won’t be able to scale or get the search flexibility that you want. It will make Cassandra seem like a bad fit. You want to play to Cassandra’s strengths –

Re: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Marco Gasparini
Hi Sean, > I will start – knowing that others will have additional help/questions I hope that, I really need help with this :) > What heap size are you using? Sounds like you are using the CMS garbage collector. Yes, I'm using CMS garbage Collector. I have not used G1 because I read it isn't

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Durity, Sean R
I will start – knowing that others will have additional help/questions. What heap size are you using? Sounds like you are using the CMS garbage collector. That takes some arcane knowledge and lots of testing to tune. I would start with G1 and using ½ the available RAM as the heap size. I would

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
: Dor Laor Sent: Wednesday, January 09, 2019 11:23 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: I think you could consider op

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
. Sean Durity From: Goutham reddy Sent: Wednesday, January 09, 2019 11:29 AM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra Thanks Sean. But what if I want to have both Spark and elasticsearch with Cassandra as separare data

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Dor Laor
On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R wrote: > I think you could consider option C: Create a (new) analytics DC in > Cassandra and run your spark nodes there. Then you can address the scaling > just on that DC. You can also use less vnodes, only replicate certain > keyspaces, etc. in

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Jonathan Haddad
> I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When in doubt, benchmark. Good luck, Jon On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos wrote: > Loosing atomic updates is a good point, but in my use case its not a > problem, since I always

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-09 Thread Tomas Bartalos
Loosing atomic updates is a good point, but in my use case its not a problem, since I always overwrite the whole record (no partitial updates). I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results. When I update one row with 10 null columns it will

Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Goutham reddy
Thanks Sean. But what if I want to have both Spark and elasticsearch with Cassandra as separare data center. Does that cause any overhead ? On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R wrote: > I think you could consider option C: Create a (new) analytics DC in > Cassandra and run your spark

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Durity, Sean R
I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You can also use less vnodes, only replicate certain keyspaces, etc. in order to perform the analytics more efficiently. Sean Durity

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
The idea of storing your data as a single blob can be dangerous. Indeed, you loose the ability to perform atomic update on each column. In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same row, 1st update changes column Firstname (let's say it's a Person record) and 2nd update

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
Those are two different cases though. It *sounds like* (again, I may be missing the point) you're trying to overwrite a value with another value. You're either going to serialize a blob and overwrite a single cell, or you're going to overwrite all the cells and include a tombstone. When you do a

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
Hello Jon, I thought having tombstones is much higher overhead than just overwriting values. The compaction overhead can be l similar, but I think the read performance is much worse. Tombstones accumulate and hang for 10 days (by default) before they are eligible for compaction. Also we have

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Jonathan Haddad
If you're overwriting values, it really doesn't matter much if it's a tombstone or any other value, they still need to be compacted and have the same overhead at read time. Tombstones are problematic when you try to use Cassandra as a queue (or something like a queue) and you need to scan over

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread Tomas Bartalos
Hello, I beleive your approach is the same as using spark with " spark.cassandra.output.ignoreNulls=true" This will not cover the situation when a value have to be overwriten with null. I found one possible solution - change the schema to keep only primary key fields and move all other fields to

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
"The problem is I can't know the combination of set/unset values" --> Just for this requirement, Achilles has a working solution for many years using INSERT_NOT_NULL_FIELDS strategy: https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy Or you can use the Update API that by design only

Re: [EXTERNAL] Writes and Reads with high latency

2018-12-28 Thread Marco Gasparini
nes. I try to design my data partitions so that deletes are for a > full partition. Then I won’t be reading through 1000s (or more) tombstones > trying to find the live data. > > > > > > Sean Durity > > > > *From:* Marco Gasparini > *Sent:* Thursday, December 27

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Tomas Bartalos
Hello, The problem is I can't know the combination of set/unset values. From my perspective every value should be set. The event from Kafka represents the complete state of the happening at certain point in time. In my table I want to store the latest event so the most recent state of the

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Eric Stevens
Depending on the use case, creating separate prepared statements for each combination of set / unset values in large INSERT/UPDATE statements may be prohibitive. Instead, you can look into driver level support for UNSET values. Requires Cassandra 2.2 or later IIRC. See: Java Driver:

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R
You say the events are incremental updates. I am interpreting this to mean only some columns are updated. Others should keep their original values. You are correct that inserting null creates a tombstone. Can you only insert the columns that actually have new values? Just skip the columns with

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Durity, Sean R
data. Sean Durity From: Marco Gasparini Sent: Thursday, December 27, 2018 3:01 AM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Writes and Reads with high latency Hello Sean, here my schema and RF: - CREATE

Re: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Marco Gasparini
Hello Sean, here my schema and RF: - CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '1'} AND durable_writes = true; CREATE TABLE my_keyspace.my_table ( pkey text,

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-21 Thread Durity, Sean R
Can you provide the schema and the queries? What is the RF of the keyspace for the data? Are you using any Retry policy on your Cluster object? Sean Durity From: Marco Gasparini Sent: Friday, December 21, 2018 10:45 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Writes and Reads with

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-05 Thread Durity, Sean R
is completed. So, I push to get upgradesstables completed as soon as possible. Sean Durity From: Shravan R Sent: Tuesday, December 04, 2018 3:39 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9 Thanks Sean. I have automation in place that can put

Re: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Shravan R
Thanks Sean. I have automation in place that can put the new binary and restart the node to a newer version as quickly as possible. upgradesstables is I/O intensive and it takes time and is proportional to the data on the node. Given these constraints, is there a risk due to prolonged

RE: [EXTERNAL] Cassandra Upgrade Plan 2.2.4 to 3.11.3

2018-12-04 Thread Durity, Sean R
See my recent post for some additional points. But I wanted to encourage you to look at the in-place upgrade on your existing hardware. No need to add a DC to try and upgrade. The cluster will handle reads and writes with nodes of different versions – no problems. I have done this many times on

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Durity, Sean R
We have had great success with Cassandra upgrades with applications staying on-line. It is one of the strongest benefits of Cassandra. A couple things I incorporate into upgrades: - The main task is getting the new binaries loaded, then restarting the node – in a rolling fashion. Get

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-23 Thread Daniel Seybold
Hi Alexander, thanks a lot for the pointers, I checked the mentioned issue. While the reported issue seems to match our problem it only occurs reads and not for writes (according to the Datastax Jira). But we experience downtimes for writes and reads. Which version of the Datastax Driver

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Alexander Dejanovski
Hi Daniel, it seems like the driver isn't detecting that the node went down, which is probably due to the way the node is being killed. If I remember correctly, in some cases Netty transport is still up in the client, which will still allows to send queries without them answering back :

Re: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-16 Thread Daniel Seybold
Hi Sean, thanks for your comments, find below some more details with respect to the (1) VM sizing and (2) the replication factor: (1) VM sizing: We selected the small VMs as intial setup to run our experiments. We have also executed the same experiments (5 nodes) on larger VMs with 6 cores

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Jonathan Haddad
Just because Cassandra doesn't do it doesn't mean you aren't able to encrypt your data at rest, and you definitely don't need DSE to do it. I recommend checking out the LUKS project. https://gitlab.com/cryptsetup/cryptsetup/blob/master/README.md This, IMO, is a better option than having the

Re: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Ben Slater
I wrote a blog post a while ago on the pros and cons of encrypting in your application for use with Cassandra that you might find useful background on this subject: https://www.instaclustr.com/securing-apache-cassandra-with-application-level-encryption/ Cheers Ben On Wed, 14 Nov 2018 at 13:47

RE: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Durity, Sean R
I think you are asking about *encryption* at rest. To my knowledge, open source Cassandra does not support this natively. There are options, like encrypting the data in the application before it gets to Cassandra. Some companies offer other solutions. IMO, if you need the increased security, it

RE: [EXTERNAL] Re: Multiple cluster for a single application

2018-11-08 Thread Durity, Sean R
We have a cluster over 100 nodes that performs just fine for its use case. In our case, we needed the disk space and did not want the admin headache of very dense nodes. It does take more automation and process to handle a larger cluster, but those are all good things to solve anyway. But

RE: [EXTERNAL] Re: rolling version upgrade, upgradesstables, and vulnerability window

2018-10-30 Thread Durity, Sean R
Just to pile on: I agree. On our upgrades, I always aim to get the binary part done on all nodes before worrying about upgradesstables. Upgrade is one node at a time (precautionary). Upgradesstables depends on cluster size, data size, compactionthroughput, etc. I usually start with running

RE: [EXTERNAL] Re: [E] Re: nodetool status and node maintenance

2018-10-29 Thread Durity, Sean R
I have wrapped nodetool info into my own script that strips out and interprets the information I care about. That script also sets a return code based on the health of that node (which protocols are up, etc.). Then I can monitor the individual health of the node – as that node sees itself. I

RE: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS)

2018-10-23 Thread Durity, Sean R
Agreed. I have run clusters with both RHEL5 and RHEL6 nodes. Sean Durity From: Jeff Jirsa Sent: Sunday, October 14, 2018 12:40 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS) Should be fine, just get the java and

Re: [EXTERNAL] Re: Tracing in cassandra

2018-10-12 Thread Nitan Kainth
Try query with partition key selection in where clause. But time for limit 11 shouldn’t fail. Are all nodes up? Do you see any corruption in ay sstable? Sent from my iPhone > On Oct 12, 2018, at 11:40 AM, Abdul Patel wrote: > > Sean, > > here it is : > CREATE TABLE Keyspave.tblname ( >

Re: [EXTERNAL] Re: Tracing in cassandra

2018-10-12 Thread Abdul Patel
Sean, here it is : CREATE TABLE Keyspave.tblname ( user_id bigint, session_id text, application_guid text, last_access_time timestamp, login_time timestamp, status int, terminated_by text, update_time timestamp, PRIMARY KEY (user_id, session_id) ) WITH

Re: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Peter Corless
Hey folks! Sean: I did a blog on DIstributed Data Summit . On top of the Scylla-oriented content, I covered Nate's keynote and highlighted the sidecar talk by Netflix (incl. YouTube video for anyone who wanted to watch it

RE: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Durity, Sean R
Thank you. I do want to hear about future conferences. I would also love to hear reports/summaries/highlights from folks who went to Distributed Data Summit (or other conferences). I think user conferences are great! Sean Durity From: Max C. Sent: Friday, October 05, 2018 8:33 PM To:

<    1   2   3   4   >