RE: [EXTERNAL] Cassandra vs MySQL

2018-03-20 Thread Durity, Sean R
I’m not sure there is a fair comparison. MySQL and Cassandra have different ways of solving related (but not necessarily the same) problems of storing and retrieving data. The data model between MySQL and Cassandra is likely to be very different. The key for Cassandra is that you need to model

RE: [EXTERNAL] Re: Cassandra downgrade version

2018-04-19 Thread Durity, Sean R
This answer surprises me, because I would expect NOT to be able to downgrade if there are any changes in the sstable structure. I assume: - Upgrade is done while the application is up and writing data (so any new data is written in the new format) - Any compactions that happen

RE: [EXTERNAL] Re: How to configure Cassandra to NOT use SSLv2?

2018-04-24 Thread Durity, Sean R
I think I would start with the JVM. Sometimes, for export purposes, the cryptography extensions (JCE), are in a separate jar or package from the standard JRE or JVM. I haven’t used the IBM JDK, so I don’t know specifically about that one. Also, perhaps the error is correct – SSLv2Hello is not a

RE: [EXTERNAL] Re: Cassandra reaper

2018-04-26 Thread Durity, Sean R
Wait, isn’t this the Apache Cassandra mailing list? Shouldn’t this be on the pickle users list or something? (Just kidding, everyone. I think there should be room for reaper and DataStax inquiries here.) Sean Durity From: Joaquin Casares [mailto:joaq...@thelastpickle.com] Sent: Tuesday, April

RE: [EXTERNAL] Cassandra limitations

2018-05-04 Thread Durity, Sean R
The issue is more with the number of tables, not the number of keyspaces. Because each table has a memTable, there is a practical limit to the number of memtables that a node can hold in its memory. (And scaling out doesn’t help, because every node still has a memTable for every table.) The prac

RE: [EXTERNAL] Re: Error after 3.1.0 to 3.11.2 upgrade

2018-05-14 Thread Durity, Sean R
A couple additional things: - Make sure that you ran repair on the system_auth keyspace on all nodes after changing the RF - If you are not often changing roles/permissions, you might look to increase permissions_validity_in_ms and roles_validity_in_ms so they are not being

RE: [EXTERNAL] IN clause of prepared statement

2018-05-21 Thread Durity, Sean R
One of the columns you are selecting is a list or map or other kind of collection. You can’t do that with an IN clause against a clustering column. Either don’t select the collection column OR don’t use the IN clause. Cassandra is trying to protect itself (and you) from a query that won’t scale

RE: [EXTERNAL] Re: apache-cassandra 2.2.8 rpm

2018-06-11 Thread Durity, Sean R
>Finally can I run mixed Datastax and Apache nodes in the same cluster same >version? >Thank you for all your help. I have run DSE and Apache Cassandra in the same cluster while migrating to DSE. The versions of Cassandra were the same. It was relatively brief -- just during the upgrade proce

RE: [EXTERNAL] Cluster is unbalanced

2018-06-18 Thread Durity, Sean R
Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)? Sean Durity lord of the (C*) rings (Staff Systems Enginee

RE: [EXTERNAL] Re: Tombstone

2018-06-19 Thread Durity, Sean R
This sounds like a queue pattern, which is typically an anti-pattern for Cassandra. I would say that it is very difficult to get the access patterns, tombstones, and everything else lined up properly to solve a queue problem. Sean Durity From: Abhishek Singh Sent: Tuesday, June 19, 2018 10:41

RE: RE: [EXTERNAL] Cluster is unbalanced

2018-06-19 Thread Durity, Sean R
ons (estimate): 15839280 On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would lik

RE: [EXTERNAL] Re: consultant recommendations

2018-06-29 Thread Durity, Sean R
I haven’t ever hired a Cassandra consultant, but the company named The Last Pickle (yes, an odd name) has some outstanding Cassandra experts. Not sure how they work, but worth a mention here. Nothing against Instacluster. There are great folks there, too. Sean Durity From: Evelyn Smith Sent:

RE: [EXTERNAL] Re: JVM Heap erratic

2018-07-03 Thread Durity, Sean R
THIS! A well-reasoned and clear explanation of a very difficult topic. This is the kind of gold that a user mailing list can provide. Thank you, Alain! Sean Durity From: Alain RODRIGUEZ Sent: Tuesday, July 03, 2018 6:37 AM To: user cassandra.apache.org Subject: [EXTERNAL] Re: JVM Heap erratic

RE: [EXTERNAL] New cluster vs Increasing nodes to already existed cluster

2018-07-16 Thread Durity, Sean R
In most cases, we separate clusters by application. This does help with isolating problems. A bad query in one application won’t affect other applications. Also, you can then scale each cluster as required by the data demands. You can also upgrade separately, which may be a huge help. You only

RE: [EXTERNAL] Re: Cassandra recommended server uptime?

2018-07-17 Thread Durity, Sean R
We do not have any scheduled, periodic node restarts. I have been working on Cassandra across many versions, and I have not seen a case where periodic restarts would solve any problem that I saw. There are certainly times when a node needs a restart – but those are because of specific reasons.

RE: [EXTERNAL] optimization to cassandra-env.sh

2018-07-26 Thread Durity, Sean R
Check the archives for CMS or G1 (whichever garbage collector you are using). There has been significant and good advice on both. In general, though, G1 has one basic number to set and does very well in our use cases. CMS has lots of black art/science tuning and configuration, but you can test o

RE: [EXTERNAL] optimization to cassandra-env.sh

2018-07-26 Thread Durity, Sean R
This is a very good explanation of CMS tuning for Cassandra: http://thelastpickle.com/blog/2018/04/11/gc-tuning.html (author Jon Haddad has extensive Cassandra experience – a super star in our field) Sean Durity From: Durity, Sean R Sent: Thursday, July 26, 2018 2:08 PM To: user

RE: [EXTERNAL] Server kernal Parameters for cassandra

2018-07-30 Thread Durity, Sean R
Here are some to review and test for Cassandra 3.x from DataStax: https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html Al Tobey has done extensive work in this area, too. This is dated (Cassandra 2.1), but is worth mining for information: https:

RE: [EXTERNAL] full text search on some text columns

2018-07-31 Thread Durity, Sean R
That sounds like a problem tailor-made for the DataStax Search (embedded SOLR) solution. I think that would be the fastest path to success. Sean Durity From: onmstester onmstester Sent: Tuesday, July 31, 2018 10:46 AM To: user Subject: [EXTERNAL] full text search on some text columns I need

RE: [EXTERNAL] Re: Cassandra rate dropping over long term test

2018-08-03 Thread Durity, Sean R
I wonder if you are building up tombstones with the deletes. Can you share your data model? Are the deleted rows using the same partition key as new rows? Any warnings in your system.log for reading through too many tombstones? Sean Durity From: Mihai Stanescu Sent: Friday, August 03, 2018 12

RE: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassandra

2018-08-09 Thread Durity, Sean R
DataStax Enterprise 6.0 has a new bulk loader tool. DSE is a commercial product, but maybe your needs are worth the investigation. Sean Durity From: Rahul Singh Sent: Tuesday, August 07, 2018 9:37 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassa

RE: [EXTERNAL] Re: Data Corruption due to multiple Cassandra 2.1 processes?

2018-08-13 Thread Durity, Sean R
I have definitely seen corruption, especially in system tables, when there are multiple instances of Cassandra running/trying to start. We had an internal tool that was supposed to restart processes (like Cassandra) if they were down, but it often re-checked before Cassandra was fully up and sta

RE: [EXTERNAL] Re: Improve data load performance

2018-08-15 Thread Durity, Sean R
Might also help to know: Size of cluster How much data is being loaded (# of inserts/actual data size) Single table or multiple tables? Is this a one-time or occasional load or more frequently? Is the data located in the same physical data center as the cluster? (any network latency?) On the clie

RE: [EXTERNAL] Re: Nodetool refresh v/s sstableloader

2018-08-29 Thread Durity, Sean R
Sstableloader, though, could require a lot more disk space – until compaction can reduce. For example, if your RF=3, you will essentially be loading 3 copies of the data. Then it will get replicated 3 more times as it is being loaded. Thus, you could need up to 9x disk space. Sean Durity From:

RE: [EXTERNAL] Re: Re: bigger data density with Cassandra 4.0?

2018-08-29 Thread Durity, Sean R
If you are going to compare vs commercial offerings like Scylla and CosmosDB, you should be looking at DataStax Enterprise. They are moving more quickly than open source (IMO) on adding features and tools that enterprises really need. I think they have some emerging tech for large/dense nodes, i

RE: [EXTERNAL] Re: adding multiple node to a cluster, cleanup and num_tokens

2018-09-04 Thread Durity, Sean R
I would only run the clean-up (on all nodes) after all new nodes are added. I would also look at increasing RF to 3 (and running repair) once there are plenty of nodes. (This is assuming that availability matters and that your queries use QUORUM or LOCAL_QUORUM for consistency level. Longer ter

RE: [EXTERNAL] Regarding migrating data from Oracle to Cassandra.migrate data from Oracle to Cassandra.

2018-09-05 Thread Durity, Sean R
3 starting points: - DO NOT migrate your tables as they are in Oracle to Cassandra. In most cases, you need a different model for Cassandra - DO take the (free) DataStax Academy courses to learn much more about Cassandra as you dive in. It is a systematic and bite-size approac

RE: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread Durity, Sean R
An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a longer (or no) TTL Then your hot data is always in the same table, but being expired. And you can access the archive table only for the more rare circumstances. Then you could have the HOT table on a differe

RE: [EXTERNAL] Re: cold vs hot data

2018-09-18 Thread Durity, Sean R
The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: An idea: On initial insert, insert into 2 tables: Hot with short TTL Cold/archive with a longer

RE: [EXTERNAL] Re: Adding datacenter and data verification

2018-09-18 Thread Durity, Sean R
You are correct that altering the keyspace replication settings does not actually move any data. It only affects new writes or reads. System_auth is one that needs to be repaired quickly OR, if your number of users/permissions is relatively small, you can just reinsert them after the alter to th

RE: [EXTERNAL] Re: Rolling back Cassandra upgrades (tarball)

2018-10-01 Thread Durity, Sean R
Version choices aside, I am an advocate for forward-only (in most cases). Here is my reasoning, so that you can evaluate for your situation: - upgrades are done while the application is up and live and writing data (no app downtime) - the upgrade usually includes a change to the sstable version (

RE: [EXTERNAL] Upcoming Cassandra-related Conferences

2018-10-08 Thread Durity, Sean R
Thank you. I do want to hear about future conferences. I would also love to hear reports/summaries/highlights from folks who went to Distributed Data Summit (or other conferences). I think user conferences are great! Sean Durity From: Max C. Sent: Friday, October 05, 2018 8:33 PM To: user@cas

RE: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS)

2018-10-23 Thread Durity, Sean R
Agreed. I have run clusters with both RHEL5 and RHEL6 nodes. Sean Durity From: Jeff Jirsa Sent: Sunday, October 14, 2018 12:40 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs (Ubuntu+CentOS) Should be fine, just get the java and k

RE: [EXTERNAL] Re: [E] Re: nodetool status and node maintenance

2018-10-29 Thread Durity, Sean R
I have wrapped nodetool info into my own script that strips out and interprets the information I care about. That script also sets a return code based on the health of that node (which protocols are up, etc.). Then I can monitor the individual health of the node – as that node sees itself. I hav

RE: [EXTERNAL] Re: rolling version upgrade, upgradesstables, and vulnerability window

2018-10-30 Thread Durity, Sean R
Just to pile on: I agree. On our upgrades, I always aim to get the binary part done on all nodes before worrying about upgradesstables. Upgrade is one node at a time (precautionary). Upgradesstables depends on cluster size, data size, compactionthroughput, etc. I usually start with running upgr

RE: Cassandra 2.1 bootstrap - No streaming progress from one node

2018-11-07 Thread Durity, Sean R
I would wipe the new node and bootstrap again. I do not know of any way to resume the streaming that was previously in progress. Sean Durity From: Steinmaurer, Thomas Sent: Wednesday, November 07, 2018 5:13 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Cassandra 2.1 bootstrap - No stream

RE: [EXTERNAL] Re: Multiple cluster for a single application

2018-11-08 Thread Durity, Sean R
We have a cluster over 100 nodes that performs just fine for its use case. In our case, we needed the disk space and did not want the admin headache of very dense nodes. It does take more automation and process to handle a larger cluster, but those are all good things to solve anyway. But count

RE: [EXTERNAL] Availability issues for write/update/read workloads (up to 100s downtime) in case of a Cassandra node failure

2018-11-09 Thread Durity, Sean R
The VMs’ memory (4 GB) seems pretty small for Cassandra. What heap size are you using? Which garbage collector? Are you seeing long GC times on the nodes? The basic rule of thumb is to give the Cassandra heap 50% of the RAM on the host. 2 GB isn’t very much. Also, I wouldn’t set the replication

RE: [EXTERNAL] Is Apache Cassandra supports Data at rest

2018-11-14 Thread Durity, Sean R
I think you are asking about *encryption* at rest. To my knowledge, open source Cassandra does not support this natively. There are options, like encrypting the data in the application before it gets to Cassandra. Some companies offer other solutions. IMO, if you need the increased security, it

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-04 Thread Durity, Sean R
We have had great success with Cassandra upgrades with applications staying on-line. It is one of the strongest benefits of Cassandra. A couple things I incorporate into upgrades: - The main task is getting the new binaries loaded, then restarting the node – in a rolling fashion. Get t

RE: [EXTERNAL] Cassandra Upgrade Plan 2.2.4 to 3.11.3

2018-12-04 Thread Durity, Sean R
See my recent post for some additional points. But I wanted to encourage you to look at the in-place upgrade on your existing hardware. No need to add a DC to try and upgrade. The cluster will handle reads and writes with nodes of different versions – no problems. I have done this many times on

RE: [EXTERNAL] Re: upgrade Apache Cassandra 2.1.9 to 3.0.9

2018-12-05 Thread Durity, Sean R
the new binary and restart the node to a newer version as quickly as possible. upgradesstables is I/O intensive and it takes time and is proportional to the data on the node. Given these constraints, is there a risk due to prolonged upgradesstables? On Tue, Dec 4, 2018 at 12:20 PM Durity, Sean R

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-21 Thread Durity, Sean R
Can you provide the schema and the queries? What is the RF of the keyspace for the data? Are you using any Retry policy on your Cluster object? Sean Durity From: Marco Gasparini Sent: Friday, December 21, 2018 10:45 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Writes and Reads with hig

RE: [EXTERNAL] Writes and Reads with high latency

2018-12-27 Thread Durity, Sean R
datetime, pkey, agent, some_id, ft, ftt) values (?,?,?,?,?,?); About Retry policy, the answer is yes, actually when a write fails I store it somewhere else and, after a period, a try to write it to Cassandra again. This way I can store almost all my data, but when the problem is the read I don'

RE: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2018-12-27 Thread Durity, Sean R
You say the events are incremental updates. I am interpreting this to mean only some columns are updated. Others should keep their original values. You are correct that inserting null creates a tombstone. Can you only insert the columns that actually have new values? Just skip the columns with

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-09 Thread Durity, Sean R
I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You can also use less vnodes, only replicate certain keyspaces, etc. in order to perform the analytics more efficiently. Sean Durity

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
center. Does that cause any overhead ? On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: I think you could consider option C: Create a (new) analytics DC in Cassandra and run your spark nodes there. Then you can address the scaling just on that DC. You ca

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
: Dor Laor Sent: Wednesday, January 09, 2019 11:23 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: I think you could consider op

RE: [EXTERNAL] fine tuning for wide rows and mixed worload system

2019-01-11 Thread Durity, Sean R
I will start – knowing that others will have additional help/questions. What heap size are you using? Sounds like you are using the CMS garbage collector. That takes some arcane knowledge and lots of testing to tune. I would start with G1 and using ½ the available RAM as the heap size. I would

RE: [EXTERNAL] RE: SASI queries- cqlsh vs java driver

2019-02-07 Thread Durity, Sean R
Kenneth is right. Trying to port/support a relational model to a CQL model the way you are doing it is not going to go well. You won’t be able to scale or get the search flexibility that you want. It will make Cassandra seem like a bad fit. You want to play to Cassandra’s strengths – availabilit

RE: [EXTERNAL] Re: Bootstrap keeps failing

2019-02-07 Thread Durity, Sean R
I have seen unreliable streaming (streaming that doesn’t finish) because of TCP timeouts from firewalls or switches. The default tcp_keepalive kernel parameters are usually not tuned for that. See https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html for more det

RE: [EXTERNAL] Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread Durity, Sean R
Agreed. It’s pretty close to impossible to administrate your way out of a data model that doesn’t play to Cassandra’s strengths. Which is true for other data storage technologies – you need to model the data the way that the engine is designed to work. Sean Durity From: DuyHai Doan Sent: Wed

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-26 Thread Durity, Sean R
This has not been my experience. Changing IP address is one of the worst admin tasks for Cassandra. System.peers and other information on each nodes is stored by ip address. And gossip is really good at sending around the old information mixed with new… Sean Durity From: Oleksandr Shulgin S

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
We use the PropertyFileSnitch precisely because it is the same on every node. If each node has to have a different file (for GPFS) – deployment is more complicated. (And for any automated configuration you would have a list of hosts and DC/rack information to compile anyway) I do put UNKNOWN as

RE: [EXTERNAL] Re: Question on changing node IP address

2019-02-27 Thread Durity, Sean R
t few months in different places and both times recovery was difficult and hazardous. I still strongly recommend against it. On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: We use the PropertyFileSnitch precisely because it is the same on every nod

RE: [EXTERNAL] Re: A Question About Hints

2019-03-05 Thread Durity, Sean R
Versions 2.0 and 2.1 were generally very stable, so I can understand a reticence to move when there are so many other things competing for time and attention. Sean Durity From: shalom sagges Sent: Monday, March 04, 2019 4:21 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: A Questio

RE: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-12 Thread Durity, Sean R
If there are 2 access patterns, I would consider having 2 tables. The first one with the ID, which you say is the majority use case. Then have a second table that uses a time-bucket approach as others have suggested: (time bucket, id) as primary key Choose a time bucket (day, week, hour, month,

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-13 Thread Durity, Sean R
egards On Wed, 13 Mar 2019 at 06:57, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: If there are 2 access patterns, I would consider having 2 tables. The first one with the ID, which you say is the majority use case. Then have a second table that uses a time-bucket approach as

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
If you can change to 8 vnodes, it will be much better for repairs and other kinds of streaming operations. The old advice of 256 per node is now not very helpful. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 1:27 PM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cluster size

RE: [EXTERNAL] Re: Cluster size "limit"

2019-03-13 Thread Durity, Sean R
Rebuild the DCs with a new number of vnodes… I have done it. Sean From: Ahmed Eljami Sent: Wednesday, March 13, 2019 2:09 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: Cluster size "limit" Is not possible with an existing cluster! Le mer. 13 mars 2019 à 18:39, Duri

RE: [EXTERNAL] Re: Migrate large volume of data from one table to another table within the same cluster when COPY is not an option.

2019-03-14 Thread Durity, Sean R
rows from current table to the 2nd table with ttl set on each record as the first table? From: Durity, Sean R mailto:sean_r_dur...@homedepot.com>> Sent: Wednesday, March 13, 2019 8:17 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>

RE: [EXTERNAL] Re: Default TTL on CF

2019-03-14 Thread Durity, Sean R
I spent a month of my life on similar problem... There wasn't an easy answer, but this is what I did #1 - Stop the problem from growing further. Get new inserts using a TTL (or set the default on the table so they get it). App team had to do this one. #2 - Delete any data that should already be

RE: [EXTERNAL] Re: Garbage Collector

2019-03-19 Thread Durity, Sean R
My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB for the JVM). That has worked in just about every case I’m familiar with. In the old days we used CMS, but tuning that beast is a black art with few wizards available (though several on this mailing list). Today, I ju

RE: [EXTERNAL] Issue while updating a record in 3 node cassandra cluster deployed using kubernetes

2019-04-09 Thread Durity, Sean R
My first suspicion would be to look at the server times in the cluster. It looks like other cases where a write occurs (with no errors) but the data is not retrieved as expected. If the write occurs with an earlier timestamp than the existing data, this is the behavior you would see. The write w

RE: [EXTERNAL] Re: Getting Consistency level TWO when it is requested LOCAL_ONE

2019-04-11 Thread Durity, Sean R
https://issues.apache.org/jira/browse/CASSANDRA-9620 has something similar that was determined to be a driver error. I would start with looking at the driver version and also the RetryPolicy that is in effect for the Cluster. Secondly, I would look at whether a batch is really needed for the sta

RE: [EXTERNAL] Re: Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException:

2019-04-17 Thread Durity, Sean R
If you are just trying to get a sense of the data, you could try adding a limit clause to limit the amount of results and hopefully beat the timeout. However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND CLUSTER." It means the data model does not support the query and wil

RE: [EXTERNAL] multiple Cassandra instances per server, possible?

2019-04-18 Thread Durity, Sean R
What is the data problem that you are trying to solve with Cassandra? Is it high availability? Low latency queries? Large data volumes? High concurrent users? I would design the solution to fit the problem(s) you are solving. For example, if high availability is the goal, I would be very cautiou

RE: [EXTERNAL] Re: Using Cassandra as an object store

2019-04-19 Thread Durity, Sean R
Object stores are some of our largest and oldest use cases. Cassandra has been a good choice for us. We do chunk the objects into 64k chunks (I think), so that partitions are not too large and it scales predictably. For us, the choice was more about high availability and scalability, which Cassa

RE: [EXTERNAL] Two separate rows for the same partition !!

2019-05-15 Thread Durity, Sean R
Uniqueness is determined by the partition key PLUS the clustering columns. Hard to tell from your data below, but is it possible that one of the clustering columns (perhaps g) has different values? That would easily explain the 2 rows returned – because they ARE different rows in the same partit

RE: [EXTERNAL] Re: Python driver concistency problem

2019-05-28 Thread Durity, Sean R
This is a stretch, but are you using authentication and/or authorization? In my understanding the queries executed for you to do the authentication and/or authorization are usually done at LOCAL_ONE (or QUORUM for cassandra user), but maybe there is something that is changed in the security setu

RE: [EXTERNAL] Re: Select in allow filtering stalls whole cluster. How to prevent such behavior?

2019-05-28 Thread Durity, Sean R
This may sound a bit harsh, but I teach my developers that if they are trying to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for its high availability and scalability characteristics. We love no downtime. ALLOW FILTERING is breaking the rules of availability and scal

RE: Recover lost node from backup or evict/re-add?

2019-06-12 Thread Durity, Sean R
I’m not sure it is correct to say, “you cannot.” However, that is a more complicated restore and more likely to lead to inconsistent data and take longer to do. You are basically trying to start from a backup point and roll everything forward and catch up to current. Replacing/re-streaming is t

RE: [EXTERNAL] Re: Cassandra migration from 1.25 to 3.x

2019-06-17 Thread Durity, Sean R
The advice so far is exactly correct for an in-place kind of upgrade. The blog post you mentioned is different. They decided to jump versions in Cassandra by standing up a new cluster and using a dual-write/dual-read process for their app. They also wrote code to read and interpret sstables in o

RE: [EXTERNAL] Re: Bursts of Thrift threads make cluster unresponsive

2019-06-28 Thread Durity, Sean R
This sounds like a bad query or large partition. If a large partition is requested on multiple nodes (because of consistency level), it will pressure all those replica nodes. Then, as the cluster tries to adjust the rest of the load, the other nodes can get overwhelmed, too. Look at cfstats to

RE: [EXTERNAL] Apache Cassandra upgrade path

2019-07-26 Thread Durity, Sean R
What you have seen is totally expected. You can’t stream between different major versions of Cassandra. Get the upgrade done, then worry about any down hardware. If you are using DCs, upgrade one DC at a time, so that there is an available environment in case of any disasters. My advice, though

RE: [EXTERNAL] Apache Cassandra upgrade path

2019-07-26 Thread Durity, Sean R
l the upgrade is done and then change it back once the upgrade is completed? On Fri, Jul 26, 2019 at 11:42 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: What you have seen is totally expected. You can’t stream between different major versions of Cassandra. Get the upgrade do

RE: [EXTERNAL] Re: loading big amount of data to Cassandra

2019-08-05 Thread Durity, Sean R
DataStax has a very fast bulk load tool - dsebulk. Not sure if it is available for open source or not. In my experience so far, I am very impressed with it. Sean Durity – Staff Systems Engineer, Cassandra -Original Message- From: p...@xvalheru.org Sent: Saturday, August 3, 2019 6:06 A

RE: [EXTERNAL] Cassandra Export error in COPY command

2019-09-22 Thread Durity, Sean R
Copy command tries to export all rows in the table, not just the ones on the node. It will eventually timeout if the table is large. It is really built for something under 5 million rows or so. Dsbulk (from DataStax) is great for this, if you are a customer. Otherwise, you will probably need to

RE: [EXTERNAL] Re: GC Tuning https://thelastpickle.com/blog/2018/04/11/gc-tuning.html

2019-10-21 Thread Durity, Sean R
I don’t disagree with Jon, who has all kinds of performance tuning experience. But for ease of operation, we only use G1GC (on Java 8), because the tuning of ParNew+CMS requires a high degree of knowledge and very repeatable testing harnesses. It isn’t worth our time. As a previous writer mentio

RE: merge two cluster

2019-10-23 Thread Durity, Sean R
Beneficial to whom? The apps, the admins, the developers? I suggest that app teams have separate clusters per application. This prevents the noisy neighbor problem, isolates any security issues, and helps when it is time for maintenance, upgrade, performance testing, etc. to not have to coordin

RE: Cassandra Rack - Datacenter Load Balancing relations

2019-10-25 Thread Durity, Sean R
+1 for removing complexity to be able to create (and maintain!) “reasoned” systems! Sean Durity – Staff Systems Engineer, Cassandra From: Reid Pinchback Sent: Thursday, October 24, 2019 10:28 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: Cassandra Rack - Datacenter Load Balancing re

RE: [EXTERNAL] n00b q re UPDATE v. INSERT in CQL

2019-10-25 Thread Durity, Sean R
Everything in Cassandra is an insert. So, an update and an insert are functionally equivalent. An update doesn't go update the existing data on disk; it is a new write of the columns involved. So, the difference in your scenario is that with the "targeted" update, you are writing less of the col

RE: [EXTERNAL] Re: Cassandra 3.11.4 Node the load starts to increase after few minutes to 40 on 4 CPU machine

2019-10-31 Thread Durity, Sean R
There is definitely a resource risk to having thousands of open connections to each node. Some of the drivers have (had?) less than optimal default settings, like acquiring 50 connections per Cassandra node. This is usually overkill. I think 5-10/node is much more reasonable. It depends on your

RE: results differ on two queries, based on secondary index key and partition key

2017-03-29 Thread Durity, Sean R
This looks more like a problem for a graph-based model. Have you looked at DSE Graph as a possibility? Sean Durity From: ferit baver elhuseyni [mailto:feritba...@gmail.com] Sent: Tuesday, March 14, 2017 11:40 AM To: user@cassandra.apache.org Subject: results differ on two queries, based on secon

RE: Random slow read times in Cassandra

2017-03-29 Thread Durity, Sean R
Does a later query on the same ID also behave poorly? (Or perhaps it gets cached and is fast the next time…) My first thought was that perhaps the slow records had many updates that were being read (or tombstones being read over) to assemble the record. A trace on the query with that key might r

RE: Issue with Cassandra consistency in results

2017-03-29 Thread Durity, Sean R
There have been many instances of supposed inconsistency noted on this list if nodes do not have the same system time. Make sure you have a matching clock on all nodes (ntp or similar). Sean Durity From: Shubham Jaju [mailto:shub...@vassarlabs.com] Sent: Tuesday, March 21, 2017 9:58 PM To: use

RE: Can we get username and timestamp in cqlsh_history?

2017-04-03 Thread Durity, Sean R
Sounds like you want full auditing of CQL in the cluster. I have not seen anything built into the open source version for that (but I could be missing something). DataStax Enterprise does have an auditing feature. Sean Durity From: anuja jain [mailto:anujaja...@gmail.com] Sent: Wednesday, Marc

RE: cassandra OOM

2017-04-25 Thread Durity, Sean R
We have seen much better stability (and MUCH less GC pauses) from G1 with a variety of heap sizes. I don’t even consider CMS any more. Sean Durity From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com] Sent: Tuesday, April 04, 2017 5:34 PM To: user@cassandra.apache.org Subject: Re: cassandra OOM

RE: Starting Cassandrs after restore of Data - get error

2017-07-07 Thread Durity, Sean R
I have seen Windows format cause problems. Run dos2unix on the cassandra.yaml file (on the linux box) and see if it helps. Sean Durity lord of the (C*) rings (Staff Systems Engineer - Cassandra) MTC 2250 #cassandra - for the latest news and updates From: Jonathan Baynes [mailto:jonathan.bay...@

RE: READ Queries timing out.

2017-07-07 Thread Durity, Sean R
1 GB heap is very small. Why not try increasing it to 50% of RAM and see if it helps you track down the real issue. It is hard to tune around a bad data model, if that is indeed the issue. Seeing your tables and queries would help. Sean Durity From: Pranay akula [mailto:pranay.akula2...@gmail.

RE: Node failure Due To Very high GC pause time

2017-07-13 Thread Durity, Sean R
I like Bryan’s terminology of an “antagonistic use case.” If I am reading this correctly, you are putting 5 (or 10) million records in a partition and then trying to delete them in the same order they are stored. This is not a good data model for Cassandra, in fact a dangerous data model. That p

RE: nodetool removenode causing the schema out of sync

2017-07-13 Thread Durity, Sean R
Late to this party, but Jeff is talking about nodetool setstreamthroughput. The default in most versions is 200 Mb/s (set in yaml file as stream_throughput_outbound_megabits_per_sec). This is outbound throttle only. So, if streams from multiple nodes are going to one, it can get inundated. The

RE: Adding a new node with the double of disk space

2017-08-18 Thread Durity, Sean R
I am doing some on-the-job-learning on this newer feature of the 3.x line, where the token generation algorithm will compensate for different size nodes in a cluster. In fact, it is one of the main reasons I upgraded to 3.0.13, because I have a number of original nodes in a cluster that are abou

RE: Getting all unique keys

2017-08-23 Thread Durity, Sean R
DataStax Enterprise bundles spark and spark connector on the DSE nodes and handles much of the plumbing work (and monitoring, etc.). Worth a look. Sean Durity From: Avi Levi [mailto:a...@indeni.com] Sent: Tuesday, August 22, 2017 2:46 AM To: user@cassandra.apache.org Subject: Re: Getting all un

RE: AWS Cassandra backup/Restore tools

2017-09-12 Thread Durity, Sean R
Datos IO has a backup/restore product for Cassandra that another team here has used successfully. It solves many of the problems inherent with sstable captures. Without something like it, restores are a nightmare with any volume of data. The downtime required and the loss of data since the snaps

RE: Reg:- DSE 5.1.0 Issue

2017-09-12 Thread Durity, Sean R
In an attempt to help close the loop for future readers… I don’t think an upgrade from DSE 4.8 straight to 5.1 is supported. I think you have to go through 5.0.x first. And, yes, you should contact DataStax support for help, but I’m ok with DSE-related questions. They may be more Cassandra-rela

RE: Can I have multiple datacenter with different versions of Cassandra

2017-09-12 Thread Durity, Sean R
No – the general answer is that you cannot stream between major versions of Cassandra. I would upgrade the existing ring, then add the new DC. Sean Durity From: Chuck Reynolds [mailto:creyno...@ancestry.com] Sent: Thursday, May 18, 2017 11:20 AM To: user@cassandra.apache.org Subject: Can I have

Massive deletes -> major compaction?

2017-09-21 Thread Durity, Sean R
Cassandra version 2.0.17 (yes, it's old - waiting for new hardware/new OS to upgrade) In a long-running system with billions of rows, TTL was not set. So a one-time purge is being planned to reduce disk usage. Records older than a certain date will be deleted. The table uses size-tiered compact

RE: Massive deletes -> major compaction?

2017-09-21 Thread Durity, Sean R
ved. -- Jeff Jirsa On Sep 21, 2017, at 11:27 AM, Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: Cassandra version 2.0.17 (yes, it’s old – waiting for new hardware/new OS to upgrade) In a long-running system with billions of rows, TTL was not set. So a one-time purge is being p

RE: Massive deletes -> major compaction?

2017-09-22 Thread Durity, Sean R
n each sstable in reverse generational order (oldest first) and as long as the data is minimally overlapping it’ll purge tombstones that way as well - takes longer but much less disk involved. -- Jeff Jirsa On Sep 21, 2017, at 11:27 AM, Durity, Sean R mailto:sean_r_dur...@homedepot.com>&g

  1   2   3   >