I’m not sure there is a fair comparison. MySQL and Cassandra have different
ways of solving related (but not necessarily the same) problems of storing and
retrieving data.
The data model between MySQL and Cassandra is likely to be very different. The
key for Cassandra is that you need to model
This answer surprises me, because I would expect NOT to be able to downgrade if
there are any changes in the sstable structure. I assume:
- Upgrade is done while the application is up and writing data (so any
new data is written in the new format)
- Any compactions that happen
I think I would start with the JVM. Sometimes, for export purposes, the
cryptography extensions (JCE), are in a separate jar or package from the
standard JRE or JVM. I haven’t used the IBM JDK, so I don’t know specifically
about that one.
Also, perhaps the error is correct – SSLv2Hello is not a
Wait, isn’t this the Apache Cassandra mailing list? Shouldn’t this be on the
pickle users list or something?
(Just kidding, everyone. I think there should be room for reaper and DataStax
inquiries here.)
Sean Durity
From: Joaquin Casares [mailto:joaq...@thelastpickle.com]
Sent: Tuesday, April
The issue is more with the number of tables, not the number of keyspaces.
Because each table has a memTable, there is a practical limit to the number of
memtables that a node can hold in its memory. (And scaling out doesn’t help,
because every node still has a memTable for every table.) The prac
A couple additional things:
- Make sure that you ran repair on the system_auth keyspace on all
nodes after changing the RF
- If you are not often changing roles/permissions, you might look to
increase permissions_validity_in_ms and roles_validity_in_ms so they are not
being
One of the columns you are selecting is a list or map or other kind of
collection. You can’t do that with an IN clause against a clustering column.
Either don’t select the collection column OR don’t use the IN clause. Cassandra
is trying to protect itself (and you) from a query that won’t scale
>Finally can I run mixed Datastax and Apache nodes in the same cluster same
>version?
>Thank you for all your help.
I have run DSE and Apache Cassandra in the same cluster while migrating to DSE.
The versions of Cassandra were the same. It was relatively brief -- just during
the upgrade proce
Are you using any rack aware topology? What are your partition keys? Is it
possible that your partition keys do not divide up as cleanly as you would like
across the cluster because the data is not evenly distributed (by partition
key)?
Sean Durity
lord of the (C*) rings (Staff Systems Enginee
This sounds like a queue pattern, which is typically an anti-pattern for
Cassandra. I would say that it is very difficult to get the access patterns,
tombstones, and everything else lined up properly to solve a queue problem.
Sean Durity
From: Abhishek Singh
Sent: Tuesday, June 19, 2018 10:41
ons (estimate): 15839280
On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
Are you using any rack aware topology? What are your partition keys? Is it
possible that your partition keys do not divide up as cleanly as you would lik
I haven’t ever hired a Cassandra consultant, but the company named The Last
Pickle (yes, an odd name) has some outstanding Cassandra experts. Not sure how
they work, but worth a mention here.
Nothing against Instacluster. There are great folks there, too.
Sean Durity
From: Evelyn Smith
Sent:
THIS! A well-reasoned and clear explanation of a very difficult topic. This is
the kind of gold that a user mailing list can provide. Thank you, Alain!
Sean Durity
From: Alain RODRIGUEZ
Sent: Tuesday, July 03, 2018 6:37 AM
To: user cassandra.apache.org
Subject: [EXTERNAL] Re: JVM Heap erratic
In most cases, we separate clusters by application. This does help with
isolating problems. A bad query in one application won’t affect other
applications. Also, you can then scale each cluster as required by the data
demands. You can also upgrade separately, which may be a huge help. You only
We do not have any scheduled, periodic node restarts. I have been working on
Cassandra across many versions, and I have not seen a case where periodic
restarts would solve any problem that I saw.
There are certainly times when a node needs a restart – but those are because
of specific reasons.
Check the archives for CMS or G1 (whichever garbage collector you are using).
There has been significant and good advice on both. In general, though, G1 has
one basic number to set and does very well in our use cases. CMS has lots of
black art/science tuning and configuration, but you can test o
This is a very good explanation of CMS tuning for Cassandra:
http://thelastpickle.com/blog/2018/04/11/gc-tuning.html
(author Jon Haddad has extensive Cassandra experience – a super star in our
field)
Sean Durity
From: Durity, Sean R
Sent: Thursday, July 26, 2018 2:08 PM
To: user
Here are some to review and test for Cassandra 3.x from DataStax:
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html
Al Tobey has done extensive work in this area, too. This is dated (Cassandra
2.1), but is worth mining for information:
https:
That sounds like a problem tailor-made for the DataStax Search (embedded SOLR)
solution. I think that would be the fastest path to success.
Sean Durity
From: onmstester onmstester
Sent: Tuesday, July 31, 2018 10:46 AM
To: user
Subject: [EXTERNAL] full text search on some text columns
I need
I wonder if you are building up tombstones with the deletes. Can you share your
data model? Are the deleted rows using the same partition key as new rows? Any
warnings in your system.log for reading through too many tombstones?
Sean Durity
From: Mihai Stanescu
Sent: Friday, August 03, 2018 12
DataStax Enterprise 6.0 has a new bulk loader tool. DSE is a commercial
product, but maybe your needs are worth the investigation.
Sean Durity
From: Rahul Singh
Sent: Tuesday, August 07, 2018 9:37 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: ETL options from Hive/Presto/s3 to cassa
I have definitely seen corruption, especially in system tables, when there are
multiple instances of Cassandra running/trying to start. We had an internal
tool that was supposed to restart processes (like Cassandra) if they were down,
but it often re-checked before Cassandra was fully up and sta
Might also help to know:
Size of cluster
How much data is being loaded (# of inserts/actual data size)
Single table or multiple tables?
Is this a one-time or occasional load or more frequently?
Is the data located in the same physical data center as the cluster? (any
network latency?)
On the clie
Sstableloader, though, could require a lot more disk space – until compaction
can reduce. For example, if your RF=3, you will essentially be loading 3 copies
of the data. Then it will get replicated 3 more times as it is being loaded.
Thus, you could need up to 9x disk space.
Sean Durity
From:
If you are going to compare vs commercial offerings like Scylla and CosmosDB,
you should be looking at DataStax Enterprise. They are moving more quickly than
open source (IMO) on adding features and tools that enterprises really need. I
think they have some emerging tech for large/dense nodes, i
I would only run the clean-up (on all nodes) after all new nodes are added. I
would also look at increasing RF to 3 (and running repair) once there are
plenty of nodes. (This is assuming that availability matters and that your
queries use QUORUM or LOCAL_QUORUM for consistency level.
Longer ter
3 starting points:
- DO NOT migrate your tables as they are in Oracle to Cassandra. In
most cases, you need a different model for Cassandra
- DO take the (free) DataStax Academy courses to learn much more about
Cassandra as you dive in. It is a systematic and bite-size approac
An idea:
On initial insert, insert into 2 tables:
Hot with short TTL
Cold/archive with a longer (or no) TTL
Then your hot data is always in the same table, but being expired. And you can
access the archive table only for the more rare circumstances. Then you could
have the HOT table on a differe
The only solution I see is using logged batch, with a huge overhead and perf
hit on for the writes
On Mon, Sep 17, 2018 at 8:28 PM, Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
An idea:
On initial insert, insert into 2 tables:
Hot with short TTL
Cold/archive with a longer
You are correct that altering the keyspace replication settings does not
actually move any data. It only affects new writes or reads. System_auth is one
that needs to be repaired quickly OR, if your number of users/permissions is
relatively small, you can just reinsert them after the alter to th
Version choices aside, I am an advocate for forward-only (in most cases). Here
is my reasoning, so that you can evaluate for your situation:
- upgrades are done while the application is up and live and writing data (no
app downtime)
- the upgrade usually includes a change to the sstable version (
Thank you. I do want to hear about future conferences. I would also love to
hear reports/summaries/highlights from folks who went to Distributed Data
Summit (or other conferences). I think user conferences are great!
Sean Durity
From: Max C.
Sent: Friday, October 05, 2018 8:33 PM
To: user@cas
Agreed. I have run clusters with both RHEL5 and RHEL6 nodes.
Sean Durity
From: Jeff Jirsa
Sent: Sunday, October 14, 2018 12:40 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Installing a Cassandra cluster with multiple Linux OSs
(Ubuntu+CentOS)
Should be fine, just get the java and k
I have wrapped nodetool info into my own script that strips out and interprets
the information I care about. That script also sets a return code based on the
health of that node (which protocols are up, etc.). Then I can monitor the
individual health of the node – as that node sees itself. I hav
Just to pile on:
I agree. On our upgrades, I always aim to get the binary part done on all nodes
before worrying about upgradesstables. Upgrade is one node at a time
(precautionary). Upgradesstables depends on cluster size, data size,
compactionthroughput, etc. I usually start with running upgr
I would wipe the new node and bootstrap again. I do not know of any way to
resume the streaming that was previously in progress.
Sean Durity
From: Steinmaurer, Thomas
Sent: Wednesday, November 07, 2018 5:13 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Cassandra 2.1 bootstrap - No stream
We have a cluster over 100 nodes that performs just fine for its use case. In
our case, we needed the disk space and did not want the admin headache of very
dense nodes. It does take more automation and process to handle a larger
cluster, but those are all good things to solve anyway.
But count
The VMs’ memory (4 GB) seems pretty small for Cassandra. What heap size are you
using? Which garbage collector? Are you seeing long GC times on the nodes? The
basic rule of thumb is to give the Cassandra heap 50% of the RAM on the host. 2
GB isn’t very much.
Also, I wouldn’t set the replication
I think you are asking about *encryption* at rest. To my knowledge, open source
Cassandra does not support this natively. There are options, like encrypting
the data in the application before it gets to Cassandra. Some companies offer
other solutions. IMO, if you need the increased security, it
We have had great success with Cassandra upgrades with applications staying
on-line. It is one of the strongest benefits of Cassandra. A couple things I
incorporate into upgrades:
- The main task is getting the new binaries loaded, then restarting
the node – in a rolling fashion. Get t
See my recent post for some additional points. But I wanted to encourage you to
look at the in-place upgrade on your existing hardware. No need to add a DC to
try and upgrade. The cluster will handle reads and writes with nodes of
different versions – no problems. I have done this many times on
the new binary and restart
the node to a newer version as quickly as possible. upgradesstables is I/O
intensive and it takes time and is proportional to the data on the node. Given
these constraints, is there a risk due to prolonged upgradesstables?
On Tue, Dec 4, 2018 at 12:20 PM Durity, Sean R
Can you provide the schema and the queries? What is the RF of the keyspace for
the data? Are you using any Retry policy on your Cluster object?
Sean Durity
From: Marco Gasparini
Sent: Friday, December 21, 2018 10:45 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Writes and Reads with hig
datetime, pkey, agent, some_id, ft,
ftt) values (?,?,?,?,?,?);
About Retry policy, the answer is yes, actually when a write fails I store it
somewhere else and, after a period, a try to write it to Cassandra again. This
way I can store almost all my data, but when the problem is the read I don'
You say the events are incremental updates. I am interpreting this to mean only
some columns are updated. Others should keep their original values.
You are correct that inserting null creates a tombstone.
Can you only insert the columns that actually have new values? Just skip the
columns with
I think you could consider option C: Create a (new) analytics DC in Cassandra
and run your spark nodes there. Then you can address the scaling just on that
DC. You can also use less vnodes, only replicate certain keyspaces, etc. in
order to perform the analytics more efficiently.
Sean Durity
center. Does that cause any overhead ?
On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
I think you could consider option C: Create a (new) analytics DC in Cassandra
and run your spark nodes there. Then you can address the scaling just on that
DC. You ca
: Dor Laor
Sent: Wednesday, January 09, 2019 11:23 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Good way of configuring Apache spark with Apache
Cassandra
On Wed, Jan 9, 2019 at 7:28 AM Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
I think you could consider op
I will start – knowing that others will have additional help/questions.
What heap size are you using? Sounds like you are using the CMS garbage
collector. That takes some arcane knowledge and lots of testing to tune. I
would start with G1 and using ½ the available RAM as the heap size. I would
Kenneth is right. Trying to port/support a relational model to a CQL model the
way you are doing it is not going to go well. You won’t be able to scale or get
the search flexibility that you want. It will make Cassandra seem like a bad
fit. You want to play to Cassandra’s strengths – availabilit
I have seen unreliable streaming (streaming that doesn’t finish) because of TCP
timeouts from firewalls or switches. The default tcp_keepalive kernel
parameters are usually not tuned for that. See
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/idleFirewallLinux.html
for more det
Agreed. It’s pretty close to impossible to administrate your way out of a data
model that doesn’t play to Cassandra’s strengths. Which is true for other data
storage technologies – you need to model the data the way that the engine is
designed to work.
Sean Durity
From: DuyHai Doan
Sent: Wed
This has not been my experience. Changing IP address is one of the worst admin
tasks for Cassandra. System.peers and other information on each nodes is stored
by ip address. And gossip is really good at sending around the old information
mixed with new…
Sean Durity
From: Oleksandr Shulgin
S
We use the PropertyFileSnitch precisely because it is the same on every node.
If each node has to have a different file (for GPFS) – deployment is more
complicated. (And for any automated configuration you would have a list of
hosts and DC/rack information to compile anyway)
I do put UNKNOWN as
t few months in different
places and both times recovery was difficult and hazardous.
I still strongly recommend against it.
On Wed, Feb 27, 2019 at 3:11 PM Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
We use the PropertyFileSnitch precisely because it is the same on every nod
Versions 2.0 and 2.1 were generally very stable, so I can understand a
reticence to move when there are so many other things competing for time and
attention.
Sean Durity
From: shalom sagges
Sent: Monday, March 04, 2019 4:21 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: A Questio
If there are 2 access patterns, I would consider having 2 tables. The first one
with the ID, which you say is the majority use case. Then have a second table
that uses a time-bucket approach as others have suggested:
(time bucket, id) as primary key
Choose a time bucket (day, week, hour, month,
egards
On Wed, 13 Mar 2019 at 06:57, Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
If there are 2 access patterns, I would consider having 2 tables. The first one
with the ID, which you say is the majority use case. Then have a second table
that uses a time-bucket approach as
If you can change to 8 vnodes, it will be much better for repairs and other
kinds of streaming operations. The old advice of 256 per node is now not very
helpful.
Sean
From: Ahmed Eljami
Sent: Wednesday, March 13, 2019 1:27 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cluster size
Rebuild the DCs with a new number of vnodes… I have done it.
Sean
From: Ahmed Eljami
Sent: Wednesday, March 13, 2019 2:09 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Cluster size "limit"
Is not possible with an existing cluster!
Le mer. 13 mars 2019 à 18:39, Duri
rows from current table to the 2nd table with ttl set on each
record as the first table?
From: Durity, Sean R
mailto:sean_r_dur...@homedepot.com>>
Sent: Wednesday, March 13, 2019 8:17 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
I spent a month of my life on similar problem... There wasn't an easy answer,
but this is what I did
#1 - Stop the problem from growing further. Get new inserts using a TTL (or set
the default on the table so they get it). App team had to do this one.
#2 - Delete any data that should already be
My default is G1GC using 50% of available RAM (so typically a minimum of 16 GB
for the JVM). That has worked in just about every case I’m familiar with. In
the old days we used CMS, but tuning that beast is a black art with few wizards
available (though several on this mailing list). Today, I ju
My first suspicion would be to look at the server times in the cluster. It
looks like other cases where a write occurs (with no errors) but the data is
not retrieved as expected. If the write occurs with an earlier timestamp than
the existing data, this is the behavior you would see. The write w
https://issues.apache.org/jira/browse/CASSANDRA-9620 has something similar that
was determined to be a driver error. I would start with looking at the driver
version and also the RetryPolicy that is in effect for the Cluster. Secondly, I
would look at whether a batch is really needed for the sta
If you are just trying to get a sense of the data, you could try adding a limit
clause to limit the amount of results and hopefully beat the timeout.
However, ALLOW FILTERING really means "ALLOW ME TO DESTROY MY APPLICATION AND
CLUSTER." It means the data model does not support the query and wil
What is the data problem that you are trying to solve with Cassandra? Is it
high availability? Low latency queries? Large data volumes? High concurrent
users? I would design the solution to fit the problem(s) you are solving.
For example, if high availability is the goal, I would be very cautiou
Object stores are some of our largest and oldest use cases. Cassandra has been
a good choice for us. We do chunk the objects into 64k chunks (I think), so
that partitions are not too large and it scales predictably. For us, the choice
was more about high availability and scalability, which Cassa
Uniqueness is determined by the partition key PLUS the clustering columns. Hard
to tell from your data below, but is it possible that one of the clustering
columns (perhaps g) has different values? That would easily explain the 2 rows
returned – because they ARE different rows in the same partit
This is a stretch, but are you using authentication and/or authorization? In my
understanding the queries executed for you to do the authentication and/or
authorization are usually done at LOCAL_ONE (or QUORUM for cassandra user), but
maybe there is something that is changed in the security setu
This may sound a bit harsh, but I teach my developers that if they are trying
to use ALLOW FILTERING – they are doing it wrong! We often choose Cassandra for
its high availability and scalability characteristics. We love no downtime.
ALLOW FILTERING is breaking the rules of availability and scal
I’m not sure it is correct to say, “you cannot.” However, that is a more
complicated restore and more likely to lead to inconsistent data and take
longer to do. You are basically trying to start from a backup point and roll
everything forward and catch up to current.
Replacing/re-streaming is t
The advice so far is exactly correct for an in-place kind of upgrade. The blog
post you mentioned is different. They decided to jump versions in Cassandra by
standing up a new cluster and using a dual-write/dual-read process for their
app. They also wrote code to read and interpret sstables in o
This sounds like a bad query or large partition. If a large partition is
requested on multiple nodes (because of consistency level), it will pressure
all those replica nodes. Then, as the cluster tries to adjust the rest of the
load, the other nodes can get overwhelmed, too.
Look at cfstats to
What you have seen is totally expected. You can’t stream between different
major versions of Cassandra. Get the upgrade done, then worry about any down
hardware. If you are using DCs, upgrade one DC at a time, so that there is an
available environment in case of any disasters.
My advice, though
l the upgrade is done and
then change it back once the upgrade is completed?
On Fri, Jul 26, 2019 at 11:42 AM Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
What you have seen is totally expected. You can’t stream between different
major versions of Cassandra. Get the upgrade do
DataStax has a very fast bulk load tool - dsebulk. Not sure if it is available
for open source or not. In my experience so far, I am very impressed with it.
Sean Durity – Staff Systems Engineer, Cassandra
-Original Message-
From: p...@xvalheru.org
Sent: Saturday, August 3, 2019 6:06 A
Copy command tries to export all rows in the table, not just the ones on the
node. It will eventually timeout if the table is large. It is really built for
something under 5 million rows or so. Dsbulk (from DataStax) is great for this,
if you are a customer. Otherwise, you will probably need to
I don’t disagree with Jon, who has all kinds of performance tuning experience.
But for ease of operation, we only use G1GC (on Java 8), because the tuning of
ParNew+CMS requires a high degree of knowledge and very repeatable testing
harnesses. It isn’t worth our time. As a previous writer mentio
Beneficial to whom? The apps, the admins, the developers?
I suggest that app teams have separate clusters per application. This prevents
the noisy neighbor problem, isolates any security issues, and helps when it is
time for maintenance, upgrade, performance testing, etc. to not have to
coordin
+1 for removing complexity to be able to create (and maintain!) “reasoned”
systems!
Sean Durity – Staff Systems Engineer, Cassandra
From: Reid Pinchback
Sent: Thursday, October 24, 2019 10:28 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Cassandra Rack - Datacenter Load Balancing re
Everything in Cassandra is an insert. So, an update and an insert are
functionally equivalent. An update doesn't go update the existing data on disk;
it is a new write of the columns involved. So, the difference in your scenario
is that with the "targeted" update, you are writing less of the col
There is definitely a resource risk to having thousands of open connections to
each node. Some of the drivers have (had?) less than optimal default settings,
like acquiring 50 connections per Cassandra node. This is usually overkill. I
think 5-10/node is much more reasonable. It depends on your
This looks more like a problem for a graph-based model. Have you looked at DSE
Graph as a possibility?
Sean Durity
From: ferit baver elhuseyni [mailto:feritba...@gmail.com]
Sent: Tuesday, March 14, 2017 11:40 AM
To: user@cassandra.apache.org
Subject: results differ on two queries, based on secon
Does a later query on the same ID also behave poorly? (Or perhaps it gets
cached and is fast the next time…) My first thought was that perhaps the slow
records had many updates that were being read (or tombstones being read over)
to assemble the record. A trace on the query with that key might r
There have been many instances of supposed inconsistency noted on this list if
nodes do not have the same system time. Make sure you have a matching clock on
all nodes (ntp or similar).
Sean Durity
From: Shubham Jaju [mailto:shub...@vassarlabs.com]
Sent: Tuesday, March 21, 2017 9:58 PM
To: use
Sounds like you want full auditing of CQL in the cluster. I have not seen
anything built into the open source version for that (but I could be missing
something). DataStax Enterprise does have an auditing feature.
Sean Durity
From: anuja jain [mailto:anujaja...@gmail.com]
Sent: Wednesday, Marc
We have seen much better stability (and MUCH less GC pauses) from G1 with a
variety of heap sizes. I don’t even consider CMS any more.
Sean Durity
From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com]
Sent: Tuesday, April 04, 2017 5:34 PM
To: user@cassandra.apache.org
Subject: Re: cassandra OOM
I have seen Windows format cause problems. Run dos2unix on the cassandra.yaml
file (on the linux box) and see if it helps.
Sean Durity
lord of the (C*) rings (Staff Systems Engineer - Cassandra)
MTC 2250
#cassandra - for the latest news and updates
From: Jonathan Baynes [mailto:jonathan.bay...@
1 GB heap is very small. Why not try increasing it to 50% of RAM and see if it
helps you track down the real issue. It is hard to tune around a bad data
model, if that is indeed the issue. Seeing your tables and queries would help.
Sean Durity
From: Pranay akula [mailto:pranay.akula2...@gmail.
I like Bryan’s terminology of an “antagonistic use case.” If I am reading this
correctly, you are putting 5 (or 10) million records in a partition and then
trying to delete them in the same order they are stored. This is not a good
data model for Cassandra, in fact a dangerous data model. That p
Late to this party, but Jeff is talking about nodetool setstreamthroughput. The
default in most versions is 200 Mb/s (set in yaml file as
stream_throughput_outbound_megabits_per_sec). This is outbound throttle only.
So, if streams from multiple nodes are going to one, it can get inundated.
The
I am doing some on-the-job-learning on this newer feature of the 3.x line,
where the token generation algorithm will compensate for different size nodes
in a cluster. In fact, it is one of the main reasons I upgraded to 3.0.13,
because I have a number of original nodes in a cluster that are abou
DataStax Enterprise bundles spark and spark connector on the DSE nodes and
handles much of the plumbing work (and monitoring, etc.). Worth a look.
Sean Durity
From: Avi Levi [mailto:a...@indeni.com]
Sent: Tuesday, August 22, 2017 2:46 AM
To: user@cassandra.apache.org
Subject: Re: Getting all un
Datos IO has a backup/restore product for Cassandra that another team here has
used successfully. It solves many of the problems inherent with sstable
captures. Without something like it, restores are a nightmare with any volume
of data. The downtime required and the loss of data since the snaps
In an attempt to help close the loop for future readers… I don’t think an
upgrade from DSE 4.8 straight to 5.1 is supported. I think you have to go
through 5.0.x first.
And, yes, you should contact DataStax support for help, but I’m ok with
DSE-related questions. They may be more Cassandra-rela
No – the general answer is that you cannot stream between major versions of
Cassandra. I would upgrade the existing ring, then add the new DC.
Sean Durity
From: Chuck Reynolds [mailto:creyno...@ancestry.com]
Sent: Thursday, May 18, 2017 11:20 AM
To: user@cassandra.apache.org
Subject: Can I have
Cassandra version 2.0.17 (yes, it's old - waiting for new hardware/new OS to
upgrade)
In a long-running system with billions of rows, TTL was not set. So a one-time
purge is being planned to reduce disk usage. Records older than a certain date
will be deleted. The table uses size-tiered compact
ved.
--
Jeff Jirsa
On Sep 21, 2017, at 11:27 AM, Durity, Sean R
mailto:sean_r_dur...@homedepot.com>> wrote:
Cassandra version 2.0.17 (yes, it’s old – waiting for new hardware/new OS to
upgrade)
In a long-running system with billions of rows, TTL was not set. So a one-time
purge is being p
n each sstable in reverse
generational order (oldest first) and as long as the data is minimally
overlapping it’ll purge tombstones that way as well - takes longer but much
less disk involved.
--
Jeff Jirsa
On Sep 21, 2017, at 11:27 AM, Durity, Sean R
mailto:sean_r_dur...@homedepot.com>&g
1 - 100 of 203 matches
Mail list logo