Determining active sstables and table- dir

2018-04-27 Thread Carl Mueller
IN cases where a table was dropped and re-added, there are now two table directories with different uuids with sstables. If you don't have knowledge of which one is active, how do you determine which is the active table directory? I have tried cf_id from system.schema_columnfamilies and that can

Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-14 Thread Carl Mueller
https://stackoverflow.com/questions/48776589/cassandra-cant-one-use-snapshots-to-rapidly-scale-out-a-cluster/48778179#48778179 So the basic question is, if one records tokens and snapshots from an existing node, via: nodetool ring | grep ip_address_of_node | awk '{print $NF ","}' | xargs for

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-15 Thread Carl Mueller
Or could we do a rapid clone to a new cluster, then add that as another datacenter? On Wed, Feb 14, 2018 at 11:40 AM, Carl Mueller <carl.muel...@smartthings.com > wrote: > https://stackoverflow.com/questions/48776589/cassandra- > cant-one-use-snapshots-to-rapidly-scale-out-a-clus

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
to explicitly exclude the loadup of any files/sstable components that are CUSTOM in SStable.java On Wed, Feb 21, 2018 at 10:05 AM, Carl Mueller <carl.muel...@smartthings.com > wrote: > jon: I am planning on writing a custom compaction strategy. That's why the > question is here, I figured t

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
Also, I was wondering if the key cache maintains a count of how many local accesses a key undergoes. Such information might be very useful for compactions of sstables by splitting data by frequency of use so that those can be preferentially compacted. On Wed, Feb 21, 2018 at 5:08 PM, Carl Mueller

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-20 Thread Carl Mueller
nge? If we are splitting the old node's primary range, then the replicas would travel with it and the new node would instantly become a replica of the old node. the next primary ranges also have the replicas. On Fri, Feb 16, 2018 at 3:58 PM, Carl Mueller <carl.muel...@smartthings.com> wrote:

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-20 Thread Carl Mueller
I think what is really necessary is providing table-level recipes for storing data. We need a lot of real world examples and the resulting schema, compaction strategies, and tunings that were performed for them. Right now I don't see such crucial cookbook data in the project. AI is a bit

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller
get rid of old data ranges not needed anymore. In practice, is this possible? I have heard Priam can double clusters and they do not use vnodes. I am assuming they do a similar approach but they only have to calculate single tokens? On Tue, Feb 20, 2018 at 11:21 AM, Carl Mueller <carl.m

Re: Cassandra counter readtimeout error

2018-02-20 Thread Carl Mueller
How "hot" are your partition keys in these counters? I would think, theoretically, if specific partition keys are getting thousands of counter increments/mutations updates, then compaction won't "compact" those together into the final value, and you'll start experiencing the problems people get

vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller
As I understand it: Replicas of data are replicated to the next primary range owner. As tokens are randomly generated (at least in 2.1.x that I am on), can't we have this situation: Say we have RF3, but the tokens happen to line up where: NodeA handles 0-10 NodeB handles 11-20 NodeA handlea

Re: vnode random token assignment and replicated data antipatterns

2018-02-20 Thread Carl Mueller
ess, would the next primary ranges take the replica ranges? On Tue, Feb 20, 2018 at 11:45 AM, Jon Haddad <j...@jonhaddad.com> wrote: > That’s why you use a NTS + a snitch, it picks replaces based on rack > awareness. > > > On Feb 20, 2018, at 9:33 AM, Carl Mueller <carl.muel..

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Carl Mueller
1386179.89 > 14530764 4 > > > > key_space_01/cf_01 histograms > > Percentile SSTables Write Latency Read LatencyPartition > SizeCell Count > > (micros) (micros) > (bytes) > > 50%

Re: Rapid scaleup of cassandra nodes with snapshots and initial_token in the yaml

2018-02-16 Thread Carl Mueller
Thanks. Yeah, it appears this would only be doable if we didn't have vnodes and used old single token clusters. I guess Priam has something where you increase the cluster by whole number multiples. Then there's the issue of doing quorum read/writes if there suddenly is a new replica range with

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
>> This is also the wrong mailing list, please direct future user questions >> to the user list. The dev list is for development of Cassandra itself. >> >> Jon >> >> On Feb 20, 2018, at 1:10 PM, Carl Mueller <carl.muel...@smartthings.com> >&

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
21, 2018 at 9:59 AM, Carl Mueller <carl.muel...@smartthings.com> wrote: > Thank you all! > > On Tue, Feb 20, 2018 at 7:35 PM, kurt greaves <k...@instaclustr.com> > wrote: > >> Probably a lot of work but it would be incredibly useful for vnodes if >

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Carl Mueller
DCs can be stood up with snapshotted data. Stand up a new cluster with your old cluster snapshots: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html Then link the DCs together. Disclaimer: I've never done this in real life. On Wed, Feb 21,

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
What is your replication factor? Single datacenter, three availability zones, is that right? You removed one node at a time or three at once? On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash wrote: > We have had a 15 node cluster across three zones and cluster repairs using >

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Carl Mueller
e and replacing it in the cluster. Bringing up a new DC with > snapshots is going to be a nightmare in comparison. > > On Wed, Feb 21, 2018 at 8:16 AM Carl Mueller <carl.muel...@smartthings.com> > wrote: > >> DCs can be stood up with snapshotted data. >> >> >

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
sorry for the idiot questions... data was allowed to fully rebalance/repair/drain before the next node was taken off? did you take 1 off per rack/AZ? On Wed, Feb 21, 2018 at 12:29 PM, Fred Habash <fmhab...@gmail.com> wrote: > One node at a time > > On Feb 21, 2018 10:23 AM

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
also entail cross-az streaming and queries and repair. On Wed, Feb 21, 2018 at 3:30 PM, Carl Mueller <carl.muel...@smartthings.com> wrote: > sorry for the idiot questions... > > data was allowed to fully rebalance/repair/drain before the next node was > taken off? > > did

Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Carl Mueller
Cass 2.1.14 is missing some wide row optimizations done in later cass releases IIRC. Speculation: IN won't matter, it will load the entire wide row into memory regardless which might spike your GC/heap and overflow the rowcache On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins

Re: Data Deleted After a few days of being off

2018-02-27 Thread Carl Mueller
Does cassandra still function if the commitlog dir has no writes? Will the data still go into the memtable and serve queries? On Tue, Feb 27, 2018 at 1:37 AM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > On Tue, Feb 27, 2018 at 7:37 AM, A wrote: > >> >> I

Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread Carl Mueller
hings simple. Did I miss > something? What does it matter right now? > > > > Thanks Carl, > > > > Kenneth Brotman > > > > *From:* Carl Mueller [mailto:carl.muel...@smartthings.com] > *Sent:* Tuesday, February 27, 2018 8:50 AM > *To:* user@cassandra.apache

Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread Carl Mueller
so... are those pages in the code tree of github? I don't see them or a directory structure under /doc. Is mirroring the documentation between the apache site and a github source a big issue? On Tue, Feb 27, 2018 at 7:50 AM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > I was debating

Re: Version Rollback

2018-02-27 Thread Carl Mueller
My speculation is that IF (bigif) the sstable formats are compatible between the versions, which probably isn't the case for major versions, then you could drop back. If the sstables changed format, then you'll probably need to figure out how to rewrite the sstables in the older format and then

Re: Filling in the blank To Do sections on the Apache Cassandra web site

2018-02-27 Thread Carl Mueller
a docker image to build them so you don’t need to mess with > sphinx. Check the README for instructions. > > Jon > > > On Feb 27, 2018, at 9:49 AM, Carl Mueller <carl.muel...@smartthings.com> > wrote: > > > If there was a github for the docs, we could start

Re: Cassandra at Instagram with Dikang Gu interview by Jeff Carpenter

2018-03-12 Thread Carl Mueller
Again, I'd really like to get a feel for scylla vs rocksandra vs cassandra. Isn't the driver binary protocol the easiest / least redesign level of storage engine swapping? Scylla and Cassandra and Rocksandra are currently three options. Rocksandra can expand out it's non-java footprint without

Re: Cassandra vs MySQL

2018-03-14 Thread Carl Mueller
THERE ARE NO JOINS WITH CASSANDRA CQL != SQL Same for aggregation, subqueries, etc. And effectively multitable transactions are out. If you have simple single-table queries and updates, or can convert the app to do so, then you're in business. On Tue, Mar 13, 2018 at 5:02 AM, Rahul Singh

Re: Rocksandra blog post

2018-03-06 Thread Carl Mueller
Basically they are avoiding gc, right? Not necessarily improving on the theoreticals of sstables and LSM trees. Why didn't they use/try scylla? I'd be interested to see that benchmark. On Tue, Mar 6, 2018 at 3:48 AM, Romain Hardouin wrote: > Rocksandra is very

Re: data types storage saving

2018-03-06 Thread Carl Mueller
If you're willing to do the data type conversion in insert and retrieval, the you could use blobs as a sort of "adaptive length int" AFAIK On Tue, Mar 6, 2018 at 6:02 AM, onmstester onmstester wrote: > I'm using int data type for one of my columns but for 99.99...% its data

Re: [EXTERNAL] Cassandra vs MySQL

2018-03-20 Thread Carl Mueller
Yes, cassandra's big win is that once you get your data and applications adapted to the platform, you have a clear path to very very large scale and resiliency. Um, assuming you have the dollars. It scales out on commodity hardware, but isn't exactly efficient in the use of that hardware. I like

Re: One time major deletion/purge vs periodic deletion

2018-03-20 Thread Carl Mueller
It's possible you'll run into compaction headaches. Likely actually. If you have time-bucketed purge/archives, I'd implement a time bucketing strategy using rotating tables dedicated to a time period so that when an entire table is ready for archiving you just snapshot its sstables and then

cassl 2.1.x seed node update via JMX

2018-03-22 Thread Carl Mueller
We have a cluster that is subject to the one-year gossip bug. We'd like to update the seed node list via JMX without restart, since our foolishly single-seed-node in this forsaken cluster is being autoculled in AWS. Is this possible? It is not marked volatile in the Config of the source code, so

Re: cassl 2.1.x seed node update via JMX

2018-03-22 Thread Carl Mueller
> Previously (as described in the ticket above), the seed node list is only > updated when doing a shadow round, removing an endpoint or restarting (look > for callers of o.a.c.gms.Gossiper#buildSeedsList() if you're curious). > > A rolling restart is the usual SOP for that. > > On F