Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Oleksandr Shulgin
On Wed, Feb 21, 2018 at 7:54 PM, Durity, Sean R wrote: > > > However, I think the shots at Cassandra are generally unfair. When I > started working with it, the DataStax documentation was some of the best > documentation I had seen on any project, especially an open

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread kurt greaves
> > Also, I was wondering if the key cache maintains a count of how many local > accesses a key undergoes. Such information might be very useful for > compactions of sstables by splitting data by frequency of use so that those > can be preferentially compacted. No we don't currently have metrics

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Jeff, I already addressed everything you said. Boy! Would I like to bring up the out of date articles on the web that trip people up and the lousy documentation on the Apache website but I can’t because a lot of folks don’t know me or why I’m saying these things. I will be making

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread kurt greaves
> > Instead of saying "Make X better" you can quantify "Here's how we can make > X better" in a jira and the conversation will continue with interested > parties (opening jiras are free!). Being combative and insulting project on > mailing list may help vent some frustrations but it is counter

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Chris Lohfink
Instead of saying "Make X better" you can quantify "Here's how we can make X better" in a jira and the conversation will continue with interested parties (opening jiras are free!). Being combative and insulting project on mailing list may help vent some frustrations but it is counter productive

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jason Brown
Hi all, I'd like to deescalate a bit here. Since this is an Apache and an OSS project, contributions come in many forms: code, speaking/advocacy, documentation, support, project management, and so on. None of these things come for free. Ken, I appreciate you bring up these usability topics;

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
Also, I was wondering if the key cache maintains a count of how many local accesses a key undergoes. Such information might be very useful for compactions of sstables by splitting data by frequency of use so that those can be preferentially compacted. On Wed, Feb 21, 2018 at 5:08 PM, Carl Mueller

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jeff Jirsa
On Wed, Feb 21, 2018 at 2:53 PM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > Hi Akash, > > I get the part about outside work which is why in replying to Jeff Jirsa I > was suggesting the big companies could justify taking it on easy enough and > you know actually pay the people who

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Brandon Williams
The only progress from this point is what Jon said: enumerate and detail your issues in jira tickets. On Wed, Feb 21, 2018 at 4:53 PM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > Hi Akash, > > I get the part about outside work which is why in replying to Jeff Jirsa I > was

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
Looking through the 2.1.X code I see this: org.apache.cassandra.io.sstable.Component.java In the enum for component types there is a CUSTOM enum value which seems to indicate a catchall for providing metadata for sstables. Has this been exploited... ever? I noticed in some of the patches for

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Hi Akash, I get the part about outside work which is why in replying to Jeff Jirsa I was suggesting the big companies could justify taking it on easy enough and you know actually pay the people who would be working at it so those people could have a life. The part I don't get is the aversion

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Akash Gangil
I would second Jon in the arguments he made. Contributing outside work is draining and really requires a lot of commitment. If someone requires features around usability etc, just pay for it, period. On Wed, Feb 21, 2018 at 2:20 PM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > Jon, >

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Jon, Very sorry that you don't see the value of the time I'm taking for this. I don't have demands; I do have a stern warning and I'm right Jon. Please be very careful not to mischaracterized my words Jon. You suggest I put things in JIRA's, then seem to suggest that I'd be lucky if anyone

Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Jeff Jirsa
Slight nuance: we don't load the whole row into memory, but the column index (and the result set, and the tombstones in the partition), which can still spike your GC/heap (and potentially overflow the row cache, if you have it on, which is atypical). On Wed, Feb 21, 2018 at 1:35 PM, Carl Mueller

Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Carl Mueller
Cass 2.1.14 is missing some wide row optimizations done in later cass releases IIRC. Speculation: IN won't matter, it will load the entire wide row into memory regardless which might spike your GC/heap and overflow the rowcache On Wed, Feb 21, 2018 at 2:16 PM, Gareth Collins

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
Hm nodetool decommision performs the streamout of the replicated data, and you said that was apparently without error... But if you dropped three nodes in one AZ/rack on a five node with RF3, then we have a missing RF factor unless NetworkTopologyStrategy fails over to another AZ. But that would

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
sorry for the idiot questions... data was allowed to fully rebalance/repair/drain before the next node was taken off? did you take 1 off per rack/AZ? On Wed, Feb 21, 2018 at 12:29 PM, Fred Habash wrote: > One node at a time > > On Feb 21, 2018 10:23 AM, "Carl Mueller"

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Jon Haddad
Ken, Maybe it’s not clear how open source projects work, so let me try to explain. There’s a bunch of us who either get paid by someone or volunteer on our free time. The folks that get paid, (yay!) usually take direction on what the priorities are, and work on projects that directly affect

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Durity, Sean R
It is instructive to listen to the concerns of new and existing users in order to improve a product like Cassandra, but I think the school yard taunt model isn’t the most effective. In my experience with open and closed source databases, there are always things that could be improved. Many

Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Gareth Collins
Thanks for the response! I could understand that being the case if the Cassandra cluster is not loaded. Splitting the work across multiple nodes would obviously make the query faster. But if this was just a single node, shouldn't one IN query be faster than multiple due to the fact that, if I

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
So before buying any marketing claims from Microsoft or whoever, maybe should you try to use it extensively ? And talking about backup, have a look at DynamoDB: http://i68.tinypic.com/n1b6yr.jpg >From my POV, if a multi-billions company like Amazon doesn't get it right or can't make it easy for

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Kenneth Brotman
Josh, To say nothing is indifference. If you care about your community, sometimes don't you have to bring up a subject even though you know it's also temporarily adding some discomfort? As to opening a JIRA, I've got a very specific topic to try in mind now. An easy one I'll work on and

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fred Habash
RF of 3 with three racs AZ's in a single region. On Feb 21, 2018 10:23 AM, "Carl Mueller" wrote: > What is your replication factor? > Single datacenter, three availability zones, is that right? > You removed one node at a time or three at once? > > On Wed, Feb 21,

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fred Habash
One node at a time On Feb 21, 2018 10:23 AM, "Carl Mueller" wrote: > What is your replication factor? > Single datacenter, three availability zones, is that right? > You removed one node at a time or three at once? > > On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash

Re: Missing 3.11.X cassandra debian packages

2018-02-21 Thread Michael Shuler
On 02/21/2018 11:56 AM, Zachary Marois wrote: > Starting in that last two weeks (I successfully installed cassandra > sometime in the last two weeks), I'm guessing on 2/19 when version > 3.11.2 was released, the cassandra apt package version 3.11.1 became > unstable. It doesn't seem to be

Missing 3.11.X cassandra debian packages

2018-02-21 Thread Zachary Marois
Starting in that last two weeks (I successfully installed cassandra sometime in the last two weeks), I'm guessing on 2/19 when version 3.11.2 was released, the cassandra apt package version 3.11.1 became unstable. It doesn't seem to be published in the

FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February

2018-02-21 Thread Sharan F
Hello Apache Supporters and Enthusiasts This is your FINAL reminder that the Call for Papers (CFP) for the Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 in Berlin. Note that the CFP deadline

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Carl Mueller
I don't disagree with jon. On Wed, Feb 21, 2018 at 10:27 AM, Jonathan Haddad wrote: > The easiest way to do this is replacing one node at a time by using > rsync. I don't know why it has to be more complicated than copying data to > a new machine and replacing it in the

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Josh McKenzie
There's a disheartening amount of "here's where Cassandra is bad, and here's what it needs to do for me for free" happening in this thread. This is open-source software. Everyone is *strongly encouraged* to submit a patch to move the needle on *any* of these things being complained about in this

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Jonathan Haddad
The easiest way to do this is replacing one node at a time by using rsync. I don't know why it has to be more complicated than copying data to a new machine and replacing it in the cluster. Bringing up a new DC with snapshots is going to be a nightmare in comparison. On Wed, Feb 21, 2018 at

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Jeff Jirsa
nodetool cfhistograms, nodetool compactionstats would be helpful Compaction is probably behind from streaming, and reads are touching many sstables. -- Jeff Jirsa > On Feb 21, 2018, at 8:20 AM, Fd Habash wrote: > > We have had a 15 node cluster across three zones and

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Carl Mueller
What is your replication factor? Single datacenter, three availability zones, is that right? You removed one node at a time or three at once? On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash wrote: > We have had a 15 node cluster across three zones and cluster repairs using >

Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

2018-02-21 Thread Fd Habash
We have had a 15 node cluster across three zones and cluster repairs using ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the cluster to 12. Since then, same repair job has taken up to 12 hours to finish and most times, it never does. More importantly, at some point

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Carl Mueller
DCs can be stood up with snapshotted data. Stand up a new cluster with your old cluster snapshots: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html Then link the DCs together. Disclaimer: I've never done this in real life. On Wed, Feb 21,

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
jon: I am planning on writing a custom compaction strategy. That's why the question is here, I figured the specifics of memtable -> sstable and cassandra internals are not a user question. If that still isn't deep enough for the dev thread, I will move all those questions to user. On Wed, Feb 21,

Re: Memtable flush -> SSTable: customizable or same for all compaction strategies?

2018-02-21 Thread Carl Mueller
Thank you all! On Tue, Feb 20, 2018 at 7:35 PM, kurt greaves wrote: > Probably a lot of work but it would be incredibly useful for vnodes if > flushing was range aware (to be used with RangeAwareCompactionStrategy). > The writers are already range aware for JBOD, but

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Nitan Kainth
New dc will be faster but may impact cluster performance due to streaming. Sent from my iPhone > On Feb 21, 2018, at 8:53 AM, Leena Ghatpande wrote: > > We do use LOCAL_ONE and LOCAL_Quorum currently. But these 8 nodes need to be > in 2 different DC< so we would end up

Re: Best approach to Replace existing 8 smaller nodes in production cluster with New 8 nodes that are bigger in capacity, without a downtime

2018-02-21 Thread Leena Ghatpande
We do use LOCAL_ONE and LOCAL_Quorum currently. But these 8 nodes need to be in 2 different DC< so we would end up create additional 2 new DC and dropping 2. are there any advantages on adding DC over one node at a time? From: Jeff Jirsa

Re: Installing the common service to start cassandrea

2018-02-21 Thread Rahul Singh
Jeff, Check the service configuration to see what path it’s using for the JRE execution and if it’s specifying any class path parameters. The system user may not have the environment variables available whereas your user may have it. -- Rahul Singh rahul.si...@anant.us Anant Corporation On

Re: Performance Of IN Queries On Wide Rows

2018-02-21 Thread Rahul Singh
That depends on the driver you use but separate queries asynchronously around the cluster would be faster. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Feb 20, 2018, 6:48 PM -0500, Eric Stevens , wrote: > Someone can correct me if I'm wrong, but I believe if you

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Oleksandr Shulgin
On Mon, Feb 19, 2018 at 10:01 AM, Kenneth Brotman < kenbrot...@yahoo.com.invalid> wrote: > > >> Cluster wide management should be a big theme in any next major release. > >> > >Na. Stability and testing should be a big theme in the next major release. > > > > Double Na on that one Jeff. I think

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Ben Slater
I’ve been bitting my tongue because I don’t normally like to directly plug our service on the mailing list but if you’re going to compare Cassandra to a full managed service from Microsoft then you really should check out Instaclustr (www.instaclustr.com) and you’ll find that we take care of many

Re: LEAK DETECTED while minor compaction

2018-02-21 Thread Дарья Меленцова
Bloom filter settings have not changed, they are default. In the table settings bloom_filter_fp_chance = 0.01. Should I increase it? DESC TABLE "PerBoxEventSeriesEventIds" CREATE TABLE "EventsKeyspace"."PerBoxEventSeriesEventIds" ( key blob, column1 text, value blob, PRIMARY KEY

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
For UI and interactive data exploration there is already the Cassandra interpreter for Apache Zeppelin that is more than decent for the job On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > But what does this video really show? That Microsoft

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread Daniel Hölbling-Inzko
But what does this video really show? That Microsoft managed to run Cassandra as a SaaS product with nice UI? Google did that years ago with BigTable and Amazon with DynamoDB. I agree that we need more tools, but not so much for querying (although that would also help a bit), but just in general