One time major deletion/purge vs periodic deletion

2018-03-05 Thread Charulata Sharma (charshar)
Hi, Wanted the community’s feedback on deciding the schedule of Archive and Purge job. Is it better to Purge a large volume of data at regular intervals (like run A jobs once in 3 months ) or purge smaller amounts more frequently (run the job weekly??) Some estimates on the number of

Re: Cassandra Daemon not coming up

2018-03-05 Thread mahesh rajamani
I did not add any user and disk space was fine. On Tue, Feb 27, 2018, 11:33 Rahul Singh wrote: > Were there any changes to the system such as permissions, etc. Did you add > users / change auth scheme? > > On Feb 27, 2018, 10:27 AM -0600, ZAIDI, ASAD A

paging through cql query on django

2018-03-05 Thread Daniel Santos
I have two queries. One that gives me the first page from a cassandra table, and another one that retrieves the successive pages. The fist one is like : select * from images_by_user where token(iduser) = token(5) limit 10 allow filtering; The successive ones are : select * from

Rocksandra blog post

2018-03-05 Thread Dikang Gu
As some of you already know, Instagram Cassandra team is working on the project to use RocksDB as Cassandra's storage engine. Today, we just published a blog post about the work we have done, and more excitingly, we published the benchmark metrics in AWS environment. Check it out here:

Re: Read latency

2018-03-05 Thread D. Salvatore
Hi Jeff, Thank you very much for your response. Your considerations are definitely right but, at this point, I just want to consider the Cassandra response time on different Azure VMs size. Yes, the YCSB GC can impact on it but the total time that YCSB spent with the GC is ~ 3% of the total

Re: system.size_estimates - safe to remove sstables?

2018-03-05 Thread Chris Lohfink
Any chance space used by snapshots? What files exist there that are taking up space? > On Mar 5, 2018, at 1:02 AM, Kunal Gangakhedkar > wrote: > > Hi all, > > I have a 2-node cluster running cassandra 2.1.18. > One of the nodes has run out of disk space and died -

Re: system.size_estimates - safe to remove sstables?

2018-03-05 Thread Chris Lohfink
Unless using spark or hadoop nothing consumes the data in that table (unless you have tooling that may use it like opscenter or something) so your safe to just truncate it or rm the sstables when instance offline you will be fine, if you do use that table you can then do a `nodetool

Re: Seed nodes of DC2 creating own versions of system keyspaces

2018-03-05 Thread Jeff Jirsa
> On Mar 5, 2018, at 6:40 AM, Oleksandr Shulgin > wrote: > > Hi, > > We were deploying a second DC today with 3 seed nodes (30 nodes in total) and > we have noticed that all seed nodes reported the following: > > INFO 10:20:50 Create new Keyspace:

Re: Read latency

2018-03-05 Thread Jeff Jirsa
> On Mar 5, 2018, at 6:52 AM, D. Salvatore wrote: > > Hello everyone, > I am benchmarking a Cassandra installation on Azure composed of 4 nodes > (Standard_D2S_V3 - 2vCPU and 8GB ram) with a replication factor of 2. Bit smaller than most people would want to run in

Read latency

2018-03-05 Thread D. Salvatore
Hello everyone, I am benchmarking a Cassandra installation on Azure composed of 4 nodes (Standard_D2S_V3 - 2vCPU and 8GB ram) with a replication factor of 2. To benchmark this testbed, I am using a single YCSB instance with the workload C (100% read request), a Consistency level ONE and only 10

Seed nodes of DC2 creating own versions of system keyspaces

2018-03-05 Thread Oleksandr Shulgin
Hi, We were deploying a second DC today with 3 seed nodes (30 nodes in total) and we have noticed that all seed nodes reported the following: INFO 10:20:50 Create new Keyspace: KeyspaceMetadata{name=system_traces, params=KeyspaceParams{durable_writes=true,

Re: How do counter updates work?

2018-03-05 Thread Hannu Kröger
So just to clarify we have two different use cases: - TIMEUUID is there for client side generation of unique row ids. It’s great for that. - Cassandra counters are not very good for row id generation and suited better to e.g. those use cases I listed before Hannu > On 5 Mar 2018, at 16:34,

Re: How do counter updates work?

2018-03-05 Thread Javier Pareja
Doesn't cassandra have TIMEUUID for these use cases? Anyways, hopefully someone can help me better understand possible delays when writing a counter. F Javier Pareja On Mon, Mar 5, 2018 at 1:54 PM, Hannu Kröger wrote: > Traditionally auto increment counters have been used

Re: How do counter updates work?

2018-03-05 Thread Hannu Kröger
Traditionally auto increment counters have been used to generate SQL row IDs. This is what Kyrylo probably is here referring to. Cassandra counters are better tracking e.g. usage patterns, web site visitors, statistics, etc. For accurate counting (e.g. for generating IDs) those counters are

Re: How do counter updates work?

2018-03-05 Thread Javier Pareja
Hi Kyrulo, I don't understand how UUIDs are related to counters, but I use counters to increment the value of a cell in an atomic manner. I could try reading the value and then writing to the cell but then I would lose the atomicity of the update. F Javier Pareja On Mon, Mar 5, 2018 at 1:32 PM,

Re: How do counter updates work?

2018-03-05 Thread Kyrylo Lebediev
Hello! Can't answer your question but there is another one: "why do we need to maintain counters with their known limitations (and I've heard of some issues with implementation of counters in Cassandra), when there exist really effective uuid generation algorithms which allow us to generate

Re: [External] Re: Whch version is the best version to run now?

2018-03-05 Thread Tom van der Woerdt
We run on the order of a thousand Cassandra nodes in production. Most of that is 3.0.16, but new clusters are defaulting to 3.11.2 and some older clusters have been upgraded to it as well. All of the bugs I encountered in 3.11.x were also seen in 3.0.x, but 3.11.x seems to get more love from the

How do counter updates work?

2018-03-05 Thread Javier Pareja
Hello everyone, I am trying to understand how cassandra counter writes work in more detail but all that I could find is this: https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters >From there I was able to extract the following process: (click here to

Re: vnodes: high availability

2018-03-05 Thread Kyrylo Lebediev
What's the reason behind this negative effect of dynamic_snitch enabled? Is this true for all C* versions for which this feature is implemented? Is that because node latencies change too dynamically/sporadically while values is dynamic_snitch tune slower 'than required' and can't keep up with