When is anti-entropy repair required?

2017-03-27 Thread eugene miretsky
Hi, Trying to get some clarifications on this post: https://docs.datastax. com/en/cassandra/3.0/cassandra/operations/opsRepairNodesWhen.html As far as I understand it, repairs to account for the fact that nodes could go down (for short of long period of time) The 2 main reasons for repairing

Issues while using TWCS compaction and Bulkloader

2017-03-27 Thread eugene miretsky
Hi, We have a Cassandra 3.0.8 cluster, and we use the Bulkloader to upload time series data nightly. The data has a 3day TTL, and the compaction window unit is 1h. Generally the data fits into memory, all reads are served

Why are automatic anti-entropy repairs required when hinted hand-off is enabled?

2017-04-06 Thread eugene miretsky
Hi, As I see it, if hinted handoff is enabled, the only time data can be inconsistent is when: 1. A node is down for longer than the max_hint_window 2. The coordinator node crushes before all the hints have been replayed Why is it still recommended to perform frequent automatic repairs,

Re: Why are automatic anti-entropy repairs required when hinted hand-off is enabled?

2017-04-20 Thread eugene miretsky
www.pythian.com/blog/effective-anti-entropy-repair-cassandra/ > > > > https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/ > toolsRepair.html > > > > https://www.datastax.com/dev/blog/repair-in-cassandra > > > > > > > > > > *From: *eugene

Downside to running multiple nodetool repairs at the same time?

2017-04-20 Thread eugene miretsky
In Cassandra 3.0 the default nodetool repair behaviour is incremental and parallel. Is there a downside to triggering repair from multiple nodes at the same time? Basically, instead of scheduling a cron job on one node to run repair, I want to schedule the job on every node (this way, I don't

How to stress test collections in Cassandra Stress

2017-04-13 Thread eugene miretsky
Hi, I'm trying to do a stress test on a a table with a collection column, but cannot figure out how to do that. I tried table_definition: | CREATE TABLE list ( customer_id bigint, items list, PRIMARY KEY (customer_id)); columnspec: - name: customer_id size: fixed(64)

Re: Downside to running multiple nodetool repairs at the same time?

2017-04-21 Thread eugene miretsky
; > On 21 Apr 2017, at 00:57, eugene miretsky <eugene.miret...@gmail.com> > wrote: > > > > In Cassandra 3.0 the default nodetool repair behaviour is incremental > and parallel. > > Is there a downside to triggering repair from multiple nodes at the same > time? >

DataStax Spark driver performance for analytics workload

2017-10-06 Thread eugene miretsky
Hello, When doing analytics is Spark, a common pattern is to load either the whole table into memory or filter on some columns. This is a good pattern for column-oriented files (Parquet) but seems to be a huge anti-pattern in C*. Most common spark operations will result in one of (a) query

Re: How do TTLs generate tombstones

2017-10-05 Thread eugene miretsky
60 seconds later, the live nodes will see that data as deleted, but when > that dead node comes back to life, it needs to learn of the deletion. > > > > On Wed, Oct 4, 2017 at 2:05 PM, eugene miretsky <eugene.miret...@gmail.com > > wrote: > >> Hello, >> >> The

What is performance gain of clustering columns

2017-10-03 Thread eugene miretsky
Hi, Clustering columns are used to order the data in a partition. However, since data is split into SSTables, the rows are ordered by clustering key only within each SSTable. Cassandra still needs to check all SSTables, and merge the data if it is found in several SSTables. The only scanario

Re: How do TTLs generate tombstones

2017-10-09 Thread eugene miretsky
t;: > >> No it's never safe to set it to 0 as you'll disable hinted handoff for >> the table. If you are never doing updates and manual deletes and you always >> insert with a ttl you can get away with setting it to the hinted handoff >> period. >> >> On 6 Oct. 2

How do TTLs generate tombstones

2017-10-04 Thread eugene miretsky
Hello, The following link says that TTLs generate tombstones - https://docs.datastax.com/en/cql/3.3/cql/cql_using/useExpire.html. What exactly is the process that converts the TTL into a tombstone? 1. Is an actual new tombstone cell created when the TTL expires? 2. Or, is the TTLed cell

Re: CQL Map vs clustering keys

2017-11-16 Thread eugene miretsky
clustering keys is performance and you can use WAY > more K/V pairs. > > Jon > > > On Nov 15, 2017, at 8:12 AM, eugene miretsky <eugene.miret...@gmail.com> > wrote: > > Hi, > > What would be the tradeoffs between using > > 1) Map > > ( > > id U

CQL Map vs clustering keys

2017-11-15 Thread eugene miretsky
Hi, What would be the tradeoffs between using 1) Map ( id UUID PRIMARY KEY, myMap map ); 2) Clustering key ( id UUID PRIMARY KEY, key int, val text, PRIMARY KEY (id, key)) ); My understanding is that maps are stored very similarly to clustering columns, where the map key

Re: How do TTLs generate tombstones

2017-10-31 Thread eugene miretsky
resurrection as long as all the writes carry TTL with >> them. >> >> We faced similar overlapping issues with TWCS (it wss due to >> dclocal_read_repair_chance) - we developed an SSTable tool that would give >> topN or bottomN keys in an SSTable based on writeti