sharding vs what cassandra does

2015-01-19 Thread Adaryl Bob Wakefield, MBA
It’s my understanding that the way Cassandra replicates data across nodes is NOT sharding. Can someone provide a better explanation or correct my understanding? B.

Re: How do replica become out of sync

2015-01-19 Thread Flavien Charlon
Thanks Andi. The reason I was asking is that even though my nodes have been 100% available and no write has been rejected, when running an incremental repair, the logs still indicate that some ranges are out of sync (which then results in large amounts of compaction), how can this be possible? I

RE: How do replica become out of sync

2015-01-19 Thread Andreas Finke
Hi, right, QUORUM means that data is written to all replicas but the coordinator waits for QUORUM responses before returning back to client. If a replica is out of sync due to network or internal issue than consistency is ensured through: - HintedHandoff (Automatically

Re: Not enough replica available” when consistency is ONE?

2015-01-19 Thread Sylvain Lebresne
On Mon, Jan 19, 2015 at 2:29 AM, Kevin Burton bur...@spinn3r.com wrote: So ConsistencyLevel.ONE and if not exists are essentially mutually incompatible and shouldn’t the driver throw an exception if the user requests this configuration? The subtlety is that this consistency level (CL.ONE in

Cassandra fetches complete partition

2015-01-19 Thread nitin padalia
Hi, Does Cassandra fetches complete partition if I include Cluster key in where clause. Or What is the difference in: 1. Select * from column_family where partition_key = 'somekey' limit 1; 2. Select * from column_family where partition_key = 'somekey' and clustering_key = 'some_clustring_key';

RE: sharding vs what cassandra does

2015-01-19 Thread Mohammed Guller
Partitioning is similar to sharding. Mohammed From: Adaryl Bob Wakefield, MBA [mailto:adaryl.wakefi...@hotmail.com] Sent: Monday, January 19, 2015 8:28 PM To: user@cassandra.apache.org Subject: sharding vs what cassandra does It’s my understanding that the way Cassandra replicates data across

Re: sharding vs what cassandra does

2015-01-19 Thread Nagesh
Sharding is a type of database partitioning. The sweet spot of cassandra is to supporting fast random reads. This is achieved by grouping data based on a partition key and replicate to different nodes. The querying should be in such a way to look up data of one partition at a time. Grouping data

RE: sharding vs what cassandra does

2015-01-19 Thread Job Thomas
Hi, If we think it the perspective of column family (table), its rows are split into different nodes(Sharding) based on ring concept in Cassandra. But the core unit of data storage (rows) id not spit across nodes, only copy is maintained in different rows. All column associated to a single

How do replica become out of sync

2015-01-19 Thread Flavien Charlon
Hi, When writing to Cassandra using CL = Quorum (or anything less than ALL), is it correct to say that Cassandra tries to write to all the replica, but only waits for Quorum? If so, what can cause some replica to become out of sync when they're all online? Thanks Flavien

Re: Compaction failing to trigger

2015-01-19 Thread Flavien Charlon
Thanks Roland. Good to know, I will try that. Do you know the JIRA ticket number of that bug? Thanks, Flavien On 19 January 2015 at 06:15, Roland Etzenhammer r.etzenham...@t-online.de wrote: Hi Flavien, I hit some problem with minor compations recently (just some days ago) - but with many

Re: number of replicas per data center?

2015-01-19 Thread Laing, Michael
Since our workload is spread globally, we spread our nodes across AWS regions as well: 2 nodes per zone, 6 nodes per region (datacenter) (RF 3), 12 nodes total (except during upgrade migrations). We autodeploy into VPCs. If a region goes bad we can route all traffic to another and bring up a

Re: Cassandra fetches complete partition

2015-01-19 Thread nitin padalia
e.g. CREATE TABLE usertable_cache ( user_id uuid, dept_id uuid, location_id text, locationmap_id uuid, PRIMARY KEY ((user_id, dept_id), location_id) ) WITH bloom_filter_fp_chance=0.01 AND caching='{keys:ALL, rows_per_partition:1000}' AND comment='' AND

Re: Cassandra fetches complete partition

2015-01-19 Thread nitin padalia
My question is specifically for row cache? As in cassandra 2.1.2 when I populate a Column Family with 1000 rows for a partition and rows_per_partition setting is 1000 for the Column Family then for first and last row, it says cache miss.. if I mention specific row key in query? If I increase

Re: number of replicas per data center?

2015-01-19 Thread Eric Stevens
Ah.. six replicas. At least its super inexpensive that way (sarcasm!) Well it's up to you to decide what your data locality and fault tolerance requirements are. If you want to run two DC's, costs are going to increase since each DC has a full set of replicas within itself. But you get the

Nodetool removenode stuck

2015-01-19 Thread Artur Kronenberg
Hi, we have had an issue with one of our nodes today: 1. Due to a wrong setup the starting node failed to properly bootstrap. It was shown as UN in the cluster however did not contain any data and we shut it down to fix our configuration issue. 2. We figured we need to remove the node from

Re: Not enough replica available” when consistency is ONE?

2015-01-19 Thread Panagiotis Garefalakis
Hello, My feeling is you got the hole CAS operations concept wrong. CAS operations a.k.a lightweight transactions are not meant to used everywhere, only in specific parts of your application where serialisable consistency is necessary. For any other case there is a variety of consistency levels

Re: Cassandra fetches complete partition

2015-01-19 Thread Eric Stevens
It depends on your version of Cassandra. I would suggest starting with this, which describes the differences between 2.0 and 2.1 http://www.datastax.com/dev/blog/row-caching-in-cassandra-2-1 In particular: In previous releases, this cache has required storing the entire partition in memory,

Re: Nodetool removenode stuck

2015-01-19 Thread Eric Stevens
I've seen removenode hang indefinitely also (per CASSANDRA-6542). Generally speaking, if a node is in good health and you want to take it out of the cluster for whatever reason (including the one you mentioned), nodetool decommission is a better choice. Removenode is for when a node is