Re: Replication factor, LOCAL_QUORUM write consistency and materialized views

2024-05-17 Thread Gábor Auth
Hi,

On Fri, May 17, 2024 at 6:18 PM Jon Haddad  wrote:

> I strongly suggest you don't use materialized views at all.  There are
> edge cases that in my opinion make them unsuitable for production, both in
> terms of cluster stability as well as data integrity.
>

Oh, there is already an open and fresh Jira ticket about it:
https://issues.apache.org/jira/browse/CASSANDRA-19383

Bye,
Gábor AUTH


Re: Replication factor, LOCAL_QUORUM write consistency and materialized views

2024-05-17 Thread Gábor Auth
Hi,

On Fri, May 17, 2024 at 6:18 PM Jon Haddad  wrote:

> I strongly suggest you don't use materialized views at all.  There are
> edge cases that in my opinion make them unsuitable for production, both in
> terms of cluster stability as well as data integrity.
>

I totally agree with you about it. But it looks like a strange and
interesting issue... the affected table has only ~1300 rows and less than
200 kB data. :)

Also, I found a same issue:
https://dba.stackexchange.com/questions/325140/single-node-failure-in-cassandra-4-0-7-causes-cluster-to-run-into-high-cpu

Bye,
Gábor AUTH


> On Fri, May 17, 2024 at 8:58 AM Gábor Auth  wrote:
>
>> Hi,
>>
>> I know, I know, the materialized view is experimental... :)
>>
>> So, I ran into a strange error. Among others, I have a very small 4-nodes
>> cluster, with very minimal data (~100 MB at all), the keyspace's
>> replication factor is 3, everything is works fine... except: if I restart a
>> node, I get a lot of errors with materialized views and consistency level
>> ONE, but only for those tables for which there is more than one
>> materialized view.
>>
>> Tables without materialized view don't have it, works fine.
>> Tables that have it, but only one materialized view, also works fine.
>> But, a table with more than one materialized view, whoops, the cluster
>> crashes temporarily, I can also see on the calling side (Java backend) that
>> no nodes are responding:
>>
>> Caused by: com.datastax.driver.core.exceptions.WriteFailureException:
>> Cassandra failure during write query at consistency LOCAL_QUORUM (2
>> responses were required but only 1 replica responded, 2 failed)
>>
>> I am surprised by this behavior, because there is so little data
>> involved, and it occurs when there is more than one materialized view only,
>> so it might be a concurrency issue under the hood.
>>
>> Have you seen an issue like this?
>>
>> Here is a stack trace on the Cassandra's side:
>>
>> [cassandra-dc03-1] ERROR [MutationStage-1] 2024-05-17 08:51:47,425
>> Keyspace.java:652 - Unknown exception caught while attempting to update
>> MaterializedView! pope.unit
>> [cassandra-dc03-1] org.apache.cassandra.exceptions.UnavailableException:
>> Cannot achieve consistency level ONE
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:37)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicas(ReplicaPlans.java:170)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicasForWrite(ReplicaPlans.java:113)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:354)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:345)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:339)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.service.StorageProxy.wrapViewBatchResponseHandler(StorageProxy.java:1312)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:1004)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.db.view.TableViews.pushViewReplicaUpdates(TableViews.java:167)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:647)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:477)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:210)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:58)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
>> [cassandra-dc03-1]  at
>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
>> Source)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
>> [cassandra-dc03-1]  at
>> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
>> [cassandra-dc03-1]  at
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>> [cassandra-dc03-1]  at java.base/java.lang.Thread.run(Unknown Source)
>>
>> --
>> Bye,
>> Gábor AUTH
>>
>


Re: Replication factor, LOCAL_QUORUM write consistency and materialized views

2024-05-17 Thread Jon Haddad
I strongly suggest you don't use materialized views at all.  There are edge
cases that in my opinion make them unsuitable for production, both in terms
of cluster stability as well as data integrity.

Jon

On Fri, May 17, 2024 at 8:58 AM Gábor Auth  wrote:

> Hi,
>
> I know, I know, the materialized view is experimental... :)
>
> So, I ran into a strange error. Among others, I have a very small 4-nodes
> cluster, with very minimal data (~100 MB at all), the keyspace's
> replication factor is 3, everything is works fine... except: if I restart a
> node, I get a lot of errors with materialized views and consistency level
> ONE, but only for those tables for which there is more than one
> materialized view.
>
> Tables without materialized view don't have it, works fine.
> Tables that have it, but only one materialized view, also works fine.
> But, a table with more than one materialized view, whoops, the cluster
> crashes temporarily, I can also see on the calling side (Java backend) that
> no nodes are responding:
>
> Caused by: com.datastax.driver.core.exceptions.WriteFailureException:
> Cassandra failure during write query at consistency LOCAL_QUORUM (2
> responses were required but only 1 replica responded, 2 failed)
>
> I am surprised by this behavior, because there is so little data involved,
> and it occurs when there is more than one materialized view only, so it
> might be a concurrency issue under the hood.
>
> Have you seen an issue like this?
>
> Here is a stack trace on the Cassandra's side:
>
> [cassandra-dc03-1] ERROR [MutationStage-1] 2024-05-17 08:51:47,425
> Keyspace.java:652 - Unknown exception caught while attempting to update
> MaterializedView! pope.unit
> [cassandra-dc03-1] org.apache.cassandra.exceptions.UnavailableException:
> Cannot achieve consistency level ONE
> [cassandra-dc03-1]  at
> org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:37)
> [cassandra-dc03-1]  at
> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicas(ReplicaPlans.java:170)
> [cassandra-dc03-1]  at
> org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicasForWrite(ReplicaPlans.java:113)
> [cassandra-dc03-1]  at
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:354)
> [cassandra-dc03-1]  at
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:345)
> [cassandra-dc03-1]  at
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:339)
> [cassandra-dc03-1]  at
> org.apache.cassandra.service.StorageProxy.wrapViewBatchResponseHandler(StorageProxy.java:1312)
> [cassandra-dc03-1]  at
> org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:1004)
> [cassandra-dc03-1]  at
> org.apache.cassandra.db.view.TableViews.pushViewReplicaUpdates(TableViews.java:167)
> [cassandra-dc03-1]  at
> org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:647)
> [cassandra-dc03-1]  at
> org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:477)
> [cassandra-dc03-1]  at
> org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:210)
> [cassandra-dc03-1]  at
> org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:58)
> [cassandra-dc03-1]  at
> org.apache.cassandra.net.InboundSink.lambda$new$0(InboundSink.java:78)
> [cassandra-dc03-1]  at
> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:97)
> [cassandra-dc03-1]  at
> org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
> [cassandra-dc03-1]  at
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:432)
> [cassandra-dc03-1]  at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown
> Source)
> [cassandra-dc03-1]  at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
> [cassandra-dc03-1]  at
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:137)
> [cassandra-dc03-1]  at
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
> [cassandra-dc03-1]  at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> [cassandra-dc03-1]  at java.base/java.lang.Thread.run(Unknown Source)
>
> --
> Bye,
> Gábor AUTH
>


Re: Replication Factor Change

2015-11-05 Thread Yulian Oifa
Hello
OK i got it , so i should set CL to ALL for reads, otherwise data may be
retrieved from nodes that does not have yet current record.
Thanks for help.
Yulian Oifa

On Thu, Nov 5, 2015 at 5:33 PM, Eric Stevens  wrote:

> If you switch reads to CL=LOCAL_ALL, you should be able to increase RF,
> then run repair, and after repair is complete, go back to your old
> consistency level.  However, while you're operating at ALL consistency, you
> have no tolerance for a node failure (but at RF=1 you already have no
> tolerance for a node failure, so that doesn't really change your
> availability model).
>
> On Thu, Nov 5, 2015 at 8:01 AM Yulian Oifa  wrote:
>
>> Hello to all.
>> I am planning to change replication factor from 1 to 3.
>> Will it cause data read errors in time of nodes repair?
>>
>> Best regards
>> Yulian Oifa
>>
>


Re: Replication Factor Change

2015-11-05 Thread Eric Stevens
If you switch reads to CL=LOCAL_ALL, you should be able to increase RF,
then run repair, and after repair is complete, go back to your old
consistency level.  However, while you're operating at ALL consistency, you
have no tolerance for a node failure (but at RF=1 you already have no
tolerance for a node failure, so that doesn't really change your
availability model).

On Thu, Nov 5, 2015 at 8:01 AM Yulian Oifa  wrote:

> Hello to all.
> I am planning to change replication factor from 1 to 3.
> Will it cause data read errors in time of nodes repair?
>
> Best regards
> Yulian Oifa
>


RE: Replication Factor Change

2015-11-05 Thread aeljami.ext
Hello,

If current CL = ONE, Be careful on production at the time of change replication 
factor, 3 nodes will be queried while data is being transformed ==> So data 
read errors!
De : Yulian Oifa [mailto:oifa.yul...@gmail.com]
Envoyé : jeudi 5 novembre 2015 16:02
À : user@cassandra.apache.org
Objet : Replication Factor Change

Hello to all.
I am planning to change replication factor from 1 to 3.
Will it cause data read errors in time of nodes repair?
Best regards
Yulian Oifa

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Re: Re : Replication factor for system_auth keyspace

2015-10-16 Thread Victor Chen
To elaborate on what Robert said, I think with most things technology
related, the answer with these sorts of questions (i.e. "ideal settings")
is usually "it depends." Remember that technology is a tool that we use to
accomplish something we want. It's just a mechanism that we as humans use
to exert our wishes on other things. In this case, cassandra allows us to
exert our wishes on the data we need to have available. So think for a
second about what you want? To be less philosophical and more practical,
how many nodes you are comfortable losing or likely to lose? How many
copies of your system_auth keyspace do you want to have always available?

Also, what do you mean by "really long?" What version of cassandra are you
using? If you are on 2.1, look at migrating to incremental repair. That it
takes so long for such a small keyspace leads me to believe you're using
sequential repair ...

-V

On Thu, Oct 15, 2015 at 7:46 PM, Robert Coli  wrote:

> On Thu, Oct 15, 2015 at 10:24 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>>   we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
>> For the system_auth keyspace, what should be the ideal replication_factor
>> set?
>>
>> We tried setting the replication factor equal to the number of nodes in a
>> datacenter, and the repair for the system_auth keyspace took really long.
>> Your suggestions would be of great help.
>>
>
> More than 1 and a lot less than 48.
>
> =Rob
>
>


Re: Re : Replication factor for system_auth keyspace

2015-10-16 Thread sai krishnam raju potturi
thanks guys for the advice. We were running parallel repairs earlier, with
cassandra version 2.0.14. As pointed out having set the replication factor
really huge for system_auth was causing the repair to take really long.

thanks
Sai

On Fri, Oct 16, 2015 at 9:56 AM, Victor Chen 
wrote:

> To elaborate on what Robert said, I think with most things technology
> related, the answer with these sorts of questions (i.e. "ideal settings")
> is usually "it depends." Remember that technology is a tool that we use to
> accomplish something we want. It's just a mechanism that we as humans use
> to exert our wishes on other things. In this case, cassandra allows us to
> exert our wishes on the data we need to have available. So think for a
> second about what you want? To be less philosophical and more practical,
> how many nodes you are comfortable losing or likely to lose? How many
> copies of your system_auth keyspace do you want to have always available?
>
> Also, what do you mean by "really long?" What version of cassandra are you
> using? If you are on 2.1, look at migrating to incremental repair. That it
> takes so long for such a small keyspace leads me to believe you're using
> sequential repair ...
>
> -V
>
> On Thu, Oct 15, 2015 at 7:46 PM, Robert Coli  wrote:
>
>> On Thu, Oct 15, 2015 at 10:24 AM, sai krishnam raju potturi <
>> pskraj...@gmail.com> wrote:
>>
>>>   we are deploying a new cluster with 2 datacenters, 48 nodes in each
>>> DC. For the system_auth keyspace, what should be the ideal
>>> replication_factor set?
>>>
>>> We tried setting the replication factor equal to the number of nodes in
>>> a datacenter, and the repair for the system_auth keyspace took really long.
>>> Your suggestions would be of great help.
>>>
>>
>> More than 1 and a lot less than 48.
>>
>> =Rob
>>
>>
>


Re: Re : Replication factor for system_auth keyspace

2015-10-15 Thread Robert Coli
On Thu, Oct 15, 2015 at 10:24 AM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

>   we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
> For the system_auth keyspace, what should be the ideal replication_factor
> set?
>
> We tried setting the replication factor equal to the number of nodes in a
> datacenter, and the repair for the system_auth keyspace took really long.
> Your suggestions would be of great help.
>

More than 1 and a lot less than 48.

=Rob


Re : Replication factor for system_auth keyspace

2015-10-15 Thread sai krishnam raju potturi
hi;
  we are deploying a new cluster with 2 datacenters, 48 nodes in each DC.
For the system_auth keyspace, what should be the ideal replication_factor
set?

We tried setting the replication factor equal to the number of nodes in a
datacenter, and the repair for the system_auth keyspace took really long.
Your suggestions would be of great help.

thanks
Sai


Re: Replication factor 2 with immutable data

2014-07-25 Thread Robert Coli
On Fri, Jul 25, 2014 at 10:46 AM, Jon Travis jtra...@p00p.org wrote:

 I have a couple questions regarding the availability of my data in a RF=2
 scenario.


You have just explained why consistency does not work with Replication
Factor of fewer than 3 and Consistency Level of less than QUORUM.

Basically, with RF=2, QUORUM is ALL, and you can't be available at ALL
because that's impossible.

=Rob


Re: Replication Factor question

2014-04-17 Thread Robert Coli
On Wed, Apr 16, 2014 at 1:47 AM, Markus Jais markus.j...@yahoo.de wrote:

 thanks. How many nodes to you have running in those 5 racks and RF 5? Only
 5 nodes or more?


While I haven't contemplated it too much, I'd think the absolute minimum
would be RF=N=5, sure. The real minimum with headroom would depend on
workload, but would probably be at least a few nodes greater than 5.

=Rob


Re: Replication Factor question

2014-04-16 Thread Markus Jais
Hi Rob,

thanks. How many nodes to you have running in those 5 racks and RF 5? Only 5 
nodes or more?

Markus

Robert Coli rc...@eventbrite.com schrieb am 20:36 Dienstag, 15.April 2014:

On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock ken.hanc...@schange.com wrote:

Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a 5-node 
cluster with RF=3, it would be impossible to lose 2 nodes without affecting 
quorum for at least some of your data. In a 6 node cluster, once you've lost 
one node, if you were to lose another, you only have a 1-in-5 chance of not 
affecting quorum for some of your data.



This is why the real highly available way to run Cassandra with QUORUM is 
RF=5, with 5 racks.


Briefly, any given node running a JVM based distributed application should be 
assumed to potentially become transiently unavailable for a short time, for 
example during long GC pauses or rolling restarts. There is also a chance of 
non-transient failure (hard down) at any time, and a much smaller chance of 
two simultaneous non-transient failures. If you have RF=3 and lose two nodes 
(one transient, the other non-transient) in a range, that range is now 
unavailable because quorum is 2 and 3-2 is 1, which is less than 2. If you 
have RF=5 and lose two nodes in the same way, quorum is 3 and 5-2 is 3, which 
is equal to 3.


AFAICT, no one actually runs Cassandra in this way because keeping 5 copies of 
your already denormalized data seems excessive and is difficult to justify to 
management.


=Rob



Re: Replication Factor question

2014-04-15 Thread Markus Jais
Hi all,

thanks for your answers. Very helpful. We plan to use enough nodes so that the 
failure of 1 or 2 machines is no problem. E.g. for a workload to can be handled 
by 3 nodes all the time, we would use at least 5, better 6 nodes to survive the 
failure of at least 2 nodes, even when the 2 nodes fail at the same time. This 
should allow the cluster to rebuild the missing nodes and still serve all 
requests with a RF=3 and Quorum reads.

All the best,

Markus





Tupshin Harper tups...@tupshin.com schrieb am 21:23 Montag, 14.April 2014:
 
tl;dr make sure you have enough capacity in the event of node failure. For 
light workloads, that can be fulfilled with nodes=rf. 
-Tupshin
On Apr 14, 2014 2:35 PM, Robert Coli rc...@eventbrite.com wrote:

On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

It is generally not recommended to set a replication factor of 3 if you have 
fewer than six nodes in a data center.


I have a detailed post about this somewhere in the archives of this list 
(which I can't seem to find right now..) but briefly, the 6-for-3 advice 
relates to the percentage of capacity you have remaining when you have a node 
down. It has become slightly less accurate over time because vnodes reduce 
bootstrap time and there have been other improvements to node startup time.


If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when 
you lose a single node, which is a significant percentage of total cluster 
capacity. You then lose another meaningful percentage of your capacity when 
your existing nodes participate in rebuilding the missing node. If you are 
then unlucky enough to lose another node, you are missing a very significant 
percentage of your cluster capacity and have to use a relatively small 
fraction of it to rebuild the now two down nodes.


I wouldn't generalize the rule of thumb as don't run under N=RF*2, but 
rather as probably don't run RF=3 under about 6 nodes. IOW, in my view, the 
most operationally sane initial number of nodes for RF=3 is likely closer to 
6 than 3.


=Rob





Re: Replication Factor question

2014-04-15 Thread Ken Hancock
Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
5-node cluster with RF=3, it would be impossible to lose 2 nodes without
affecting quorum for at least some of your data. In a 6 node cluster, once
you've lost one node, if you were to lose another, you only have a 1-in-5
chance of not affecting quorum for some of your data.

In much larger clusters, it becomes less probable that you will lose
multiple nodes within a RF group.





On Tue, Apr 15, 2014 at 4:37 AM, Markus Jais markus.j...@yahoo.de wrote:

 Hi all,

 thanks for your answers. Very helpful. We plan to use enough nodes so that
 the failure of 1 or 2 machines is no problem. E.g. for a workload to can be
 handled by 3 nodes all the time, we would use at least 5, better 6 nodes to
 survive the failure of at least 2 nodes, even when the 2 nodes fail at the
 same time. This should allow the cluster to rebuild the missing nodes and
 still serve all requests with a RF=3 and Quorum reads.

 All the best,

 Markus





   Tupshin Harper tups...@tupshin.com schrieb am 21:23 Montag, 14.April
 2014:

 tl;dr make sure you have enough capacity in the event of node failure. For
 light workloads, that can be fulfilled with nodes=rf.
 -Tupshin
 On Apr 14, 2014 2:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.


 I have a detailed post about this somewhere in the archives of this list
 (which I can't seem to find right now..) but briefly, the 6-for-3 advice
 relates to the percentage of capacity you have remaining when you have a
 node down. It has become slightly less accurate over time because vnodes
 reduce bootstrap time and there have been other improvements to node
 startup time.

 If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when
 you lose a single node, which is a significant percentage of total cluster
 capacity. You then lose another meaningful percentage of your capacity when
 your existing nodes participate in rebuilding the missing node. If you are
 then unlucky enough to lose another node, you are missing a very
 significant percentage of your cluster capacity and have to use a
 relatively small fraction of it to rebuild the now two down nodes.

 I wouldn't generalize the rule of thumb as don't run under N=RF*2, but
 rather as probably don't run RF=3 under about 6 nodes. IOW, in my view,
 the most operationally sane initial number of nodes for RF=3 is likely
 closer to 6 than 3.

 =Rob






Re: Replication Factor question

2014-04-15 Thread Markus Jais
Hi Ken,

thanks. Good point. 

Markus
Ken Hancock ken.hanc...@schange.com schrieb am 15:15 Dienstag, 15.April 2014:
 
Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a 5-node 
cluster with RF=3, it would be impossible to lose 2 nodes without affecting 
quorum for at least some of your data. In a 6 node cluster, once you've lost 
one node, if you were to lose another, you only have a 1-in-5 chance of not 
affecting quorum for some of your data.

In much larger clusters, it becomes less probable that you will lose multiple 
nodes within a RF group.








On Tue, Apr 15, 2014 at 4:37 AM, Markus Jais markus.j...@yahoo.de wrote:

Hi all,


thanks for your answers. Very helpful. We plan to use enough nodes so that 
the failure of 1 or 2 machines is no problem. E.g. for a workload to can be 
handled by 3 nodes all the time, we would use at least 5, better 6 nodes to 
survive the failure of at least 2 nodes, even when the 2 nodes fail at the 
same time. This should allow the cluster to rebuild the missing nodes and 
still serve all requests with a RF=3 and Quorum reads.


All the best,


Markus










Tupshin Harper tups...@tupshin.com schrieb am 21:23 Montag, 14.April 2014:
 
tl;dr make sure you have enough capacity in the event of node failure. For 
light workloads, that can be fulfilled with nodes=rf. 
-Tupshin
On Apr 14, 2014 2:35 PM, Robert Coli rc...@eventbrite.com wrote:

On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

It is generally not recommended to set a replication factor of 3 if you 
have fewer than six nodes in a data center.


I have a detailed post about this somewhere in the archives of this list 
(which I can't seem to find right now..) but briefly, the 6-for-3 advice 
relates to the percentage of capacity you have remaining when you have a 
node down. It has become slightly less accurate over time because vnodes 
reduce bootstrap time and there have been other improvements to node 
startup time.


If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when 
you lose a single node, which is a significant percentage of total cluster 
capacity. You then lose another meaningful percentage of your capacity when 
your existing nodes participate in rebuilding the missing node. If you are 
then unlucky enough to lose another node, you are missing a very 
significant percentage of your cluster capacity and have to use a 
relatively small fraction of it to rebuild the now two down nodes.


I wouldn't generalize the rule of thumb as don't run under N=RF*2, but 
rather as probably don't run RF=3 under about 6 nodes. IOW, in my view, 
the most operationally sane initial number of nodes for RF=3 is likely 
closer to 6 than 3.


=Rob







 
 

 

 



Re: Replication Factor question

2014-04-15 Thread Robert Coli
On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock ken.hanc...@schange.comwrote:

 Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
 5-node cluster with RF=3, it would be impossible to lose 2 nodes without
 affecting quorum for at least some of your data. In a 6 node cluster, once
 you've lost one node, if you were to lose another, you only have a 1-in-5
 chance of not affecting quorum for some of your data.


This is why the real highly available way to run Cassandra with QUORUM is
RF=5, with 5 racks.

Briefly, any given node running a JVM based distributed application should
be assumed to potentially become transiently unavailable for a short time,
for example during long GC pauses or rolling restarts. There is also a
chance of non-transient failure (hard down) at any time, and a much smaller
chance of two simultaneous non-transient failures. If you have RF=3 and
lose two nodes (one transient, the other non-transient) in a range, that
range is now unavailable because quorum is 2 and 3-2 is 1, which is less
than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3
and 5-2 is 3, which is equal to 3.

AFAICT, no one actually runs Cassandra in this way because keeping 5 copies
of your already denormalized data seems excessive and is difficult to
justify to management.

=Rob


Re: Replication Factor question

2014-04-15 Thread Tupshin Harper
It is not common,  but I know of multiple organizations running with RF=5,
in at least one DC, for HA reasons.

-Tupshin
On Apr 15, 2014 2:36 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock ken.hanc...@schange.comwrote:

 Keep in mind if you lose the wrong two, you can't satisfy quorum.  In a
 5-node cluster with RF=3, it would be impossible to lose 2 nodes without
 affecting quorum for at least some of your data. In a 6 node cluster, once
 you've lost one node, if you were to lose another, you only have a 1-in-5
 chance of not affecting quorum for some of your data.


 This is why the real highly available way to run Cassandra with QUORUM is
 RF=5, with 5 racks.

 Briefly, any given node running a JVM based distributed application should
 be assumed to potentially become transiently unavailable for a short time,
 for example during long GC pauses or rolling restarts. There is also a
 chance of non-transient failure (hard down) at any time, and a much smaller
 chance of two simultaneous non-transient failures. If you have RF=3 and
 lose two nodes (one transient, the other non-transient) in a range, that
 range is now unavailable because quorum is 2 and 3-2 is 1, which is less
 than 2. If you have RF=5 and lose two nodes in the same way, quorum is 3
 and 5-2 is 3, which is equal to 3.

 AFAICT, no one actually runs Cassandra in this way because keeping 5
 copies of your already denormalized data seems excessive and is difficult
 to justify to management.

 =Rob




Re: Replication Factor question

2014-04-14 Thread Sergey Murylev
Hi Markus,
 It is generally not recommended to set a replication factor of 3 if
 you have fewer than six nodes in a data center.
Actually you can create a cluster with 3 nodes and replication level 3.
But in this case if one of them would fail cluster become inconsistent.
In this way minimum reasonable nodes number is 4 for replication level 3.
In this case we can tolerate single node failure. But in this situation
each node would contain 3/4 of all data. This is not very good. Number 6
is recommended because in this case each node contain 1/2 of all data,
this is quite adequate overhead.

Typically Cassandra clusters don't have big replication level, typically
it is 3 (failure of any single node don't crush cluster) or 5 (failure
of any two nodes don't crush cluster).

For more details you should look to replication level calculator
http://www.ecyrd.com/cassandracalculator/.

--
Thanks,
Sergey

On 14/04/14 13:25, Markus Jais wrote:
 Hello,

 currently reading the Practical Cassandra. In the section about
 replication factors the book says:

 It is generally not recommended to set a replication factor of 3 if
 you have fewer than six nodes in a data center.

 Why is that? What problems would arise if I had a replication factor
 of 3 and only 5 nodes?

 Does that mean that for a replication of 4 I would need at least 8
 nodes and for a factor of 5 at least 10 nodes?

 Not saying that I would factor 5 andn 10 nodes, just curious about how
 this works.

 All the best,

 Markus



signature.asc
Description: OpenPGP digital signature


Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
I do not agree with this advice.  It can be perfectly reasonable to have
#nodes  2*RF.

It is common to deploy a 3 node cluster with RF=3 and it works fine as long
as each node can handle 100% of your data, and keep up with the workload.

-Tupshin
On Apr 14, 2014 5:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 Hello,

 currently reading the Practical Cassandra. In the section about
 replication factors the book says:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.

 Why is that? What problems would arise if I had a replication factor of 3
 and only 5 nodes?

 Does that mean that for a replication of 4 I would need at least 8 nodes
 and for a factor of 5 at least 10 nodes?

 Not saying that I would factor 5 andn 10 nodes, just curious about how
 this works.

 All the best,

 Markus



Re: Replication Factor question

2014-04-14 Thread Markus Jais
Hi all,

thanks. Very helpful.

@Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node fails 
(due to hardware problems, for example). According to the C* docs, writes fail 
if the number of nodes is smaller than the RF.
I agree that it will run fine as long as all nodes are up and they can handle 
the load but eventually hardware will fail.

Markus





Tupshin Harper tups...@tupshin.com schrieb am 13:44 Montag, 14.April 2014:
 
I do not agree with this advice.  It can be perfectly reasonable to have #nodes 
 2*RF. 
It is common to deploy a 3 node cluster with RF=3 and it works fine as long as 
each node can handle 100% of your data, and keep up with the workload. 
-Tupshin 
On Apr 14, 2014 5:25 AM, Markus Jais markus.j...@yahoo.de wrote:

Hello,


currently reading the Practical Cassandra. In the section about replication 
factors the book says:


It is generally not recommended to set a replication factor of 3 if you have 
fewer than six nodes in a data center.


Why is that? What problems would arise if I had a replication factor of 3 and 
only 5 nodes?


Does that mean that for a replication of 4 I would need at least 8 nodes and 
for a factor of 5 at least 10 nodes?


Not saying that I would factor 5 andn 10 nodes, just curious about how this 
works.


All the best,


Markus



Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
With 3 nodes, and RF=3, you can always use CL=ALL if all nodes are up,
QUORUM if 1 node is down, and ONE if any two nodes are down.

The exact same thing is true if you have more nodes.

-Tupshin
On Apr 14, 2014 7:51 AM, Markus Jais markus.j...@yahoo.de wrote:

 Hi all,

 thanks. Very helpful.

 @Tupshin: With a 3 node cluster and RF 3 isn't it a problem if one node
 fails (due to hardware problems, for example). According to the C* docs,
 writes fail if the number of nodes is smaller than the RF.
 I agree that it will run fine as long as all nodes are up and they can
 handle the load but eventually hardware will fail.

 Markus





   Tupshin Harper tups...@tupshin.com schrieb am 13:44 Montag, 14.April
 2014:

 I do not agree with this advice.  It can be perfectly reasonable to have
 #nodes  2*RF.
 It is common to deploy a 3 node cluster with RF=3 and it works fine as
 long as each node can handle 100% of your data, and keep up with the
 workload.
 -Tupshin
 On Apr 14, 2014 5:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 Hello,

 currently reading the Practical Cassandra. In the section about
 replication factors the book says:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.

 Why is that? What problems would arise if I had a replication factor of 3
 and only 5 nodes?

 Does that mean that for a replication of 4 I would need at least 8 nodes
 and for a factor of 5 at least 10 nodes?

 Not saying that I would factor 5 andn 10 nodes, just curious about how
 this works.

 All the best,

 Markus






Re: Replication Factor question

2014-04-14 Thread Robert Coli
On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.


I have a detailed post about this somewhere in the archives of this list
(which I can't seem to find right now..) but briefly, the 6-for-3 advice
relates to the percentage of capacity you have remaining when you have a
node down. It has become slightly less accurate over time because vnodes
reduce bootstrap time and there have been other improvements to node
startup time.

If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when
you lose a single node, which is a significant percentage of total cluster
capacity. You then lose another meaningful percentage of your capacity when
your existing nodes participate in rebuilding the missing node. If you are
then unlucky enough to lose another node, you are missing a very
significant percentage of your cluster capacity and have to use a
relatively small fraction of it to rebuild the now two down nodes.

I wouldn't generalize the rule of thumb as don't run under N=RF*2, but
rather as probably don't run RF=3 under about 6 nodes. IOW, in my view,
the most operationally sane initial number of nodes for RF=3 is likely
closer to 6 than 3.

=Rob


Re: Replication Factor question

2014-04-14 Thread Tupshin Harper
tl;dr make sure you have enough capacity in the event of node failure. For
light workloads, that can be fulfilled with nodes=rf.

-Tupshin
On Apr 14, 2014 2:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais markus.j...@yahoo.de wrote:

 It is generally not recommended to set a replication factor of 3 if you
 have fewer than six nodes in a data center.


 I have a detailed post about this somewhere in the archives of this list
 (which I can't seem to find right now..) but briefly, the 6-for-3 advice
 relates to the percentage of capacity you have remaining when you have a
 node down. It has become slightly less accurate over time because vnodes
 reduce bootstrap time and there have been other improvements to node
 startup time.

 If you have fewer than 6 nodes with RF=3, you lose 1/6th of capacity when
 you lose a single node, which is a significant percentage of total cluster
 capacity. You then lose another meaningful percentage of your capacity when
 your existing nodes participate in rebuilding the missing node. If you are
 then unlucky enough to lose another node, you are missing a very
 significant percentage of your cluster capacity and have to use a
 relatively small fraction of it to rebuild the now two down nodes.

 I wouldn't generalize the rule of thumb as don't run under N=RF*2, but
 rather as probably don't run RF=3 under about 6 nodes. IOW, in my view,
 the most operationally sane initial number of nodes for RF=3 is likely
 closer to 6 than 3.

 =Rob




Re: replication factor is zero

2013-06-06 Thread Tyler Hobbs
On Thu, Jun 6, 2013 at 1:28 PM, Daning Wang dan...@netseer.com wrote:

 could we set replication factor to 0 on other data center? what is the
 best to way for not syncing some data in a cluster?


Yes, you can set it to 0, and that's the recommended way to handle this.


-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: replication factor is zero

2013-06-06 Thread Alain RODRIGUEZ
But afaik you can set the RF only per Keyspace. So you will have to pull
those tables apart, in a different Keyspace.


2013/6/6 Tyler Hobbs ty...@datastax.com


 On Thu, Jun 6, 2013 at 1:28 PM, Daning Wang dan...@netseer.com wrote:

 could we set replication factor to 0 on other data center? what is the
 best to way for not syncing some data in a cluster?


 Yes, you can set it to 0, and that's the recommended way to handle this.


 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: Replication Factor and Consistency Level Confusion

2012-12-20 Thread Vasileios Vlachos
Hello,

Thank you very much for your quick responses.

Initially we were thinking the same thing, that an explanation would
be that the wrong node could be down, but then isn't this something
that hinted handoff sorts out? So actually, Consistency Level refers
to the number of replicas, not the total number of nodes in a cluster.
Keeping that in mind and assuming that hinted handoff has nothing to
do with that as I thought, I could explain some results but not all.
Let me explain:

Test 1 (3/3 Nodes UP):
CL  :ANY ONETWOTHREEQUORUM   ALL
RF 3:OK  OK OK OK   OK   OK

Test 2 (2/3 Nodes UP):
CL  :ANYONETWOTHREEQUORUMALL
RF 2:OK OK x  xOKx

Test 3 (2/3 Nodes UP):
CL  :ANYONETWOTHREEQUORUMALL
RF 3:OK OK x  xOKOK

Test 1:
Everything was fine because all nodes were up and the RF does not
exceed the total number of nodes, in which case writes would be
blocked.

Test 2:
CL=TWO did not work because we were unlucky and the wrong node,
responsible for the key range we were trying to insert, was DOWN (I
can accept that for now, however I do not quite understand why isn't
this sorted by the hinted handoff). My explanation might be wrong
again, but CL=THREE should fail because we only have set RF=2, so
there isn't a 3rd replica anywhere anyway. Why did CL=QUORUM not fail
then? Since QUORUM=(RF/2)+1=2 in this case, the write operation should
try to write in 2 replicas, one of which, the one responsible for that
range as we said, is DOWN. I should expect CL=2 and CL=QUORUM to have
the same outcome in this case. Why that's not the case? CL=ALL fails
for the same reason as CL=TWO I presume.

Test 3:
I was expecting only CL=ANY and CL=ONE to work in this case. CL=TWO
does not work because , just like with Test 2, the same situation
applies with the node responsible for that particular key range to be
DOWN. If that's the case, why CL=QUORUM was successful??? The only
explanation I can thing of at the moment is that QUORUM explicitly
refers to the total number of nodes in the cluster rather than the
number of replicas determined by the RF. CL=THREE seems easy, it fails
because one of the three replicas is DOWN. CL=ALL is confusing as
well. If my understanding is correct and ALL means all replicas, 3 in
this case, then the operation should fail because one replica is DOWN
and I can not be lucky to have the right node DOWN, because RF=3.
So, every node should have a copy of the data.

Furthermore, with regards to being unlucky with the wrong node if
this actually what is happening, how is it possible to ever have a
node-failure resiliant cassandra cluster? My understanding of this
implies that even with 100 nodes, every 1/100 writes would fail until
the node is replaced/repaired.

Thank you very much in advance.

Vasilis

On Wed, Dec 19, 2012 at 4:18 PM, Roland Gude roland.g...@ez.no wrote:

 Hi

 RF 2 means that 2 nodes are responsible for any given row (no matter how
 many nodes are in the cluster)
 For your cluster with three nodes let's just assume the following
 responsibilities

 NodeA   B   C
 Primary keys0-5 6-1011-15
 Replica keys11-15   0-5 6-10

 Assume node 'C' is down
 Writing any key in range 0-5 with consistency TWO is possible (A and B are
 up)
 Writing any key in range 11-15 with consistency TWO will fail (C is down
 and 11-15 is its primary range)
 Writing any key in range 6-10 with consistency TWO will fail (C is down
 and it is the replica for this range)

 I hope this explains it.

 -Ursprüngliche Nachricht-
 Von: Vasileios Vlachos [mailto:vasileiosvlac...@gmail.com]
 Gesendet: Mittwoch, 19. Dezember 2012 17:07
 An: user@cassandra.apache.org
 Betreff: Replication Factor and Consistency Level Confusion

 Hello All,

 We have a 3-node cluster and we created a keyspace (say Test_1) with
 Replication Factor set to 3. I know is not great but we wanted to test
 different behaviors. So, we created a Column Family (say cf_1) and we tried
 writing something with Consistency Level ANY, ONE, TWO, THREE, QUORUM and
 ALL. We did that while all nodes were in UP state, so we had no problems at
 all. No matter what the Consistency Level was, we were able to insert a
 value.

 Same cluster, different keyspace (say Test_2) with Replication Factor set
 to 2 this time and one of the 3 nodes deliberately DOWN. Again, we created a
 Column Family (say cf_1) and we tried writing something with different
 Consistency Levels. Here is what we got:
 ANY: worked (expected...)
 ONE: worked (expected...)
 TWO: did not work (WHT???)
 THREE: did not work (expected...)
 QUORUM: worked (expected...)
 ALL: did not work (expected I guess...)

 Now, we know that QUORUM derives from (RF/2)+1, so we were expecting that
 to work, after all only 1 node was DOWN. Why did Consistency Level TWO not
 

Re: Replication Factor and Consistency Level Confusion

2012-12-20 Thread Henrik Schröder
Don't run with a replication factor of 2, use 3 instead, and do all reads
and writes using quorum consistency.

That way, if a single node is down, all your operations will complete. In
fact, if every third node is down, you'll still be fine and able to handle
all requests.

However, if two adjacent nodes are down at the same time, operations
against keys that are stored on both those servers will fail beause quorum
can't be satisfied.

To gain a better understanding, repeat your tests, but with multiple random
keys, and keep track of how many operations fail in each case.


/Henrik

On Thu, Dec 20, 2012 at 10:26 AM, Vasileios Vlachos 
vasileiosvlac...@gmail.com wrote:

 Furthermore, with regards to being unlucky with the wrong node if
 this actually what is happening, how is it possible to ever have a
 node-failure resiliant cassandra cluster? My understanding of this
 implies that even with 100 nodes, every 1/100 writes would fail until
 the node is replaced/repaired.



Re: Replication Factor and Consistency Level Confusion

2012-12-20 Thread Tristan Seligmann
On Thu, Dec 20, 2012 at 11:26 AM, Vasileios Vlachos
vasileiosvlac...@gmail.com wrote:
 Initially we were thinking the same thing, that an explanation would
 be that the wrong node could be down, but then isn't this something
 that hinted handoff sorts out?

If a node is partitioned from the rest of the cluster (ie. the node
goes down, but later comes back with the same data it had), it will
obviously be out of data with regard to any writes that happened while
it was down. Anti-entropy (nodetool repair) and read repair will
repair this inconsistency over time, but not right away; hinted
handoff is an optimization that will allow the node to become mostly
consistent right away on rejoining the cluster, as the nodes will have
stored hints for it while it was down, and will send it them once the
node is back up.

However, the important thing to note is that this is an
/optimization/. If a replica is down, then it will not be able to
satisfy any consistency level requirements, except for the special
case of CL=ANY. If you use another CL like TWO, then two actual
replica nodes must be up for the ranges you are writing to, a node
that is not a replica but will write a hint does not count.

 Test 2 (2/3 Nodes UP):
 CL  :ANYONETWOTHREEQUORUMALL
 RF 2:OK OK x  xOKx

For this test, QUORUM = RF/2+1 = 2/2+1 = 2. A write at QUORUM should
have succeded if both of the replicas for the range were up, but if
one of the replicas for the range was the downed node, then it would
have failed. I think you can use the 'nodetool getendpoints' command
to list the nodes that are replicas for the given row key.

I am unable to explain how a write at QUORUM could succeed if a write
at TWO for the same key failed.

 Test 3 (2/3 Nodes UP):
 CL  :ANYONETWOTHREEQUORUMALL
 RF 3:OK OK x  xOKOK

For this test, QUORUM = RF/2+1 = 3/2+1 = 2. Again, I am unable to
explain why a write at QUORUM would succeed if a write at TWO failed,
and I am also unable to explain how a write at ALL could succeed, for
any key, if one of the nodes is down.

I would suggest double-checking your test setup; also, make sure you
use the same row keys every time (if this is not already the case) so
that you have repeatable results.

 Furthermore, with regards to being unlucky with the wrong node if
 this actually what is happening, how is it possible to ever have a
 node-failure resiliant cassandra cluster? My understanding of this
 implies that even with 100 nodes, every 1/100 writes would fail until
 the node is replaced/repaired.

RF is the important number when considering fault-tolerance in your
cluster, not the number of nodes. If RF=3, and you read and write at
quorum, then you can tolerate one node being down in the range you are
operating on. If you need to be able to tolerate two nodes being down,
RF=5 and QUORUM would work. In other words, if you need better fault
tolerance, RF is what you need to increase; if you need better
performance, or you need to store more data, then N (number of nodes
in cluster) is what you need to increase. Of course, N must be at
least as big as RF...
--
mithrandi, i Ainil en-Balandor, a faer Ambar


Re: Replication Factor and Consistency Level Confusion

2012-12-20 Thread aaron morton
 this actually what is happening, how is it possible to ever have a
 node-failure resiliant cassandra cluster? 
Background http://thelastpickle.com/2011/06/13/Down-For-Me/

 I would suggest double-checking your test setup; also, make sure you
 use the same row keys every time (if this is not already the case) so
 that you have repeatable results.
Take a look at the nodetool getendpoints command. It will tell you which nodes 
a key is stored on. Though for RF 3 and N3 it's all of them :)

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/12/2012, at 12:54 AM, Tristan Seligmann mithra...@mithrandi.net wrote:

 On Thu, Dec 20, 2012 at 11:26 AM, Vasileios Vlachos
 vasileiosvlac...@gmail.com wrote:
 Initially we were thinking the same thing, that an explanation would
 be that the wrong node could be down, but then isn't this something
 that hinted handoff sorts out?
 
 If a node is partitioned from the rest of the cluster (ie. the node
 goes down, but later comes back with the same data it had), it will
 obviously be out of data with regard to any writes that happened while
 it was down. Anti-entropy (nodetool repair) and read repair will
 repair this inconsistency over time, but not right away; hinted
 handoff is an optimization that will allow the node to become mostly
 consistent right away on rejoining the cluster, as the nodes will have
 stored hints for it while it was down, and will send it them once the
 node is back up.
 
 However, the important thing to note is that this is an
 /optimization/. If a replica is down, then it will not be able to
 satisfy any consistency level requirements, except for the special
 case of CL=ANY. If you use another CL like TWO, then two actual
 replica nodes must be up for the ranges you are writing to, a node
 that is not a replica but will write a hint does not count.
 
 Test 2 (2/3 Nodes UP):
 CL  :ANYONETWOTHREEQUORUMALL
 RF 2:OK OK x  xOKx
 
 For this test, QUORUM = RF/2+1 = 2/2+1 = 2. A write at QUORUM should
 have succeded if both of the replicas for the range were up, but if
 one of the replicas for the range was the downed node, then it would
 have failed. I think you can use the 'nodetool getendpoints' command
 to list the nodes that are replicas for the given row key.
 
 I am unable to explain how a write at QUORUM could succeed if a write
 at TWO for the same key failed.
 
 Test 3 (2/3 Nodes UP):
 CL  :ANYONETWOTHREEQUORUMALL
 RF 3:OK OK x  xOKOK
 
 For this test, QUORUM = RF/2+1 = 3/2+1 = 2. Again, I am unable to
 explain why a write at QUORUM would succeed if a write at TWO failed,
 and I am also unable to explain how a write at ALL could succeed, for
 any key, if one of the nodes is down.
 
 I would suggest double-checking your test setup; also, make sure you
 use the same row keys every time (if this is not already the case) so
 that you have repeatable results.
 
 Furthermore, with regards to being unlucky with the wrong node if
 this actually what is happening, how is it possible to ever have a
 node-failure resiliant cassandra cluster? My understanding of this
 implies that even with 100 nodes, every 1/100 writes would fail until
 the node is replaced/repaired.
 
 RF is the important number when considering fault-tolerance in your
 cluster, not the number of nodes. If RF=3, and you read and write at
 quorum, then you can tolerate one node being down in the range you are
 operating on. If you need to be able to tolerate two nodes being down,
 RF=5 and QUORUM would work. In other words, if you need better fault
 tolerance, RF is what you need to increase; if you need better
 performance, or you need to store more data, then N (number of nodes
 in cluster) is what you need to increase. Of course, N must be at
 least as big as RF...
 --
 mithrandi, i Ainil en-Balandor, a faer Ambar



Re: Replication Factor and Consistency Level Confusion

2012-12-19 Thread Hiller, Dean
ANY: worked (expected...)
ONE: worked (expected...)
TWO: did not work (WHT???)

This is expected sometimes and sometimes not.  It depends on the 2 of the
3 nodes that have the data.  Since you have one node down, that might be
the one where that data goes ;).

THREE: did not work (expected...)
QUORUM: worked (expected...)
ALL: did not work (expected I guess...)



On 12/19/12 9:07 AM, Vasileios Vlachos vasileiosvlac...@gmail.com
wrote:

ANY: worked (expected...)
ONE: worked (expected...)
TWO: did not work (WHT???)
THREE: did not work (expected...)
QUORUM: worked (expected...)
ALL: did not work (expected I guess...)



Re: Replication Factor and Consistency Level Confusion

2012-12-19 Thread Hiller, Dean
Ps, you may be getting a bit confused by the way.  Just think if you have
a 10 node cluster and one node is down  and you do CL=2Š..if the node that
is down is where your data goes, yes, you will fail.  If you do CL=quorum
and RF=3 you can tolerate one node being downŠIf you use astyanax, I think
they have an implementation that will switch the CL level to lower when a
node is out so the writes will still work which is quite nice.  (ie. Write
fails, switch to CL lower and write again or same with the read).

Dean

On 12/19/12 9:07 AM, Vasileios Vlachos vasileiosvlac...@gmail.com
wrote:

Hello All,

We have a 3-node cluster and we created a keyspace (say Test_1) with
Replication Factor set to 3. I know is not great but we wanted to test
different behaviors. So, we created a Column Family (say cf_1) and we
tried writing something with Consistency Level ANY, ONE, TWO, THREE,
QUORUM and ALL. We did that while all nodes were in UP state, so we
had no problems at all. No matter what the Consistency Level was, we
were able to insert a value.

Same cluster, different keyspace (say Test_2) with Replication Factor
set to 2 this time and one of the 3 nodes deliberately DOWN. Again, we
created a Column Family (say cf_1) and we tried writing something with
different Consistency Levels. Here is what we got:
ANY: worked (expected...)
ONE: worked (expected...)
TWO: did not work (WHT???)
THREE: did not work (expected...)
QUORUM: worked (expected...)
ALL: did not work (expected I guess...)

Now, we know that QUORUM derives from (RF/2)+1, so we were expecting
that to work, after all only 1 node was DOWN. Why did Consistency
Level TWO not work then???

Third test... Same cluster again, different keyspace (say Test_3) with
Replication Factor set to 3 this time and 1 of the 3 nodes
deliberately DOWN again. Same approach again, created different Column
Family (say cf_1) and different Consistency Level settings resulted in
the following:
ANY: worked (what???)
ONE: worked (what???)
TWO: did not work (what???)
THREE: did not work (expected...)
QUORUM: worked (what???)
ALL: worked (what???)

We thought that if the Replication Factor is greater than the number
of nodes in the cluster, writes are blocked.

Apparently we are completely missing the a level of understanding
here, so we would appreciate any help!

Thank you in advance!

Vasilis



Re: Replication factor and performance questions

2012-11-10 Thread B. Todd Burruss
@oleg, to answer your last question a cassandra node should never ask
another node for information it doesn't have.  it uses the key and the
partitioner to determine where the data is located before ever
contacting another node.

On Mon, Nov 5, 2012 at 9:45 AM, Andrey Ilinykh ailin...@gmail.com wrote:
 You will have one extra hop. Not big deal, actually. And many client
 libraries (astyanax for example) are token aware, so they are smart
 enough to call the right node.

 On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 Should be all under 400Gig on each.

 My question is -- is there additional overhead with replicas making requests
 to one another for keys they don't have ? how much of an overhead is that ?


 On 2012-11-05 17:00:37 +, Michael Kjellman said:

 Rule of thumb is to try to keep nodes under 400GB.
 Compactions/Repairs/Move operations etc become a nightmare otherwise. How
 much data do you expect to have on each node? Also depends on caches,
 bloom filters etc

 On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 I have 4 nodes at my disposal.

 I can configure them like this:

 1) RF=1, each node has 25% of the data. On random-reads, how big is the
 performance penalty if a node needs to look for data on another replica
 ?

 2) RF=2, each node has 50% of the data. Same question ?



 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/




 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook





 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/




Re: Replication factor and performance questions

2012-11-05 Thread Michael Kjellman
Rule of thumb is to try to keep nodes under 400GB.
Compactions/Repairs/Move operations etc become a nightmare otherwise. How
much data do you expect to have on each node? Also depends on caches,
bloom filters etc

On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:

I have 4 nodes at my disposal.

I can configure them like this:

1) RF=1, each node has 25% of the data. On random-reads, how big is the
performance penalty if a node needs to look for data on another replica
?

2) RF=2, each node has 50% of the data. Same question ?



-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: Replication factor and performance questions

2012-11-05 Thread Bryan
Our compactions/repairs have already become nightmares and we have not 
approached the levels of data you describe here (~200 GB). Have any 
pointers/case studies for optimizing this?


On Nov 5, 2012, at 12:00 PM, Michael Kjellman wrote:

 Rule of thumb is to try to keep nodes under 400GB.
 Compactions/Repairs/Move operations etc become a nightmare otherwise. How
 much data do you expect to have on each node? Also depends on caches,
 bloom filters etc
 
 On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 I have 4 nodes at my disposal.
 
 I can configure them like this:
 
 1) RF=1, each node has 25% of the data. On random-reads, how big is the
 performance penalty if a node needs to look for data on another replica
 ?
 
 2) RF=2, each node has 50% of the data. Same question ?
 
 
 
 -- 
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/
 
 
 
 
 'Like' us on Facebook for exclusive content and other resources on all 
 Barracuda Networks solutions.
 Visit http://barracudanetworks.com/facebook
 
 



Re: Replication factor and performance questions

2012-11-05 Thread Oleg Dulin

Should be all under 400Gig on each.

My question is -- is there additional overhead with replicas making 
requests to one another for keys they don't have ? how much of an 
overhead is that ?


On 2012-11-05 17:00:37 +, Michael Kjellman said:


Rule of thumb is to try to keep nodes under 400GB.
Compactions/Repairs/Move operations etc become a nightmare otherwise. How
much data do you expect to have on each node? Also depends on caches,
bloom filters etc

On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:


I have 4 nodes at my disposal.

I can configure them like this:

1) RF=1, each node has 25% of the data. On random-reads, how big is the
performance penalty if a node needs to look for data on another replica
?

2) RF=2, each node has 50% of the data. Same question ?



--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/





'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.


Visit http://barracudanetworks.com/facebook






--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Re: Replication factor and performance questions

2012-11-05 Thread Andrey Ilinykh
You will have one extra hop. Not big deal, actually. And many client
libraries (astyanax for example) are token aware, so they are smart
enough to call the right node.

On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 Should be all under 400Gig on each.

 My question is -- is there additional overhead with replicas making requests
 to one another for keys they don't have ? how much of an overhead is that ?


 On 2012-11-05 17:00:37 +, Michael Kjellman said:

 Rule of thumb is to try to keep nodes under 400GB.
 Compactions/Repairs/Move operations etc become a nightmare otherwise. How
 much data do you expect to have on each node? Also depends on caches,
 bloom filters etc

 On 11/5/12 8:57 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 I have 4 nodes at my disposal.

 I can configure them like this:

 1) RF=1, each node has 25% of the data. On random-reads, how big is the
 performance penalty if a node needs to look for data on another replica
 ?

 2) RF=2, each node has 50% of the data. Same question ?



 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/




 'Like' us on Facebook for exclusive content and other resources on all
 Barracuda Networks solutions.

 Visit http://barracudanetworks.com/facebook





 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/




Re: Replication factor 2, consistency and failover

2012-09-12 Thread Sergey Tryuber
Aaron, thank you! Your message was exactly what we wanted to see: that we
didn't miss something critical. We'll share our Astyanax patch in the
future.

On 10 September 2012 03:44, aaron morton aa...@thelastpickle.com wrote:

 In general we want to achieve strong consistency.

 You need to have R + W  N

 LOCAL_QUORUM and reads with ONE.

 Gives you 2  + 1  2 when you use it. When you drop back to ONE / ONE you
 no longer have strong consistency.

 may be advise on how to improve it.

 Sounds like you know how to improve it :)

 Things you could play with:

 * hinted_handoff_throttle_delay_in_ms in YAML to reduce the time it takes
 for HH delay to deliver the messages.
 * increase the read_repair_chance for the CF's. This will increase the
 chance of RR repairing an inconsistency behind the scenes, so the next read
 is consistent. This will also increase the IO load on the system.

 With the RF 2 restriction you are probably doing the best you can. You are
 giving up consistency for availability and partition tolerance. The best
 thing to do to get peeps to agree that we will accept reduced consistency
 for high availability rather than say in general we want to achieve
 strong consistency.

 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 9/09/2012, at 9:09 PM, Sergey Tryuber stryu...@gmail.com wrote:

 Hi

 We have to use Cassandra with RF=2 (don't ask why...). There are two
 datacenters (RF=2 in each datacenter). Also we use Astyanax as a client
 library. In general we want to achieve strong consistency. Read performance
 is important for us, that's why we perform writes with LOCAL_QUORUM and
 reads with ONE. If one server is down, we automatically switch to
 Writes.ONE, Reads.ONE only for that replica which has failed node (we
 modified Astyanax to achieve that). When the server comes back, we turn
 back Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see some
 inconsistencies during the switching process and some time after (when
 hinted handnoff works).

 Basically I don't have any questions, just want to share our ugly
 failover algorithm, to hear your criticism and may be advise on how to
 improve it. Unfortunately we can't change replication factor and most of
 the time we have to read with consistency level ONE (because we have strict
 requirements on read performance).

 Thank you!





Re: Replication factor 2, consistency and failover

2012-09-09 Thread aaron morton
 In general we want to achieve strong consistency. 
You need to have R + W  N

 LOCAL_QUORUM and reads with ONE.
Gives you 2  + 1  2 when you use it. When you drop back to ONE / ONE you no 
longer have strong consistency. 

 may be advise on how to improve it. 
Sounds like you know how to improve it :)

Things you could play with:

* hinted_handoff_throttle_delay_in_ms in YAML to reduce the time it takes for 
HH delay to deliver the messages.
* increase the read_repair_chance for the CF's. This will increase the chance 
of RR repairing an inconsistency behind the scenes, so the next read is 
consistent. This will also increase the IO load on the system. 

With the RF 2 restriction you are probably doing the best you can. You are 
giving up consistency for availability and partition tolerance. The best thing 
to do to get peeps to agree that we will accept reduced consistency for high 
availability rather than say in general we want to achieve strong 
consistency.

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 9/09/2012, at 9:09 PM, Sergey Tryuber stryu...@gmail.com wrote:

 Hi
 
 We have to use Cassandra with RF=2 (don't ask why...). There are two 
 datacenters (RF=2 in each datacenter). Also we use Astyanax as a client 
 library. In general we want to achieve strong consistency. Read performance 
 is important for us, that's why we perform writes with LOCAL_QUORUM and reads 
 with ONE. If one server is down, we automatically switch to Writes.ONE, 
 Reads.ONE only for that replica which has failed node (we modified Astyanax 
 to achieve that). When the server comes back, we turn back 
 Writes.LOCAL_QUORUM and Reads.ONE, but, of course, we see some 
 inconsistencies during the switching process and some time after (when hinted 
 handnoff works).
 
 Basically I don't have any questions, just want to share our ugly failover 
 algorithm, to hear your criticism and may be advise on how to improve it. 
 Unfortunately we can't change replication factor and most of the time we have 
 to read with consistency level ONE (because we have strict requirements on 
 read performance). 
 
 Thank you!
 



Re: Replication factor - Consistency Questions

2012-07-20 Thread aaron morton
 But isn't QUORUM on a 2-node cluster still 2 nodes?
Yes.

3 is where you start to get some redundancy - 
http://thelastpickle.com/2011/06/13/Down-For-Me/

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/07/2012, at 10:24 AM, Kirk True wrote:

 But isn't QUORUM on a 2-node cluster still 2 nodes?
 
 On 07/17/2012 11:50 PM, Jason Tang wrote:
 Yes, for ALL, it is not good for HA, and because we meet problem when use 
 QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got 
 UnavailableException exception.
 
 2012/7/18 Jay Parashar jparas...@itscape.com
 Thanks..but write ALL will fail for any downed nodes. I am thinking of 
 QUORAM.
 
  
 From: Jason Tang [mailto:ares.t...@gmail.com] 
 Sent: Tuesday, July 17, 2012 8:24 PM
 To: user@cassandra.apache.org
 Subject: Re: Replication factor - Consistency Questions
 
  
 Hi
 
  
 I am starting using Cassandra for not a long time, and also have problems in 
 consistency.
 
  
 Here is some thinking.
 
 If you have Write:Any / Read:One, it will have consistency problem, and if 
 you want to repair, check your schema, and check the parameter Read repair 
 chance: 
 
 http://wiki.apache.org/cassandra/StorageConfiguration 
 
  
 And if you want to get consistency result, my suggestion is to have 
 Write:ALL / Read:One, since for Cassandra, write is more faster then read.
 
  
 For performance impact, you need to test your traffic, and if your memory 
 can not cache all your data, or your network is not fast enough, then yes, 
 it will impact to write one more node.
 
  
 BRs
 
  
 2012/7/18 Jay Parashar jparas...@itscape.com
 
 Hello all,
 
 There is a lot of material on Replication factor and Consistency level but I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.
 
 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level;
 Write = ANY and Read = 1
 
 I know that my consistency is Weak but since my RF = 2, I thought data would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?
 
 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?
 
 Thanks in advance
 Regards
 
 Jay
 
 
  
 
 



Re: Replication factor - Consistency Questions

2012-07-19 Thread Kirk True

But isn't QUORUM on a 2-node cluster still 2 nodes?

On 07/17/2012 11:50 PM, Jason Tang wrote:
Yes, for ALL, it is not good for HA, and because we meet problem when 
use QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM 
when got UnavailableException exception.


2012/7/18 Jay Parashar jparas...@itscape.com 
mailto:jparas...@itscape.com


Thanks..but write ALL will fail for any downed nodes. I am
thinking of QUORAM.

*From:*Jason Tang [mailto:ares.t...@gmail.com
mailto:ares.t...@gmail.com]
*Sent:* Tuesday, July 17, 2012 8:24 PM
*To:* user@cassandra.apache.org mailto:user@cassandra.apache.org
*Subject:* Re: Replication factor - Consistency Questions

Hi

I am starting using Cassandra for not a long time, and also have
problems in consistency.

Here is some thinking.

If you have Write:Any / Read:One, it will have consistency
problem, and if you want to repair, check your schema, and check
the parameter Read repair chance: 

http://wiki.apache.org/cassandra/StorageConfiguration

And if you want to get consistency result, my suggestion is to
have Write:ALL / Read:One, since for Cassandra, write is more
faster then read.

For performance impact, you need to test your traffic, and if your
memory can not cache all your data, or your network is not fast
enough, then yes, it will impact to write one more node.

BRs

2012/7/18 Jay Parashar jparas...@itscape.com
mailto:jparas...@itscape.com

Hello all,

There is a lot of material on Replication factor and Consistency
level but I
am a little confused by what is happening on my setup. (Cassandra
1.1.2). I
would appreciate any answers.

My Setup: A cluster of 2 nodes evenly balanced. My RF =2,
Consistency Level;
Write = ANY and Read = 1

I know that my consistency is Weak but since my RF = 2, I thought
data would
be just duplicated in both the nodes but sometimes, querying does
not give
me the correct (or gives partial) results. In other times, it
gives me the
right results
Is the Read Repair going on after the first query? But as RF = 2,
data is
duplicated then why the repair?
Note: My query is done a while after the Writes so data should
have been in
both the nodes. Or is this not the case (flushing not happening etc)?

I am thinking of making the Write as 1 and Read as QUORAM so R + W
 RF (1 +
2  2) to give strong consistency. Will that affect performance a lot
(generally speaking)?

Thanks in advance
Regards

Jay






Re: Replication factor - Consistency Questions

2012-07-18 Thread Jason Tang
Yes, for ALL, it is not good for HA, and because we meet problem when use
QUORAM, and current solution is switch Write:QUORAM / Read:QUORAM when got
UnavailableException exception.

2012/7/18 Jay Parashar jparas...@itscape.com

 Thanks..but write ALL will fail for any downed nodes. I am thinking of
 QUORAM.

 ** **

 *From:* Jason Tang [mailto:ares.t...@gmail.com]
 *Sent:* Tuesday, July 17, 2012 8:24 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Replication factor - Consistency Questions

 ** **

 Hi

 ** **

 I am starting using Cassandra for not a long time, and also have problems
 in consistency.

 ** **

 Here is some thinking.

 If you have Write:Any / Read:One, it will have consistency problem, and if
 you want to repair, check your schema, and check the parameter Read repair
 chance: 

 http://wiki.apache.org/cassandra/StorageConfiguration 

 ** **

 And if you want to get consistency result, my suggestion is to have
 Write:ALL / Read:One, since for Cassandra, write is more faster then read.
 

 ** **

 For performance impact, you need to test your traffic, and if your memory
 can not cache all your data, or your network is not fast enough, then yes,
 it will impact to write one more node.

 ** **

 BRs

 ** **

 2012/7/18 Jay Parashar jparas...@itscape.com

 Hello all,

 There is a lot of material on Replication factor and Consistency level but
 I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.

 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
 Level;
 Write = ANY and Read = 1

 I know that my consistency is Weak but since my RF = 2, I thought data
 would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?

 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1
 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?

 Thanks in advance
 Regards

 Jay

 

 ** **



Re: Replication factor - Consistency Questions

2012-07-17 Thread Jason Tang
Hi

I am starting using Cassandra for not a long time, and also have problems
in consistency.

Here is some thinking.
If you have Write:Any / Read:One, it will have consistency problem, and if
you want to repair, check your schema, and check the parameter Read repair
chance: 
http://wiki.apache.org/cassandra/StorageConfiguration

And if you want to get consistency result, my suggestion is to have
Write:ALL / Read:One, since for Cassandra, write is more faster then read.

For performance impact, you need to test your traffic, and if your memory
can not cache all your data, or your network is not fast enough, then yes,
it will impact to write one more node.

BRs


2012/7/18 Jay Parashar jparas...@itscape.com

 Hello all,

 There is a lot of material on Replication factor and Consistency level but
 I
 am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
 would appreciate any answers.

 My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency
 Level;
 Write = ANY and Read = 1

 I know that my consistency is Weak but since my RF = 2, I thought data
 would
 be just duplicated in both the nodes but sometimes, querying does not give
 me the correct (or gives partial) results. In other times, it gives me the
 right results
 Is the Read Repair going on after the first query? But as RF = 2, data is
 duplicated then why the repair?
 Note: My query is done a while after the Writes so data should have been in
 both the nodes. Or is this not the case (flushing not happening etc)?

 I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1
 +
 2  2) to give strong consistency. Will that affect performance a lot
 (generally speaking)?

 Thanks in advance
 Regards

 Jay





RE: Replication factor - Consistency Questions

2012-07-17 Thread Jay Parashar
Thanks..but write ALL will fail for any downed nodes. I am thinking of
QUORAM.

 

From: Jason Tang [mailto:ares.t...@gmail.com] 
Sent: Tuesday, July 17, 2012 8:24 PM
To: user@cassandra.apache.org
Subject: Re: Replication factor - Consistency Questions

 

Hi

 

I am starting using Cassandra for not a long time, and also have problems in
consistency.

 

Here is some thinking.

If you have Write:Any / Read:One, it will have consistency problem, and if
you want to repair, check your schema, and check the parameter Read repair
chance: 

http://wiki.apache.org/cassandra/StorageConfiguration 

 

And if you want to get consistency result, my suggestion is to have
Write:ALL / Read:One, since for Cassandra, write is more faster then read.

 

For performance impact, you need to test your traffic, and if your memory
can not cache all your data, or your network is not fast enough, then yes,
it will impact to write one more node.

 

BRs

 

2012/7/18 Jay Parashar jparas...@itscape.com

Hello all,

There is a lot of material on Replication factor and Consistency level but I
am a little confused by what is happening on my setup. (Cassandra 1.1.2). I
would appreciate any answers.

My Setup: A cluster of 2 nodes evenly balanced. My RF =2, Consistency Level;
Write = ANY and Read = 1

I know that my consistency is Weak but since my RF = 2, I thought data would
be just duplicated in both the nodes but sometimes, querying does not give
me the correct (or gives partial) results. In other times, it gives me the
right results
Is the Read Repair going on after the first query? But as RF = 2, data is
duplicated then why the repair?
Note: My query is done a while after the Writes so data should have been in
both the nodes. Or is this not the case (flushing not happening etc)?

I am thinking of making the Write as 1 and Read as QUORAM so R + W  RF (1 +
2  2) to give strong consistency. Will that affect performance a lot
(generally speaking)?

Thanks in advance
Regards

Jay



 



Re: Replication factor

2012-05-30 Thread aaron morton
Ah. The lack of page cache hits after compaction makes sense. But I don't think 
the drastic effect it appears to have is expected. Do you have an idea of how 
much slower local reads get ?

If you are selecting coordinators based on token ranges the DS is not as much. 
It still has some utility as the Digest reads will be happening on other nodes 
and it should help with selecting them. 

Thanks for the extra info. 

Aaron

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/05/2012, at 1:24 AM, Viktor Jevdokimov wrote:

 All data is in the page cache. No repairs. Compactions not hitting disk for 
 read. CPU 50%. ParNew GC 100 ms in average.
  
 After one compaction completes, new sstable is not in page cache, there may 
 be a disk usage spike before data is cached, so local reads gets slower for a 
 moment, comparing with other nodes. Redirecting almost all requests to other 
 nodes finally ends up with a huge latency spike almost on all nodes, 
 especially when ParNew GC may spike on one node (200ms). We call it “cluster 
 hiccup”, when incoming and outgoing network traffic drops for a moment.
  
 And such hiccups happens several times an hour, few seconds long. Playing 
 with badness threshold did not gave a lot better results, but turning DS off 
 completely fixed all problems with latencies, node spikes, cluster hiccups 
 and network traffic drops.
  
 In our case, our client is selecting endpoints for a key by calculating a 
 token, so we always hit a replica.
  
  
 
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo18be.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Thursday, May 24, 2012 13:00
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
  
 Your experience is when using CL ONE the Dynamic Snitch is moving local reads 
 off to other nodes and this is causing spikes in read latency ? 
  
 Did you notice what was happening on the node for the DS to think it was so 
 slow ? Was compaction or repair going on ? 
  
 Have you played with the badness threshold 
 https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472 ? 
  
 Cheers
  
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
  
 On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:
 
 
 Depends on use case. For ours we have another experience and statistics, when 
 turning dynamic snitch off makes overall latency and spikes much, much lower.
  
  
  
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
  
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo29.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
  
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Thursday, May 24, 2012 02:35
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
  
 On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:
  When RF == number of nodes, and you read at CL ONE you will always be 
  reading locally.
 “always be reading locally” – only if Dynamic Snitch is “off”. With dynamic 
 snitch “on” request may be redirected to other node, which may introduce 
 latency spikes.
  
 Actually it's preventing spikes, since if it won't read locally that means 
 the local replica is in worse shape than the rest (compacting, repairing, 
 etc.)
  
 -Brandon 



Re: Replication factor

2012-05-24 Thread aaron morton
ReadRepair means including all UP replicas in the request, waiting 
asynchronously after the read has completed, resolving and repairing 
differences. If you read at QUOURM with RR running, ALL (replace) nodes will 
perform a read. 

At any CL  ONE the responses from CL nodes are reconciled and differences are 
repaired using a mechanism similar to RR. This has to happen before the 
response can be sent to the client.

The naming does not help, but they are a different. RR is a background process 
designed to reduce the chance of an inconsistent read. It's needed less since 
changed to Hinted Handoff in 1.0.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/05/2012, at 5:47 AM, Daning Wang wrote:

 Thanks guys.
 
 Aaron, I am confused about this. from wiki 
 http://wiki.apache.org/cassandra/ReadRepair, looks for any consistency level. 
 Read Repair will be done either before or after responding data.
 
   Read Repair does not run at CL ONE
 
 Daning
 
 On Wed, May 23, 2012 at 3:51 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:
  When RF == number of nodes, and you read at CL ONE you will always be 
  reading locally.
 
 “always be reading locally” – only if Dynamic Snitch is “off”. With dynamic 
 snitch “on” request may be redirected to other node, which may introduce 
 latency spikes.
 
  
 
  
 
 
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo7789.png 
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Wednesday, May 23, 2012 13:00
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
 
  
 
 RF is normally adjusted to modify availability (see 
 http://thelastpickle.com/2011/06/13/Down-For-Me/)
 
  
 
 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
 
 Read Repair does not run at CL ONE.
 
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally. But with a low consistency.
 
 If you read with QUORUM when RF == number of nodes you will still get some 
 performance benefit from the data being read locally.
 
  
 
 Cheers
 
  
 
  
 
 -
 
 Aaron Morton
 
 Freelance Developer
 
 @aaronmorton
 
 http://www.thelastpickle.com
 
  
 
 On 23/05/2012, at 9:34 AM, Daning Wang wrote:
 
 
 
 
 Hello,
 
 What is the pros and cons to choose different number of replication factor in 
 term of performance? if space is not a concern.
 
 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
 
 Can you share some insights about this?
 
 Thanks in advance,
 
 Daning
 
  
 
 



Re: Replication factor

2012-05-24 Thread aaron morton
Your experience is when using CL ONE the Dynamic Snitch is moving local reads 
off to other nodes and this is causing spikes in read latency ? 

Did you notice what was happening on the node for the DS to think it was so 
slow ? Was compaction or repair going on ? 
 
Have you played with the badness threshold 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472 ? 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:

 Depends on use case. For ours we have another experience and statistics, when 
 turning dynamic snitch off makes overall latency and spikes much, much lower.
  
  
 
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 What is Adform: watch this short video
 signature-logo29.png
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Thursday, May 24, 2012 02:35
 To: user@cassandra.apache.org
 Subject: Re: Replication factor
  
 On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
 viktor.jevdoki...@adform.com wrote:
  When RF == number of nodes, and you read at CL ONE you will always be 
  reading locally.
 “always be reading locally” – only if Dynamic Snitch is “off”. With dynamic 
 snitch “on” request may be redirected to other node, which may introduce 
 latency spikes.
  
 Actually it's preventing spikes, since if it won't read locally that means 
 the local replica is in worse shape than the rest (compacting, repairing, 
 etc.)
  
 -Brandon 



RE: Replication factor

2012-05-24 Thread Viktor Jevdokimov
All data is in the page cache. No repairs. Compactions not hitting disk for 
read. CPU 50%. ParNew GC 100 ms in average.

After one compaction completes, new sstable is not in page cache, there may be 
a disk usage spike before data is cached, so local reads gets slower for a 
moment, comparing with other nodes. Redirecting almost all requests to other 
nodes finally ends up with a huge latency spike almost on all nodes, especially 
when ParNew GC may spike on one node (200ms). We call it cluster hiccup, 
when incoming and outgoing network traffic drops for a moment.

And such hiccups happens several times an hour, few seconds long. Playing with 
badness threshold did not gave a lot better results, but turning DS off 
completely fixed all problems with latencies, node spikes, cluster hiccups and 
network traffic drops.

In our case, our client is selecting endpoints for a key by calculating a 
token, so we always hit a replica.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, May 24, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

Your experience is when using CL ONE the Dynamic Snitch is moving local reads 
off to other nodes and this is causing spikes in read latency ?

Did you notice what was happening on the node for the DS to think it was so 
slow ? Was compaction or repair going on ?

Have you played with the badness threshold 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472 ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:


Depends on use case. For ours we have another experience and statistics, when 
turning dynamic snitch off makes overall latency and spikes much, much lower.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

signature-logo29.pnghttp://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Brandon Williams 
[mailto:dri...@gmail.com]mailto:[mailto:dri...@gmail.com]
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the 
local replica is in worse shape than the rest (compacting, repairing, etc.)

-Brandon

inline: signature-logo18be.png

Re: Replication factor

2012-05-23 Thread aaron morton
RF is normally adjusted to modify availability (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
Read Repair does not run at CL ONE.
When RF == number of nodes, and you read at CL ONE you will always be reading 
locally. But with a low consistency.
If you read with QUORUM when RF == number of nodes you will still get some 
performance benefit from the data being read locally.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 9:34 AM, Daning Wang wrote:

 Hello,
 
 What is the pros and cons to choose different number of replication factor in 
 term of performance? if space is not a concern.
 
 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
 
 Can you share some insights about this?
 
 Thanks in advance,
 
 Daning 
 



RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Wednesday, May 23, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

RF is normally adjusted to modify availability (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.
Read Repair does not run at CL ONE.
When RF == number of nodes, and you read at CL ONE you will always be reading 
locally. But with a low consistency.
If you read with QUORUM when RF == number of nodes you will still get some 
performance benefit from the data being read locally.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 9:34 AM, Daning Wang wrote:


Hello,

What is the pros and cons to choose different number of replication factor in 
term of performance? if space is not a concern.

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.

Can you share some insights about this?

Thanks in advance,

Daning

inline: signature-logo7789.png

Re: Replication factor

2012-05-23 Thread Daning Wang
Thanks guys.

Aaron, I am confused about this. from wiki
http://wiki.apache.org/cassandra/ReadRepair, looks for any consistency
level. Read Repair will be done either before or after responding data.

  Read Repair does not run at CL ONE

Daning

On Wed, May 23, 2012 at 3:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

   When RF == number of nodes, and you read at CL ONE you will always be
 reading locally.

 “always be reading locally” – only if Dynamic Snitch is “off”. With
 dynamic snitch “on” request may be redirected to other node, which may
 introduce latency spikes.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Wednesday, May 23, 2012 13:00
 *To:* user@cassandra.apache.org
 *Subject:* Re: Replication factor

 ** **

 RF is normally adjusted to modify availability (see
 http://thelastpickle.com/2011/06/13/Down-For-Me/)

 ** **

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs
 RF=4 affect read performance? If consistency level is ONE, looks reading
 does not need to go to another hop to get data if RF=4, but it would do
 more work on read repair in the background.

  Read Repair does not run at CL ONE.

 When RF == number of nodes, and you read at CL ONE you will always be
 reading locally. But with a low consistency.

 If you read with QUORUM when RF == number of nodes you will still get some
 performance benefit from the data being read locally.

 ** **

 Cheers

 ** **

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 23/05/2012, at 9:34 AM, Daning Wang wrote:



 

 Hello,

 What is the pros and cons to choose different number of replication factor
 in term of performance? if space is not a concern.

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs
 RF=4 affect read performance? If consistency level is ONE, looks reading
 does not need to go to another hop to get data if RF=4, but it would do
 more work on read repair in the background.

 Can you share some insights about this?

 Thanks in advance,

 Daning 

 ** **

signature-logo7789.png

Re: Replication factor

2012-05-23 Thread Brandon Williams
On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

   When RF == number of nodes, and you read at CL ONE you will always be
 reading locally.

 “always be reading locally” – only if Dynamic Snitch is “off”. With
 dynamic snitch “on” request may be redirected to other node, which may
 introduce latency spikes.


Actually it's preventing spikes, since if it won't read locally that means
the local replica is in worse shape than the rest (compacting, repairing,
etc.)

-Brandon


RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
Depends on use case. For ours we have another experience and statistics, when 
turning dynamic snitch off makes overall latency and spikes much, much lower.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Brandon Williams [mailto:dri...@gmail.com]
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.org
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the 
local replica is in worse shape than the rest (compacting, repairing, etc.)

-Brandon
inline: signature-logo29.png

Re: Replication factor per column family

2012-02-17 Thread R. Verlangen
Ok, that's clear, thank you for your time!

2012/2/16 aaron morton aa...@thelastpickle.com

 yes.

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 10:15 PM, R. Verlangen wrote:

 Hmm ok. This means if I want to have a CF with RF = 3 and another CF with
 RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces?

 2012/2/16 aaron morton aa...@thelastpickle.com

 Multiple CF mutations for a row are treated atomically in the commit log,
 and they are sent together to the replicas. Replication occurs at the row
 level, not the row+cf level.

 If each CF had it's own RF, odd things may happen. Like sending a batch
 mutation for one row and two CF's that fails because there is not enough
 nodes for one of the CF's.

 Would be other reasons as well. In short it's baked in.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 9:54 PM, R. Verlangen wrote:

 Hi there,

 As the subject states: Is it possible to set a replication factor per
 column family?

 Could not find anything of recent releases. I'm running Cassandra 1.0.7
 and I think it should be possible on a per CF basis instead of the whole
 keyspace.

 With kind regards,
 Robin







Re: Replication factor per column family

2012-02-16 Thread aaron morton
Multiple CF mutations for a row are treated atomically in the commit log, and 
they are sent together to the replicas. Replication occurs at the row level, 
not the row+cf level. 

If each CF had it's own RF, odd things may happen. Like sending a batch 
mutation for one row and two CF's that fails because there is not enough nodes 
for one of the CF's. 

Would be other reasons as well. In short it's baked in. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/02/2012, at 9:54 PM, R. Verlangen wrote:

 Hi there,
 
 As the subject states: Is it possible to set a replication factor per column 
 family?
 
 Could not find anything of recent releases. I'm running Cassandra 1.0.7 and I 
 think it should be possible on a per CF basis instead of the whole keyspace.
 
 With kind regards,
 Robin



Re: Replication factor per column family

2012-02-16 Thread R. Verlangen
Hmm ok. This means if I want to have a CF with RF = 3 and another CF with
RF = 1 (e.g. some debug logging) I will have to create 2 keyspaces?

2012/2/16 aaron morton aa...@thelastpickle.com

 Multiple CF mutations for a row are treated atomically in the commit log,
 and they are sent together to the replicas. Replication occurs at the row
 level, not the row+cf level.

 If each CF had it's own RF, odd things may happen. Like sending a batch
 mutation for one row and two CF's that fails because there is not enough
 nodes for one of the CF's.

 Would be other reasons as well. In short it's baked in.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 16/02/2012, at 9:54 PM, R. Verlangen wrote:

 Hi there,

 As the subject states: Is it possible to set a replication factor per
 column family?

 Could not find anything of recent releases. I'm running Cassandra 1.0.7
 and I think it should be possible on a per CF basis instead of the whole
 keyspace.

 With kind regards,
 Robin





Re: Replication factor per column family

2012-02-16 Thread aaron morton
yes.

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/02/2012, at 10:15 PM, R. Verlangen wrote:

 Hmm ok. This means if I want to have a CF with RF = 3 and another CF with RF 
 = 1 (e.g. some debug logging) I will have to create 2 keyspaces?
 
 2012/2/16 aaron morton aa...@thelastpickle.com
 Multiple CF mutations for a row are treated atomically in the commit log, and 
 they are sent together to the replicas. Replication occurs at the row level, 
 not the row+cf level. 
 
 If each CF had it's own RF, odd things may happen. Like sending a batch 
 mutation for one row and two CF's that fails because there is not enough 
 nodes for one of the CF's. 
 
 Would be other reasons as well. In short it's baked in. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/02/2012, at 9:54 PM, R. Verlangen wrote:
 
 Hi there,
 
 As the subject states: Is it possible to set a replication factor per 
 column family?
 
 Could not find anything of recent releases. I'm running Cassandra 1.0.7 and 
 I think it should be possible on a per CF basis instead of the whole 
 keyspace.
 
 With kind regards,
 Robin
 
 



Re: Replication factor and other schema changes in = 0.7

2010-08-20 Thread Gary Dusbabek
It is coming.  In fact, I started working on this ticket yesterday.
Most of the settings that you could change before will be modifiable.
Unfortunately, you must still manually perform the repair operations,
etc., afterward.

https://issues.apache.org/jira/browse/CASSANDRA-1285

Gary.


On Thu, Aug 19, 2010 at 18:00, Andres March ama...@qualcomm.com wrote:
 How should we go about changing the replication factor and other keyspace
 settings now that it and other KSMetaData are no longer managed in
 cassandra.yaml?

 I found makeDefinitionMutation() in the Migration class and see that it is
 called for the other schema migrations.  There just seems to be a big gap in
 the management API for the KSMetaData we might want to change.
 --
 Andres March
 ama...@qualcomm.com
 Qualcomm Internet Services


Re: Replication factor and other schema changes in = 0.7

2010-08-20 Thread Andres March

 Cool, thanks.  I suspected the same, including the repair.

On 08/20/2010 06:05 AM, Gary Dusbabek wrote:

It is coming.  In fact, I started working on this ticket yesterday.
Most of the settings that you could change before will be modifiable.
Unfortunately, you must still manually perform the repair operations,
etc., afterward.

https://issues.apache.org/jira/browse/CASSANDRA-1285

Gary.


On Thu, Aug 19, 2010 at 18:00, Andres Marchama...@qualcomm.com  wrote:

How should we go about changing the replication factor and other keyspace
settings now that it and other KSMetaData are no longer managed in
cassandra.yaml?

I found makeDefinitionMutation() in the Migration class and see that it is
called for the other schema migrations.  There just seems to be a big gap in
the management API for the KSMetaData we might want to change.
--
Andres March
ama...@qualcomm.com
Qualcomm Internet Services


--
*Andres March*
ama...@qualcomm.com mailto:ama...@qualcomm.com
Qualcomm Internet Services


Re: Replication Factor and Data Centers

2010-06-15 Thread Jonathan Ellis
(moving to user@)

On Mon, Jun 14, 2010 at 10:43 PM, Masood Mortazavi
masoodmortaz...@gmail.com wrote:
 Is the clearer interpretation of this statement (in
 conf/datacenters.properties) given anywhere else?

 # The sum of all the datacenter replication factor values should equal
 # the replication factor of the keyspace (i.e. sum(dc_rf) = RF)

 # keyspace\:datacenter=replication factor
 Keyspace1\:DC1=3
 Keyspace1\:DC2=2
 Keyspace1\:DC3=1

 Does the above example configuration imply that Keyspace1 has a RF of 6, and
 that of these 3 will go to DC1, 2 to DC2 and 1 to DC3?

Yes.

 What will happen if datacenters.properties and cassandra-rack.properties are
 simply empty?

You have an illegal configuration.
https://issues.apache.org/jira/browse/CASSANDRA-1191 is open to have
Cassandra raise an error under this condition and others.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com