Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread horschi
Hi Tom, "The idea of join_ring=false is that other nodes are not aware of the new node, and therefore never send requests to it. The new node can then be repaired" Nicely explained, but I still see the issue that this node would not receive writes during that time. So after the repair the node

Re: How to prevent queries being routed to new DC?

2015-09-08 Thread Tom van den Berge
Hi Anuj, That could indeed explain reads on my new DC. However, what I'm seeing in my client application is that every now and then, a read query does not produce any result, while I'm sure that it should. If I understand the read repair process correctly, it will never cause a read query fail to

Re: Some love for multi-partition LWT?

2015-09-08 Thread Marek Lewandowski
Are you absolutely sure that lock is required? I could imagine that multiple paxos rounds could be played for different partitions and these rounds would be dependent on each other. Performance aside, can you please elaborate where do you see such need for lock? On 8 Sep 2015 00:05, "DuyHai Doan"

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
> > "one drawback: the node joins the cluster as soon as the bootstrapping > begins." > I am not sure I understand this correctly. It will get tokens, but not > load data if you combine it with autobootstrap=false. > Joining the cluster means that all other nodes become aware of the new node, and

Re: Some love for multi-partition LWT?

2015-09-08 Thread woolfel
There's quite a bit of literature on the topic. Look at what is in acmqueue and you'll see what others are saying is accurate. To guarantee you need a distributed lock or a different design like datomic. Look at what rich hickey has done with datomic Sent from my iPhone > On Sep 8, 2015,

Re: Some love for multi-partition LWT?

2015-09-08 Thread Marek Lewandowski
"This simple example shows how hard it is to implement multi-partition Paxos rounds. The fact that you have multiple Paxos rounds that are dependent on each other break the safety guarantees of the original Paxos paper. " What if this dependency is explicitly specified in proposal. Assume that

Re: Some love for multi-partition LWT?

2015-09-08 Thread Peter Lin
I would caution using paxos for distributed transaction in an inappropriate way. The model has to be logically and mathematically correct, otherwise you end up with corrupt data. In the worst case, it could cause cascading failure that brings down the cluster. I've seen distributed systems come to

Re: Some love for multi-partition LWT?

2015-09-08 Thread DuyHai Doan
"Do you think it could work?" At first glance, maybe, but it would involve a huge number of round trips and a lot of contentions. You'll risk serious deadlocks. Second, to prove that a solution works, you'll need to prove that it works for ALL situations, not just a few. Proving something wrong

Re: Some love for multi-partition LWT?

2015-09-08 Thread DuyHai Doan
"I could imagine that multiple paxos rounds could be played for different partitions and these rounds would be dependent on each other" Example of cluster of 10 nodes (N1 ... N10) and RF=3. Suppose a LWT with 2 partitions and 2 mutations M1 & M2, coordinator C1. It will imply 2 Paxos rounds with

Re: Replacing dead node and cassandra.replace_address

2015-09-08 Thread Maciek Sakrejda
On Tue, Sep 8, 2015 at 11:14 AM, sai krishnam raju potturi < pskraj...@gmail.com> wrote: > Once the new node is bootstrapped, you could remove replacement_address > from the env.sh file > Thanks, but how do I know when bootstrapping is completed?

Re: Old SSTables lying around

2015-09-08 Thread Vidur Malik
I did add a comment to that ticket, but the problem seems slightly different; moreover the compaction strategies are different. I'm also not seeing the error that abliss is reporting. On Tue, Sep 8, 2015 at 2:47 PM, Robert Coli wrote: > On Tue, Sep 8, 2015 at 10:32 AM,

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Bryan Cheng
Tom, I don't believe so; it seems the symptom would be an indefinite (or very long) hang. To clarify, is this issue restricted to LOCAL_QUORUM? Can you issue a LOCAL_ONE SELECT and retrieve the expected data back? On Tue, Sep 8, 2015 at 12:02 PM, Tom van den Berge < tom.vandenbe...@gmail.com>

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Just to be sure: can this bug result in a 0-row result while it should be > 0 ? Op 8 sep. 2015 6:29 PM schreef "Tyler Hobbs" : > See https://issues.apache.org/jira/browse/CASSANDRA-9753 > > On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < > tom.vandenbe...@gmail.com>

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
> Running nodetool rebuild on a node that was started with join_ring=false >> does not work, unfortunately. The nodetool command returns immediately, >> after a message appears in the log that the streaming of data has started. >> After that, nothing happens. > > > Per driftx, the author of

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Robert Coli
On Tue, Sep 8, 2015 at 1:39 AM, horschi wrote: > "The idea of join_ring=false is that other nodes are not aware of the new > node, and therefore never send requests to it. The new node can then be > repaired" > Nicely explained, but I still see the issue that this node would

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Nate McCall
> > Just to be sure: can this bug result in a 0-row result while it should be > > 0 ? > Per Tyler's reference to CASSANDRA-9753 , you would see this if the read was routed by speculative retry to the nodes that were not yet finished being

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Nate, I've disabled it, and it's been running for about an hour now without problems, while before, the problem occurred roughly every few minutes. I guess it's safe to say that this proves that CASSANDRA-9753 is the cause of the problem.

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread horschi
Hi Robert, I tried to set up a new node with join_ring=false once. In my test that node did not pick a token in the ring. I assume running repair or rebuild would not do anything in that case: No tokens = no data. But I must admit: I have not tried running rebuild. Is a new node with

Re: who does generate timestamp during the write?

2015-09-08 Thread ibrahim El-sanosi
Yes, that you a lot On Tue, Sep 8, 2015 at 5:25 PM, Tyler Hobbs wrote: > > On Sat, Sep 5, 2015 at 8:32 AM, ibrahim El-sanosi < > ibrahimsaba...@gmail.com> wrote: > >> So in this scenario, the latest data that wrote to the replicas is [K1, >> V2] which should be the correct

Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Robert Coli
On Tue, Sep 8, 2015 at 2:39 PM, horschi wrote: > I tried to set up a new node with join_ring=false once. In my test that > node did not pick a token in the ring. I assume running repair or rebuild > would not do anything in that case: No tokens = no data. But I must admit: > I

Question about consistency

2015-09-08 Thread Eric Plowe
I'm using Cassandra as a storage mechanism for session state persistence for an ASP.NET web application. I am seeing issues where the session state is persisted on one page (setting a value: Session["key"] = "value" and when it redirects to another (from a post back event) and check for the

Re: Question about consistency

2015-09-08 Thread Robert Coli
On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe wrote: > I'm using Cassandra as a storage mechanism for session state persistence > for an ASP.NET web application. I am seeing issues where the session > state is persisted on one page (setting a value: Session["key"] = > "value"

Re: Replacing dead node and cassandra.replace_address

2015-09-08 Thread Vasileios Vlachos
I think you should be able to see the streaming process by running nodetool netstats. I also think system.log displays similar information about stemming/when stemming is finished. Shouldn't the state of the node change to UP when bootstrap is completed as well? People, correct me if I'm wrong

Re: Question about consistency

2015-09-08 Thread Eric Plowe
Rob, All writes/reads are happening from DC1. DC2 is a backup. The web app does not handle live requests from DC2. Regards, Eric Plowe On Tuesday, September 8, 2015, Robert Coli wrote: > On Tue, Sep 8, 2015 at 4:40 PM, Eric Plowe

Re: Question about consistency

2015-09-08 Thread Eric Plowe
To further expand. We have two data centers, Miami and Dallas. Dallas is our disaster recovery data center. The cluster has 12 nodes, 6 in Miami and 6 in Dallas. The servers in Miami only read/write to Miami using data center aware load balancing policy of the driver. We have the problem when

Re: who does generate timestamp during the write?

2015-09-08 Thread Tyler Hobbs
On Sat, Sep 5, 2015 at 8:32 AM, ibrahim El-sanosi wrote: > So in this scenario, the latest data that wrote to the replicas is [K1, > V2] which should be the correct one, but it reads [K1,V1] because of divert > clock. > > Can such scenario occur? > Yes, it most

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tyler Hobbs
See https://issues.apache.org/jira/browse/CASSANDRA-9753 On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge < tom.vandenbe...@gmail.com> wrote: > I've been bugging you a few times, but now I've got trace data for a query > with LOCAL_QUORUM that is being sent to a remove data center. > > The

Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
I've been bugging you a few times, but now I've got trace data for a query with LOCAL_QUORUM that is being sent to a remove data center. The setup is as follows: NetworkTopologyStrategy: {"DC1":"1","DC2":"2"} Both DC1 and DC2 have 2 nodes. In DC2, one node is currently being rebuilt, and

Re: Old SSTables lying around

2015-09-08 Thread Vidur Malik
Bump on this. Anybody have any insight/need more info? On Fri, Sep 4, 2015 at 5:09 PM, Vidur Malik wrote: > Hey, > > We're running a Cassandra 2.2.0 cluster with 8 nodes. We are doing > frequent updates to our data and we have very few reads, and we are using > Leveled

Replacing dead node and cassandra.replace_address

2015-09-08 Thread Maciek Sakrejda
According to the docs [1], when replacing a Cassandra node, I should start the replacement with cassandra.replace_address specified. Does that just become part of the replacement node's startup configuration? Can I (or do I have to) stop specifying it at some point? Does this affect subsequent