Re: When are hints written?

2016-04-21 Thread Jan
HI Bo; 

you raised 2 questions: 
20% system utilization
Hints 

20% system utilization:  For a node or a cluster to have 20% utilization is 
Normal during peak write operation.  
Hints:   hints are written when a node is unreachable;C* 3.0  has a 
complete over haul in the way hints have been implemented. 

Recommend reading up this blog article: 
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery

hope this helps
Jan/



On Thu, 4/21/16, Jens Rantil  wrote:

 Subject: Re: When are hints written?
 To: user@cassandra.apache.org
 Date: Thursday, April 21, 2016, 8:57 AM
 
 Hi again
 Bo,
 I assume this is the piece of
 documentation you are referring to? 
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance
 
 > If a
 replica node is overloaded or unavailable, and the failure
 detector has not yet marked it down, then expect most or all
 writes to that node to fail after the timeout triggered by
 write_request_timeout_in_ms,
 which defaults to 10 seconds. During that time, Cassandra
 writes the hint when the timeout is reached.
 I'm not an expert on this, but
 the way I've seen is that hints are written stored as
 soon as there is _any_ issues writing a mutation
 (insert/update/delete) to a node. By "issue", that
 essentially means that a node hasn't acknowledged back
 to the coordinator that the write succeeded within
 write_request_timeout_in_ms. This includes TCP/socket
 timeouts, connection issues or that the node is down. The
 hints are stored for a maximum timespan defaulting to 3
 hours.
 
 Cheers,
 Jens
 On Thu, Apr
 21, 2016 at 8:06 AM Bo Finnerup Madsen 
 wrote:
 Hi Jens,
 Thank you
 for the tip!ALL would definitely cure our hints
 issue, but as you note, it is not optimal as we are unable
 to take down nodes without clients failing.
 I am most probably overlooking
 something in the documentation, but I cannot see any
 description of when hints are written other than when a node
 is marked as being down. And since none of our nodes have
 been marked as being down (at least according to the logs),
 I suspect that there is some timeout that governs when hints
 are written?
 Regarding
 your other post: Yes, 3.0.3 is pretty new. But we are new to
 this cassandra game, and our schema-fu is not strong enough
 for us to create a schema without using materialized views
 :)
 
 ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil
 :
 Hi Bo,
 > In our case, I would like for the
 cluster to wait for the write to be persisted on the
 relevant nodes before returning an ok to the client. But I don't know which
 knobs to turn to accomplish this? or if it is even possible
 :)
 This is what write consistency
 option is for. Have a look at 
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
 Note, however that if you use ALL, your clients will fail
 (throw exception, depending on language) as soon as a single
 partition can't be written. This means you can't do
 online maintenance of a Cassandra node (such as upgrading it
 etc.) without experiencing write issues.
 Cheers,Jens
 On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen
 
 wrote:
 Hi,
 We have a
 small 5 node cluster of m4.xlarge clients that receives
 writes from ~20 clients. The clients will write as fast as
 they can, and the whole process is limited by the write
 performance of the cassandra cluster.After we have tweaked our schema to
 avoid large partitions, the load is going ok and we
 don't see any warnings or errors in the cassandra logs.
 But we do see quite a lot of hint handoff activity. During
 the load, the cassandra nodes are quite loaded, with linux
 reporting a load as high as 20.
 I have read the available
 documentation on how hints works, and to my understanding
 hints should only be written if a node is down. But as far
 as I can see, none of the nodes are marked as down during
 the load. So I suspect I am missing something
 :)We have configured the servers
 with write_request_timeout_in_ms: 12 and the clients
 with a timeout of 13, but still get hints
 stored.
 In our case, I
 would like for the cluster to wait for the write to be
 persisted on the relevant nodes before returning an ok to
 the client. But I don't know which knobs to turn to
 accomplish this? or if it is even possible :)
 We are running cassandra 3.0.3, with
 8Gb heap and a replication factor of 3.
 Thank you in advance!
 Yours sincerely,  Bo
 Madsen
 -- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
 For urgent matters you can reach me at
 +46-708-84 18 32.
 -- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60
 Stockholm, Sweden
 For urgent matters you can
 reach me at +46-708-84 18 32.


RE: Problem Replacing a Dead Node

2016-04-21 Thread Jan
Mir; 

You can take a node out of the cluster with nodetool decommission to a live 
node, or nodetool removetoken (to any other machine) to remove a dead one. 
This will assign the ranges the old node was responsible for to other nodes, 
and replicate the appropriate data there. If decommission is used, the data 
will stream from the decommissioned node. If removetoken is used, the data will 
stream from the remaining replicas.


Hope this helps
Jan/


On Thu, 4/21/16, Anubhav Kale  wrote:

 Subject: RE: Problem Replacing a Dead Node
 To: "user@cassandra.apache.org" 
 Date: Thursday, April 21, 2016, 6:34 PM
 
 #yiv5871637581
 #yiv5871637581 --
  
  _filtered #yiv5871637581 {panose-1:2 4 5 3 5 4 6 3 2 4;}
  _filtered #yiv5871637581 {font-family:Calibri;panose-1:2 15
 5 2 2 2 4 3 2 4;}
 #yiv5871637581  
 #yiv5871637581 p.yiv5871637581MsoNormal, #yiv5871637581
 li.yiv5871637581MsoNormal, #yiv5871637581
 div.yiv5871637581MsoNormal
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}
 #yiv5871637581 a:link, #yiv5871637581
 span.yiv5871637581MsoHyperlink
{color:blue;text-decoration:underline;}
 #yiv5871637581 a:visited, #yiv5871637581
 span.yiv5871637581MsoHyperlinkFollowed
{color:purple;text-decoration:underline;}
 #yiv5871637581 span.yiv5871637581EmailStyle17
{color:#1F497D;}
 #yiv5871637581 .yiv5871637581MsoChpDefault
{}
  _filtered #yiv5871637581 {margin:1.0in 1.0in 1.0in 1.0in;}
 #yiv5871637581 div.yiv5871637581WordSection1
{}
 #yiv5871637581 
 
 Reusing the bootstrapping node
 could have caused this, but hard to tell. Since you have
 only 7 nodes, have you tried doing a few rolling restarts of
 all nodes
  to let gossip settle ? Also, the node is pingable from
 other nodes even though it says Unreachable below. Correct
 ? 
    
 Based on nodetool status, it
 appears the node has streamed all the data it needs, but it
 doesn’t think it has joined the ring yet. Does cqlsh work
 on that node
  ?  
    
 From: Mir Tanvir Hossain
 [mailto:mir.tanvir.hoss...@gmail.com]
 
 
 Sent: Thursday, April 21, 2016 11:51 AM
 
 To: user@cassandra.apache.org
 
 Subject: Re: Problem Replacing a Dead Node
 
    
 
 Here is a bit more detail
 of the whole situation. I am hoping someone can help me out
 here. 
 
    
 
 
 We have a seven node
 cluster. One the nodes started to have issues but it was
 running. We decided to add a new node, and remove the
 problematic node after the new node joins. However, the new
 node did not join the cluster even after three
  days. Hence, we decided to go with the replacement option.
 We shutdown the problematic node. After that, we stopped
 cassandra on the bootstraping node, deleted all the data,
 and restarted that node as the replacement node for the
 problematic node.  
 
 
    
 
 
 Since, we reused the
 bootstrapping node as the replacement node, I am wondering
 whether that is causing any issue. Any insights are
 appreciated.  
 
 
    
 
 
 This is the output of
 nodetool describecluster from the replacement node, and two
 other nodes. 
 
 
    
 
 
 
 mhossain@cassandra-24:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-13:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-09:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 cassandra-24 (10.0.7.4) is
 the replacement node. 10.0.7.91 is the ip address of the
 dead node. 
 
 
    
 
 
 -Mir  
 
 
 
 
    
 
 On Thu, Apr 21, 2016 at 10:02
 AM, Mir Tanvir Hossain 
 wrote: 
 
 
 Hi, I am trying to replace a
 

Re: Combining two clusters/keyspaces into single cluster

2016-04-21 Thread Jan
HI ; 

Your objective is add the Keyspace 2 to cluster 1. 
The documentation  link being referred to is to add a new datacenter [not 
applicable to you].

You need to : 
a. take a snapshot of keyspace 2  on cluster2
b. use sstable loader to copy the keyspace2  onto cluster 1
c. run a 'nodetool repair'   on cluster 1
d. de-commission cluster2.

You are ready to use cluster 1 [with both keyspaces within it]

Hope this helps
Jan



On Thu, 4/21/16, Arlington Albertson  wrote:

 Subject: Combining two clusters/keyspaces into single cluster
 To: user@cassandra.apache.org
 Date: Thursday, April 21, 2016, 6:15 PM
 
 Hey Folks,
 I've been looking through various
 documentations, but I'm either overlooking something
 obvious or not wording it correctly, but the gist of my
 problem is this:
 I have two cassandra clusters, with two separate
 keyspaces on EC2. We'll call them as
 follows:
 cluster1 (DC name, cluster name,
 etc...)keyspace1 (only exists on
 cluster1)
 cluster2 (DC name, cluster name,
 etc...)keyspace2 (only exists on
 cluster2)
 I need to
 perform the following:- take keyspace2,
 and add it to cluster1 so that all nodes can serve the
 traffic- needs to happen "live" so that
 I can repoint new instances to the cluster1 endpoints and
 they'll just start working, and no longer directly use
 cluster2- eventually, tear down cluster2 (easy
 with a `nodetool decommission` after verifying all seeds
 have been changed, etc...)
 
 This doc seems to be the closest I've found
 thus 
far:https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
 
 Is that the appropriate guide for this and
 I'm just over thinking it? Or is there something else I
 should be looking at?
 Also, this is DSC C* 2.1.13. 
 TIA!
 -AA


Re: Problem Replacing a Dead Node

2016-04-21 Thread Mir Tanvir Hossain
I will try a rolling restart to see whether that helps. The replacement
node is pingable from other cassandra nodes. I also was able to telnet to
the storage port (7000) of the replacement node as well from another node.
cqlsh doesn't work on the new node. When does gossip settle?

Is there anyway to force the node to join the ring?

-Mir

On Thu, Apr 21, 2016 at 4:34 PM, Anubhav Kale 
wrote:

> Reusing the bootstrapping node could have caused this, but hard to tell.
> Since you have only 7 nodes, have you tried doing a few rolling restarts of
> all nodes to let gossip settle ? Also, the node is pingable from other
> nodes even though it says Unreachable below. Correct ?
>
>
>
> Based on nodetool status, it appears the node has streamed all the data it
> needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on
> that node ?
>
>
>
> *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
> *Sent:* Thursday, April 21, 2016 11:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Problem Replacing a Dead Node
>
>
>
> Here is a bit more detail of the whole situation. I am hoping someone can
> help me out here.
>
>
>
> We have a seven node cluster. One the nodes started to have issues but it
> was running. We decided to add a new node, and remove the problematic node
> after the new node joins. However, the new node did not join the cluster
> even after three days. Hence, we decided to go with the replacement option.
> We shutdown the problematic node. After that, we stopped cassandra on the
> bootstraping node, deleted all the data, and restarted that node as the
> replacement node for the problematic node.
>
>
>
> Since, we reused the bootstrapping node as the replacement node, I am
> wondering whether that is causing any issue. Any insights are appreciated.
>
>
>
> This is the output of nodetool describecluster from the replacement node,
> and two other nodes.
>
>
>
> mhossain@cassandra-24:~$ nodetool describecluster
>
> Cluster Information:
>
> Name: App
>
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
> Schema versions:
>
> 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
>
>
> mhossain@cassandra-13:~$ nodetool describecluster
>
> Cluster Information:
>
> Name: App
>
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
> Schema versions:
>
> 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
> UNREACHABLE: [10.0.7.91, 10.0.7.4]
>
>
>
>
>
> mhossain@cassandra-09:~$ nodetool describecluster
>
> Cluster Information:
>
> Name: App
>
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
> Schema versions:
>
> 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
> 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]
>
>
>
> UNREACHABLE: [10.0.7.91, 10.0.7.4]
>
>
>
>
>
> cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip
> address of the dead node.
>
>
>
> -Mir
>
>
>
> On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain <
> mir.tanvir.hoss...@gmail.com> wrote:
>
> Hi, I am trying to replace a dead node with by following
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
> .
> It's been 3 full days since the replacement node started, and the node is
> still not showing up as part of the cluster on OpsCenter. I was wondering
> whether the delay is due to the fact that I have a test keyspace with
> replication factor of one? If I delete that keyspace, would the new node
> successfully replace the dead node? Any general insight will be hugely
> appreciated.
>
>
>
> Thanks,
>
> Mir
>
>
>
>
>
>
>


RE: Problem Replacing a Dead Node

2016-04-21 Thread Anubhav Kale
Reusing the bootstrapping node could have caused this, but hard to tell. Since 
you have only 7 nodes, have you tried doing a few rolling restarts of all nodes 
to let gossip settle ? Also, the node is pingable from other nodes even though 
it says Unreachable below. Correct ?

Based on nodetool status, it appears the node has streamed all the data it 
needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on that 
node ?

From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
Sent: Thursday, April 21, 2016 11:51 AM
To: user@cassandra.apache.org
Subject: Re: Problem Replacing a Dead Node

Here is a bit more detail of the whole situation. I am hoping someone can help 
me out here.

We have a seven node cluster. One the nodes started to have issues but it was 
running. We decided to add a new node, and remove the problematic node after 
the new node joins. However, the new node did not join the cluster even after 
three days. Hence, we decided to go with the replacement option. We shutdown 
the problematic node. After that, we stopped cassandra on the bootstraping 
node, deleted all the data, and restarted that node as the replacement node for 
the problematic node.

Since, we reused the bootstrapping node as the replacement node, I am wondering 
whether that is causing any issue. Any insights are appreciated.

This is the output of nodetool describecluster from the replacement node, and 
two other nodes.

mhossain@cassandra-24:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]


mhossain@cassandra-13:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]

UNREACHABLE: [10.0.7.91, 10.0.7.4]


mhossain@cassandra-09:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 
10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]

UNREACHABLE: [10.0.7.91, 10.0.7.4]


cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip address of 
the dead node.

-Mir

On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain 
> wrote:
Hi, I am trying to replace a dead node with by following 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
 It's been 3 full days since the replacement node started, and the node is 
still not showing up as part of the cluster on OpsCenter. I was wondering 
whether the delay is due to the fact that I have a test keyspace with 
replication factor of one? If I delete that keyspace, would the new node 
successfully replace the dead node? Any general insight will be hugely 
appreciated.

Thanks,
Mir





Combining two clusters/keyspaces into single cluster

2016-04-21 Thread Arlington Albertson
Hey Folks,

I've been looking through various documentations, but I'm either
overlooking something obvious or not wording it correctly, but the gist of
my problem is this:

I have two cassandra clusters, with two separate keyspaces on EC2. We'll
call them as follows:

*cluster1* (DC name, cluster name, etc...)
*keyspace1* (only exists on cluster1)

*cluster2* (DC name, cluster name, etc...)
*keyspace2* (only exists on cluster2)

I need to perform the following:
- take keyspace2, and add it to cluster1 so that all nodes can serve the
traffic
- needs to happen "live" so that I can repoint new instances to the
cluster1 endpoints and they'll just start working, and no longer directly
use cluster2
- eventually, tear down cluster2 (easy with a `nodetool decommission` after
verifying all seeds have been changed, etc...)

This doc seems to be the closest I've found thus far:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Is that the appropriate guide for this and I'm just over thinking it? Or is
there something else I should be looking at?

Also, this is DSC C* 2.1.13.

TIA!

-AA


Re: Limit 1

2016-04-21 Thread Bryan Cheng
As far as I know, the answer is yes, however it is unlikely that the cursor
will have to probe very far to find a valid row unless your data is highly
bursty. The key cache (assuming you have it enabled) will allow the query
to skip unrelated rows in its search.

However I would caution against TTL'ing the world and generating a 1-to-1
ratio of writes to deletes.

One approach you can try is to compound your primary key with the hour.
Then the latest hour of events can be retrieved by PK lookup. If you delete
older rows that are outside of the partition you're operating on, your
cursor will not have to skip tombstones to find valid results.

On Wed, Apr 20, 2016 at 11:37 AM, Jimmy Lin  wrote:

> I have a following table(using default sized tier compaction) that its
> column get TTLed every hour(as we want to keep only the last 1 hour events)
>
> And I do
> Select * from mytable where object_id = ‘’ LIMIT 1;
>
> And since query only interested in last/latest value, will cassandra need
> to scan multiple sstables or potentially skipping tombstones data just to
> get the top of the latest data?
>
> Or is it smart enough to know the beginning of the sstables and get the
> result very efficiently?
>
>
> CREATE TABLE mytable (
> object_id text,
> created timeuuid,
> my_data text
> PRIMARY KEY (object_id, created)
> ) WITH CLUSTERING ORDER BY (created DESC)
>


Re: Problem Replacing a Dead Node

2016-04-21 Thread Mir Tanvir Hossain
Here is a bit more detail of the whole situation. I am hoping someone can
help me out here.

We have a seven node cluster. One the nodes started to have issues but it
was running. We decided to add a new node, and remove the problematic node
after the new node joins. However, the new node did not join the cluster
even after three days. Hence, we decided to go with the replacement option.
We shutdown the problematic node. After that, we stopped cassandra on the
bootstraping node, deleted all the data, and restarted that node as the
replacement node for the problematic node.

Since, we reused the bootstrapping node as the replacement node, I am
wondering whether that is causing any issue. Any insights are appreciated.

This is the output of nodetool describecluster from the replacement node,
and two other nodes.

mhossain@cassandra-24:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4, 10.0.7.190,
10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176]


mhossain@cassandra-13:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100,
10.0.7.195, 10.0.7.160, 10.0.7.176]

UNREACHABLE: [10.0.7.91, 10.0.7.4]


mhossain@cassandra-09:~$ nodetool describecluster
Cluster Information:
Name: App
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100,
10.0.7.195, 10.0.7.160, 10.0.7.176]

UNREACHABLE: [10.0.7.91, 10.0.7.4]


cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip
address of the dead node.

-Mir

On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain <
mir.tanvir.hoss...@gmail.com> wrote:

> Hi, I am trying to replace a dead node with by following
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
> It's been 3 full days since the replacement node started, and the node is
> still not showing up as part of the cluster on OpsCenter. I was wondering
> whether the delay is due to the fact that I have a test keyspace with
> replication factor of one? If I delete that keyspace, would the new node
> successfully replace the dead node? Any general insight will be hugely
> appreciated.
>
> Thanks,
> Mir
>
>
>


Re: Problem Replacing a Dead Node

2016-04-21 Thread Mir Tanvir Hossain
Hi Jeff, thanks for getting back to me.

I have gone through the output of nodetool netstats, and it seems all the
streams are 100% completed as per the output. What else do you think is
going wrong?

-Mir

On Thu, Apr 21, 2016 at 10:27 AM, Jeff Jirsa 
wrote:

> The keyspace with RF=1 may lose data, but isn’t blocking the replacement.
>
> The most likely cause of the delay is hung streaming. Run `nodetool
> netstats` on the joining (replacement) node. Do the byte counters change?
> If not, streaming is hung, and you’ll likely need to restart the process.
> If so, use the byte counters to calculate streaming percentage complete and
> extrapolate.
>
>
>
> From: Mir Tanvir Hossain
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, April 21, 2016 at 10:02 AM
> To: "user@cassandra.apache.org"
> Subject: Problem Replacing a Dead Node
>
> Hi, I am trying to replace a dead node with by following
> https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
> It's been 3 full days since the replacement node started, and the node is
> still not showing up as part of the cluster on OpsCenter. I was wondering
> whether the delay is due to the fact that I have a test keyspace with
> replication factor of one? If I delete that keyspace, would the new node
> successfully replace the dead node? Any general insight will be hugely
> appreciated.
>
> Thanks,
> Mir
>
>
>


Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Sotirios Delimanolis
We're running G1 at the moment, both young and mixed collections. 

On Thursday, April 21, 2016 11:07 AM, Jake Luciani  wrote:
 

 What kind of collection? if its par new I wouldn't worry.
On Thu, Apr 21, 2016 at 2:02 PM, Sotirios Delimanolis  
wrote:

Should this be of any concern? Are the corresponding threads spending too long 
in this JNI critical region and delaying GC?
I don't get that impression at all from the GC log timings. They're very 
reasonable.
On Thursday, April 21, 2016 10:57 AM, Jake Luciani  wrote:
 

 It's only used by the Snappy and LZ4 Compressors
On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis  
wrote:

According to this Oracle document, GCLocker Initiated GC
is triggered when a JNI critical region was released. GC is blocked when 
any thread is in the JNI Critical region.If GC was requested during that 
period, that GC is invoked after all the threads come out of the JNI critical 
region.

What part of Cassandra's implementation does anything with JNI?
In our GC logs, this is by far the most common reason for GC pauses.




-- 
http://twitter.com/tjake

   



-- 
http://twitter.com/tjake

  

Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
What kind of collection? if its par new I wouldn't worry.

On Thu, Apr 21, 2016 at 2:02 PM, Sotirios Delimanolis 
wrote:

> Should this be of any concern? Are the corresponding threads spending too
> long in this JNI critical region and delaying GC?
>
> I don't get that impression at all from the GC log timings. They're very
> reasonable.
>
> On Thursday, April 21, 2016 10:57 AM, Jake Luciani 
> wrote:
>
>
> It's only used by the Snappy and LZ4 Compressors
>
> On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis <
> sotodel...@yahoo.com> wrote:
>
> According to this Oracle document
> , GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>
>
>
> --
> http://twitter.com/tjake
>
>
>


-- 
http://twitter.com/tjake


Re: Problem Replacing a Dead Node

2016-04-21 Thread Mir Tanvir Hossain
Hi Anubhav, thanks for getting back to me. here is the information that you
requested.

datastax agent is running on the node. However, in the agent log I see

ERROR [clojure-agent-send-off-pool-4] 2016-04-21 17:51:46,055 Can't connect
to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042
(com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot
connect))), retrying soon.
ERROR [clojure-agent-send-off-pool-5] 2016-04-21 17:51:46,056 Can't connect
to Cassandra (All host(s) tried for query failed (tried: /10.0.7.4:9042
(com.datastax.driver.core.TransportException: [/10.0.7.4:9042] Cannot
connect))), retrying soon.

I am guessing this is because the node is not accepting any reads yet.


here is the output of nodetool status from the replacement node

mhossain@cassandra-24:~$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID
Rack
UN  10.0.7.80   117.32 GB  256 14.2%
 508b6503-e342-41bf-9baf-1f16ed1ebfc8  1a
UN  10.0.7.468.46 GB   256 15.1%
 b9a99507-ef83-441a-a007-da91144fae8f  1a
UN  10.0.7.190  106.54 GB  256 15.0%
 32cb119e-e13d-45db-89d1-a4385c47cee2  1a
UN  10.0.7.100  80.99 GB   256 13.5%
 efe3f327-48e8-4105-b096-a7f5c85736f9  1a
UN  10.0.7.195  105.9 GB   256 14.1%
 96403b7e-57fd-4b84-9607-745ec2d826df  1a
UN  10.0.7.160  98.42 GB   256 13.9%
 3a788a95-63f9-44f2-af91-9f49de75db63  1a
UN  10.0.7.176  93.04 GB   256 14.3%
 d9124ced-847d-474e-a230-ea67ba46dfa8  1a


here is the output of nodetool status from another existing node on the
cluster:

mhossain@cassandra-13:~$ nodetool status
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns   Host ID
Rack
UN  10.0.7.80   117.32 GB  256 14.2%
 508b6503-e342-41bf-9baf-1f16ed1ebfc8  1a
UN  10.0.7.190  106.54 GB  256 15.0%
 32cb119e-e13d-45db-89d1-a4385c47cee2  1a
UN  10.0.7.100  80.99 GB   256 13.5%
 efe3f327-48e8-4105-b096-a7f5c85736f9  1a
UN  10.0.7.195  105.9 GB   256 14.1%
 96403b7e-57fd-4b84-9607-745ec2d826df  1a
DN  10.0.7.91   115.97 GB  256 15.1%
 b9a99507-ef83-441a-a007-da91144fae8f  1a
UN  10.0.7.160  98.42 GB   256 13.9%
 3a788a95-63f9-44f2-af91-9f49de75db63  1a
UN  10.0.7.176  93.04 GB   256 14.3%
 d9124ced-847d-474e-a230-ea67ba46dfa8  1a


10.0.7.91 is the node I am trying to replace.


Here is the output of tail -n 50 /var/log/cassandra/system.log

INFO [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8885324152940404221.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,812 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8894597858951527418.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8895886558199074220.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8943679396315445898.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
8971093763454238578.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
9147567932890414079.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,813 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
9201669617284985565.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1671) Relocating ranges:
 INFO [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1665) Nodes /10.0.7.91 and /10.0.7.4 have the same token
930880968512921941.  Ignoring /10.0.7.91
DEBUG [GossipStage:1] 2016-04-21 17:58:01,814 StorageService.java (line
1671) Relocating ranges:
DEBUG [PendingRangeCalculator:1] 2016-04-21 17:58:01,814
PendingRangeCalculatorService.java (line 128) No 

Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Sotirios Delimanolis
Should this be of any concern? Are the corresponding threads spending too long 
in this JNI critical region and delaying GC?
I don't get that impression at all from the GC log timings. They're very 
reasonable.
On Thursday, April 21, 2016 10:57 AM, Jake Luciani  wrote:
 

 It's only used by the Snappy and LZ4 Compressors
On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis  
wrote:

According to this Oracle document, GCLocker Initiated GC
is triggered when a JNI critical region was released. GC is blocked when 
any thread is in the JNI Critical region.If GC was requested during that 
period, that GC is invoked after all the threads come out of the JNI critical 
region.

What part of Cassandra's implementation does anything with JNI?
In our GC logs, this is by far the most common reason for GC pauses.




-- 
http://twitter.com/tjake

  

Re: What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Jake Luciani
It's only used by the Snappy and LZ4 Compressors

On Thu, Apr 21, 2016 at 1:54 PM, Sotirios Delimanolis 
wrote:

> According to this Oracle document
> , GCLocker
> Initiated GC
>
> is triggered when a JNI critical region was released. GC is blocked
> when any thread is in the JNI Critical region.
> If GC was requested during that period, that GC is invoked after all
> the threads come out of the JNI critical region.
>
> What part of Cassandra's implementation does anything with JNI?
>
> In our GC logs, this is by far the most common reason for GC pauses.
>
>


-- 
http://twitter.com/tjake


What does Cassandra use (JNI?) that triggers GCLocker Initiated GCs?

2016-04-21 Thread Sotirios Delimanolis
According to this Oracle document, GCLocker Initiated GC
is triggered when a JNI critical region was released. GC is blocked when 
any thread is in the JNI Critical region.If GC was requested during that 
period, that GC is invoked after all the threads come out of the JNI critical 
region.

What part of Cassandra's implementation does anything with JNI?
In our GC logs, this is by far the most common reason for GC pauses.


Re: Problem Replacing a Dead Node

2016-04-21 Thread Jeff Jirsa
The keyspace with RF=1 may lose data, but isn’t blocking the replacement.

The most likely cause of the delay is hung streaming. Run `nodetool netstats` 
on the joining (replacement) node. Do the byte counters change? If not, 
streaming is hung, and you’ll likely need to restart the process. If so, use 
the byte counters to calculate streaming percentage complete and extrapolate.



From:  Mir Tanvir Hossain
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, April 21, 2016 at 10:02 AM
To:  "user@cassandra.apache.org"
Subject:  Problem Replacing a Dead Node

Hi, I am trying to replace a dead node with by following 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
 It's been 3 full days since the replacement node started, and the node is 
still not showing up as part of the cluster on OpsCenter. I was wondering 
whether the delay is due to the fact that I have a test keyspace with 
replication factor of one? If I delete that keyspace, would the new node 
successfully replace the dead node? Any general insight will be hugely 
appreciated. 

Thanks,
Mir





smime.p7s
Description: S/MIME cryptographic signature


RE: Problem Replacing a Dead Node

2016-04-21 Thread Anubhav Kale
Is the datastax-agent running fine on the node ? What does nodetool status and 
system.log show ?

From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
Sent: Thursday, April 21, 2016 10:02 AM
To: user@cassandra.apache.org
Subject: Problem Replacing a Dead Node

Hi, I am trying to replace a dead node with by following 
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
 It's been 3 full days since the replacement node started, and the node is 
still not showing up as part of the cluster on OpsCenter. I was wondering 
whether the delay is due to the fact that I have a test keyspace with 
replication factor of one? If I delete that keyspace, would the new node 
successfully replace the dead node? Any general insight will be hugely 
appreciated.

Thanks,
Mir




Problem Replacing a Dead Node

2016-04-21 Thread Mir Tanvir Hossain
Hi, I am trying to replace a dead node with by following
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
It's been 3 full days since the replacement node started, and the node is
still not showing up as part of the cluster on OpsCenter. I was wondering
whether the delay is due to the fact that I have a test keyspace with
replication factor of one? If I delete that keyspace, would the new node
successfully replace the dead node? Any general insight will be hugely
appreciated.

Thanks,
Mir


Re: Alternative approach to setting up new DC

2016-04-21 Thread Jan
Jens;

I am unsure that you need to enable Replication & also use the sstable loader.
You could load the data into the new DC and susbsequently alter the keyspace to 
replicate from the older DC. 

Cheers
Jan



On Thu, 4/21/16, Jens Rantil  wrote:

 Subject: Re: Alternative approach to setting up new DC
 To: user@cassandra.apache.org
 Date: Thursday, April 21, 2016, 9:00 AM
 
 Hi,
 I never got
 any response here, but just wanted to share that I went to a
 Cassandra meet-up in Stockholm yesterday where I talked to
 two knowledgable Cassandra people that verified that the
 approach below should work. The most important thing is that
 the backup must be fully imported before gc_grace_seconds
 after when the backup is taken.
 As of me, I managed to a get a more
 stable VPN setup and did not have to go down this
 path.
 Cheers,Jens
 
 On Mon, Apr
 18, 2016 at 10:15 AM Jens Rantil 
 wrote:
 Hi,
 I am
 provisioning a new datacenter for an existing cluster. A
 rather shaky VPN connection is hindering me from making a
 "nodetool rebuild" bootstrap on the new DC.
 Interestingly, I have a full fresh database snapshot/backup
 at the same location as the new DC (transferred outside of
 the VPN). I am now considering the following
 approach:Make
 sure my clients are using the old DC.
 Provision the new nodes in new
 DC.
 ALTER the keyspace to enable
 replicas on the new DC. This will start replicating all
 writes from old DC to new DC.
 Before
 gc_grace_seconds after operation 3) above, use sstableloader
 to stream my backup to the new nodes.
 For
 safety precaution, do a full repair.
 Could you see any issues with
 doing this?
 Cheers,Jens-- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
 For urgent matters you can reach me at
 +46-708-84 18
 32.-- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60
 Stockholm, Sweden
 For urgent matters you can
 reach me at +46-708-84 18 32.


Unable to reliably count keys on a thrift CF

2016-04-21 Thread Carlos Alonso
Hi guys.

I've been struggling for the last days to find a reliable and stable way to
count keys in a thrift column family.

My idea is to basically iterate the whole ring using the token function, as
documented here:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/paging_c.html in batches
of 1 records

The only corner case is that if there were more than 1 records in a
single partition (not the case, but the program should still handle it) it
explores the partition in depth by getting all records for that particular
token (see below). In the end, all keys are saved into a hash to guarantee
uniqueness. The count of unique keys is always different (and random,
sometimes more keys, sometimes less are retrieved) and, of course, I'm sure
no activity is going on in that cf.

I'm running Cassandra 2.1.11 with MurMur3 partitioner. RF=3 and CL=QUORUM

the column family structure is

CREATE TABLE tbl (
key blob,
column1 ascii,
value blob,
PRIMARY KEY(key, column1)
)

and I'm running the following script

connection = open_cql_connection
results = connection.execute("SELECT token(key), key FROM tbl LIMIT 1")

keys_hash = {} // Hash to save the keys to guarantee uniqueness
last_token = nil
token = nil

while results != nil
  results.each do |row|
keys_hash[row['key']] = true
token = row['token(key)']
  end
  if token == last_token
results = connection.execute("SELECT token(key), key FROM tbl WHERE
token(key) = #{token}")
  else
results = connection.execute("SELECT token(key), key FROM tbl WHERE
token(key) >= #{token} LIMIT 1")
  end
  last_token = token
end

puts keys.keys.count

What am I missing?

Thanks!

Carlos Alonso | Software Engineer | @calonso 


Re: Alternative approach to setting up new DC

2016-04-21 Thread Jens Rantil
Hi,

I never got any response here, but just wanted to share that I went to a
Cassandra meet-up in Stockholm yesterday where I talked to two knowledgable
Cassandra people that verified that the approach below should work. The
most important thing is that the backup must be fully imported before
gc_grace_seconds after when the backup is taken.

As of me, I managed to a get a more stable VPN setup and did not have to go
down this path.

Cheers,
Jens

On Mon, Apr 18, 2016 at 10:15 AM Jens Rantil  wrote:

> Hi,
>
> I am provisioning a new datacenter for an existing cluster. A rather shaky
> VPN connection is hindering me from making a "nodetool rebuild" bootstrap
> on the new DC. Interestingly, I have a full fresh database snapshot/backup
> at the same location as the new DC (transferred outside of the VPN). I am
> now considering the following approach:
>
>1. Make sure my clients are using the old DC.
>2. Provision the new nodes in new DC.
>3. ALTER the keyspace to enable replicas on the new DC. This will
>start replicating all writes from old DC to new DC.
>4. Before gc_grace_seconds after operation 3) above, use sstableloader
>to stream my backup to the new nodes.
>5. For safety precaution, do a full repair.
>
> Could you see any issues with doing this?
>
> Cheers,
> Jens
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>
-- 

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: When are hints written?

2016-04-21 Thread Jens Rantil
Hi again Bo,

I assume this is the piece of documentation you are referring to?
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance

> If a replica node is overloaded or unavailable, and the failure detector
has not yet marked it down, then expect most or all writes to that node to
fail after the timeout triggered by write_request_timeout_in_ms,

which defaults to 10 seconds. During that time, Cassandra writes the hint
when the timeout is reached.

I'm not an expert on this, but the way I've seen is that hints are written
stored as soon as there is _any_ issues writing a mutation
(insert/update/delete) to a node. By "issue", that essentially means that a
node hasn't acknowledged back to the coordinator that the write succeeded
within write_request_timeout_in_ms. This includes TCP/socket timeouts,
connection issues or that the node is down. The hints are stored for a
maximum timespan defaulting to 3 hours.

Cheers,
Jens

On Thu, Apr 21, 2016 at 8:06 AM Bo Finnerup Madsen 
wrote:

> Hi Jens,
>
> Thank you for the tip!
> ALL would definitely cure our hints issue, but as you note, it is not
> optimal as we are unable to take down nodes without clients failing.
>
> I am most probably overlooking something in the documentation, but I
> cannot see any description of when hints are written other than when a node
> is marked as being down. And since none of our nodes have been marked as
> being down (at least according to the logs), I suspect that there is some
> timeout that governs when hints are written?
>
> Regarding your other post: Yes, 3.0.3 is pretty new. But we are new to
> this cassandra game, and our schema-fu is not strong enough for us to
> create a schema without using materialized views :)
>
>
> ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil :
>
>> Hi Bo,
>>
>> > In our case, I would like for the cluster to wait for the write to be
>> persisted on the relevant nodes before returning an ok to the client.
>> But I don't know which knobs to turn to accomplish this? or if it is even
>> possible :)
>>
>> This is what write consistency option is for. Have a look at
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
>> Note, however that if you use ALL, your clients will fail (throw exception,
>> depending on language) as soon as a single partition can't be written. This
>> means you can't do online maintenance of a Cassandra node (such as
>> upgrading it etc.) without experiencing write issues.
>>
>> Cheers,
>> Jens
>>
>> On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen <
>> bo.gunder...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We have a small 5 node cluster of m4.xlarge clients that receives writes
>>> from ~20 clients. The clients will write as fast as they can, and the whole
>>> process is limited by the write performance of the cassandra cluster.
>>> After we have tweaked our schema to avoid large partitions, the load is
>>> going ok and we don't see any warnings or errors in the cassandra logs. But
>>> we do see quite a lot of hint handoff activity. During the load, the
>>> cassandra nodes are quite loaded, with linux reporting a load as high as 20.
>>>
>>> I have read the available documentation on how hints works, and to my
>>> understanding hints should only be written if a node is down. But as far as
>>> I can see, none of the nodes are marked as down during the load. So I
>>> suspect I am missing something :)
>>> We have configured the servers with write_request_timeout_in_ms: 12
>>> and the clients with a timeout of 13, but still get hints stored.
>>>
>>> In our case, I would like for the cluster to wait for the write to be
>>> persisted on the relevant nodes before returning an ok to the client. But I
>>> don't know which knobs to turn to accomplish this? or if it is even
>>> possible :)
>>>
>>> We are running cassandra 3.0.3, with 8Gb heap and a replication factor
>>> of 3.
>>>
>>> Thank you in advance!
>>>
>>> Yours sincerely,
>>>   Bo Madsen
>>>
>> --
>>
>> Jens Rantil
>> Backend Developer @ Tink
>>
>> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
>> For urgent matters you can reach me at +46-708-84 18 32.
>>
> --

Jens Rantil
Backend Developer @ Tink

Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at +46-708-84 18 32.


Re: A few misbehaving nodes

2016-04-21 Thread Erik Forsberg



On 2016-04-19 15:54, sai krishnam raju potturi wrote:

hi;
   do we see any hung process like Repairs on those 3 nodes?  what 
does "nodetool netstats" show??


No hung process from what I can see.

root@cssa02-06:~# nodetool tpstats
Pool NameActive   Pending  Completed Blocked  
All time blocked
ReadStage 0 01530227 
0 0
RequestResponseStage  0 0   19230947 
0 0
MutationStage 0 0   37059234 
0 0
ReadRepairStage   0 0  80178 
0 0
ReplicateOnWriteStage 0 0  0 
0 0
GossipStage   0 0  43003 
0 0
CacheCleanupExecutor  0 0  0 
0 0
MigrationStage0 0  0 
0 0
MemoryMeter   0 0267 
0 0
FlushWriter   0 0202 
0 5
ValidationExecutor0 0212 
0 0
InternalResponseStage 0 0  0 
0 0
AntiEntropyStage  0 0427 
0 0
MemtablePostFlusher   0 0669 
0 0
MiscStage 0 0212 
0 0
PendingRangeCalculator0 0 70 
0 0
CompactionExecutor0 0   1206 
0 0
commitlog_archiver0 0  0 
0 0
HintedHandoff 0 1113 
0 0


Message type   Dropped
RANGE_SLICE  1
READ_REPAIR  0
PAGED_RANGE  0
BINARY   0
READ   219
MUTATION 3
_TRACE   0
REQUEST_RESPONSE 2
COUNTER_MUTATION 0

root@cssa02-06:~# nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 75317
Mismatch (Blocking): 0
Mismatch (Background): 11
Pool NameActive   Pending  Completed
Commandsn/a 1   19248846
Responses   n/a 0   19875699

\EF


RE: Cassandra 2.0.x OOM during bootstrap

2016-04-21 Thread Michael Fong
Hi, all,

Here is some more information on before the OOM happened on the rebooted node 
in a 2-node test cluster:


1.   It seems the schema version has changed on the rebooted node after 
reboot, i.e.
Before reboot,
Node 1: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,326 MigrationManager.java 
(line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f
Node 2: DEBUG [MigrationStage:1] 2016-04-19 11:09:42,122 MigrationManager.java 
(line 328) Gossiping my schema version 4cb463f8-5376-3baf-8e88-a5cc6a94f58f

After rebooting node 2,
Node 2: DEBUG [main] 2016-04-19 11:18:18,016 MigrationManager.java (line 328) 
Gossiping my schema version f5270873-ba1f-39c7-ab2e-a86db868b09b



2.   After reboot, both nods repeatedly send MigrationTask to each other - 
we suspect it is related to the schema version (Digest) mismatch after Node 2 
rebooted:
The node2  keeps submitting the migration task over 100+ times to the other 
node.
INFO [GossipStage:1] 2016-04-19 11:18:18,261 Gossiper.java (line 1011) Node 
/192.168.88.33 has restarted, now UP
INFO [GossipStage:1] 2016-04-19 11:18:18,262 TokenMetadata.java (line 414) 
Updating topology for /192.168.88.33
INFO [GossipStage:1] 2016-04-19 11:18:18,263 StorageService.java (line 1544) 
Node /192.168.88.33 state jump to normal
INFO [GossipStage:1] 2016-04-19 11:18:18,264 TokenMetadata.java (line 414) 
Updating topology for /192.168.88.33
DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102) 
Submitting migration task for /192.168.88.33
DEBUG [GossipStage:1] 2016-04-19 11:18:18,265 MigrationManager.java (line 102) 
Submitting migration task for /192.168.88.33
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
Can't send schema pull request: node /192.168.88.33 is down.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,268 MigrationTask.java (line 62) 
Can't send schema pull request: node /192.168.88.33 is down.
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:1] 2016-04-19 11:18:18,353 Gossiper.java (line 978) 
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,353 MigrationManager.java 
(line 102) Submitting migration task for /192.168.88.33
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:1] 2016-04-19 11:18:18,355 Gossiper.java (line 978) 
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:1] 2016-04-19 11:18:18,355 MigrationManager.java 
(line 102) Submitting migration task for /192.168.88.33
DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.33
INFO [RequestResponseStage:2] 2016-04-19 11:18:18,355 Gossiper.java (line 978) 
InetAddress /192.168.88.33 is now UP
DEBUG [RequestResponseStage:2] 2016-04-19 11:18:18,356 MigrationManager.java 
(line 102) Submitting migration task for /192.168.88.33
.


On the otherhand, Node 1 keeps updating its gossip information, followed by 
receiving and submitting migrationTask afterwards:
DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,332 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:3] 2016-04-19 11:18:18,333 Gossiper.java (line 978) 
InetAddress /192.168.88.34 is now UP
DEBUG [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:4] 2016-04-19 11:18:18,335 Gossiper.java (line 978) 
InetAddress /192.168.88.34 is now UP
DEBUG [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 977) 
removing expire time for endpoint : /192.168.88.34
INFO [RequestResponseStage:3] 2016-04-19 11:18:18,335 Gossiper.java (line 978) 
InetAddress /192.168.88.34 is now UP
..
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,496 
MigrationRequestVerbHandler.java (line 41) Received migration request from 
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,595 
MigrationRequestVerbHandler.java (line 41) Received migration request from 
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,843 
MigrationRequestVerbHandler.java (line 41) Received migration request from 
/192.168.88.34.
DEBUG [MigrationStage:1] 2016-04-19 11:18:18,878 
MigrationRequestVerbHandler.java (line 41) Received migration request from 
/192.168.88.34.
..
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
127) submitting migration task for /192.168.88.34
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
127) submitting migration task for /192.168.88.34
DEBUG [OptionalTasks:1] 2016-04-19 11:19:18,337 MigrationManager.java (line 
127) submitting migration task for /192.168.88.34
.

Has anyone experienced this scenario? Thanks in 

Re: When are hints written?

2016-04-21 Thread Bo Finnerup Madsen
Hi Jens,

Thank you for the tip!
ALL would definitely cure our hints issue, but as you note, it is not
optimal as we are unable to take down nodes without clients failing.

I am most probably overlooking something in the documentation, but I cannot
see any description of when hints are written other than when a node is
marked as being down. And since none of our nodes have been marked as being
down (at least according to the logs), I suspect that there is some timeout
that governs when hints are written?

Regarding your other post: Yes, 3.0.3 is pretty new. But we are new to this
cassandra game, and our schema-fu is not strong enough for us to create a
schema without using materialized views :)

ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil :

> Hi Bo,
>
> > In our case, I would like for the cluster to wait for the write to be
> persisted on the relevant nodes before returning an ok to the client. But
> I don't know which knobs to turn to accomplish this? or if it is even
> possible :)
>
> This is what write consistency option is for. Have a look at
> https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
> Note, however that if you use ALL, your clients will fail (throw exception,
> depending on language) as soon as a single partition can't be written. This
> means you can't do online maintenance of a Cassandra node (such as
> upgrading it etc.) without experiencing write issues.
>
> Cheers,
> Jens
>
> On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen 
> wrote:
>
>> Hi,
>>
>> We have a small 5 node cluster of m4.xlarge clients that receives writes
>> from ~20 clients. The clients will write as fast as they can, and the whole
>> process is limited by the write performance of the cassandra cluster.
>> After we have tweaked our schema to avoid large partitions, the load is
>> going ok and we don't see any warnings or errors in the cassandra logs. But
>> we do see quite a lot of hint handoff activity. During the load, the
>> cassandra nodes are quite loaded, with linux reporting a load as high as 20.
>>
>> I have read the available documentation on how hints works, and to my
>> understanding hints should only be written if a node is down. But as far as
>> I can see, none of the nodes are marked as down during the load. So I
>> suspect I am missing something :)
>> We have configured the servers with write_request_timeout_in_ms: 12
>> and the clients with a timeout of 13, but still get hints stored.
>>
>> In our case, I would like for the cluster to wait for the write to be
>> persisted on the relevant nodes before returning an ok to the client. But I
>> don't know which knobs to turn to accomplish this? or if it is even
>> possible :)
>>
>> We are running cassandra 3.0.3, with 8Gb heap and a replication factor of
>> 3.
>>
>> Thank you in advance!
>>
>> Yours sincerely,
>>   Bo Madsen
>>
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>