Running repair while Cassandra upgrade 2.0.X to 2.1.X

2017-12-05 Thread shini gupta
If we have upgraded Cassandra binaries from 2.0 to 2.1 on ALL the nodes but
upgradesstable is still pending, please provide the impact of following
scenarios:



1. Running nodetool repair on one of the nodes while upgradesstables is
still executing on one or more nodes in the cluster.

2. Running nodetool repair when upgradesstables failed abruptly on some of
the nodes such that some sstable files are in new format while other
sstable files are still in old format.



Even though it may not be recommended to run I/O intensive operations like
repair and upgradesstables simultaneously, can we assume that both the
above sceanrios are now supported and will not break anything, especially
after https://issues.apache.org/jira/browse/CASSANDRA-5772 has been fixed
in 2.0?


Regards
Shini


Re: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-05 Thread Fred Habash
Or, do a full repair after bootstrapping completes?



On Dec 5, 2017 4:43 PM, "Jeff Jirsa"  wrote:

> You cant ask cassandra to stream from the node with the "most recent
> data", because for some rows B may be most recent, and for others C may be
> most recent - you'd have to stream from both (which we don't support).
>
> You'll need to repair (and you can repair before you do the replace to
> avoid the window of time where you violate consistency - use the -hosts
> option to allow repair with a down host, you'll repair A+C, so when B
> starts it'll definitely have all of the data).
>
>
> On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash  wrote:
>
>> Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to
>> node A and B. Before it was written to C, node B crashes. I replaced B and
>> it bootstrapped data from node C.
>>
>>
>>
>> Now, row x is missing from C and B.  If node A crashes, it will be
>> replaced and it will bootstrap from either C or B. As such, row x is now
>> completely gone from the entire ring.
>>
>>
>>
>> Is this scenario possible at all (at least in C* < 3.0).
>>
>>
>>
>> How can a newly replaced node be forced to bootstrap from the node in the
>> replica set that has the most recent data?
>>
>>
>>
>> Otherwise, we have to repair a node immediately after bootstrapping it
>> for a node replacement.
>>
>>
>>
>> Thank you
>>
>>
>>
>
>


Re: Connection refused - 127.0.0.1-Gossip

2017-12-05 Thread Lerh Chuan Low
I think as Jeff mentioned it sounds like a configuration issue, are you
sure you are using the same configmap/however it's being passed in and just
throwing out ideas, maybe the pods are behind a http proxy and you may have
forgotten to pass in the env vars?

On 6 December 2017 at 08:45, Jeff Jirsa  wrote:

> I don't have any k8 clusters to test with, but do you know how your yaml
> translates to cassandra.yaml ? What are the listen/broadcast addresses
> being set?
>
>
> On Tue, Dec 5, 2017 at 6:09 AM, Marek Kadek -T (mkadek - CONSOL PARTNERS
> LTD at Cisco)  wrote:
>
>> We are experiencing following issues with Cassandra on our kubernetes
>> clusters:
>>
>> ```
>>
>> @ kubectl exec -it cassandra-cassandra-0 -- tail
>> /var/log/cassandra/debug.log
>>
>> DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2017-12-05
>> 09:02:06,560 OutboundTcpConnection.java:545 - Unable to connect to
>> localhost/127.0.0.1
>>
>> java.net.ConnectException: Connection refused
>>
>> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_131]
>>
>> at sun.nio.ch.SocketChannelImpl.c
>> onnect(SocketChannelImpl.java:648) ~[na:1.8.0_131]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132)
>> ~[apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnection.connect(OutboundTcpConnection.java:433)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>> at org.apache.cassandra.net.Outbo
>> undTcpConnection.run(OutboundTcpConnection.java:262)
>> [apache-cassandra-3.11.0.jar:3.11.0]
>>
>> ```
>>
>>
>>
>> Basically, it’s tons and tons of the same message over and over (on all
>> clusters, all C* nodes). It tries roughly 4-5 times a second to open a tcp
>> connection to localhost (?) for gossiping.
>>
>>
>>
>> What we know:
>>
>> - does not happen on Cassandra 3.0.15, but happen on 3.11.1 (same
>> configuration).
>>
>> - does happen even on minikube-single-Cassandra “cluster”.
>>
>> - does not happen on docker-compose Cassandra cluster, only on kubernetes
>> one.
>>
>>
>>
>> Our configuration is pretty much this helm chart:
>> https://github.com/kubernetes/charts/blob/master/incubator/c
>> assandra/values.yaml
>>
>>
>>
>> Do you have any idea what it could be related to?
>>
>>
>>
>
>


Re: Connection refused - 127.0.0.1-Gossip

2017-12-05 Thread Jeff Jirsa
I don't have any k8 clusters to test with, but do you know how your yaml
translates to cassandra.yaml ? What are the listen/broadcast addresses
being set?


On Tue, Dec 5, 2017 at 6:09 AM, Marek Kadek -T (mkadek - CONSOL PARTNERS
LTD at Cisco)  wrote:

> We are experiencing following issues with Cassandra on our kubernetes
> clusters:
>
> ```
>
> @ kubectl exec -it cassandra-cassandra-0 -- tail
> /var/log/cassandra/debug.log
>
> DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2017-12-05
> 09:02:06,560 OutboundTcpConnection.java:545 - Unable to connect to
> localhost/127.0.0.1
>
> java.net.ConnectException: Connection refused
>
> at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_131]
>
> at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_131]
>
> at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_131]
>
> at 
> sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
> ~[na:1.8.0_131]
>
> at org.apache.cassandra.net.OutboundTcpConnectionPool.
> newSocket(OutboundTcpConnectionPool.java:146)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
> at org.apache.cassandra.net.OutboundTcpConnectionPool.
> newSocket(OutboundTcpConnectionPool.java:132)
> ~[apache-cassandra-3.11.0.jar:3.11.0]
>
> at org.apache.cassandra.net.OutboundTcpConnection.connect(
> OutboundTcpConnection.java:433) [apache-cassandra-3.11.0.jar:3.11.0]
>
> at org.apache.cassandra.net.OutboundTcpConnection.run(
> OutboundTcpConnection.java:262) [apache-cassandra-3.11.0.jar:3.11.0]
>
> ```
>
>
>
> Basically, it’s tons and tons of the same message over and over (on all
> clusters, all C* nodes). It tries roughly 4-5 times a second to open a tcp
> connection to localhost (?) for gossiping.
>
>
>
> What we know:
>
> - does not happen on Cassandra 3.0.15, but happen on 3.11.1 (same
> configuration).
>
> - does happen even on minikube-single-Cassandra “cluster”.
>
> - does not happen on docker-compose Cassandra cluster, only on kubernetes
> one.
>
>
>
> Our configuration is pretty much this helm chart:
> https://github.com/kubernetes/charts/blob/master/incubator/
> cassandra/values.yaml
>
>
>
> Do you have any idea what it could be related to?
>
>
>


Re: When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-05 Thread Jeff Jirsa
You cant ask cassandra to stream from the node with the "most recent data",
because for some rows B may be most recent, and for others C may be most
recent - you'd have to stream from both (which we don't support).

You'll need to repair (and you can repair before you do the replace to
avoid the window of time where you violate consistency - use the -hosts
option to allow repair with a down host, you'll repair A+C, so when B
starts it'll definitely have all of the data).


On Tue, Dec 5, 2017 at 1:38 PM, Fd Habash  wrote:

> Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to
> node A and B. Before it was written to C, node B crashes. I replaced B and
> it bootstrapped data from node C.
>
>
>
> Now, row x is missing from C and B.  If node A crashes, it will be
> replaced and it will bootstrap from either C or B. As such, row x is now
> completely gone from the entire ring.
>
>
>
> Is this scenario possible at all (at least in C* < 3.0).
>
>
>
> How can a newly replaced node be forced to bootstrap from the node in the
> replica set that has the most recent data?
>
>
>
> Otherwise, we have to repair a node immediately after bootstrapping it for
> a node replacement.
>
>
>
> Thank you
>
>
>


When Replacing a Node, How to Force a Consistent Bootstrap

2017-12-05 Thread Fd Habash
Assume I have cluster of 3 nodes (A,B,C). Row x was written with CL=LQ to node 
A and B. Before it was written to C, node B crashes. I replaced B and it 
bootstrapped data from node C.

Now, row x is missing from C and B.  If node A crashes, it will be replaced and 
it will bootstrap from either C or B. As such, row x is now completely gone 
from the entire ring. 

Is this scenario possible at all (at least in C* < 3.0). 

How can a newly replaced node be forced to bootstrap from the node in the 
replica set that has the most recent data? 

Otherwise, we have to repair a node immediately after bootstrapping it for a 
node replacement.

Thank you



SASI index usage and limitation

2017-12-05 Thread Nicolas Henneaux
Hi,

For a project I am working on, I have a use case implying a column holding an 
expiration date that has to be updated on a regular basis and filtered using a 
slice query.

The table is used to maintain a list of elements to process. First, a list of 
candidate is retrieved (with an expiration date in the past) and then each 
element is updated with a new expiration date corresponding to the maximum 
processing time of the element. When the element processing is finished the 
expiration is updated to the maximum long value, i.e. it never expires. I know 
it looks like a queue and it is an anti-pattern in Cassandra. The full model is 
more complex but I have simplified to the slice query problem.

The table definition is below.
CREATE TABLE IF NOT EXISTS element_status (
partitionkey text,
elementid text,
lockexpirationinmillissinceepoch bigint,
PRIMARY KEY((partitionkey), requestid)) with default_time_to_live = 604800;
The initial insert.
INSERT INTO element_status (partitionkey, elementid,clientCallUuid, 
lockexpirationinmillissinceepoch) VALUES (‘mypartition’, 
‘06e6668c-ebad-4e16-9329-a8854ebf1c32’, 123455);
The query to retrieve the candidates
SELECT * FROM element_status WHERE
partitionKey='partitionKey' AND
lockExpirationInMillisSinceEpoch < 123456
LIMIT 123 ALLOW FILTERING;
The query to “lock” an element, i.e. update the expiration date
UPDATE element_status SET
lockExpirationInMillisSinceEpoch=135456
  WHERE partitionKey='partitionKey' AND
requestId='requestId';
The query to “finish” the element.
UPDATE keyspaceName.async_message_status SET
lockExpirationInMillisSinceEpoch=9223372036854775807
  WHERE partitionKey='partitionKey' AND
requestId='requestId';

For the slice query, a SASI index has been created with the following 
definition.
CREATE CUSTOM INDEX IF NOT EXISTS 
element_status_lock_expiration_in_millis_since_epoch_index ON element_status 
(lockExpirationInMillisSinceEpoch) USING 
'org.apache.cassandra.index.sasi.SASIIndex';

However, the usage of SASI index is not recommended for production. I would 
like to evaluate what are the problems that can occur with this index.

  *   Can it occur that an element is never retrieved even if the initial 
insertion date is in the past?
  *   Same question after one or several updates and an expiration date in the 
past?
  *   Does any performance issue can occur?

The expected volume by partition is around 200k records by day and TTL is one 
week.

Thank you for your support,

Best regards,

Nicolas HENNEAUX
IT Architect

Email:

nicolas.henne...@arhs-developments.com

Tel.:

+32 2 774 01 49

Fax:

+32 2 774 88 31


[id:image001.png@01D32882.048263F0]


Woluwedal 30
B-1932 Zaventem
www.arhs-dev-be.com






Connection refused - 127.0.0.1-Gossip

2017-12-05 Thread Marek Kadek -T (mkadek - CONSOL PARTNERS LTD at Cisco)
We are experiencing following issues with Cassandra on our kubernetes clusters:

```

@ kubectl exec -it cassandra-cassandra-0 -- tail /var/log/cassandra/debug.log

DEBUG [MessagingService-Outgoing-localhost/127.0.0.1-Gossip] 2017-12-05 
09:02:06,560 OutboundTcpConnection.java:545 - Unable to connect to 
localhost/127.0.0.1

java.net.ConnectException: Connection refused

at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_131]

at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_131]

at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_131]

at 
sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_131]

at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:146)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:132)
 ~[apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnection.connect(OutboundTcpConnection.java:433)
 [apache-cassandra-3.11.0.jar:3.11.0]

at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:262)
 [apache-cassandra-3.11.0.jar:3.11.0]

```



Basically, it’s tons and tons of the same message over and over (on all 
clusters, all C* nodes). It tries roughly 4-5 times a second to open a tcp 
connection to localhost (?) for gossiping.



What we know:

- does not happen on Cassandra 3.0.15, but happen on 3.11.1 (same 
configuration).

- does happen even on minikube-single-Cassandra “cluster”.

- does not happen on docker-compose Cassandra cluster, only on kubernetes one.



Our configuration is pretty much this helm chart: 
https://github.com/kubernetes/charts/blob/master/incubator/cassandra/values.yaml



Do you have any idea what it could be related to?