Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Walsh, Stephen
We were chatting to Jon Haddena about a week ago about our tombstone issue 
using Cassandra 2.0.14
To Summarize

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered
We use 1 keyspace with 1 table
Each row have about 40 columns
Each row has a TTL of 10 seconds

We insert about 500 rows per second in a prepared batch** (about 3mb in network 
overhead)
We query the entire table once per second

**This is too enable consistent data, E.G batch in transactional, so we get all 
queried data from one insert and not a mix of 2 or more.


Seems every second we insert, the rows are never deleted by the TTL, or so we 
thought.
After some time we got this message on the query side


###
ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) 
Scanned over 10 tombstones in keyspace.table; query aborted (see 
tombstone_failure_threshold)
ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:91,5,main]
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
###


So we know tombstones are infact being created.
Solution was to change the table schema and set gc_grace_seconds to run every 
60 seconds.
This worked for 20 seconds, then we saw this


###
Read 500 live and 3 tombstoned cells in keyspace.table (see 
tombstone_warn_threshold). 1 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
###

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)
So now we have the gc_grace_seconds set to 10 seoncds.
But its feels very wrong to have it at a low number, especially if we move to a 
larger cluster. This just wont fly.
What are we doing wrong?

We shouldn't increase the tombstone threshold as that is extremely dangerous.


Best Regards
Stephen Walsh






This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Walsh, Stephen
Maybe thanks Michael,
I will give these setting a go,
How do you do you periodic node-tool repairs in the situation, for what I read 
we need to start doing this also.

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair


From: Laing, Michael [mailto:michael.la...@nytimes.com]
Sent: 21 April 2015 16:26
To: user@cassandra.apache.org
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

If you never delete except by ttl, and always write with the same ttl (or 
monotonically increasing), you can set gc_grace_seconds to 0.

That's what we do. There have been discussions on the list over the last few 
years re this topic.

ml

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
We were chatting to Jon Haddena about a week ago about our tombstone issue 
using Cassandra 2.0.14
To Summarize

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered
We use 1 keyspace with 1 table
Each row have about 40 columns
Each row has a TTL of 10 seconds

We insert about 500 rows per second in a prepared batch** (about 3mb in network 
overhead)
We query the entire table once per second

**This is too enable consistent data, E.G batch in transactional, so we get all 
queried data from one insert and not a mix of 2 or more.


Seems every second we insert, the rows are never deleted by the TTL, or so we 
thought.
After some time we got this message on the query side


###
ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) 
Scanned over 10 tombstones in keyspace.table; query aborted (see 
tombstone_failure_threshold)
ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:91,5,main]
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
###


So we know tombstones are infact being created.
Solution was to change the table schema and set gc_grace_seconds to run every 
60 seconds.
This worked for 20 seconds, then we saw this


###
Read 500 live and 3 tombstoned cells in keyspace.table (see 
tombstone_warn_threshold). 1 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, 
localDeletion=2147483647tel:2147483647}
###

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)
So now we have the gc_grace_seconds set to 10 seoncds.
But its feels very wrong to have it at a low number, especially if we move to a 
larger cluster. This just wont fly.
What are we doing wrong?

We shouldn’t increase the tombstone threshold as that is extremely dangerous.


Best Regards
Stephen Walsh






This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-22 Thread Walsh, Stephen
=Android


From:Laing, Michael 
michael.la...@nytimes.commailto:michael.la...@nytimes.com
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's
Hmm - we read/write with Local Quorum always - I'd recommend that as that is 
your 'consistency' defense.

We use python, so I am not familiar with the java driver - but 'file not found' 
indicates something is inconsistent.

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen 
stephen.wa...@aspect.comjavascript:return wrote:
Thanks for all your help Michael,

Our data will change through the day, so data with a TTL will eventually get 
dropped, and new data will appear.
I’d imagine the entire table maybe expire and start over 7-10 times a day.



But on the GC topic, now java Driver now gives this error on the query
I also get “Request did not complete within rpc_timeout.” In cqlsh.

#
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Responses$Error.asException(Responses.java:100) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) 
~[cassandra-driver-core-2.1.4.jar:na]
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) 
~[cassandra-driver-core-2.1.4.jar:na]
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 ~[cassandra-driver-core-2.1.4.jar:na]
#


These queries where taking about 1 second to run when the gc was at 10 seconds 
(same duration as the TTL).

Also seeing a lot of this this stuff in the log file

#
ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:71,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.FileNotFoundException: 
/var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such 
file or directory)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db



Maybe this is a 1 step back 2 steps forward approach?
Any ideas?




From: Laing, Michael [mailto:michael.la...@nytimes.comjavascript:return]
Sent: 21 April 2015 17:09

To: user@cassandra.apache.orgjavascript:return
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

Discussions previously on the list show why this is not a problem in much more 
detail.

If something changes in your cluster: node down, new node, etc - you run repair 
for sure.

We also run periodic repairs prophylactically.

But if you never delete and always ttl by the same amount, you do not have to 
worry about zombie data being resurrected - the main reason for running repair 
within gc_grace_seconds.



On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen 
stephen.wa...@aspect.comjavascript:return wrote:
Maybe thanks Michael,
I will give these setting a go,
How do you do you periodic node-tool repairs in the situation, for what I read 
we need to start doing this also.

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair


From: Laing, Michael [mailto:michael.la...@nytimes.comjavascript:return]
Sent: 21 April 2015 16:26
To: user@cassandra.apache.orgjavascript:return
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

If you

RE: RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-23 Thread Walsh, Stephen
:1307)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
at 
javax.management.remote.rmi.RMIConnectionImpl.getAttribute(RMIConnectionImpl.java:637)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$87(TCPTransport.java:683)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$2/1019453949.run(Unknown
 Source)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



From: Anuj Wadehra [mailto:anujw_2...@yahoo.co.in]
Sent: 21 April 2015 19:04
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

Whats ur sstable count for the CF? I hope compactions are working fine. Also 
check the full stacktrace of FileNotFoundException ..if its related to 
compactionyou can try cleaning compactions_in_progress folder in system 
folder in data directory..there are JIRA issues relating to that.

Thanks
Anuj Wadehra


Sent from Yahoo Mail on 
Androidhttps://overview.mail.yahoo.com/mobile/?.src=Android


From:Laing, Michael michael.la...@nytimes.comjavascript:return
Date:Tue, 21 Apr, 2015 at 10:21 pm
Subject:Re: Cassandra tombstones being created by updating rows with TTL's
Hmm - we read/write with Local Quorum always - I'd recommend that as that is 
your 'consistency' defense.

We use python, so I am not familiar with the java driver - but 'file not found' 
indicates something is inconsistent.

On Tue, Apr 21, 2015 at 12:22 PM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
Thanks for all your help Michael,

Our data will change through the day, so data with a TTL will eventually get 
dropped, and new data will appear.
I’d imagine the entire table maybe expire and start over 7-10 times a day.



But on the GC topic, now java Driver now gives this error on the query
I also get “Request did not complete within rpc_timeout.” In cqlsh.

#
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Responses$Error.asException(Responses.java:100) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) 
~[cassandra-driver-core-2.1.4.jar:na]
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) 
~[cassandra-driver-core-2.1.4.jar:na]
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 ~[cassandra-driver-core-2.1.4.jar:na]
#


These queries where taking about 1 second to run when the gc was at 10 seconds 
(same duration as the TTL).

Also seeing a lot

RE: Cassandra tombstones being created by updating rows with TTL's

2015-04-21 Thread Walsh, Stephen
Thanks for all your help Michael,

Our data will change through the day, so data with a TTL will eventually get 
dropped, and new data will appear.
I’d imagine the entire table maybe expire and start over 7-10 times a day.



But on the GC topic, now java Driver now gives this error on the query
I also get “Request did not complete within rpc_timeout.” In cqlsh.

#
com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout 
during read query at consistency ONE (1 responses were required but only 0 
replica responded)
at 
com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Responses$Error.asException(Responses.java:100) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) 
~[cassandra-driver-core-2.1.4.jar:na]
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra 
timeout during read query at consistency ONE (1 responses were required but 
only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) 
~[cassandra-driver-core-2.1.4.jar:na]
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) 
~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
 ~[cassandra-driver-core-2.1.4.jar:na]
at 
com.datastax.shaded.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
 ~[cassandra-driver-core-2.1.4.jar:na]
#


These queries where taking about 1 second to run when the gc was at 10 seconds 
(same duration as the TTL).

Also seeing a lot of this this stuff in the log file

#
ERROR [ReadStage:71] 2015-04-21 17:11:07,597 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:71,5,main]
java.lang.RuntimeException: java.lang.RuntimeException: 
java.io.FileNotFoundException: 
/var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db (No such 
file or directory)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/lib/cassandra/data/keyspace/table/keyspace-table-jb-5-Data.db



Maybe this is a 1 step back 2 steps forward approach?
Any ideas?




From: Laing, Michael [mailto:michael.la...@nytimes.com]
Sent: 21 April 2015 17:09
To: user@cassandra.apache.org
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

Discussions previously on the list show why this is not a problem in much more 
detail.

If something changes in your cluster: node down, new node, etc - you run repair 
for sure.

We also run periodic repairs prophylactically.

But if you never delete and always ttl by the same amount, you do not have to 
worry about zombie data being resurrected - the main reason for running repair 
within gc_grace_seconds.



On Tue, Apr 21, 2015 at 11:49 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
Maybe thanks Michael,
I will give these setting a go,
How do you do you periodic node-tool repairs in the situation, for what I read 
we need to start doing this also.

https://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair


From: Laing, Michael 
[mailto:michael.la...@nytimes.commailto:michael.la...@nytimes.com]
Sent: 21 April 2015 16:26
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Cassandra tombstones being created by updating rows with TTL's

If you never delete except by ttl, and always write with the same ttl (or 
monotonically increasing), you can set gc_grace_seconds to 0.

That's what we do. There have been discussions on the list over the last few 
years re this topic.

ml

On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
We were chatting to Jon Haddena about a week ago about our tombstone issue 
using Cassandra 2.0.14
To Summarize

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered

Insert Vs Updates - Both create tombstones

2015-05-13 Thread Walsh, Stephen
Quick Question,

Our team is under much debate, we are trying to find out if an Update on a row 
with a TTL will create a tombstone.

E.G

We have one row with a TTL, if we keep updating that row before the TTL is 
hit, will a tombstone be created.
I believe it will, but want to confirm.

So if that's is  true,
And if our TTL is 10 seconds and we update the row every second, will 10 
tombstones be created after 10 seconds? Or just 1?
(and does the same apply for insert)

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Insert Vs Updates - Both create tombstones

2015-05-14 Thread Walsh, Stephen
Thank you,

We are updating the entire row (all columns) each second via the “insert” 
command.
So if we did updates – no tombstones would be created?
But because we are doing inserts- we are creating tombstones for each column 
each insert?

From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: 13 May 2015 12:10
To: user@cassandra.apache.org
Subject: Re: Insert Vs Updates - Both create tombstones

Sorry, wrong thread. Disregard the above

On Wed, May 13, 2015 at 4:08 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:45 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Under the assumption that when you update the columns you also update the TTL 
for the columns then a tombstone won’t be created for those columns.
Remember that TTL is set on columns (or “cells”), not on rows, so your 
description of updating a row is slightly misleading. If every query updates 
different columns then different columns might expire at different times.

From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.commailto:stephen.wa...@aspect.com]
Sent: Wednesday, May 13, 2015 1:35 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Insert Vs Updates - Both create tombstones

Quick Question,

Our team is under much debate, we are trying to find out if an Update on a row 
with a TTL will create a tombstone.

E.G

We have one row with a TTL, if we keep “updating” that row before the TTL is 
hit, will a tombstone be created.
I believe it will, but want to confirm.

So if that’s is  true,
And if our TTL is 10 seconds and we “update” the row every second, will 10 
tombstones be created after 10 seconds? Or just 1?
(and does the same apply for “insert”)

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Insert Vs Updates - Both create tombstones

2015-05-14 Thread Walsh, Stephen
Thanks ☺

I think you might have got your T’s and V’s mixed up ?

So we insert V2 @ T2, then insert V1 @ T1 where T1 is earlier to T2 = V2

Should it not be the  other way around?

So we insert V1 @ T1, then insert V2 @ T2 where T2 is earlier to T2 = V2


So in a tombstone manor over 5 seconds we are looking like this

Second 1
Insert V1, T1 with TTL =5
Second 2
V1, T1 (TTL 4)
Insert V1, T2 with TTL= 5

Second 3
V1, T1 (TTL 3)
V1, T2 (TTL 4)
Insert V1, T3 with TTL= 5

Second 3
V1, T1 (TTL 2)
V1, T2 (TTL 3)
V1, T3 (TTL 4)
Insert V1, T4 with TTL= 5

Second 4
V1, T1 (TTL 1)
V1, T2 (TTL 2)
V1, T3 (TTL 3)
V1, T4 (TTL 4)
Insert V1, T5 with TTL= 5

Second 5
V1, T1 (Tombstoned)
V1, T2 (TTL 1)
V1, T3 (TTL 2)
V1, T4 (TTL 3)
V1, T5 (TTL 4)

Second 6
V1, T1 (Tombstoned)
V1, T2 (Tombstoned)
V1, T3 (TTL 1)
V1, T4 (TTL 2)
V1, T5 (TTL 3)

Second 7
V1, T1 (Tombstoned)
V1, T2 (Tombstoned)
V1, T3 (Tombstoned)
V1, T4 (TTL 1)
V1, T5 (TTL 2)


Second 8
V1, T1 (Tombstoned)
V1, T2 (Tombstoned)
V1, T3 (Tombstoned)
V1, T4 (Tombstoned)
V1, T5 (TTL 1)

Second 8
V1, T1 (Tombstoned)
V1, T2 (Tombstoned)
V1, T3 (Tombstoned)
V1, T4 (Tombstoned)
V1, T5 (Tombstoned)

Second 9
(Minor Compaction run to clean up tombstones)


And if I did an “update“, the result would be the same.
And like you mentioned, if I did a query at “second 4”, the query would be 
based of 5 versions of V1 to query against, and the highest T value would be 
returned.




From: Peer, Oded [mailto:oded.p...@rsa.com]
Sent: 14 May 2015 11:12
To: user@cassandra.apache.org
Subject: RE: Insert Vs Updates - Both create tombstones

If this how you update then you are not creating tombstones.

If you used UPDATE it’s the same behavior. You are simply inserting a new value 
for the cell which does not create a tombstone.
When you modify data by using either the INSERT or the UPDATE command the value 
is stored along with a timestamp indicating the timestamp of the value.
Assume timestamp T1 is before T2 (T1  T2) and you stored value V2 with 
timestamp T2. Then you store V1 with timestamp T1.
Now you have two values of V in the DB: V2,T2, V1,T1
When you read the value of V from the DB you read both V2,T2, V1,T1, which 
may be in different sstables, Cassandra resolves the conflict by comparing the 
timestamp and returns V2.
Compaction will later take care and remove V1,T1 from the DB.


From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: Thursday, May 14, 2015 11:39 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Insert Vs Updates - Both create tombstones

Thank you,

We are updating the entire row (all columns) each second via the “insert” 
command.
So if we did updates – no tombstones would be created?
But because we are doing inserts- we are creating tombstones for each column 
each insert?


From: Ali Akhtar [mailto:ali.rac...@gmail.com]
Sent: 13 May 2015 12:10
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Insert Vs Updates - Both create tombstones

Sorry, wrong thread. Disregard the above

On Wed, May 13, 2015 at 4:08 PM, Ali Akhtar 
ali.rac...@gmail.commailto:ali.rac...@gmail.com wrote:
If specifying 'using' timestamp, the docs say to provide microseconds, but 
where are these microseconds obtained from? I have regular java.util.Date 
objects, I can get the time in milliseconds (i.e the unix timestamp), how would 
I convert that to microseconds?

On Wed, May 13, 2015 at 3:45 PM, Peer, Oded 
oded.p...@rsa.commailto:oded.p...@rsa.com wrote:
Under the assumption that when you update the columns you also update the TTL 
for the columns then a tombstone won’t be created for those columns.
Remember that TTL is set on columns (or “cells”), not on rows, so your 
description of updating a row is slightly misleading. If every query updates 
different columns then different columns might expire at different times.

From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.commailto:stephen.wa...@aspect.com]
Sent: Wednesday, May 13, 2015 1:35 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Insert Vs Updates - Both create tombstones

Quick Question,

Our team is under much debate, we are trying to find out if an Update on a row 
with a TTL will create a tombstone.

E.G

We have one row with a TTL, if we keep “updating” that row before the TTL is 
hit, will a tombstone be created.
I believe it will, but want to confirm.

So if that’s is  true,
And if our TTL is 10 seconds and we “update” the row every second, will 10 
tombstones be created after 10 seconds? Or just 1?
(and does the same apply for “insert”)

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its

RE: Drop/Create table with same CF Name

2015-05-25 Thread Walsh, Stephen
Totally agree with this.

From: Ken Hancock [mailto:ken.hanc...@schange.com]
Sent: 22 May 2015 17:10
To: user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

This issue really needs to be strongly highlighted in the documentation.  
Imagine someone noticing similarities between SQL and CQL and assuming that one 
could actually drop a table and recreate the table as a method of deleting all 
the data...totally crazy, I know...

On Fri, May 22, 2015 at 11:06 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
Thanks for the link,

I don’t think your link is what I hand mind – considering it mentioned to be 
fixed in 2.0.13

I was referring to this “won’t fix” issue
https://issues.apache.org/jira/browse/CASSANDRA-4857

We’ve seen this a few times, we’re we drop a key space and re-create it and get 
inconstancy issues.
It even happened to me mid Message thread on these boards.

http://www.mail-archive.com/user%40cassandra.apache.org/msg42139.html



From: Sebastian Estevez 
[mailto:sebastian.este...@datastax.commailto:sebastian.este...@datastax.com]
Sent: 22 May 2015 14:46

To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

 I’m aware of issues where recreating key spaces can cause inconsistency in 
2.0.13 if memTables are not flushed beforehand , is this the issues that is 
resolved?

Yep, that's https://issues.apache.org/jira/browse/CASSANDRA-7511


All the best,



[datastax_logo.png]http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615tel:954%20905%208615 | 
sebastian.este...@datastax.commailto:sebastian.este...@datastax.com

[linkedin.png]https://www.linkedin.com/company/datastax[facebook.png]https://www.facebook.com/datastax[twitter.png]https://twitter.com/datastax[g+.png]https://plus.google.com/+Datastax/about[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]http://feeds.feedburner.com/datastax

[http://datastax.com/all/images/cs_logo_color_sm.png]http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, May 22, 2015 at 7:53 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
Can someone share the content on this link please, I’m aware of issues where 
recreating key spaces can cause inconsistency in 2.0.13 if memTables are not 
flushed beforehand , is this the issues that is resolved?


From: Ken Hancock 
[mailto:ken.hanc...@schange.commailto:ken.hanc...@schange.com]
Sent: 21 May 2015 17:13
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

Thanks Mark (though that article doesn't appear publicly accessible for others).
Truncate would have been the tool of choice, however my understanding is 
truncate fails unless all nodes are up and running which makes it a 
non-workable choice since we can't determine when failures will occur.
Ken

On Thu, May 21, 2015 at 11:00 AM, Mark Reddy 
mark.l.re...@gmail.commailto:mark.l.re...@gmail.com wrote:
Yes, it's a known issue. For more information on the topic see this support 
post from DataStax:

https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

Mark

On 21 May 2015 at 15:31, Ken Hancock 
ken.hanc...@schange.commailto:ken.hanc...@schange.com wrote:

We've been running into the reused key cache issue (CASSANDRA-5202) with 
dropping and recreating the same table in Cassandra 1.2.18 so we've been 
testing with key caches disabled which does not seem to solve the issue.  In 
the latest logs it seems that old SSTables metadata gets read after the tables 
have been deleted by the previous drop, eventually causing an exception and the 
Thrift interface shut down.

At this point is it a known issue that one CANNOT reuse a table name prior to 
Cassandra 2.1 ?







This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read

RE: Nodetool on 2.1.5

2015-05-22 Thread Walsh, Stephen
Thanks all,

Looks like the long way of doing nodetool cleanup repair on each machine is 
want needs to be done now.
So what about the graphs on Ops Center not working , anyone seen this before?

Steve

From: Jason Wee [mailto:peich...@gmail.com]
Sent: 22 May 2015 04:27
To: user@cassandra.apache.org
Subject: Re: Nodetool on 2.1.5

yeah, you can confirm in the log such as the one below.
WARN  [main] 2015-05-22 11:23:25,584 CassandraDaemon.java:81 - JMX is not 
enabled to receive remote connections. Please see cassandra-env.sh for more 
info.
we are running c* with ipv6, cqlsh works superb but not on local link.
$ nodetool -h fe80::224:1ff:fed7:82ea cfstats system.hints;
nodetool: Failed to connect to 'fe80::224:1ff:fed7:82ea:7199' - 
ConnectException: 'Connection refused'.


On Fri, May 22, 2015 at 12:39 AM, Yuki Morishita 
mor.y...@gmail.commailto:mor.y...@gmail.com wrote:
For security reason, Cassandra changes JMX to listen localhost only
since version 2.0.14/2.1.4.http://2.1.4.
From NEWS.txt:

The default JMX config now listens to localhost only. You must enable
the other JMX flags in cassandra-env.sh manually. 

On Thu, May 21, 2015 at 11:05 AM, Walsh, Stephen
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
 Just wondering if anyone else is seeing this issue on the nodetool after
 installing 2.1.5





 This works

 nodetool -h 127.0.0.1 cfstats keyspace.table



 This works

 nodetool -h localhost cfstats keyspace.table



 This works

 nodetool cfstats keyspace.table



 This doesn’t work

 nodetool -h 192.168.1.10 cfstats keyspace.table

 nodetool: Failed to connect to ‘192.168.1.10:7199http://192.168.1.10:7199' 
 - ConnectException:
 'Connection refused'.



 Where 192.168.1.10 is the machine IP,

 All firewalls are disabled and it worked fine on version 2.0.13



 This has happened on both of our upgraded clusters.

 Also no longer able to view the “CF: Total MemTable Size”  “flushes
 pending” in Ops Center 5.1.1, related issue?



 This email (including any attachments) is proprietary to Aspect Software,
 Inc. and may contain information that is confidential. If you have received
 this message in error, please do not read, copy or forward this message.
 Please notify the sender immediately, delete it from your system and destroy
 any copies. You may not further disclose or distribute this email or its
 attachments.


--
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Drop/Create table with same CF Name

2015-05-22 Thread Walsh, Stephen
Can someone share the content on this link please, I’m aware of issues where 
recreating key spaces can cause inconsistency in 2.0.13 if memTables are not 
flushed beforehand , is this the issues that is resolved?


From: Ken Hancock [mailto:ken.hanc...@schange.com]
Sent: 21 May 2015 17:13
To: user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

Thanks Mark (though that article doesn't appear publicly accessible for others).
Truncate would have been the tool of choice, however my understanding is 
truncate fails unless all nodes are up and running which makes it a 
non-workable choice since we can't determine when failures will occur.
Ken

On Thu, May 21, 2015 at 11:00 AM, Mark Reddy 
mark.l.re...@gmail.commailto:mark.l.re...@gmail.com wrote:
Yes, it's a known issue. For more information on the topic see this support 
post from DataStax:

https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

Mark

On 21 May 2015 at 15:31, Ken Hancock 
ken.hanc...@schange.commailto:ken.hanc...@schange.com wrote:

We've been running into the reused key cache issue (CASSANDRA-5202) with 
dropping and recreating the same table in Cassandra 1.2.18 so we've been 
testing with key caches disabled which does not seem to solve the issue.  In 
the latest logs it seems that old SSTables metadata gets read after the tables 
have been deleted by the previous drop, eventually causing an exception and the 
Thrift interface shut down.

At this point is it a known issue that one CANNOT reuse a table name prior to 
Cassandra 2.1 ?







This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Drop/Create table with same CF Name

2015-05-22 Thread Walsh, Stephen
Thanks for the link,

I don’t think your link is what I hand mind – considering it mentioned to be 
fixed in 2.0.13

I was referring to this “won’t fix” issue
https://issues.apache.org/jira/browse/CASSANDRA-4857

We’ve seen this a few times, we’re we drop a key space and re-create it and get 
inconstancy issues.
It even happened to me mid Message thread on these boards.

http://www.mail-archive.com/user%40cassandra.apache.org/msg42139.html



From: Sebastian Estevez [mailto:sebastian.este...@datastax.com]
Sent: 22 May 2015 14:46
To: user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

 I’m aware of issues where recreating key spaces can cause inconsistency in 
2.0.13 if memTables are not flushed beforehand , is this the issues that is 
resolved?

Yep, that's https://issues.apache.org/jira/browse/CASSANDRA-7511


All the best,



[datastax_logo.png]http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | 
sebastian.este...@datastax.commailto:sebastian.este...@datastax.com

[linkedin.png]https://www.linkedin.com/company/datastax[facebook.png]https://www.facebook.com/datastax[twitter.png]https://twitter.com/datastax[g+.png]https://plus.google.com/+Datastax/about[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]http://feeds.feedburner.com/datastax

[http://datastax.com/all/images/cs_logo_color_sm.png]http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

On Fri, May 22, 2015 at 7:53 AM, Walsh, Stephen 
stephen.wa...@aspect.commailto:stephen.wa...@aspect.com wrote:
Can someone share the content on this link please, I’m aware of issues where 
recreating key spaces can cause inconsistency in 2.0.13 if memTables are not 
flushed beforehand , is this the issues that is resolved?


From: Ken Hancock 
[mailto:ken.hanc...@schange.commailto:ken.hanc...@schange.com]
Sent: 21 May 2015 17:13
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Drop/Create table with same CF Name

Thanks Mark (though that article doesn't appear publicly accessible for others).
Truncate would have been the tool of choice, however my understanding is 
truncate fails unless all nodes are up and running which makes it a 
non-workable choice since we can't determine when failures will occur.
Ken

On Thu, May 21, 2015 at 11:00 AM, Mark Reddy 
mark.l.re...@gmail.commailto:mark.l.re...@gmail.com wrote:
Yes, it's a known issue. For more information on the topic see this support 
post from DataStax:

https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1

Mark

On 21 May 2015 at 15:31, Ken Hancock 
ken.hanc...@schange.commailto:ken.hanc...@schange.com wrote:

We've been running into the reused key cache issue (CASSANDRA-5202) with 
dropping and recreating the same table in Cassandra 1.2.18 so we've been 
testing with key caches disabled which does not seem to solve the issue.  In 
the latest logs it seems that old SSTables metadata gets read after the tables 
have been deleted by the previous drop, eventually causing an exception and the 
Thrift interface shut down.

At this point is it a known issue that one CANNOT reuse a table name prior to 
Cassandra 2.1 ?







This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Nodetool on 2.1.5

2015-05-21 Thread Walsh, Stephen
Just wondering if anyone else is seeing this issue on the nodetool after 
installing 2.1.5


This works
nodetool -h 127.0.0.1 cfstats keyspace.table

This works
nodetool -h localhost cfstats keyspace.table

This works
nodetool cfstats keyspace.table

This doesn't work
nodetool -h 192.168.1.10 cfstats keyspace.table
nodetool: Failed to connect to '192.168.1.10:7199' - ConnectException: 
'Connection refused'.

Where 192.168.1.10 is the machine IP,
All firewalls are disabled and it worked fine on version 2.0.13

This has happened on both of our upgraded clusters.
Also no longer able to view the CF: Total MemTable Size  flushes pending 
in Ops Center 5.1.1, related issue?

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: SSTables are not getting removed

2015-11-02 Thread Walsh, Stephen
Thanks to both Nate and Jeff, for both the bug highlighting and the configure 
issues.

We've upgraded to 2.1.11
Lowered our memtable_cleanup_threshold to .11
Lowered out thrift_framed_transport_size_in_mb to 15

We kicked off another run.

The results was that the cassandra failed after 1 hour.
SSTables grew to about 8 before we lost JMX connection.
(so that's about 32000 SSTables in total over all nodes)
Major GC happened every 3 min - 5 min

We then reset for a direct comparison between 2.1.6 & 2.1.11.

There was no difference in the output of 2.1.6 to 2.1.11




From: Nate McCall [mailto:n...@thelastpickle.com]
Sent: 30 October 2015 22:06
To: Cassandra Users 
Subject: Re: SSTables are not getting removed


memtable_offheap_space_in_mb: 4096
memtable_cleanup_threshold: 0.99

^ What led to this setting? You are basically telling Cassandra to not flush 
the highest-traffic memtable until the memtable space is 99% full. With that 
many tables and keyspaces, you are basically locking up everything on the flush 
queue, causing substantial back pressure. If you run 'nodetool tpstats' you 
will probably see a massive number of 'All Time Blocked' for FlushWriter and 
'Dropped' for Mutations.

Actually, this is probably why you are seeing a lot of small tables: commit log 
segments are being filled and blocked from flushing due to the above, so they 
have to attempt to flush repeatedly with whatever is there whenever they get 
the chance.

thrift_framed_transport_size_in_mb: 150

^ This is also a super bad idea. Thrift buffers grow as needed to accomodate 
larger results, but they dont ever shrink. This will lead to a bunch of open 
connections holding onto large, empty byte arrays. This will show up 
immediately in a heap dump inspection.

concurrent_compactors: 4
compaction_throughput_mb_per_sec: 0
endpoint_snitch: GossipingPropertyFileSnitch

This grinds our system to a halt and causes a major GC nearly every second.

So far the only way to get around this is to run a cron job every hour that 
does a "nodetool compact".

What's the output of 'nodetool compactionstats'? CASSANDRA-9882 and 
CASSANDRA-9592 could be to blame (both fixed in recent versions) or this could 
just be a side effect of the memory pressure from the above settings.

Start back at the default settings (except snitch - GPFS is always a good place 
to start) and change settings serially and in small increments based on 
feedback gleaned from monitoring runtimes.


--
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


SSTables are not getting removed

2015-10-30 Thread Walsh, Stephen
Hey all,

First off, thank you for taking the time to read this.

---
SYSTEM SPEC
---

We're using Cassandra Version 2.1.6 (please don't ask us to upgrade just yet 
less you are aware of an existing bug for this issue)
We are running on AWS 4 core . 16 GB server
We are running a 4 core Cluster
We are writing about 1220 rows per second
We are querying about 3400 rows per second
There are 100 Key spaces each with 10 CF's
Our Replication factor is 3
All our data has a TTL of 10 seconds
Our "gc_grace_seconds" is 0
Durable writes is disabled
Read & write Consistency is ONE
Our HEAP size is 8GB (rest of the heap is default)
Our Compaction Strategy is Level Tiered
-XX:CMSInitiatingOccupancyFraction=50

---
Cassandra.yaml
---
Below are the values we've changed

hinted_handoff_enabled: false
data_file_directories:
- /var/lib/cassandra/data
commitlog_directory: /cassandra
saved_caches_directory: /var/lib/cassandra/saved_caches
memtable_offheap_space_in_mb: 4096
memtable_cleanup_threshold: 0.99
memtable_allocation_type: offheap_objects
memtable_flush_writers: 4
thrift_framed_transport_size_in_mb: 150
concurrent_compactors: 4
compaction_throughput_mb_per_sec: 0
endpoint_snitch: GossipingPropertyFileSnitch


---
Issue
---
Over time all our CF's develop about 1 SSTable per 30 min.
Each SSTable contains only 36 rows with data that is all "Marked for deletion"
Over about 15 hours we could have up to 112000 SSTables over all nodes for all 
CF's
This grinds our system to a halt and causes a major GC nearly every second.

So far the only way to get around this is to run a cron job every hour that 
does a "nodetool compact".

Is there any reason that these SSTables are not being deleted during normal 
compaction  / minor GC / major GC ?

Regards
Stephen Walsh




This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Replication of data over 2 Datacentre's, when one node fails we get replica issues

2015-11-18 Thread Walsh, Stephen
Hey all,

We're testing Cassandra failover over 2 Datacentre's.

There are 3 nodes on each.
All CF's have a Replication of 2 on both Datacentre's (DC1:2, DC2:2)

When one Datacentre goes down then all queries go to the other.
This works fine for LOCAL_QUOURM queries. As 2 replicas of the data exist in 
this Datacentre.

However in the scenario where the 2 Datacentre's are up and one node goes down, 
all queries to that Datacentre will fail for LOCAL_QUOURM
This is because the node that failed had the replica data, and there is only 1 
remaining node with the data. So LOCAL_QUOURM which requires 2 nodes with 
replica data will fail

Is there a way to not send these queries to the incomplete Datacentre.
What the best way the handle this?


Regards
Stephen Walsh
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

2015-09-17 Thread Walsh, Stephen
Some more info,

Looking at the Java Memory Dump file.

I see about 400 SSTableScanners  - one for each of our column Families.
Each is about 200MB in size.
And (from what I can see) all of them are reading from a 
"compactions_in_progress-ka-00-Data.db" file

dfile  org.apache.cassandra.io.compress.CompressedRandomAccessReader path = 
"/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-71661-Data.db"
 131840 104

Steve


From: Walsh, Stephen
Sent: 17 September 2015 15:33
To: user@cassandra.apache.org
Subject: Cassandra shutdown during large number of compactions - now fails to 
start with OOM Exception

Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with 
it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, 
cassandra tries to start-up and we get an OutOfMemory Exception.


INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening 
/var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
 (19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space


it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went 
down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / 
commit_log folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

2015-09-17 Thread Walsh, Stephen
Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with 
it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, 
cassandra tries to start-up and we get an OutOfMemory Exception.


INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening 
/var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
 (19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space


it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went 
down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / 
commit_log folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
No such thing as a stupid question☺
I know they exist in some nodes, but if they replicated correctly is a 
different story.
I’m  checking this one now,

Ok, hooked up OpsCenter to see what it was saying,
Out of the 100 keyspaces creted,
9 are missing one CF
2 are missing two CF’s
1 is missing three CF’s

It looks like the replication of the tables did not complete to all nodes?

Looking at each of the 4 nodes at the keyspace with 3 missing CF’s
(via CQLSH_HOST=x.x.x.x cqlsh & “Describe keyspace XXX;”)

Node 1 : has all CF’s
Node 2 : has all CF’s
Node 3 : has all CF’s
Node 4 : has all CF’s


This is indeed very strange….


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 01 October 2015 12:05
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

And that's a stupid one, I know, but does the column you're trying to access 
actually exist?

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 11:09, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
I did think of that and they are all the same version ☺


From: Carlos Alonso [mailto:i...@mrcalonso.com<mailto:i...@mrcalonso.com>]
Sent: 01 October 2015 10:11

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

Hi Stephen.

The UnknownColumnFamilyException made me thought of a possible schema 
disagreement in which any of your nodes has a different version and therefore 
you cannot reach quorum?

Can you run nodetool describecluster and see if all nodes have the same schema 
versions?

Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 09:49, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error

RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
I did think of that and they are all the same version ☺


From: Carlos Alonso [mailto:i...@mrcalonso.com]
Sent: 01 October 2015 10:11
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

Hi Stephen.

The UnknownColumnFamilyException made me thought of a possible schema 
disagreement in which any of your nodes has a different version and therefore 
you cannot reach quorum?

Can you run nodetool describecluster and see if all nodes have the same schema 
versions?

Cheers!

Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso>

On 1 October 2015 at 09:49, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
Thanks Jake, I’ll try test out 2.1.9 to see if it resolved the issue and ill 
try “nodetool resetlocalschema” now to see if it helps.

Cassandra is 2.1.6
OpsCenter is 5.2.1

From: Jake Luciani [mailto:jak...@gmail.com]
Sent: 01 October 2015 14:00
To: user 
Subject: Re: Consistency Issues

Onur, was responding to Stephen's issue.


On Thu, Oct 1, 2015 at 8:56 AM, Onur Yalazı 
> wrote:
Thank you Jake.

The issue is I do not have missing CF's and upgrading beyond 2.1.3 is not a 
possibility because of the deprecation of cql dialects. Our application is 
using Hector and migrating to cql3 is a huge refactoring.



On 01/10/15 15:48, Jake Luciani wrote:
Couple things to try.

1. nodetool resetlocalschema on the nodes with missing CFs. This will refresh 
the schema on the local node.
2. upgrade to 2.1.9. There are some pretty major issues in 2.1.6 (nothing 
specific to this problem but worth upgrading)




--
http://twitter.com/tjake
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-05 Thread Walsh, Stephen
It did, but a ran it again on one node – that node never recovered. ☹

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 02 October 2015 21:20
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

On Fri, Oct 2, 2015 at 1:32 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺

FTR, running resetlocalschema on all nodes (especially simultaneously) seems 
likely to nuke all of your schema.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-02 Thread Walsh, Stephen
Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺


From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: 01 October 2015 16:01
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

You say that you don't think GC is your issue... but did you actually check?  
The reasons you suggest aren't very convincing.  Can you provide your GC 
settings, and take a look at jstat --gccause?

http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option


On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-02 Thread Walsh, Stephen

Using the following cmd  - sudo su cassandra -c "jstat -gccause 4162”

Gave this (not sure if it will present correctly on the webpage)

But during load we only see data move between the survivor spaces in Eden and 
the old gen never really grows

  S0  S1E  O M CCS  YGC   YGCTFGC   
 FGCT  GCTLGCC  GCC
  0.00  70.57  48.69  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  49.02  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  78.38  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  83.99  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.07  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.30  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC
  0.00  70.57  90.40  26.29  97.86  96.62119   14.087 20.100   
14.187 Allocation Failure   No GC

From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: 02 October 2015 09:32
To: user@cassandra.apache.org
Subject: RE: Consistency Issues

Sorry for the late reply, I ran the nodetool resetlocalschema on all nodes but 
in the end it just removed all the schemas and crashed the applications.
I need to reset and try again. I’ll try get you the gc stats today ☺


From: Jonathan Haddad [mailto:j...@jonhaddad.com]
Sent: 01 October 2015 16:01
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

You say that you don't think GC is your issue... but did you actually check?  
The reasons you suggest aren't very convincing.  Can you provide your GC 
settings, and take a look at jstat --gccause?

http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html#gccause_option


On Thu, Oct 1, 2015 at 4:50 AM Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho 
[mailto:sancho.rica...@gmail.com<mailto:sancho.rica...@gmail.com>]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it fro

RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-10-01 Thread Walsh, Stephen
If you’re looking for the clean-up of the old gen in the jvm heap, it doesn’t 
happen.
We have a new gen turning 15 times before its pushed to old gen.
Seems all our data only has a TTL of 10 seconds – very little data is sent to 
the old gen.

Add in heap size of 8GB with a new gen size of 2GB, I don’t think gc is our 
issue.


I’m more worried about error messages in the Cassandra log file that state.


UnknownColumnFamilyException reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

and

cassandra OutboundTcpConnection.java:313 - error writing to Connection.



But I really need to understand this best practice that was mentioned (on 
number of CF’s) by Jack Krupansky.
Anyone more information on this?


Many thanks for all your help guys keep it coming ☺
Steve

From: Ricardo Sancho [mailto:sancho.rica...@gmail.com]
Sent: 01 October 2015 09:39
To: user@cassandra.apache.org
Subject: RE: Consistency Issues


Can you tell us how much time your gcs are taking?
Do you see any especially long ones?
On 1 Oct 2015 09:37, "Walsh, Stephen" 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
There is no load balancer in front of Cassandra,  it’s in front of our 
application.
Everyone seems hung up on this point? But it’s not the root causing of the 
inconsistency issue.

Can anyone verify the best practice for number of CF’s?


From: Robert Coli [mailto:rc...@eventbrite.com<mailto:rc...@eventbrite.com>]
Sent: 30 September 2015 18:45
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

On Wed, Sep 30, 2015 at 9:06 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:

We never had these issue with our first run. Its only when we added another 25% 
of writes.

As Jack said, you are probably pushing your GC over a threshold, leading to 
long pause times and inability to meet quorum.

As Sebastian said, you probably shouldn't need a load balancer in front of 
Cassandra.

=Rob

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Consistency Issues

2015-09-30 Thread Walsh, Stephen
Hi there,

We are having some issues with consistency. I'll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

-

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 
acknowledged the write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that 
its hits Cassandra 3 times quicker.
But we've now increased the Cassandra spec nearly 3 fold, and only for an extra 
250 writes p/s and it still doesn't work.

We're having a hard time finding out why replication is an issue with the size 
of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount 
of CF's in Cassandra the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
Many thanks for your reply Sebastian,
But the load balancer is being used with our applications, not with Cassandra.
It just allows use to increase the through-put to Cassandra

Our Generation Tool  -> Load Balancer -> Our Processing Application -> Cassandra

Sebastian, is jack correct in best practices for the number of CF’s?


From: Sebastian Estevez [mailto:sebastian.este...@datastax.com]
Sent: 30 September 2015 17:29
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

Can you provide exact details on where your load balancer is? Like Michael 
said, you shouldn't need one between your client and the c* cluster if you're 
using a DataStax driver.


All the best,



[datastax_logo.png]<http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | 
sebastian.este...@datastax.com<mailto:sebastian.este...@datastax.com>

[linkedin.png]<https://www.linkedin.com/company/datastax>[facebook.png]<https://www.facebook.com/datastax>[twitter.png]<https://twitter.com/datastax>[g+.png]<https://plus.google.com/+Datastax/about>[https://lh6.googleusercontent.com/24_538J0j5M0NHQx-jkRiV_IHrhsh-98hpi--Qz9b0-I4llvWuYI6LgiVJsul0AhxL0gMTOHgw3G0SvIXaT2C7fsKKa_DdQ2uOJ-bQ6h_mQ7k7iMybcR1dr1VhWgLMxcmg]<http://feeds.feedburner.com/datastax>

[http://datastax.com/images/Summit_Email.png]<http://cassandrasummit-datastax.com/?utm_campaign=summit15_medium=summiticon_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology, 
delivering Apache Cassandra to the world’s most innovative enterprises. 
Datastax is built to be agile, always-on, and predictably scalable to any size. 
With more than 500 customers in 45 countries, DataStax is the database 
technology and transactional backbone of choice for the worlds most innovative 
companies such as Netflix, Adobe, Intuit, and eBay.

On Wed, Sep 30, 2015 at 12:06 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Many thanks all,

The Load balancers are only between our own node and not as a middle-man to 
Cassandra. It’s just so we can push more data into Cassandra.
The only reason we are not using 2.1.9 is time , we haven’t had time to test 
upgrades.

I wasn’t able to find any best practices for number of CF, where do you see 
this documented?
I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.

Errors around a few times a second, about 10 or so.
They are constant.

Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
We don’t seem to get any OOM errors.

We never had these issue with our first run. Its only when we added another 25% 
of writes.

Many thanks for taking the time to reply Jack



From: Jack Krupansky 
[mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>]
Sent: 30 September 2015 16:53
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Consistency Issues

More than "low hundreds" (200 or 300 max, and preferably under 100) of 
tables/column families is not exactly a recommended best practice. You may be 
able to get it to work, but probably only with very heavy tuning (i.e., lots of 
time and playing with options) on your own part. IOW, no quick and easy 
solution.

The only immediate issue that pops to mind is that you are hitting a GC pause 
due to the large heap size and high volume.

How frequent are these errors occurring? Like, how much data can you load 
before the first one pops up, and are they then frequent/constant or just 
occasionally/rarely?

Can you test to see if you can see similar timeouts with say only 100 or 50 
tables? At least that might isolate whether the issue relates at all to the 
number of tables vs. raw data rate or GC pause.

Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so 
that it is only modestly above the minimum required to avoid OOM.


-- Jack Krupansky

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
More information,

I’ve just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 
15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I’ll try my best to explain.

RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
Many thanks all,

The Load balancers are only between our own node and not as a middle-man to 
Cassandra. It’s just so we can push more data into Cassandra.
The only reason we are not using 2.1.9 is time , we haven’t had time to test 
upgrades.

I wasn’t able to find any best practices for number of CF, where do you see 
this documented?
I see a lot of comments on 1,000 CF’s Vs 1,000 key spaces.

Errors around a few times a second, about 10 or so.
They are constant.

Our TTL is 10 seconds on data with gc_grace_seconds set to 0 on each CF.
We don’t seem to get any OOM errors.

We never had these issue with our first run. Its only when we added another 25% 
of writes.

Many thanks for taking the time to reply Jack



From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: 30 September 2015 16:53
To: user@cassandra.apache.org
Subject: Re: Consistency Issues

More than "low hundreds" (200 or 300 max, and preferably under 100) of 
tables/column families is not exactly a recommended best practice. You may be 
able to get it to work, but probably only with very heavy tuning (i.e., lots of 
time and playing with options) on your own part. IOW, no quick and easy 
solution.

The only immediate issue that pops to mind is that you are hitting a GC pause 
due to the large heap size and high volume.

How frequent are these errors occurring? Like, how much data can you load 
before the first one pops up, and are they then frequent/constant or just 
occasionally/rarely?

Can you test to see if you can see similar timeouts with say only 100 or 50 
tables? At least that might isolate whether the issue relates at all to the 
number of tables vs. raw data rate or GC pause.

Sometimes you can reduce/eliminate the GC pause issue by reducing the heap so 
that it is only modestly above the minimum required to avoid OOM.


-- Jack Krupansky

On Wed, Sep 30, 2015 at 11:22 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
More information,

I’ve just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4<http://172.31.22.4>] 2015-09-30 
15:19:14,769 IncomingTcpConnection.java:97 - UnknownColumnFamilyException 
reading from socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen 
[mailto:stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I’ll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

-

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 
acknowledged the write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that 
its hits Cassandra 3 times quicker.
But we’ve now increased the Cassandra spec nearly 3 fold, and only for an extra 
250 writes p/s and it still doesn’t work.

We’re having a hard time finding out why replication is an issue with the size 
of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount 
of CF’s in Cassandra the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and 

RE: Consistency Issues

2015-09-30 Thread Walsh, Stephen
More information,

I've just setup a NTP server to rule out any timing issues.
And I also see this in the Cassandra node log files

MessagingService-Incoming-/172.31.22.4] 2015-09-30 15:19:14,769 
IncomingTcpConnection.java:97 - UnknownColumnFamilyException reading from 
socket; closing
org.apache.cassandra.db.UnknownColumnFamilyException: Couldn't find 
cfId=cf411b50-6785-11e5-a435-e7be20c92086

Any idea what this is related too?
All these tests are run with a clean setup of Cassandra  nodes followed by a 
nodetool repair.
Before any data hits them.


From: Walsh, Stephen [mailto:stephen.wa...@aspect.com]
Sent: 30 September 2015 15:17
To: user@cassandra.apache.org
Subject: Consistency Issues

Hi there,

We are having some issues with consistency. I'll try my best to explain.

We have an application that was able to
Write ~1000 p/s
Read ~300 p/s
Total CF created: 400
Total Keyspaces created : 80

On a 4 node Cassandra Cluster with
Version 2.1.6
Replication : 3
Consistency  (Read & Write) : LOCAL_QUORUM
Cores : 4
Ram : 15 GB
Heap Size 8GB

This was fine and worked, but was pushing our application to the max.

-

Next we added a load balancer (HaProxy) to our application.
So now we have 3 of our nodes talking to 4 Cassandra Nodes with a load of
Write ~1250 p/s
Read 0p/s
Total CF created: 450
Total Keyspaces created : 100

On our application we now see
Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica 
were required but only 1 acknowledged the write)
(we are using java Cassandra driver 2.1.6)

So we increased the number of Cassandra nodes
To 5, then 6  and each time got the same replication error.

So then we double the spec of every node to
8 cores
30GB  RAM
Heap size 15GB

And we still get this replication error (2 replica were required but only 1 
acknowledged the write)

We know that when we introduce HaProxy Load balancer with 3 of our nodes that 
its hits Cassandra 3 times quicker.
But we've now increased the Cassandra spec nearly 3 fold, and only for an extra 
250 writes p/s and it still doesn't work.

We're having a hard time finding out why replication is an issue with the size 
of a cluster.

We tried to get OpsCenter working to monitor the nodes, but due to the amount 
of CF's in Cassandra the datastax-agent takes 90% of the CPU on every node.

Any suggestion / recommendation would be very welcome.

Regards
Stephen Walsh



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

2015-09-21 Thread Walsh, Stephen
Although I didn't get an answer on this, it's worth noting the removing the 
compaction_in_progress folder resolved the issue.

From: Walsh, Stephen
Sent: 17 September 2015 16:37
To: 'user@cassandra.apache.org' <user@cassandra.apache.org>
Subject: RE: Cassandra shutdown during large number of compactions - now fails 
to start with OOM Exception

Some more info,

Looking at the Java Memory Dump file.

I see about 400 SSTableScanners  - one for each of our column Families.
Each is about 200MB in size.
And (from what I can see) all of them are reading from a 
"compactions_in_progress-ka-00-Data.db" file

dfile  org.apache.cassandra.io.compress.CompressedRandomAccessReader path = 
"/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-71661-Data.db"
 131840 104

Steve


From: Walsh, Stephen
Sent: 17 September 2015 15:33
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Cassandra shutdown during large number of compactions - now fails to 
start with OOM Exception

Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with 
it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, 
cassandra tries to start-up and we get an OutOfMemory Exception.


INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening 
/var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
 (19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space


it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went 
down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / 
commit_log folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: cassandra reads are unbalanced

2015-12-02 Thread Walsh, Stephen
Very good questions.

We have reads and writes at LOCAL_ONE.
There are 2 application (1 for each DC) who read and write at the same rate to 
their local DC
(All reads / writes started all perfectly even and degraded over time)

We use DCAwareRoundRobin policy

On update on the nodetool cleanup – it has help but hasn’t balanced all nodes. 
Node 1 on DC2 is still quite high

Node 1 (DC1)  =  1.35k(seeder)
Node 2 (DC1)  =  1.54k
Node 3 (DC1)  =  1.45k

Node 1 (DC2)  =  2.06k   (seeder)
Node 2 (DC2)  =  1.38k
Node 3 (DC2)  =  1.43k


From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: 02 December 2015 14:22
To: user@cassandra.apache.org
Subject: Re: cassandra reads are unbalanced

Which Consistency level do you use for reads ? ONE ? Are you reading from only 
DC1 or from both DC ?
What is the LoadBalancingStrategy you have configured for your driver ? 
TokenAware wrapped on DCAwareRoundRobin ?





On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Hey all,

Thanks for taking the time to help.

So we have 6 cassandra nodes in 2 Data Centers.
Both Data Centers have a replication of 3 – so all nodes have all the data.

Over the last 2 days we’ve noticed that data reads / writes has shifted from 
balanced to unbalanced
(Nodetool status still shows 100% ownership on every node, with similar sizes)


For Example

We monitor the number of reads / writes of every table via the cassandra JMX 
metrics. (cassandra.db.read_count)
Over the last hour of this run

Reads
Node 1 (DC1)  =  1.79k(seeder)
Node 2 (DC1)  =  1.92k
Node 3 (DC1)  =  1.97k

Node 1 (DC2)  =  2.90k   (seeder)
Node 2 (DC2)  =  1.76k
Node 3 (DC2)  =  1.19k

As you see on DC1, everything is pretty well balanced, but on DC2 the reads 
favour Node1 over Node 3.
I ran a nodetool repair yesterday – ran for 6 hours and when completed didn’t 
change the read balance.

Write levels are similar on  DC2, but not as bad a reads.

Anyone any suggestion on how to rebalance? I’m thinking maybe running a 
nodetool cleanup in case some of the keys have shifted?

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: cassandra reads are unbalanced

2015-12-03 Thread Walsh, Stephen
Thanks but keep in mind that both DC should be getting the same load, our 
production applications are behind a round robin load balancer – so each one 
our local application talk to its local Cassandra DataCenter.

It took about 4 hours but the nodetool cleanup eventually balanced all nodes

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: 02 December 2015 16:27
To: user@cassandra.apache.org
Subject: Re: cassandra reads are unbalanced

If you're using the Java driver with LOCAL_ONE and the default load balancing 
strategy (TokenAware wrapped on DCAwareRoundRobin), the driver will always 
select the primary replica. To change this behavior and introduce some 
randomness so that non primary replicas get a chance to serve a read:

new TokenAwarePolicy(new DCAwareRoundRobinPolicy("local_DC"), true).

The second parameter (true) asks the TokenAware policy to "shuffle" replica on 
each request to avoid always returning the primary replica.

On Wed, Dec 2, 2015 at 6:44 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Very good questions.

We have reads and writes at LOCAL_ONE.
There are 2 application (1 for each DC) who read and write at the same rate to 
their local DC
(All reads / writes started all perfectly even and degraded over time)

We use DCAwareRoundRobin policy

On update on the nodetool cleanup – it has help but hasn’t balanced all nodes. 
Node 1 on DC2 is still quite high

Node 1 (DC1)  =  1.35k(seeder)
Node 2 (DC1)  =  1.54k
Node 3 (DC1)  =  1.45k

Node 1 (DC2)  =  2.06k   (seeder)
Node 2 (DC2)  =  1.38k
Node 3 (DC2)  =  1.43k


From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: 02 December 2015 14:22
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: cassandra reads are unbalanced

Which Consistency level do you use for reads ? ONE ? Are you reading from only 
DC1 or from both DC ?
What is the LoadBalancingStrategy you have configured for your driver ? 
TokenAware wrapped on DCAwareRoundRobin ?





On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Hey all,

Thanks for taking the time to help.

So we have 6 cassandra nodes in 2 Data Centers.
Both Data Centers have a replication of 3 – so all nodes have all the data.

Over the last 2 days we’ve noticed that data reads / writes has shifted from 
balanced to unbalanced
(Nodetool status still shows 100% ownership on every node, with similar sizes)


For Example

We monitor the number of reads / writes of every table via the cassandra JMX 
metrics. (cassandra.db.read_count)
Over the last hour of this run

Reads
Node 1 (DC1)  =  1.79k(seeder)
Node 2 (DC1)  =  1.92k
Node 3 (DC1)  =  1.97k

Node 1 (DC2)  =  2.90k   (seeder)
Node 2 (DC2)  =  1.76k
Node 3 (DC2)  =  1.19k

As you see on DC1, everything is pretty well balanced, but on DC2 the reads 
favour Node1 over Node 3.
I ran a nodetool repair yesterday – ran for 6 hours and when completed didn’t 
change the read balance.

Write levels are similar on  DC2, but not as bad a reads.

Anyone any suggestion on how to rebalance? I’m thinking maybe running a 
nodetool cleanup in case some of the keys have shifted?

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: cassandra reads are unbalanced

2015-12-04 Thread Walsh, Stephen
Thanks for your input, but I think I’ve already answered most of your questions.


How many clients do you have performing reads?

--
On Wed, Dec 2, 2015 at 6:44 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote
….
There are 2 application (1 for each DC) who read and write at the same rate to 
their local DC
….







Is your load balancer in front of your clients or between your clients and 
Cassandra?

--
On Thu, Dec 3, 2015 at 4:58 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
…
our production applications are behind a round robin load balancer
…
--

No Load Balancers talk to cassandra – I’m only mentioning this to show that the 
writes / read are evenly distributed over the 2 DC’s






Does Node1 of DC2 have the exact same configuration of hardware of the other 
nodes
Yes





Is it in the same rack
It’s in AWS – but we have it configured via the GossipProperytFileSnitch that 
they are all on unique racks






Maybe your load balancer thinks that node is more capable and handles requests 
faster so that it looks less loaded than the other two nodes
Unlikely, it’s all TCP SSL pass though connections. It doesn’t balance on load, 
it just round robins each request





You might also check the read counts after a very short interval of time to see 
if Node1 is uniformly getting more requests or just occasionally
--
On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote …
We monitor the number of reads / writes of every table via the cassandra JMX 
metrics. (cassandra.db.read_count)
…
--
We can only monitor in 1 hour moving window




Maybe the other two nodes are in a different rack that occasionally has net 
connectivity issues
Unlikely seems its AWS






From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: 03 December 2015 16:11
To: user@cassandra.apache.org
Subject: Re: cassandra reads are unbalanced

How many clients do you have performing reads?

Is your load balancer in front of your clients or between your clients and 
Cassandra?

Does Node1 of DC2 have the exact same configuration of hardware of the other 
nodes? Is it in the same rack? Maybe your load balancer thinks that node is 
more capable and handles requests faster so that it looks less loaded than the 
other two nodes.

You might also check the read counts after a very short interval of time to see 
if Node1 is uniformly getting more requests or just occasionally. Maybe the 
other two nodes are in a different rack that occasionally has net connectivity 
issues so that the requests get diverted by the client/load balancer to Node1 
during those times.


-- Jack Krupansky

On Thu, Dec 3, 2015 at 4:58 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Thanks but keep in mind that both DC should be getting the same load, our 
production applications are behind a round robin load balancer – so each one 
our local application talk to its local Cassandra DataCenter.

It took about 4 hours but the nodetool cleanup eventually balanced all nodes

From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: 02 December 2015 16:27

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: cassandra reads are unbalanced

If you're using the Java driver with LOCAL_ONE and the default load balancing 
strategy (TokenAware wrapped on DCAwareRoundRobin), the driver will always 
select the primary replica. To change this behavior and introduce some 
randomness so that non primary replicas get a chance to serve a read:

new TokenAwarePolicy(new DCAwareRoundRobinPolicy("local_DC"), true).

The second parameter (true) asks the TokenAware policy to "shuffle" replica on 
each request to avoid always returning the primary replica.

On Wed, Dec 2, 2015 at 6:44 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Very good questions.

We have reads and writes at LOCAL_ONE.
There are 2 application (1 for each DC) who read and write at the same rate to 
their local DC
(All reads / writes started all perfectly even and degraded over time)

We use DCAwareRoundRobin policy

On update on the nodetool cleanup – it has help but hasn’t balanced all nodes. 
Node 1 on DC2 is still quite high

Node 1 (DC1)  =  1.35k(seeder)
Node 2 (DC1)  =  1.54k
Node 3 (DC1)  =  1.45k

Node 1 (DC2)  =  2.06k   (seeder)
Node 2 (DC2)  =  1.38k
Node 3 (DC2)  =  1.43k


From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: 02 December 2015 14:22
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: cassandra reads are unbalanced

Which Consistency level do you use for reads ? ONE 

RE: cassandra reads are unbalanced

2015-12-07 Thread Walsh, Stephen
What client Cassandra driver are you using? Java?
Java driver 2.1.8

Is there only a single thread in each client or are there multiple threads
Multi in parallel.

What does your connection code look like
It’s a very large class based on config files, but I believe you’re interested 
in this line


cluster.withLoadBalancingPolicy(
  new 
DCAwareRoundRobinPolicy(config.getString(ConfigurationKeys.CassandraDataCenterName),

config.getInt(ConfigurationKeys.CassandraFailoverDataCenterNodesToLookAt),true))

with each of our application having a different (local) Datacenter name.

Steve

From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: 04 December 2015 16:46
To: user@cassandra.apache.org
Subject: Re: cassandra reads are unbalanced

Thanks for the elaboration. A few more questions...

Is there only a single thread in each client or are there multiple threads 
doing reading in parallel? IOW, does a read need to complete before the next 
read is issued.

What client Cassandra driver are you using? Java?

What does your connection code look like, say compared to the example in the 
doc:
http://docs.datastax.com/en/developer/java-driver/2.0/java-driver/quick_start/qsSimpleClientCreate_t.html

Just to make sure it really is connecting only to the local cluster and using 
round robin and whether it is token aware.


-- Jack Krupansky

On Fri, Dec 4, 2015 at 10:51 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Thanks for your input, but I think I’ve already answered most of your questions.


How many clients do you have performing reads?

--
On Wed, Dec 2, 2015 at 6:44 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote
….
There are 2 application (1 for each DC) who read and write at the same rate to 
their local DC
….







Is your load balancer in front of your clients or between your clients and 
Cassandra?

--
On Thu, Dec 3, 2015 at 4:58 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
…
our production applications are behind a round robin load balancer
…
--

No Load Balancers talk to cassandra – I’m only mentioning this to show that the 
writes / read are evenly distributed over the 2 DC’s






Does Node1 of DC2 have the exact same configuration of hardware of the other 
nodes
Yes





Is it in the same rack
It’s in AWS – but we have it configured via the GossipProperytFileSnitch that 
they are all on unique racks






Maybe your load balancer thinks that node is more capable and handles requests 
faster so that it looks less loaded than the other two nodes
Unlikely, it’s all TCP SSL pass though connections. It doesn’t balance on load, 
it just round robins each request





You might also check the read counts after a very short interval of time to see 
if Node1 is uniformly getting more requests or just occasionally
------
On Wed, Dec 2, 2015 at 3:36 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote …
We monitor the number of reads / writes of every table via the cassandra JMX 
metrics. (cassandra.db.read_count)
…
--
We can only monitor in 1 hour moving window




Maybe the other two nodes are in a different rack that occasionally has net 
connectivity issues
Unlikely seems its AWS






From: Jack Krupansky 
[mailto:jack.krupan...@gmail.com<mailto:jack.krupan...@gmail.com>]
Sent: 03 December 2015 16:11

To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: cassandra reads are unbalanced

How many clients do you have performing reads?

Is your load balancer in front of your clients or between your clients and 
Cassandra?

Does Node1 of DC2 have the exact same configuration of hardware of the other 
nodes? Is it in the same rack? Maybe your load balancer thinks that node is 
more capable and handles requests faster so that it looks less loaded than the 
other two nodes.

You might also check the read counts after a very short interval of time to see 
if Node1 is uniformly getting more requests or just occasionally. Maybe the 
other two nodes are in a different rack that occasionally has net connectivity 
issues so that the requests get diverted by the client/load balancer to Node1 
during those times.


-- Jack Krupansky

On Thu, Dec 3, 2015 at 4:58 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Thanks but keep in mind that both DC should be getting the same load, our 
production applications are behind a round robin load balancer – so each one 
our local application talk to its local Cassandra DataCenter.

It took about 4 hours but the nodetool cleanup eventually balanced all nodes

From: DuyHai Doan [mailto:doanduy...@gmail.com<mailto:doanduy...@gmail.com>]
Sent: 02 December 2015 16:27

To: user@cassa

cassandra reads are unbalanced

2015-12-02 Thread Walsh, Stephen
Hey all,

Thanks for taking the time to help.

So we have 6 cassandra nodes in 2 Data Centers.
Both Data Centers have a replication of 3 - so all nodes have all the data.

Over the last 2 days we've noticed that data reads / writes has shifted from 
balanced to unbalanced
(Nodetool status still shows 100% ownership on every node, with similar sizes)


For Example

We monitor the number of reads / writes of every table via the cassandra JMX 
metrics. (cassandra.db.read_count)
Over the last hour of this run

Reads
Node 1 (DC1)  =  1.79k(seeder)
Node 2 (DC1)  =  1.92k
Node 3 (DC1)  =  1.97k

Node 1 (DC2)  =  2.90k   (seeder)
Node 2 (DC2)  =  1.76k
Node 3 (DC2)  =  1.19k

As you see on DC1, everything is pretty well balanced, but on DC2 the reads 
favour Node1 over Node 3.
I ran a nodetool repair yesterday - ran for 6 hours and when completed didn't 
change the read balance.

Write levels are similar on  DC2, but not as bad a reads.

Anyone any suggestion on how to rebalance? I'm thinking maybe running a 
nodetool cleanup in case some of the keys have shifted?

Regards
Stephen Walsh


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Unable to start one Cassandra node: OutOfMemoryError

2015-12-17 Thread Walsh, Stephen
Glad to help :P


From: Mikhail Strebkov [mailto:streb...@gmail.com]
Sent: 10 December 2015 22:35
To: user@cassandra.apache.org
Subject: Re: Unable to start one Cassandra node: OutOfMemoryError

Steve, thanks a ton! Removing compactions_in_progress helped! Now the node is 
running again.

p.s. Sorry for referring to you by the last name in my last email, I got 
confused.

On Thu, Dec 10, 2015 at 2:09 AM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
8GB is the max recommended for heap size and that’s if you have 32GB or more 
available.

We use 6GB on our 16GB machines and its very stable

The out of memory could be coming from cassandra reloading 
compactions_in_progress into memory, you can check this from the log files if 
needs be.
You can safely delete this folder inside the data directory.

This can happen if you didn’t stop cassandra with a drain command and wait for 
the compactions to finish.
Last time we hit it – was due to testing HA when we forced killed an entire 
cluster.

Steve



From: Jeff Jirsa 
[mailto:jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>]
Sent: 10 December 2015 02:49
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Unable to start one Cassandra node: OutOfMemoryError

8G is probably too small for a G1 heap. Raise your heap or try CMS instead.

71% of your heap is collections – may be a weird data model quirk, but try CMS 
first and see if that behaves better.



From: Mikhail Strebkov
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Date: Wednesday, December 9, 2015 at 5:26 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Subject: Unable to start one Cassandra node: OutOfMemoryError

Hi everyone,

While upgrading our 5 machines cluster from DSE version 4.7.1 (Cassandra 2.1.8) 
to DSE version: 4.8.2 (Cassandra 2.1.11)  one of the nodes can't start with 
OutOfMemoryError.

We're using HotSpot 64-Bit Server VM/1.8.0_45 and G1 garbage collector with 8 
GiB heap.

Average node size is 300 GiB.

I looked at the heap dump with YourKit profiler 
(www.yourkit.com<http://www.yourkit.com>) and it was quite hard since it's so 
big, but can't get much out of it: http://i.imgur.com/fIRImma.png

As far as I understand the report, there are 1,332,812 instances of 
org.apache.cassandra.db.Row which retain 8 GiB. I don't understand why all of 
them are still strongly reachable?

Please help me to debug this. I don't know even where to start.
I feel very uncomfortable with 1 node running 4.8.2, 1 node down and 3 nodes 
running 4.7.1 at the same time.

Thanks,
Mikhail


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


RE: Unable to start one Cassandra node: OutOfMemoryError

2015-12-10 Thread Walsh, Stephen
8GB is the max recommended for heap size and that’s if you have 32GB or more 
available.

We use 6GB on our 16GB machines and its very stable

The out of memory could be coming from cassandra reloading 
compactions_in_progress into memory, you can check this from the log files if 
needs be.
You can safely delete this folder inside the data directory.

This can happen if you didn’t stop cassandra with a drain command and wait for 
the compactions to finish.
Last time we hit it – was due to testing HA when we forced killed an entire 
cluster.

Steve



From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: 10 December 2015 02:49
To: user@cassandra.apache.org
Subject: Re: Unable to start one Cassandra node: OutOfMemoryError

8G is probably too small for a G1 heap. Raise your heap or try CMS instead.

71% of your heap is collections – may be a weird data model quirk, but try CMS 
first and see if that behaves better.



From: Mikhail Strebkov
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, December 9, 2015 at 5:26 PM
To: "user@cassandra.apache.org"
Subject: Unable to start one Cassandra node: OutOfMemoryError

Hi everyone,

While upgrading our 5 machines cluster from DSE version 4.7.1 (Cassandra 2.1.8) 
to DSE version: 4.8.2 (Cassandra 2.1.11)  one of the nodes can't start with 
OutOfMemoryError.

We're using HotSpot 64-Bit Server VM/1.8.0_45 and G1 garbage collector with 8 
GiB heap.

Average node size is 300 GiB.

I looked at the heap dump with YourKit profiler 
(www.yourkit.com) and it was quite hard since it's so 
big, but can't get much out of it: http://i.imgur.com/fIRImma.png

As far as I understand the report, there are 1,332,812 instances of 
org.apache.cassandra.db.Row which retain 8 GiB. I don't understand why all of 
them are still strongly reachable?

Please help me to debug this. I don't know even where to start.
I feel very uncomfortable with 1 node running 4.8.2, 1 node down and 3 nodes 
running 4.7.1 at the same time.

Thanks,
Mikhail


This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Balancing tokens over 2 datacenter

2016-04-13 Thread Walsh, Stephen
Hi there,

So we have 2 datacenter with 3 nodes each.
Replication factor is 3 per DC (so each node has all data)

We have an application in each DC that writes that Cassandra DC.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

As such, all keyspaces and tables where created on DC1.
The effect of this is that all reads are now going to DC1 and ignoring DC2

WE’ve tried doing , nodetool repair / cleanup – but the reads always go to DC1?

Anyone know how to rebalance the tokens over DC’s?


Regards
Steve


P.S I know about this article
http://www.datastax.com/dev/blog/balancing-your-cassandra-cluster
But its doesn’t answer my question regarding 2 DC’s token balancing

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.


Re: Balancing tokens over 2 datacenter

2016-04-13 Thread Walsh, Stephen
Right again Alain
We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC 
application to point to that Cassandra DC’s.



From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, 13 April 2016 at 15:52
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

Steve,

This cluster looks just great.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

This is the only thing to solve, and it happens in the client side 
configuration.

What client do you use?

Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as pointed 
in Bhuvan's link 
http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
 ? You can use some other

Then make sure to deploy this on clients on that need to use 'DC1' and 'new 
DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.

Make sure ports are open.

This should be it,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-04-13 16:28 GMT+02:00 Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>>:
Thanks for your helps guys,

As you guessed our schema is

{'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND 
durable_writes = false;


Our reads and writes on LOCAL_ONE with each application (now) using its own DC 
as its preferred DC

Here is the nodetool status for one of our tables (all tabes are created the 
same way)


Datacenter: DC1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.0.149  14.6 MB256 100.0%
0f497235-a0bb-4e47-9434-dd0e126aa432  RAC3

UN  X.0.0.251  12.33 MB   256 100.0%
a1307717-4b61-4d57-8658-50460d6d54a1  RAC1

UN  X.0.0.79   21.54 MB   256 100.0%
f353c8f3-6b7c-483b-ad9a-3d66d469079e  RAC2

Datacenter: DC2

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.2.32   18.08 MB   256 100.0%
103a1cb3-6580-44bd-bf97-28ae160e1119  RAC6

UN  X.0.2.211  12.46 MB   256 100.0%
8c8dd5ba-806d-43eb-9ee5-af463e443f46  RAC5

UN  X.0.2.186  12.58 MB   256 100.0%
aef904ba-aaab-47f1-9bdc-cc1e0c676f61  RAC4


We ran the nodetool repair and cleanup in case the nodes where balanced but 
needed cleaning up – this was not the case :(


Steve


From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, 13 April 2016 at 14:48
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

Hi Steve,

As such, all keyspaces and tables where created on DC1.
The effect of this is that all reads are now going to DC1 and ignoring DC2

I think this is not exactly true. When tables are created, they are created on 
a specific keyspace, no matter where you send the alter schema command, schema 
will propagate to all the datacenters the keyspace is replicated to.

So the question is: Is your keyspace using 'DC1: 3, DC2: 3' as replication 
factors? Could you show us the schema and a nodetool status as well?

WE’ve tried doing , nodetool repair / cleanup – but the reads always go to DC1

Trying to do random things is often not a good idea. For example, as each node 
holds 100% of the data, cleanup is an expensive no-op :-).

Anyone know how to rebalance the tokens over DC’s?

Yes, I can help on that, but I need to know your current status.

Basically, your(s) keyspace(s) must be using RF of 3 on the 2 DCs as mentioned, 
your client to be configured to stick to the DC in their zone (use a DCAware 
policy with the DC name + LOCAL_ONE/QUORUM, see Bhuvan's links) and things 
should be better.

If you need more detailed help, let us know what is unclear to you and provide 
us with 'nodetool status' output and with your schema (at least keyspaces 
config).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache C

Re: Balancing tokens over 2 datacenter

2016-04-13 Thread Walsh, Stephen
Thanks for your helps guys,

As you guessed our schema is

{'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND 
durable_writes = false;


Our reads and writes on LOCAL_ONE with each application (now) using its own DC 
as its preferred DC

Here is the nodetool status for one of our tables (all tabes are created the 
same way)


Datacenter: DC1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.0.149  14.6 MB256 100.0%
0f497235-a0bb-4e47-9434-dd0e126aa432  RAC3

UN  X.0.0.251  12.33 MB   256 100.0%
a1307717-4b61-4d57-8658-50460d6d54a1  RAC1

UN  X.0.0.79   21.54 MB   256 100.0%
f353c8f3-6b7c-483b-ad9a-3d66d469079e  RAC2

Datacenter: DC2

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.2.32   18.08 MB   256 100.0%
103a1cb3-6580-44bd-bf97-28ae160e1119  RAC6

UN  X.0.2.211  12.46 MB   256 100.0%
8c8dd5ba-806d-43eb-9ee5-af463e443f46  RAC5

UN  X.0.2.186  12.58 MB   256 100.0%
aef904ba-aaab-47f1-9bdc-cc1e0c676f61  RAC4


We ran the nodetool repair and cleanup in case the nodes where balanced but 
needed cleaning up – this was not the case :(


Steve


From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, 13 April 2016 at 14:48
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

Hi Steve,

As such, all keyspaces and tables where created on DC1.
The effect of this is that all reads are now going to DC1 and ignoring DC2

I think this is not exactly true. When tables are created, they are created on 
a specific keyspace, no matter where you send the alter schema command, schema 
will propagate to all the datacenters the keyspace is replicated to.

So the question is: Is your keyspace using 'DC1: 3, DC2: 3' as replication 
factors? Could you show us the schema and a nodetool status as well?

WE’ve tried doing , nodetool repair / cleanup – but the reads always go to DC1

Trying to do random things is often not a good idea. For example, as each node 
holds 100% of the data, cleanup is an expensive no-op :-).

Anyone know how to rebalance the tokens over DC’s?

Yes, I can help on that, but I need to know your current status.

Basically, your(s) keyspace(s) must be using RF of 3 on the 2 DCs as mentioned, 
your client to be configured to stick to the DC in their zone (use a DCAware 
policy with the DC name + LOCAL_ONE/QUORUM, see Bhuvan's links) and things 
should be better.

If you need more detailed help, let us know what is unclear to you and provide 
us with 'nodetool status' output and with your schema (at least keyspaces 
config).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com







2016-04-13 15:32 GMT+02:00 Bhuvan Rawal 
<bhu1ra...@gmail.com<mailto:bhu1ra...@gmail.com>>:
This could be because of the way you have configured the policy, have a look at 
the below links for configuring the policy

https://datastax.github.io/python-driver/api/cassandra/policies.html

http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node

Regards,
Bhuvan

On Wed, Apr 13, 2016 at 6:54 PM, Walsh, Stephen 
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Hi there,

So we have 2 datacenter with 3 nodes each.
Replication factor is 3 per DC (so each node has all data)

We have an application in each DC that writes that Cassandra DC.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

As such, all keyspaces and tables where created on DC1.
The effect of this is that all reads are now going to DC1 and ignoring DC2

WE’ve tried doing , nodetool repair / cleanup – but the reads always go to DC1?

Anyone know how to rebalance the tokens over DC’s?


Regards
Steve


P.S I know about this article
http://www.datastax.com/dev/blog/balancing-your-cassandra-cluster
But its doesn’t answer my question regarding 2 DC’s token balancing

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy an

Re: Balancing tokens over 2 datacenter

2016-04-14 Thread Walsh, Stephen
Thanks Guys,

I tend to agree that its a viable configuration, (but I’m biased)
We use datadog monitoring to view read writes per node,

We see all the writes are balanced (due to the replication factor) but all 
reads only go to DC1.
So with the configuration I believed confirmed :)

Any way to balance the primary tokens over the two DC’s? :)

Steve

From: Jeff Jirsa <jeff.ji...@crowdstrike.com<mailto:jeff.ji...@crowdstrike.com>>
Reply-To: <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, 14 April 2016 at 03:05
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with RF=3 
in both of those Dcs. That’s exactly what you’d expect it to be, and a 
perfectly viable production config for many workloads.



From: Anuj Wadehra
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Date: Wednesday, April 13, 2016 at 6:02 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
Subject: Re: Balancing tokens over 2 datacenter

Hi Stephen Walsh,

As per the nodetool output, every node owns 100% of the range. This indicates 
wrong configuration. It would be good, if you verify and share following 
properties of yaml on all nodes:

Num tokens,seeds, cluster name,listen address, initial token.

Also, which snitch are you using? If you use propertyfilesnitch, please share 
cassandra-topology.properties too.



Thanks
Anuj

Sent from Yahoo Mail on 
Android<https://overview.mail.yahoo.com/mobile/?.src=Android>

On Wed, 13 Apr, 2016 at 9:46 PM, Walsh, Stephen
<stephen.wa...@aspect.com<mailto:stephen.wa...@aspect.com>> wrote:
Right again Alain
We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC 
application to point to that Cassandra DC’s.



From: Alain RODRIGUEZ <arodr...@gmail.com>
Reply-To: "user@cassandra.apache.org" 
<user@cassandra.apache.org>
Date: Wednesday, 13 April 2016 at 15:52
To: "user@cassandra.apache.org" 
<user@cassandra.apache.org>
Subject: Re: Balancing tokens over 2 datacenter

Steve,

This cluster looks just great.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

This is the only thing to solve, and it happens in the client side 
configuration.

What client do you use?

Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as pointed 
in Bhuvan's link 
http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
 ? You can use some other

Then make sure to deploy this on clients on that need to use 'DC1' and 'new 
DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.

Make sure ports are open.

This should be it,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-04-13 16:28 GMT+02:00 Walsh, Stephen 
<stephen.wa...@aspect.com>:
Thanks for your helps guys,

As you guessed our schema is

{'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND 
durable_writes = false;


Our reads and writes on LOCAL_ONE with each application (now) using its own DC 
as its preferred DC

Here is the nodetool status for one of our tables (all tabes are created the 
same way)


Datacenter: DC1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.0.149  14.6 MB256 100.0%
0f497235-a0bb-4e47-9434-dd0e126aa432  RAC3

UN  X.0.0.251  12.33 MB   256 100.0%
a1307717-4b61-4d57-8658-50460d6d54a1  RAC1

UN  X.0.0.79   21.54 MB   256 100.0%
f353c8f3-6b7c-483b-ad9a-3d66d469079e  RAC2

Datacenter: DC2

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.2.32   18.08 MB   256 100.0%
103a1cb3-6580-44bd-bf97-28ae160e1119  RAC6

UN  X.0.2.211  12.46 MB   256 100.0%
8c8dd5ba-806d-43eb-9ee5-af463e443f46  RAC5

UN  X.0.2.186  12.58 MB   256 100.0%
aef904ba-aaab-47f1-9bdc-cc1e0c676f61  RAC4


We ran the nodetool repair and cleanup in case the nodes where balanced but 
needed cleaning up – this was not the case :(


Steve


From: Alain RODRIGUEZ <arodr...@gmail.com>
Reply-To: "user@cassandra.apache.org" 
<user@cassandra.apache.org>
Date: Wednesday, 13 April 2016 at 14:48
To: "user@cassandra.apache.org" 
<user@cassandra.apache.org>
Subject: Re: Balancing tokens over 2 datacent