Re: when a node is dead in Cassandra cluster

2015-09-21 Thread Jeff Ferland
A dead node should exist in the ring until it is replaced. If you remove a node 
without a replacement, you’ll end up with that replica’s ownership being placed 
onto another node without the data having been transferred, and queries against 
that range will falsely empty records until a repair is completed. I believe 
the most correct action is to start up a replacement node using the 
-Dcassandra.replace_address=ip_of_dead_node command line argument. Reference 
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
 
.

-Jeff

> On Sep 21, 2015, at 5:32 PM, Shenghua(Daniel) Wan  
> wrote:
> 
> Hi, 
> When a node is dead, is it supposed to exist in the ring? When I found a node 
> is lost, and I check with nodetool and ops center, I still see the lost node 
> in the token ring. When I describe_ring, the lost node is also returned. Is 
> this what it is supposed to be? Why did not C* server hide the lost nodes 
> from the clients?
> 
> Thanks a lot!
> 
> -- 
> 
> Regards,
> Shenghua (Daniel) Wan



Re: when a node is dead in Cassandra cluster

2015-09-21 Thread John Wong
On Mon, Sep 21, 2015 at 8:32 PM, Shenghua(Daniel) Wan  wrote:

> Hi,
> When a node is dead, is it supposed to exist in the ring?
>

It is still considered as part of a cluster. Imagine a case when you do a
rolling restart, the node would be temporary out of service for maybe a few
minutes to a few hours and it would be DN in the status column.

Why did not C* server hide the lost nodes from the clients?


Is it the client contacting this dead node directly, or is it the Cassandra
coordinator unable to get a quorum?


when a node is dead in Cassandra cluster

2015-09-21 Thread Shenghua(Daniel) Wan
Hi,
When a node is dead, is it supposed to exist in the ring? When I found a
node is lost, and I check with nodetool and ops center, I still see the
lost node in the token ring. When I describe_ring, the lost node is also
returned. Is this what it is supposed to be? Why did not C* server hide the
lost nodes from the clients?

Thanks a lot!

-- 

Regards,
Shenghua (Daniel) Wan


High read latency

2015-09-21 Thread Jaydeep Chovatia
Hi,

My application issues more read requests than write, I do see that under
load cfstats for one of the table is quite high around 43ms

Local read count: 114479357
Local read latency: 43.442 ms
Local write count: 22288868
Local write latency: 0.609 ms


Here is my node configuration:
RF=3, Read/Write with QUORUM, 64GB RAM, 48 CPU core. I have only 5 GB of
data on each node (and for experiment purpose I stored data in tmpfs)

I've tried increasing concurrent_read count upto 512 but no help in read
latency. CPU/Memory/IO looks fine on system.

Any idea what should I tune?

Jaydeep


Re: Help with tombstones and compaction

2015-09-21 Thread Jeff Jirsa
The timestamp involed here isn’t the one defined in the schema, it’s the 
timestamp written on each cell when you apply a mutation (write).

That timestamp is the one returned by WRITETIME(), and visible in 
sstablemetadata – it’s not visible in the schema directly. 

Failing to have the proper unit (milliseconds vs microseconds) could certainly 
cause confusion.



From:  Venkatesh Arivazhagan
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, September 21, 2015 at 1:41 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Help with tombstones and compaction

Thank you for your reply Jeff! 

I will switch to Cassandra 2.1.9. 

Quick follow up question: Does the schema, settings I have setup look alright? 
My timestamp column's type is blob - I was wondering if this could confuse DTCS?

On Sun, Sep 20, 2015 at 3:37 PM, Jeff Jirsa  wrote:
2.1.4 is getting pretty old. There’s a DTCS deletion tweak in 2.1.5 ( 
https://issues.apache.org/jira/browse/CASSANDRA-8359 ) that may help you.

2.1.5 and 2.1.6 have some memory leak issues in DTCS, so go to 2.1.7 or newer 
(probably 2.1.9 unless you have a compelling reason not to go to 2.1.9)


From: Venkatesh Arivazhagan
Reply-To: "user@cassandra.apache.org"
Date: Sunday, September 20, 2015 at 2:48 PM
To: "user@cassandra.apache.org"
Subject: Help with tombstones and compaction

Hi Guys,

I have a Cassandra 2.1.4 cluster with 14 nodes. I am using it primarily for 
storing time series data collected via KairosDB.
The default TTL for data inserted into the column family named data_points is 
12hrs. I have also set the gc_grace_seconds to 12 hrs.
In-spite of this my disk space keeps on increasing and it looks like tombstones 
are never dropped.

It looks like compactions are happening on a regular basis. The SSTable count 
does not seem outrageous either. It is constantly between ~10 to ~22.

Am I doing anything wrong? Is there a way to mitigate this?

Attached:
* DESC output for my keyspace
* Disk usage graph
* LiveSSTable Count graph

--

CREATE KEYSPACE kairosdb WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '3'}  AND durable_writes = true;

CREATE TABLE kairosdb.data_points (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'max_sstable_age_days': '365', 'base_time_seconds': 
'3600', 'max_threshold': '32', 'timestamp_resolution': 'MILLISECONDS', 
'enabled': 'true', 'tombstone_compaction_interval': '1', 'min_threshold': '4', 
'tombstone_threshold': '.1', 'class': 
'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 43200
AND gc_grace_seconds = 43200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = 'NONE';

CREATE TABLE kairosdb.row_key_index (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 43200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = 'NONE';

CREATE TABLE kairosdb.string_index (
key blob,
column1 text,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (column1 ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32'}
AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 43200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.1
AND speculative_retry = 'NONE';
-

Re: Help with tombstones and compaction

2015-09-21 Thread Venkatesh Arivazhagan
Thank you for your reply Jeff!

I will switch to Cassandra 2.1.9.

Quick follow up question: Does the schema, settings I have setup look
alright? My timestamp column's type is blob - I was wondering if this could
confuse DTCS?

On Sun, Sep 20, 2015 at 3:37 PM, Jeff Jirsa 
wrote:

> 2.1.4 is getting pretty old. There’s a DTCS deletion tweak in 2.1.5 (
> https://issues.apache.org/jira/browse/CASSANDRA-8359 ) that may help you.
>
> 2.1.5 and 2.1.6 have some memory leak issues in DTCS, so go to 2.1.7 or
> newer (probably 2.1.9 unless you have a compelling reason not to go to
> 2.1.9)
>
>
> From: Venkatesh Arivazhagan
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, September 20, 2015 at 2:48 PM
> To: "user@cassandra.apache.org"
> Subject: Help with tombstones and compaction
>
> Hi Guys,
>
> I have a Cassandra 2.1.4 cluster with 14 nodes. I am using it primarily
> for storing time series data collected via KairosDB.
> The default TTL for data inserted into the column family named data_points
> is 12hrs. I have also set the gc_grace_seconds to 12 hrs.
> In-spite of this my disk space keeps on increasing and it looks like
> tombstones are never dropped.
>
> It looks like compactions are happening on a regular basis. The SSTable
> count does not seem outrageous either. It is constantly between ~10 to ~22.
>
> Am I doing anything wrong? Is there a way to mitigate this?
>
> Attached:
> * DESC output for my keyspace
> * Disk usage graph
> * LiveSSTable Count graph
>
>
> --
>
> CREATE KEYSPACE kairosdb WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': '3'}  AND durable_writes = true;
>
> CREATE TABLE kairosdb.data_points (
> key blob,
> column1 blob,
> value blob,
> PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
> AND CLUSTERING ORDER BY (column1 ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'max_sstable_age_days': '365', 'base_time_seconds':
> '3600', 'max_threshold': '32', 'timestamp_resolution': 'MILLISECONDS',
> 'enabled': 'true', 'tombstone_compaction_interval': '1', 'min_threshold':
> '4', 'tombstone_threshold': '.1', 'class':
> 'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 43200
> AND gc_grace_seconds = 43200
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.1
> AND speculative_retry = 'NONE';
>
> CREATE TABLE kairosdb.row_key_index (
> key blob,
> column1 blob,
> value blob,
> PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
> AND CLUSTERING ORDER BY (column1 ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 43200
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.1
> AND speculative_retry = 'NONE';
>
> CREATE TABLE kairosdb.string_index (
> key blob,
> column1 text,
> value blob,
> PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE
> AND CLUSTERING ORDER BY (column1 ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'min_threshold': '4', 'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 43200
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.1
> AND speculative_retry = 'NONE';
>
> --
>


Re: What is your backup strategy for Cassandra?

2015-09-21 Thread Sanjay Baronia
John,

Yes the Trilio solution is private and today, it is for Cassandra running in 
Vmware and OpenStack environment. AWS support is on the roadmap. Will reach out 
separately to give you a demo after the summit.

Thanks,

Sanjay
_
Sanjay Baronia
VP of Product & Solutions Management
Trilio Data
(c) 508-335-2306
sanjay.baro...@triliodata.com
[Trilio-Business Assurance_300 Pixels]

Experience Trilio in action, please click 
here to request a demo 
today!
[cid:A671941A-2E52-4BB7-B7F8-994DC2C6BDB6]

From: John Wong mailto:gokoproj...@gmail.com>>
Reply-To: Cassandra Maillist 
mailto:user@cassandra.apache.org>>
Date: Friday, September 18, 2015 at 8:02 PM
To: Cassandra Maillist 
mailto:user@cassandra.apache.org>>
Subject: Re: What is your backup strategy for Cassandra?



On Fri, Sep 18, 2015 at 3:02 PM, Sanjay Baronia 
mailto:sanjay.baro...@triliodata.com>> wrote:

Will be at the Cassandra summit next week if any of you would like a demo.


Sanjay, is Trilio Data's work private? Unfortunately I will not attend the 
Summit, but maybe Trilio can also talk about this in, say, a Cassandra Planet 
blog post? I'd like to see a demo or get a little more technical. If open 
source would be cool.

I didn't implement our solution, but the current solution is based on full 
snapshot copies to a remote server for storage using rsync (only transfers what 
is needed). On our remote server we have a complete backup of every hour, so if 
you cd into the data directory you can get every node's exact moment-in-time 
data like you are browsing on the actual nodes.

We are an AWS shop so we can further optimize our cost by using EBS snapshot so 
the volume can reduce (currently we provisioned 4000GB which is too much). 
Anyway, s3 we tried, and is an okay solution. The bad thing is performance plus 
ability to quickly go back in time. With EBS I can create a dozen volumes from 
the same snapshot, attach each to my each of my node, and cp -r files over.

John

From: Maciek Sakrejda mailto:mac...@heroku.com>>
Reply-To: Cassandra Maillist 
mailto:user@cassandra.apache.org>>
Date: Friday, September 18, 2015 at 2:09 PM
To: Cassandra Maillist 
mailto:user@cassandra.apache.org>>
Subject: Re: What is your backup strategy for Cassandra?

On Thu, Sep 17, 2015 at 7:46 PM, Marc Tamsky 
mailto:mtam...@gmail.com>> wrote:
This seems like an apt time to quote [1]:

> Remember that you get 1 point for making a backup and 10,000 points for 
> restoring one.

Restoring from backups is my goal.

The commonly recommended tools (tablesnap, cassandra_snapshotter) all seem to 
leave the restore operation as a pretty complicated exercise for the operator.

Do any include a working way to restore, on a different host, all of node X's 
data from backups to the correct directories, such that the restored files are 
in the proper places and the node restart method [2] "just works"?

As someone getting started with Cassandra, I'm very much interested in this as 
well. It seems that for the most part, folks seem to rely on replication and 
node replacement to recover from failures, and perhaps this is a testament for 
how well this works, but as long as we're hauling out aphorisms, "RAID is not a 
backup" seems to (partially) apply here too.

I'd love to hear more about how the community does restores, too. This isn't 
complaining about shoddy tooling: this is trying to understand--and hopefully, 
in time, improve--the status quo re: disaster recovery. E.g., given that 
tableslurp operates on a single table at a time, do people normally just 
restore single tables? Is that used when there's filesystem or disk corruption? 
Bugs? Other issues? Looking forward to learning more.

Thanks,
Maciek



Re: Unable to remove dead node from cluster.

2015-09-21 Thread Dikang Gu
I have tried all of them, neither of them worked.
1. decommission: the host had hardware issue, and I can not connect to it.
2. remove, there is not HostID, so the removenode did not work.
3. unsafeAssassinateEndpoint, it will throw NPE as I pasted before, can we
fix it?

Thanks
Dikang.

On Mon, Sep 21, 2015 at 11:11 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Order is decommission, remove, assassinate.
>
> Which have you tried?
> On Sep 21, 2015 10:47 AM, "Dikang Gu"  wrote:
>
>> Hi there,
>>
>> I have a dead node in our cluster, which is a wired state right now, and
>> can not be removed from cluster.
>>
>> The nodestatus shows:
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address  Load   Tokens  OwnsHost ID
>> Rack
>> DN  10.210.165.55?  256 ?   null
>>  r1
>>
>> I tried the unsafeAssassinateEndpoint, but got exception like:
>> 2015-09-18_23:21:40.79760 INFO  23:21:40 InetAddress /10.210.165.55 is
>> now DOWN
>> 2015-09-18_23:21:40.80667 ERROR 23:21:40 Exception in thread
>> Thread[GossipStage:1,5,main]
>> 2015-09-18_23:21:40.80668 java.lang.NullPointerException: null
>> 2015-09-18_23:21:40.80669   at
>> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1584)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80669   at
>> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1592)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80670   at
>> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80671   at
>> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80671   at
>> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80672   at
>> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80673   at
>> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80673   at
>> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80673   at
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
>> 2015-09-18_23:21:40.80674   at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> ~[na:1.7.0_45]
>> 2015-09-18_23:21:40.80674   at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> ~[na:1.7.0_45]
>> 2015-09-18_23:21:40.80674   at java.lang.Thread.run(Thread.java:744)
>> ~[na:1.7.0_45]
>> 2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to
>> local pause of 10852378435 > 50
>>
>> Any suggestions about how to remove it?
>> Thanks.
>>
>> --
>> Dikang
>>
>>


-- 
Dikang


Re: Unable to remove dead node from cluster.

2015-09-21 Thread Sebastian Estevez
Order is decommission, remove, assassinate.

Which have you tried?
On Sep 21, 2015 10:47 AM, "Dikang Gu"  wrote:

> Hi there,
>
> I have a dead node in our cluster, which is a wired state right now, and
> can not be removed from cluster.
>
> The nodestatus shows:
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  OwnsHost ID
> Rack
> DN  10.210.165.55?  256 ?   null
>r1
>
> I tried the unsafeAssassinateEndpoint, but got exception like:
> 2015-09-18_23:21:40.79760 INFO  23:21:40 InetAddress /10.210.165.55 is
> now DOWN
> 2015-09-18_23:21:40.80667 ERROR 23:21:40 Exception in thread
> Thread[GossipStage:1,5,main]
> 2015-09-18_23:21:40.80668 java.lang.NullPointerException: null
> 2015-09-18_23:21:40.80669   at
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1584)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80669   at
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1592)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80670   at
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80671   at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80671   at
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80672   at
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80673   at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80673   at
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80673   at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
> ~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
> 2015-09-18_23:21:40.80674   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> ~[na:1.7.0_45]
> 2015-09-18_23:21:40.80674   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> ~[na:1.7.0_45]
> 2015-09-18_23:21:40.80674   at java.lang.Thread.run(Thread.java:744)
> ~[na:1.7.0_45]
> 2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to
> local pause of 10852378435 > 50
>
> Any suggestions about how to remove it?
> Thanks.
>
> --
> Dikang
>
>


Unable to remove dead node from cluster.

2015-09-21 Thread Dikang Gu
Hi there,

I have a dead node in our cluster, which is a wired state right now, and
can not be removed from cluster.

The nodestatus shows:
Datacenter: DC1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  OwnsHost ID
  Rack
DN  10.210.165.55?  256 ?   null
   r1

I tried the unsafeAssassinateEndpoint, but got exception like:
2015-09-18_23:21:40.79760 INFO  23:21:40 InetAddress /10.210.165.55 is now
DOWN
2015-09-18_23:21:40.80667 ERROR 23:21:40 Exception in thread
Thread[GossipStage:1,5,main]
2015-09-18_23:21:40.80668 java.lang.NullPointerException: null
2015-09-18_23:21:40.80669   at
org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1584)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80669   at
org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1592)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80670   at
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1822)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80671   at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1495)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80671   at
org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2121)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80672   at
org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1113)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80673   at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
~[apache-cassandra-2.1.8+git20150804.076b0b1.jar:2.1.8+git20150804.076b0b1]
2015-09-18_23:21:40.80674   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_45]
2015-09-18_23:21:40.80674   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_45]
2015-09-18_23:21:40.80674   at java.lang.Thread.run(Thread.java:744)
~[na:1.7.0_45]
2015-09-18_23:21:40.85812 WARN  23:21:40 Not marking nodes down due to
local pause of 10852378435 > 50

Any suggestions about how to remove it?
Thanks.

-- 
Dikang


Re: [RELEASE] Apache Cassandra 3.0.0-rc1 released

2015-09-21 Thread Aleksey Yeschenko
Compatible version of python-driver (3.0.0a3) and java-driver (3.0.0-alpha3) 
will be published to pypi and maven later today.

In the meantime, you can use the versions bundled with Cassandra 3.0.0-rc1.

-- 
AY

On September 21, 2015 at 09:04:57, Jake Luciani (j...@apache.org) wrote:

The Cassandra team is pleased to announce the release of Apache Cassandra  
version 3.0.0-rc1.  

Apache Cassandra is a fully distributed database. It is the right choice  
when you need scalability and high availability without compromising  
performance.  

http://cassandra.apache.org/  

Downloads of source and binary distributions are listed in our download  
section:  

http://cassandra.apache.org/download/  

This version is a release candidate[1] on the 3.0 series. As always, please  
pay  
attention to the release notes[2] and Let us know[3] if you were to  
encounter  
any problem.  

Enjoy!  

[1]: http://goo.gl/Oppn3S (CHANGES.txt)  
[2]: http://goo.gl/zQFaj4 (NEWS.txt)  
[3]: https://issues.apache.org/jira/browse/CASSANDRA  


[RELEASE] Apache Cassandra 3.0.0-rc1 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 3.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Oppn3S (CHANGES.txt)
[2]: http://goo.gl/zQFaj4 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.0.17 released

2015-09-21 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.17.

This is most likely the final release for the 2.0 release series.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/QwruFc (CHANGES.txt)
[2]: http://goo.gl/fHlSqL (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Should replica placement change after a topology change?

2015-09-21 Thread Robert Coli
On Wed, Sep 16, 2015 at 3:39 AM, Richard Dawe 
wrote:

> In that mixed non-EC2/EC2 environment, with GossipingPropertyFileSnitch,
> it seems like you would need to simulate what Ec2Snitch does, and manually
> configure GPFS to treat each Availability Zone as a rack.
>

Yes, you configure GPFS with the same identifiers EC2Snitch would use, and
then rack awareness takes care of the rest.

=Rob


RE: Cassandra shutdown during large number of compactions - now fails to start with OOM Exception

2015-09-21 Thread Walsh, Stephen
Although I didn't get an answer on this, it's worth noting the removing the 
compaction_in_progress folder resolved the issue.

From: Walsh, Stephen
Sent: 17 September 2015 16:37
To: 'user@cassandra.apache.org' 
Subject: RE: Cassandra shutdown during large number of compactions - now fails 
to start with OOM Exception

Some more info,

Looking at the Java Memory Dump file.

I see about 400 SSTableScanners  - one for each of our column Families.
Each is about 200MB in size.
And (from what I can see) all of them are reading from a 
"compactions_in_progress-ka-00-Data.db" file

dfile  org.apache.cassandra.io.compress.CompressedRandomAccessReader path = 
"/var/lib/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-ka-71661-Data.db"
 131840 104

Steve


From: Walsh, Stephen
Sent: 17 September 2015 15:33
To: user@cassandra.apache.org
Subject: Cassandra shutdown during large number of compactions - now fails to 
start with OOM Exception

Hey all, I was hoping someone had a similar issue.
We're using 2.1.6 and shutdown a testbed in AWS thinking we were finished with 
it,
We started it backup today and saw that only 2 of 4 nodes came up.

Seems there was a lot of compaction happening at the time it was shutdown, 
cassandra tries to start-up and we get an OutOfMemory Exception.


INFO  13:45:57 Initializing system.range_xfers
INFO  13:45:57 Initializing system.schema_keyspaces
INFO  13:45:57 Opening 
/var/lib/cassandra/data/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-21807
 (19418 bytes)
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /var/log/cassandra/java_pid3011.hprof ...
Heap dump file created [7751760805 bytes in 52.439 secs]
ERROR 13:47:11 Exception encountered during startup
java.lang.OutOfMemoryError: Java heap space


it's not related the key_cache, we removed this and the issue is still present.
So we believe its re-trying all the compactions that were in place when it went 
down.

We've modified the HEAP size to be half of the systems RAM (8GB in this case)

At the moment the only work around we have is to empty the data / saved_cache / 
commit_log folders and let it re-sync with the other nodes.

Has anyone seen this before and what have they done to solve it?
Can we remove unfinished compactions?

Steve



This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.