RE: management and monitoring nodetool repair

2015-10-19 Thread aeljami.ext
Thx Carlos,

How can I get information on error during repair ?

Thx.
De : Carlos Alonso [mailto:i...@mrcalonso.com]
Envoyé : lundi 19 octobre 2015 11:09
À : user@cassandra.apache.org
Objet : Re: management and monitoring nodetool repair

So repair process has two phases:

First one is all about calculating Merkel trees and that comparing it with 
others. This phase can be monitored with nodetool compactionstats
Second one is about streaming files of data. That one can be monitored with 
nodetool netstats.

Hope it helps.
Cheers!

Carlos Alonso | Software Engineer | @calonso

On 16 October 2015 at 14:09, 
> wrote:
Hi,
I'm looking for a tool for management and monitoring of the status of nodetool 
repair.

Currently I am trying to test cassandra-reaper, but if you tested other tools 
thank you to share.

Thanks

_



Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.



This message and its attachments may contain confidential or privileged 
information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete 
this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.

Thank you.


_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



C* 2.1.10 failed to start

2015-10-19 Thread Kai Wang
It seems the same as https://issues.apache.org/jira/browse/CASSANDRA-8544.
It started to happen after bulkloading ~100G data and restarting.

Windows 2008 R2, JVM 1.8.0_60. It feels like C* didn't shutdown cleanly. Is
there any way to workaround this?

Thanks.


Re: management and monitoring nodetool repair

2015-10-19 Thread Carlos Alonso
So repair process has two phases:

First one is all about calculating Merkel trees and that comparing it with
others. This phase can be monitored with nodetool compactionstats
Second one is about streaming files of data. That one can be monitored with
nodetool netstats.

Hope it helps.
Cheers!

Carlos Alonso | Software Engineer | @calonso 

On 16 October 2015 at 14:09,  wrote:

> Hi,
>
> I'm looking for a tool for management and monitoring of the status of
> nodetool repair.
>
> Currently I am trying to test cassandra-reaper, but if you tested other
> tools thank you to share.
>
>
>
> Thanks
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>


Re: C* 2.1.10 failed to start

2015-10-19 Thread Kai Wang
I fixed this by deleting everything in system\compactions_in_progress-
I wonder if there's any side effects by doing this.

On Mon, Oct 19, 2015 at 8:56 AM, Kai Wang  wrote:

> It seems the same as https://issues.apache.org/jira/browse/CASSANDRA-8544.
> It started to happen after bulkloading ~100G data and restarting.
>
> Windows 2008 R2, JVM 1.8.0_60. It feels like C* didn't shutdown cleanly.
> Is there any way to workaround this?
>
> Thanks.
>


Re: management and monitoring nodetool repair

2015-10-19 Thread Carlos Alonso
I'd say the logs will pretty much tell you all you need. You just need to
find which is the entity that logs about the repair status (RepairTask.java
?) and once you find it, just tail the logs grepping for that while repair
is happening and eventually you'll see the errors as, possibly, java stack
traces.

Carlos Alonso | Software Engineer | @calonso 

On 19 October 2015 at 14:00,  wrote:

> Thx Carlos,
>
>
>
> How can I get information on error during repair ?
>
>
>
> Thx.
>
> *De :* Carlos Alonso [mailto:i...@mrcalonso.com]
> *Envoyé :* lundi 19 octobre 2015 11:09
> *À :* user@cassandra.apache.org
> *Objet :* Re: management and monitoring nodetool repair
>
>
>
> So repair process has two phases:
>
>
>
> First one is all about calculating Merkel trees and that comparing it with
> others. This phase can be monitored with nodetool compactionstats
>
> Second one is about streaming files of data. That one can be monitored
> with nodetool netstats.
>
>
>
> Hope it helps.
>
> Cheers!
>
>
> Carlos Alonso | Software Engineer | @calonso 
>
>
>
> On 16 October 2015 at 14:09,  wrote:
>
> Hi,
>
> I'm looking for a tool for management and monitoring of the status of
> nodetool repair.
>
> Currently I am trying to test cassandra-reaper, but if you tested other
> tools thank you to share.
>
>
>
> Thanks
>
> _
>
>
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
>
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
>
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
>
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
>
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
>
> they should not be distributed, used or copied without authorisation.
>
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
>
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
>
> Thank you.
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>


Re: Read query taking a long time

2015-10-19 Thread Carlos Alonso
Could you send cfhistograms and cfstats relevant to the read column family?

That could help

Carlos Alonso | Software Engineer | @calonso 

On 17 October 2015 at 16:15, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> I've read all I could find on how cassandra works, I'm still wondering why
> the following query takes more than 5s to return on a simple (and modest) 3
> nodes cassandra 2.1.9 cluster:
>
> SELECT sequence_nr, used
> FROM messages
> WHERE persistence_id = 'session-SW' AND partition_nr = 0;
>
> The schema of this table:
> 
> CREATE TABLE akka.messages (
> persistence_id text,
> partition_nr bigint,
> sequence_nr bigint,
> message blob,
> used boolean static,
> PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
> ) WITH CLUSTERING ORDER BY (sequence_nr ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
>
>
> This is a query from Akka cassandra journal (
> https://github.com/krasserm/akka-persistence-cassandra) which returns 33
> rows for the moment.
>
> The time doesn't vary if I use a QUORUM consistency or ONE.
> Note that the same query but with a different `persistence_id` are much
> faster (it might depend on the number of rows returned).
>
> Here's the (redacted) trace taken with cqlsh:
>
> activity
> | timestamp  | source |
> source_elapsed
>
> ---+++
>
>   Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 |
>   0
>   Parsing
> ..
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 |
>84
> READ message received from /192.168.168.26
> [MessagingService-Incoming-/192.168.168.26] | 2015-10-17 17:01:05.683000
> | 192.168.168.29 | 69
>  Preparing
> statement [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 |
> 192.168.168.26 |215
>reading data from /
> 192.168.168.29 [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 |
> 192.168.168.26 |   1189
>Sending READ message to /192.168.168.29
> [MessagingService-Outgoing-/192.168.168.29] | 2015-10-17 17:01:05.683000
> | 192.168.168.26 |   1317
> Executing single-partition query on
> messages [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 |
> 192.168.168.29 |175
> Acquiring sstable
> references [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 |
> 192.168.168.29 |189
>  Merging memtable
> tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 |
> 192.168.168.29 |204
>Key cache hit for sstable
> 434 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   257
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   269
>Key cache hit for sstable
> 432 [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 |
>   514
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   527
>Key cache hit for sstable
> 433 [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   779
>  Seeking to partition beginning in data
> file [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 | 192.168.168.29 |
>   789
>Skipped 0/3 non-slice-intersecting sstables, included 0 due to
> tombstones [SharedPool-Worker-1] | 2015-10-17 17:01:05.686000 |
> 192.168.168.29 |929
>   Merging data from memtables and 3
> sstables [SharedPool-Worker-1] | 2015-10-17 17:01:05.687000 |
> 192.168.168.29 |956
>speculating read 

Re: Read query taking a long time

2015-10-19 Thread Jon Haddad
I wrote a blog post a while back you may find helpful on diagnosing problems in 
production.  There's a lot of potential things that could be wrong with your 
cluster and going back and forth on the ML to pin down the right one will take 
forever.

http://rustyrazorblade.com/2014/09/cassandra-summit-recap-diagnosing-problems-in-production/
 


Once you've figured out what's wrong (and fixed it), you should read Al's 
tuning guide:

https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html 


Jon


> On Oct 19, 2015, at 7:38 AM, Carlos Alonso  wrote:
> 
> Could you send cfhistograms and cfstats relevant to the read column family?
> 
> That could help
> 
> Carlos Alonso | Software Engineer | @calonso 
> 
> On 17 October 2015 at 16:15, Brice Figureau  > wrote:
> Hi,
> 
> I've read all I could find on how cassandra works, I'm still wondering why 
> the following query takes more than 5s to return on a simple (and modest) 3 
> nodes cassandra 2.1.9 cluster:
> 
> SELECT sequence_nr, used
> FROM messages
> WHERE persistence_id = 'session-SW' AND partition_nr = 0;
> 
> The schema of this table:
> 
> CREATE TABLE akka.messages (
> persistence_id text,
> partition_nr bigint,
> sequence_nr bigint,
> message blob,
> used boolean static,
> PRIMARY KEY ((persistence_id, partition_nr), sequence_nr)
> ) WITH CLUSTERING ORDER BY (sequence_nr ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {'sstable_compression': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99.0PERCENTILE';
> 
> 
> This is a query from Akka cassandra journal 
> (https://github.com/krasserm/akka-persistence-cassandra 
> ) which returns 33 
> rows for the moment.
> 
> The time doesn't vary if I use a QUORUM consistency or ONE.
> Note that the same query but with a different `persistence_id` are much 
> faster (it might depend on the number of rows returned).
> 
> Here's the (redacted) trace taken with cqlsh:
> 
> activity  
> | timestamp  | source | 
> source_elapsed
> ---+++
>   
>   Execute CQL3 query | 2015-10-17 17:01:05.681000 | 192.168.168.26 |  
> 0
>   Parsing .. 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
> 84
> READ message received from /192.168.168.26 
>  [MessagingService-Incoming-/192.168.168.26 
> ] | 2015-10-17 17:01:05.683000 | 192.168.168.29 | 
> 69
>  Preparing statement 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
>215
>reading data from /192.168.168.29 
>  [SharedPool-Worker-1] | 2015-10-17 17:01:05.683000 | 
> 192.168.168.26 |   1189
>Sending READ message to /192.168.168.29 
>  [MessagingService-Outgoing-/192.168.168.29 
> ] | 2015-10-17 17:01:05.683000 | 192.168.168.26 | 
>   1317
> Executing single-partition query on messages 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 
>175
> Acquiring sstable references 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.684000 | 192.168.168.29 | 
>189
>  Merging memtable tombstones 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 
>204
>Key cache hit for sstable 434 
> [SharedPool-Worker-1] | 2015-10-17 17:01:05.685000 | 192.168.168.29 | 
>257
>  Seeking to 

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Sebastian Estevez
The validation compaction part of repair is susceptible to the compaction
throttling knob `nodetool getcompactionthroughput`
/ `nodetool setcompactionthroughput` and you can use that to tune down the
resources that are being used by repair.

Check out this post by driftx on advanced repair techniques
.

Given your other question, I agree with Raj that it might be a good idea to
decommission the new nodes rather than repairing depending on how much data
has made it to them and how tight you were on resources before adding nodes.


All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Sun, Oct 18, 2015 at 8:18 PM, Kevin Burton  wrote:

> I'm doing a big nodetool repair right now and I'm pretty sure the added
> overhead is impacting our performance.
>
> Shouldn't you be able to throttle repair so that normal compactions can
> use most of the resources?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra 2.1, you 
may have had

commitlog_sync: batch
commitlog_sync_batch_window_in_ms: 25

in you cassiandra.yaml

It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just 
happened immediately), but fixed in 2.1, which meant that every mutation 
blocked its writer thread for 25ms meaning at 80 mutations/sec/writer thread 
you’d start DROPPING mutations if your write timeout is 2000ms.

This turns out to be a massive problem if you write fast, and the default 
commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6 as a way of 
addressing this (with some suggesting 1ms)

Neither of these changes got much fanfare except an eventual reference in 
CHANGES.TXT

With 2.1.9 if you aren’t doing periodic sync, then I think the new behavior is 
just to sync whenever the commit logs have a consistent/complete set of 
mutations ready.

Note this is hard to diagnose because CPU is idle and pretty much all latency 
metrics (except the overall coordinator write) do not count this time (and you 
probably weren’t noticing the 25ms write ACK time). It turned out for us that 
one of our nodes was getting more writes (> 20k mutations per second) which was 
about the magic number… anything shy of that and everything looked fine, but 
just by going slightly over, this node was dropping lots of mutations.






smime.p7s
Description: S/MIME cryptographic signature


Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Michael Shuler

On 10/19/2015 10:55 AM, Graham Sanderson wrote:

If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
2.1, you may have had

commitlog_sync: batch

commitlog_sync_batch_window_in_ms: 25


in you cassiandra.yaml

It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
happened immediately), but fixed in 2.1, *which meant that every
mutation blocked its writer thread for 25ms meaning at 80
mutations/sec/writer thread you’d start DROPPING mutations if your write
timeout is 2000ms.*

This turns out to be a massive problem if you write fast, and the
default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
as a way of addressing this (with some suggesting 1ms)

Neither of these changes got much fanfare except an eventual reference
in CHANGES.TXT

With 2.1.9 if you aren’t doing periodic sync, then I think the new
behavior is just to sync whenever the commit logs have a
consistent/complete set of mutations ready.

Note this is hard to diagnose because CPU is idle and pretty much all
latency metrics (except the overall coordinator write) do not count this
time (and you probably weren’t noticing the 25ms write ACK time). It
turned out for us that one of our nodes was getting more writes (> 20k
mutations per second) which was about the magic number… anything shy of
that and everything looked fine, but just by going slightly over, this
node was dropping lots of mutations.


If you would be kind enough to submit a patch to JIRA for NEWS.txt 
(aligned with the right versions you're warning about) that includes the 
info upgrading users might need, that would be great!


--
Kind regards,
Michael


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Eric Stevens
It seems to me that as long as cleanup hasn't happened, if you
*decommission* the newly joined nodes, they'll stream whatever writes they
took back to the original replicas.  Presumably that should be pretty quick
as they won't have nearly as much data as the original nodes (as they only
hold data written while they were online).  Then as long as cleanup hasn't
happened, your cluster should have returned to a consistent view of the
data.  You can now bootstrap the new nodes again.

If you have done a cleanup, then the data is probably irreversibly
corrupted, you will have to figure out how to restore the missing data
incrementally from backups if they are available.

On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama 
wrote:

> In this can does it make sense to remove newly added nodes, correct the
> configuration and have them rejoin one at a time ?
>
> Thx
>
> Sent from my iPhone
>
> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa 
> wrote:
>
> Take a snapshot now, before you get rid of any data (whatever you do,
> don’t run cleanup).
>
> If you identify missing data, you can go back to those snapshots, find the
> nodes that had the data previously (sstable2json, for example), and either
> re-stream that data into the cluster with sstableloader or copy it to a new
> host and `nodetool refresh` it into the new system.
>
>
>
> From:  on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 8:10 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at
> once?
>
> ouch.. OK.. I think I really shot myself in the foot here then.  This
> might be bad.
>
> I'm not sure if I would have missing data.  I mean basically the data is
> on the other nodes.. but the cluster has been running with 10 nodes
> accidentally bootstrapped with auto_bootstrap=false.
>
> So they have new data and seem to be missing values.
>
> this is somewhat misleading... Initially if you start it up and run
> nodetool status , it only returns one node.
>
> So I assumed auto_bootstrap=false meant that it just doesn't join the
> cluster.
>
> I'm running a nodetool repair now to hopefully fix this.
>
>
>
> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa 
> wrote:
>
>> auto_bootstrap=false tells it to join the cluster without running
>> bootstrap – the node assumes it has all of the necessary data, and won’t
>> stream any missing data.
>>
>> This generally violates consistency guarantees, but if done on a single
>> node, is typically correctable with `nodetool repair`.
>>
>> If you do it on many  nodes at once, it’s possible that the new nodes
>> could represent all 3 replicas of the data, but don’t physically have any
>> of that data, leading to missing records.
>>
>>
>>
>> From:  on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 3:44 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>> at once?
>>
>> An shit.. I think we're seeing corruption.. missing records :-/
>>
>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton 
>> wrote:
>>
>>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new
>>> nodes)
>>>
>>> By default we have auto_boostrap = false
>>>
>>> so we just push our config to the cluster, the cassandra daemons
>>> restart, and they're not cluster members and are the only nodes in the
>>> cluster.
>>>
>>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had
>>> about 7 members of the cluster and 8 not yet joined.
>>>
>>> We are only doing 1 at a time because apparently bootstrapping more than
>>> 1 is unsafe.
>>>
>>> I did a rolling restart whereby I went through and restarted all the
>>> cassandra boxes.
>>>
>>> Somehow the new nodes auto boostrapped themselves EVEN though
>>> auto_bootstrap=false.
>>>
>>> We don't have any errors.  Everything seems functional.  I'm just
>>> worried about data loss.
>>>
>>> Thoughts?
>>>
>>> Kevin
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>>
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out 

[RELEASE] Apache Cassandra 3.0.0-rc2 released

2015-10-19 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.0.0-rc2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] for the 3.0 series. As always,
please pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/mLK41h (CHANGES.txt)
[2]: http://goo.gl/JO8474 (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
But basically if you were on 2.1.0 thru 2.1.5 you probably couldn’t know to 
change your config
If you were on 2.1.6 thru 2.1.8 you may not have noticed the NEWS.TXT change 
and changed your config
If you are on 2.1.9+ you are probably OK

if you are using periodic fsync then you don’t have an issue

> On Oct 19, 2015, at 11:37 AM, Graham Sanderson  wrote:
> 
> - commitlog_sync_batch_window_in_ms behavior has changed from the
>   maximum time to wait between fsync to the minimum time.  We are 
>   working on making this more user-friendly (see CASSANDRA-9533) but in the
>   meantime, this means 2.1 needs a much smaller batch window to keep
>   writer threads from starving.  The suggested default is now 2ms.
> was added retroactively to NEWS.txt in 2.1.6 which is why it is not obvious
> 
>> On Oct 19, 2015, at 11:03 AM, Michael Shuler > > wrote:
>> 
>> On 10/19/2015 10:55 AM, Graham Sanderson wrote:
>>> If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
>>> 2.1, you may have had
>>> 
>>> commitlog_sync: batch
>>> 
>>> commitlog_sync_batch_window_in_ms: 25
>>> 
>>> 
>>> in you cassiandra.yaml
>>> 
>>> It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
>>> happened immediately), but fixed in 2.1, *which meant that every
>>> mutation blocked its writer thread for 25ms meaning at 80
>>> mutations/sec/writer thread you’d start DROPPING mutations if your write
>>> timeout is 2000ms.*
>>> 
>>> This turns out to be a massive problem if you write fast, and the
>>> default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
>>> as a way of addressing this (with some suggesting 1ms)
>>> 
>>> Neither of these changes got much fanfare except an eventual reference
>>> in CHANGES.TXT
>>> 
>>> With 2.1.9 if you aren’t doing periodic sync, then I think the new
>>> behavior is just to sync whenever the commit logs have a
>>> consistent/complete set of mutations ready.
>>> 
>>> Note this is hard to diagnose because CPU is idle and pretty much all
>>> latency metrics (except the overall coordinator write) do not count this
>>> time (and you probably weren’t noticing the 25ms write ACK time). It
>>> turned out for us that one of our nodes was getting more writes (> 20k
>>> mutations per second) which was about the magic number… anything shy of
>>> that and everything looked fine, but just by going slightly over, this
>>> node was dropping lots of mutations.
>> 
>> If you would be kind enough to submit a patch to JIRA for NEWS.txt (aligned 
>> with the right versions you're warning about) that includes the info 
>> upgrading users might need, that would be great!
>> 
>> -- 
>> Kind regards,
>> Michael
> 



smime.p7s
Description: S/MIME cryptographic signature


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Jeff Jirsa
Worth noting that repair may not work, as it’s possible that NONE of the nodes 
with data (for some given row) are no longer valid replicas according to the 
DHT/Tokens, so repair will not find any of the replicas with the data. 



From:  Robert Coli
Reply-To:  "user@cassandra.apache.org"
Date:  Monday, October 19, 2015 at 3:40 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Would we have data corruption if we bootstrapped 10 nodes at once?

On Sun, Oct 18, 2015 at 8:10 PM, Kevin Burton  wrote:
ouch.. OK.. I think I really shot myself in the foot here then.  This might be 
bad.

Yep.

https://issues.apache.org/jira/browse/CASSANDRA-7069 - "Prevent operator 
mistakes due to simultaneous bootstrap"

But this doesn't handle your case, where you force joined a bunch of nodes with 
auto_bootstrap=false.

Probably if I were in your case (and realized it immediately) I would 
decommission all nodes and then start again. I probably would not run repair, 
though that would also work. I agree with jeffj down-thread that you should not 
run cleanup.

=Rob

 




smime.p7s
Description: S/MIME cryptographic signature


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Robert Coli
On Sun, Oct 18, 2015 at 8:10 PM, Kevin Burton  wrote:

> ouch.. OK.. I think I really shot myself in the foot here then.  This
> might be bad.
>

Yep.

https://issues.apache.org/jira/browse/CASSANDRA-7069 - "Prevent operator
mistakes due to simultaneous bootstrap"

But this doesn't handle your case, where you force joined a bunch of nodes
with auto_bootstrap=false.

Probably if I were in your case (and realized it immediately) I would
decommission all nodes and then start again. I probably would not run
repair, though that would also work. I agree with jeffj down-thread that
you should not run cleanup.

=Rob



>
>


Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Robert Coli
On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton  wrote:

> I think the point I was trying to make is that on highly loaded boxes,
>  repair should take lower priority than normal compactions.
>

You can manually do this by changing the thread priority of compaction
threads which you somhow identify as doing repair related compaction...

... but incoming streamed SStables are compacted just as if they were
flushed, so I'm pretty sure what you're asking for is not currently
possible?

=Rob


Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?

2015-10-19 Thread Ravi
I have two node C* cluster and on these two nodes I want to run spark jobs
locally. Inside sparkJob I have to put connection url as localhost so that
it will insert data to local C* instance( I am using Cassandra same nodes
as spark Job's slaves for execution via Mesos)

Problem is if I change rpc_address=localhost in cassandra.yml then I can
connect locally using Spark job(with localhost as connection url) or cqlsh
localhost but remote applications cannot connect to node using IP in
connection url.

I am using apache-cassandra-2.2.0.

Is there any configuration so that local program on C* node can connect
using localhost as connection url and remote program's using IP/name in
connection url?

Is this approach correct to connect local spark job to its local C* node
for any RDD operations?

Thanks,
Ravi


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Robert Coli
On Mon, Oct 19, 2015 at 9:20 AM, Branton Davis 
wrote:

> Is that also true if you're standing up multiple nodes from backups that
> already have data?  Could you not stand up more than one at a time since
> they already have the data?
>

An operator probably almost never wants to add multiple
not-previously-joined nodes to an active cluster via auto_bootstrap:false.

The one case I can imagine is when you are starting a cluster which is not
receiving any write traffic and does contain snapshots.

=Rob


unusual GC log

2015-10-19 Thread 曹志富
INFO  [Service Thread] 2015-10-20 10:42:47,854 GCInspector.java:252 -
ParNew GC in 476ms.  CMS Old Gen: 4288526240 -> 4725514832; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:42:50,870 GCInspector.java:252 -
ParNew GC in 423ms.  CMS Old Gen: 4725514832 -> 5114687560; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:42:53,847 GCInspector.java:252 -
ParNew GC in 406ms.  CMS Old Gen: 5114688368 -> 5513119264; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:42:57,118 GCInspector.java:252 -
ParNew GC in 421ms.  CMS Old Gen: 5513119264 -> 5926324736; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:00,041 GCInspector.java:252 -
ParNew GC in 437ms.  CMS Old Gen: 5926324736 -> 6324793584; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:03,029 GCInspector.java:252 -
ParNew GC in 429ms.  CMS Old Gen: 6324793584 -> 6693672608; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:05,566 GCInspector.java:252 -
ParNew GC in 339ms.  CMS Old Gen: 6693672608 -> 6989128592; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:08,431 GCInspector.java:252 -
ParNew GC in 421ms.  CMS Old Gen: 6266493464 -> 6662041272; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:11,131 GCInspector.java:252 -
ConcurrentMarkSweep GC in 215ms.  CMS Old Gen: 5926324736 -> 4574418480;
CMS Perm Gen: 33751256 -> 33751192
; Par Eden Space: 7192 -> 611360336;
INFO  [Service Thread] 2015-10-20 10:43:11,848 GCInspector.java:252 -
ParNew GC in 511ms.  CMS Old Gen: 4574418480 -> 4996166672; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:14,915 GCInspector.java:252 -
ParNew GC in 395ms.  CMS Old Gen: 4996167912 -> 5380926744; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:18,335 GCInspector.java:252 -
ParNew GC in 432ms.  CMS Old Gen: 5380926744 -> 5811659120; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:21,492 GCInspector.java:252 -
ParNew GC in 439ms.  CMS Old Gen: 5811659120 -> 6270861936; Par Eden Space:
671088640 -> 0;
INFO  [Service Thread] 2015-10-20 10:43:24,698 GCInspector.java:252 -
ParNew GC in 490ms.  CMS Old Gen: 6270861936 -> 6668734208; Par Eden Space:
671088640 -> 0; Par Survivor Sp
ace: 83886080 -> 83886072
INFO  [Service Thread] 2015-10-20 10:43:27,963 GCInspector.java:252 -
ParNew GC in 457ms.  CMS Old Gen: 6668734208 -> 7072885208; Par Eden Space:
671088640 -> 0; Par Survivor Sp
ace: 83886072 -> 83886080

after seconds node mark down.

My node config is : 8GB heap NEW_HEAP size is 800MB

NODE hardware is :4CORE 32GBRAM

--
Ranger Tsao


Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
Yes.. .it's not currently possible :)

I think it should be.

Say the IO on your C* is at 60% utilization.

If you do a repair, this would require 120% utilization obviously not
possible, so now your app is down / offline until the repair finishes.

If you could throttle repair separately this would resolve this problem.

IF anyone else thinks this is an issue I'll create a JIRA.

On Mon, Oct 19, 2015 at 3:38 PM, Robert Coli  wrote:

> On Mon, Oct 19, 2015 at 9:30 AM, Kevin Burton  wrote:
>
>> I think the point I was trying to make is that on highly loaded boxes,
>>  repair should take lower priority than normal compactions.
>>
>
> You can manually do this by changing the thread priority of compaction
> threads which you somhow identify as doing repair related compaction...
>
> ... but incoming streamed SStables are compacted just as if they were
> flushed, so I'm pretty sure what you're asking for is not currently
> possible?
>
> =Rob
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-19 Thread Branton Davis
Is that also true if you're standing up multiple nodes from backups that
already have data?  Could you not stand up more than one at a time since
they already have the data?

On Mon, Oct 19, 2015 at 10:48 AM, Eric Stevens  wrote:

> It seems to me that as long as cleanup hasn't happened, if you
> *decommission* the newly joined nodes, they'll stream whatever writes
> they took back to the original replicas.  Presumably that should be pretty
> quick as they won't have nearly as much data as the original nodes (as they
> only hold data written while they were online).  Then as long as cleanup
> hasn't happened, your cluster should have returned to a consistent view of
> the data.  You can now bootstrap the new nodes again.
>
> If you have done a cleanup, then the data is probably irreversibly
> corrupted, you will have to figure out how to restore the missing data
> incrementally from backups if they are available.
>
> On Sun, Oct 18, 2015 at 10:37 PM Raj Chudasama 
> wrote:
>
>> In this can does it make sense to remove newly added nodes, correct the
>> configuration and have them rejoin one at a time ?
>>
>> Thx
>>
>> Sent from my iPhone
>>
>> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa 
>> wrote:
>>
>> Take a snapshot now, before you get rid of any data (whatever you do,
>> don’t run cleanup).
>>
>> If you identify missing data, you can go back to those snapshots, find
>> the nodes that had the data previously (sstable2json, for example), and
>> either re-stream that data into the cluster with sstableloader or copy it
>> to a new host and `nodetool refresh` it into the new system.
>>
>>
>>
>> From:  on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 8:10 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>> at once?
>>
>> ouch.. OK.. I think I really shot myself in the foot here then.  This
>> might be bad.
>>
>> I'm not sure if I would have missing data.  I mean basically the data is
>> on the other nodes.. but the cluster has been running with 10 nodes
>> accidentally bootstrapped with auto_bootstrap=false.
>>
>> So they have new data and seem to be missing values.
>>
>> this is somewhat misleading... Initially if you start it up and run
>> nodetool status , it only returns one node.
>>
>> So I assumed auto_bootstrap=false meant that it just doesn't join the
>> cluster.
>>
>> I'm running a nodetool repair now to hopefully fix this.
>>
>>
>>
>> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa 
>> wrote:
>>
>>> auto_bootstrap=false tells it to join the cluster without running
>>> bootstrap – the node assumes it has all of the necessary data, and won’t
>>> stream any missing data.
>>>
>>> This generally violates consistency guarantees, but if done on a single
>>> node, is typically correctable with `nodetool repair`.
>>>
>>> If you do it on many  nodes at once, it’s possible that the new nodes
>>> could represent all 3 replicas of the data, but don’t physically have any
>>> of that data, leading to missing records.
>>>
>>>
>>>
>>> From:  on behalf of Kevin Burton
>>> Reply-To: "user@cassandra.apache.org"
>>> Date: Sunday, October 18, 2015 at 3:44 PM
>>> To: "user@cassandra.apache.org"
>>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes
>>> at once?
>>>
>>> An shit.. I think we're seeing corruption.. missing records :-/
>>>
>>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton 
>>> wrote:
>>>
 We just migrated from a 30 node cluster to a 45 node cluster. (so 15
 new nodes)

 By default we have auto_boostrap = false

 so we just push our config to the cluster, the cassandra daemons
 restart, and they're not cluster members and are the only nodes in the
 cluster.

 Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had
 about 7 members of the cluster and 8 not yet joined.

 We are only doing 1 at a time because apparently bootstrapping more
 than 1 is unsafe.

 I did a rolling restart whereby I went through and restarted all the
 cassandra boxes.

 Somehow the new nodes auto boostrapped themselves EVEN though
 auto_bootstrap=false.

 We don't have any errors.  Everything seems functional.  I'm just
 worried about data loss.

 Thoughts?

 Kevin

 --

 We’re hiring if you know of any awesome Java Devops or Linux Operations
 Engineers!

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 


>>>
>>>
>>> --
>>>
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO 

Re: compact/repair shouldn't compete for normal compaction resources.

2015-10-19 Thread Kevin Burton
I think the point I was trying to make is that on highly loaded boxes,
 repair should take lower priority than normal compactions.

Having a throttle on *both* doesn't solve the problem.

So I need a

setcompactionthroughput

and a

setrepairthroughput

and total througput would be the sum of both.

On Mon, Oct 19, 2015 at 8:30 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> The validation compaction part of repair is susceptible to the compaction
> throttling knob `nodetool getcompactionthroughput`
> / `nodetool setcompactionthroughput` and you can use that to tune down the
> resources that are being used by repair.
>
> Check out this post by driftx on advanced repair techniques
> .
>
> Given your other question, I agree with Raj that it might be a good idea
> to decommission the new nodes rather than repairing depending on how much
> data has made it to them and how tight you were on resources before adding
> nodes.
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Sun, Oct 18, 2015 at 8:18 PM, Kevin Burton  wrote:
>
>> I'm doing a big nodetool repair right now and I'm pretty sure the added
>> overhead is impacting our performance.
>>
>> Shouldn't you be able to throttle repair so that normal compactions can
>> use most of the resources?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: BEWARE https://issues.apache.org/jira/browse/CASSANDRA-9504

2015-10-19 Thread Graham Sanderson
- commitlog_sync_batch_window_in_ms behavior has changed from the
  maximum time to wait between fsync to the minimum time.  We are 
  working on making this more user-friendly (see CASSANDRA-9533) but in the
  meantime, this means 2.1 needs a much smaller batch window to keep
  writer threads from starving.  The suggested default is now 2ms.
was added retroactively to NEWS.txt in 2.1.6 which is why it is not obvious

> On Oct 19, 2015, at 11:03 AM, Michael Shuler  wrote:
> 
> On 10/19/2015 10:55 AM, Graham Sanderson wrote:
>> If you had Cassandra 2.0.x (possibly before) and upgraded to Cassandra
>> 2.1, you may have had
>> 
>> commitlog_sync: batch
>> 
>> commitlog_sync_batch_window_in_ms: 25
>> 
>> 
>> in you cassiandra.yaml
>> 
>> It turned out that this was pretty much broken in 2.0 (i.e. fsyncs just
>> happened immediately), but fixed in 2.1, *which meant that every
>> mutation blocked its writer thread for 25ms meaning at 80
>> mutations/sec/writer thread you’d start DROPPING mutations if your write
>> timeout is 2000ms.*
>> 
>> This turns out to be a massive problem if you write fast, and the
>> default commitlog_sync_batch_window_in_ms was changed to 2 ms in 2.1.6
>> as a way of addressing this (with some suggesting 1ms)
>> 
>> Neither of these changes got much fanfare except an eventual reference
>> in CHANGES.TXT
>> 
>> With 2.1.9 if you aren’t doing periodic sync, then I think the new
>> behavior is just to sync whenever the commit logs have a
>> consistent/complete set of mutations ready.
>> 
>> Note this is hard to diagnose because CPU is idle and pretty much all
>> latency metrics (except the overall coordinator write) do not count this
>> time (and you probably weren’t noticing the 25ms write ACK time). It
>> turned out for us that one of our nodes was getting more writes (> 20k
>> mutations per second) which was about the magic number… anything shy of
>> that and everything looked fine, but just by going slightly over, this
>> node was dropping lots of mutations.
> 
> If you would be kind enough to submit a patch to JIRA for NEWS.txt (aligned 
> with the right versions you're warning about) that includes the info 
> upgrading users might need, that would be great!
> 
> -- 
> Kind regards,
> Michael



smime.p7s
Description: S/MIME cryptographic signature