Understanding Cassandra Architecture

2015-08-04 Thread Thouraya TH
Hi all,





1-  I have 4 nodes A, B, C, and D (A is the seed node). I have modified
seed node address and listen address of each Cassandra.yaml file (I have
modified nothing else)



2-

Then, from A:

CREATE KEYSPACE   my_keyspace WITH REPLICATION = {‘class’:’Simple Strategy,
‘replication_factor’:2}

CREATE TABLE my_table  (text varchar PRIMARY KEY);

USE my_keyspace;

INSERT INTO my_table (text)VALUES('text1');



Please, I try to understand the architecture of Cassandra and what happen
exactly in this case:

As I understand:

*Step 1:* Create the ring

1)  The seed node A bootstrap gossip protocol and wait listening other
clients (B, C, D)

2)  The other node joins (contacts) seed node and run gossip process
(to exchange information across the cluster every second)

*Question 1:* at that moment (at this step) the seed node store some
details about clients? It broadcasts details to other clients?

Each node exchanges information across the cluster every second ?



*Step 2:* Creating the key space and Storing rows (replication factor =2)



Thank you so much for help.

Best Reagrds.


auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi everyone,

I'll just ask my question as provocative as possible ;-)

Isnt't auto_bootstrap=false broken the way it is currently implemented?

What currently happens:
New node starts with auto_bootstrap=false and it starts serving reads
immediately without having any data.

Would the following be more correct:
- New node should stay in a joining state
- Operator loads data (e.g. using nodetool rebuild or putting in backupped
files or whatever)
- Operator has to manually switch from joining into normal state using
nodetool (only then it will start serving reads)

Wouldn't this behaviour more consistent?

kind regards,
Christian


RE: auto_bootstrap=false broken?

2015-08-04 Thread aeljami.ext
I had problems with write_survey.
I opened a bug :  https://issues.apache.org/jira/browse/CASSANDRA-9934

De : horschi [mailto:hors...@gmail.com]
Envoyé : mardi 4 août 2015 15:20
À : user@cassandra.apache.org
Objet : Re: auto_bootstrap=false broken?

Hi Paulo,

thanks for your feedback, but I think this is not what I am looking for.

Starting with join_ring does not take any tokens in the ring. And the nodetool 
join afterwards will again do token-selection and data loading in one step.

I would like to separate these steps:
1. assign tokens
2. have the node in a joining state, so that I can copy in data
3. mark the node as ready


I just saw that perhaps write_survey could be misused for that.

Did anyone ever use write_survey for such a partial bootstrapping?
Do I have to worry about data-loss when using multiple write_survey nodes in 
one cluster?

kind regards,
Christian



On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta 
pauloricard...@gmail.commailto:pauloricard...@gmail.com wrote:
Hello Christian,
You may use the start-up parameter -Dcassandra.join_ring=false if you don't 
want the node to join the ring on startup. More about this parameter here: 
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

You can later join the ring via nodetool join command: 
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html
auto_bootstrap=false is typically used to bootstrap new datacenters or 
clusters, or nodes with data already on it before starting the process.
Cheers,
Paulo

2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.commailto:hors...@gmail.com:
Hi everyone,

I'll just ask my question as provocative as possible ;-)

Isnt't auto_bootstrap=false broken the way it is currently implemented?

What currently happens:
New node starts with auto_bootstrap=false and it starts serving reads 
immediately without having any data.

Would the following be more correct:
- New node should stay in a joining state
- Operator loads data (e.g. using nodetool rebuild or putting in backupped 
files or whatever)
- Operator has to manually switch from joining into normal state using nodetool 
(only then it will start serving reads)

Wouldn't this behaviour more consistent?

kind regards,
Christian



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Paulo,

thanks for your feedback, but I think this is not what I am looking for.

Starting with join_ring does not take any tokens in the ring. And the
nodetool join afterwards will again do token-selection and data loading
in one step.

I would like to separate these steps:
1. assign tokens
2. have the node in a joining state, so that I can copy in data
3. mark the node as ready


I just saw that perhaps write_survey could be misused for that.

Did anyone ever use write_survey for such a partial bootstrapping?
Do I have to worry about data-loss when using multiple write_survey nodes
in one cluster?

kind regards,
Christian



On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com
wrote:

 Hello Christian,

 You may use the start-up parameter -Dcassandra.join_ring=false if you
 don't want the node to join the ring on startup. More about this parameter
 here:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

 You can later join the ring via nodetool join command:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html

 auto_bootstrap=false is typically used to bootstrap new datacenters or
 clusters, or nodes with data already on it before starting the process.

 Cheers,

 Paulo

 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com:

 Hi everyone,

 I'll just ask my question as provocative as possible ;-)

 Isnt't auto_bootstrap=false broken the way it is currently implemented?

 What currently happens:
 New node starts with auto_bootstrap=false and it starts serving reads
 immediately without having any data.

 Would the following be more correct:
 - New node should stay in a joining state
 - Operator loads data (e.g. using nodetool rebuild or putting in
 backupped files or whatever)
 - Operator has to manually switch from joining into normal state using
 nodetool (only then it will start serving reads)

 Wouldn't this behaviour more consistent?

 kind regards,
 Christian





Re: auto_bootstrap=false broken?

2015-08-04 Thread Paulo Motta
Hello Christian,

You may use the start-up parameter -Dcassandra.join_ring=false if you don't
want the node to join the ring on startup. More about this parameter here:
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

You can later join the ring via nodetool join command:
http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html

auto_bootstrap=false is typically used to bootstrap new datacenters or
clusters, or nodes with data already on it before starting the process.

Cheers,

Paulo

2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com:

 Hi everyone,

 I'll just ask my question as provocative as possible ;-)

 Isnt't auto_bootstrap=false broken the way it is currently implemented?

 What currently happens:
 New node starts with auto_bootstrap=false and it starts serving reads
 immediately without having any data.

 Would the following be more correct:
 - New node should stay in a joining state
 - Operator loads data (e.g. using nodetool rebuild or putting in backupped
 files or whatever)
 - Operator has to manually switch from joining into normal state using
 nodetool (only then it will start serving reads)

 Wouldn't this behaviour more consistent?

 kind regards,
 Christian



Re: Long joining node

2015-08-04 Thread Sebastian Estevez
That's the one. I set it to an hour to be safe (if a stream goes above the
timeout it will get restarted) but it can probably be lower.

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Aug 4, 2015 at 2:21 PM, Stan Lemon sle...@salesforce.com wrote:

 Sebastian,
 You're referring to streaming_socket_timeout_in_ms correct?  What value do
 you recommend?  All of my nodes are currently at the default 0.

 Thanks,
 Stan


 On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez 
 sebastian.este...@datastax.com wrote:

 It helps to set stream socket timeout in the yaml so that you don't hang
 forever on a lost / broken stream.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax


 http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com
 wrote:

 I am attempting to add a 13th node in one of the datacenters. I have
 been monitoring this process from the node itself with nodetool netstats
 and from one of the existing nodes using nodetool status.

 On the existing node I see the new node as UJ. I have watched the load
 steadily climb up to about 203.4gb, and then over the last two hours it has
 fluctuated a bit and has been steadily dropping to about 203.1gb


 It's probably hung. If I were you I'd probably wipe the node and
 re-bootstrap.

 (what version of cassandra/what network are you on (AWS?)/etc.)

 =Rob







Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Robert,

sorry for the confusion. Perhaps write_survey is not my solution
(unfortunetaly I cant get it to work, so I dont really know). I just
thought that it *could* be my solution.


What I actually want:
I want to be able to start a new node, without it starting to serve reads
prematurely. I want cassandra to wait for me to confirm everything is ok,
now serve reads.



Possible solutions so far:

A) When starting a new node with auto_bootstrap=false, then I get a node
that has no data, but serves reads. In my opinion it would be cleaner if it
would stay in a joining state where it only receives writes.

B) Disabling join_ring on my new node does nothing. The new node will not
have a token. I cant see it in nodetool status. Therefore I assume it
will not receive any writes.

C) write_survey unfortunetaly does not seem to work for me: My new node,
which I start with survey-mode, gets writes from other nodes and shows as
joining in the ring. Which is good! But does not get a schema, so it
throws exceptions when receiving these writes. I assume its just a bug in
2.0.




Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour
(or at least I don't know how).

kind regards,
Christian




On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


 What you're asking doesn't make sense to me.

 What does partial bootstrap mean? Where are you getting the data from?
 How are you copying in data and why do you need the node to be in a
 joining state to do that?

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Explains a method by which you can repair a partially joined node. In what
 way does this differ from what you want?

 =Rob




Re: auto_bootstrap=false broken?

2015-08-04 Thread Jonathan Haddad
You're trying to solve a problem that doesn't exist.  Cassandra only starts
serving reads when it's ready.

On Tue, Aug 4, 2015 at 10:51 AM horschi hors...@gmail.com wrote:

 Hi Robert,

 sorry for the confusion. Perhaps write_survey is not my solution
 (unfortunetaly I cant get it to work, so I dont really know). I just
 thought that it *could* be my solution.


 What I actually want:
 I want to be able to start a new node, without it starting to serve reads
 prematurely. I want cassandra to wait for me to confirm everything is ok,
 now serve reads.



 Possible solutions so far:

 A) When starting a new node with auto_bootstrap=false, then I get a node
 that has no data, but serves reads. In my opinion it would be cleaner if it
 would stay in a joining state where it only receives writes.

 B) Disabling join_ring on my new node does nothing. The new node will not
 have a token. I cant see it in nodetool status. Therefore I assume it
 will not receive any writes.

 C) write_survey unfortunetaly does not seem to work for me: My new node,
 which I start with survey-mode, gets writes from other nodes and shows as
 joining in the ring. Which is good! But does not get a schema, so it
 throws exceptions when receiving these writes. I assume its just a bug in
 2.0.




 Disclaimer: I am using C* 2.0, with which I can't get the desire behaviour
 (or at least I don't know how).

 kind regards,
 Christian




 On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


 What you're asking doesn't make sense to me.

 What does partial bootstrap mean? Where are you getting the data from?
 How are you copying in data and why do you need the node to be in a
 joining state to do that?

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Explains a method by which you can repair a partially joined node. In
 what way does this differ from what you want?

 =Rob





Long joining node

2015-08-04 Thread Stan Lemon
Hello,
I have a a cluster with 12 nodes each in 2 datacenters for a total of 24
nodes.

I am attempting to add a 13th node in one of the datacenters. I have been
monitoring this process from the node itself with nodetool netstats and
from one of the existing nodes using nodetool status.

On the existing node I see the new node as UJ. I have watched the load
steadily climb up to about 203.4gb, and then over the last two hours it has
fluctuated a bit and has been steadily dropping to about 203.1gb

On the node that I am adding I watched over several hours as nodetool
netstats received data, however for the last couple of hours nodetool
netsats simply shows the ips of the other nodes in the cluster.  It looks
something like to this...

Mode: JOINING
Bootstrap 659153b0-3ab6-11e5-8c94-5dd79366f3d9
/10.1.82.160
/10.1.82.162
/10.1.82.80
/10.2.123.74
/10.1.82.166
/10.1.82.158
/10.1.82.168
/10.1.82.150
/10.1.82.148
/10.2.123.2
/10.1.82.152
/10.1.82.156
/10.84.78.120
/10.2.123.80
/10.2.123.78
/10.81.122.64
/10.2.123.82
/10.2.123.84
/10.1.82.164
/10.81.122.62
/10.2.123.76
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool NameActive   Pending  Completed
Commandsn/a 0 24
Responses   n/a 01090793


So I'm trying to figure out... What is the node doing? Why is it still
joining? How long should I wait before being concerned?

Also...

The UUID next to the word 'Bootstrap' is NOT the host ID of the node
joining, it's actually the UUID of a different node already in the cluster.
This seems concerning to me, but again I'm not sure if this is expected
behavior or not.

ANY help would be greatly appreciated.

Thanks,
Stan


Re: Long joining node

2015-08-04 Thread Sebastian Estevez
It helps to set stream socket timeout in the yaml so that you don't hang
forever on a lost / broken stream.

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote:

 I am attempting to add a 13th node in one of the datacenters. I have been
 monitoring this process from the node itself with nodetool netstats and
 from one of the existing nodes using nodetool status.

 On the existing node I see the new node as UJ. I have watched the load
 steadily climb up to about 203.4gb, and then over the last two hours it has
 fluctuated a bit and has been steadily dropping to about 203.1gb


 It's probably hung. If I were you I'd probably wipe the node and
 re-bootstrap.

 (what version of cassandra/what network are you on (AWS?)/etc.)

 =Rob




Re: auto_bootstrap=false broken?

2015-08-04 Thread Robert Coli
On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


What you're asking doesn't make sense to me.

What does partial bootstrap mean? Where are you getting the data from?
How are you copying in data and why do you need the node to be in a
joining state to do that?

https://issues.apache.org/jira/browse/CASSANDRA-6961

Explains a method by which you can repair a partially joined node. In what
way does this differ from what you want?

=Rob


Re: Long joining node

2015-08-04 Thread Stan Lemon
Sebastian,
You're referring to streaming_socket_timeout_in_ms correct?  What value do
you recommend?  All of my nodes are currently at the default 0.

Thanks,
Stan


On Tue, Aug 4, 2015 at 2:16 PM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 It helps to set stream socket timeout in the yaml so that you don't hang
 forever on a lost / broken stream.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax


 http://cassandrasummit-datastax.com/?utm_campaign=summit15utm_medium=summiticonutm_source=emailsignature

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com
 wrote:

 I am attempting to add a 13th node in one of the datacenters. I have
 been monitoring this process from the node itself with nodetool netstats
 and from one of the existing nodes using nodetool status.

 On the existing node I see the new node as UJ. I have watched the load
 steadily climb up to about 203.4gb, and then over the last two hours it has
 fluctuated a bit and has been steadily dropping to about 203.1gb


 It's probably hung. If I were you I'd probably wipe the node and
 re-bootstrap.

 (what version of cassandra/what network are you on (AWS?)/etc.)

 =Rob






Re: Long joining node

2015-08-04 Thread Stan Lemon
I'm running 2.0.11. These nodes are on bare metal at Soft Layer.

So after I sent my first post logs a RuntimeException popped up in the
logs, not sure if this might be related?

ERROR 14:13:17,648 Exception in thread Thread[CompactionExecutor:505,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(7750448305743929847,
003076697369746f725f706167655f76696561316261636466313364626531633637326636336366646636636138333032314930303030616336342d303030302d303033302d343030302d3030303030303030616336343a64323065323964632d643836342d313165342d383030302d303030303063366500)
= current key DecoratedKey(-8457751561836812744,
002c6f626a6563745f617564697438616462666462383532646536373263393534633831326130663837326466375230303030326263322d303030302d303033302d343030302d3030303030303030326263323a30303030326263322d303030302d303033302d343030302d3030303030336331373630303a50726f737065637400)
writing into
/var/lib/cassandra/data/pi/__shardindex/pi-__shardindex-tmp-jb-657-Data.db
at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:143)
at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:166)
at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:170)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)



On Tue, Aug 4, 2015 at 2:14 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote:

 I am attempting to add a 13th node in one of the datacenters. I have been
 monitoring this process from the node itself with nodetool netstats and
 from one of the existing nodes using nodetool status.

 On the existing node I see the new node as UJ. I have watched the load
 steadily climb up to about 203.4gb, and then over the last two hours it has
 fluctuated a bit and has been steadily dropping to about 203.1gb


 It's probably hung. If I were you I'd probably wipe the node and
 re-bootstrap.

 (what version of cassandra/what network are you on (AWS?)/etc.)

 =Rob




Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Aeljami,

thanks for the ticket. I'll keep an eye on it.

I can't get the survey to work at all on 2.0 (I am not getting any schema
on the survey node). So I guess the survey is not going to be a solution
for now.

kind regards,
Christian


On Tue, Aug 4, 2015 at 3:29 PM, aeljami@orange.com wrote:

 I had problems with write_survey.

 I opened a bug :  https://issues.apache.org/jira/browse/CASSANDRA-9934



 *De :* horschi [mailto:hors...@gmail.com]
 *Envoyé :* mardi 4 août 2015 15:20
 *À :* user@cassandra.apache.org
 *Objet :* Re: auto_bootstrap=false broken?



 Hi Paulo,



 thanks for your feedback, but I think this is not what I am looking for.



 Starting with join_ring does not take any tokens in the ring. And the
 nodetool join afterwards will again do token-selection and data loading
 in one step.



 I would like to separate these steps:

 1. assign tokens

 2. have the node in a joining state, so that I can copy in data

 3. mark the node as ready





 I just saw that perhaps write_survey could be misused for that.



 Did anyone ever use write_survey for such a partial bootstrapping?

 Do I have to worry about data-loss when using multiple write_survey nodes
 in one cluster?



 kind regards,

 Christian







 On Tue, Aug 4, 2015 at 2:24 PM, Paulo Motta pauloricard...@gmail.com
 wrote:

 Hello Christian,

 You may use the start-up parameter -Dcassandra.join_ring=false if you
 don't want the node to join the ring on startup. More about this parameter
 here:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsCUtility_t.html

 You can later join the ring via nodetool join command:
 http://docs.datastax.com/en/cassandra/2.0/cassandra/tools/toolsJoin.html

 auto_bootstrap=false is typically used to bootstrap new datacenters or
 clusters, or nodes with data already on it before starting the process.

 Cheers,

 Paulo



 2015-08-04 8:50 GMT-03:00 horschi hors...@gmail.com:

 Hi everyone,



 I'll just ask my question as provocative as possible ;-)



 Isnt't auto_bootstrap=false broken the way it is currently implemented?



 What currently happens:

 New node starts with auto_bootstrap=false and it starts serving reads
 immediately without having any data.



 Would the following be more correct:

 - New node should stay in a joining state

 - Operator loads data (e.g. using nodetool rebuild or putting in backupped
 files or whatever)

 - Operator has to manually switch from joining into normal state using
 nodetool (only then it will start serving reads)



 Wouldn't this behaviour more consistent?



 kind regards,

 Christian





 _

 Ce message et ses pieces jointes peuvent contenir des informations 
 confidentielles ou privilegiees et ne doivent donc
 pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
 ce message par erreur, veuillez le signaler
 a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
 electroniques etant susceptibles d'alteration,
 Orange decline toute responsabilite si ce message a ete altere, deforme ou 
 falsifie. Merci.

 This message and its attachments may contain confidential or privileged 
 information that may be protected by law;
 they should not be distributed, used or copied without authorisation.
 If you have received this email in error, please notify the sender and delete 
 this message and its attachments.
 As emails may be altered, Orange is not liable for messages that have been 
 modified, changed or falsified.
 Thank you.




Re: Long joining node

2015-08-04 Thread Robert Coli
On Tue, Aug 4, 2015 at 11:02 AM, Stan Lemon sle...@salesforce.com wrote:

 I am attempting to add a 13th node in one of the datacenters. I have been
 monitoring this process from the node itself with nodetool netstats and
 from one of the existing nodes using nodetool status.

 On the existing node I see the new node as UJ. I have watched the load
 steadily climb up to about 203.4gb, and then over the last two hours it has
 fluctuated a bit and has been steadily dropping to about 203.1gb


It's probably hung. If I were you I'd probably wipe the node and
re-bootstrap.

(what version of cassandra/what network are you on (AWS?)/etc.)

=Rob


Re: auto_bootstrap=false broken?

2015-08-04 Thread horschi
Hi Jonathan,

unless you specify auto_bootstrap=false :)

kind regards,
Christian

On Tue, Aug 4, 2015 at 7:54 PM, Jonathan Haddad j...@jonhaddad.com wrote:

 You're trying to solve a problem that doesn't exist.  Cassandra only
 starts serving reads when it's ready.

 On Tue, Aug 4, 2015 at 10:51 AM horschi hors...@gmail.com wrote:

 Hi Robert,

 sorry for the confusion. Perhaps write_survey is not my solution
 (unfortunetaly I cant get it to work, so I dont really know). I just
 thought that it *could* be my solution.


 What I actually want:
 I want to be able to start a new node, without it starting to serve reads
 prematurely. I want cassandra to wait for me to confirm everything is ok,
 now serve reads.



 Possible solutions so far:

 A) When starting a new node with auto_bootstrap=false, then I get a node
 that has no data, but serves reads. In my opinion it would be cleaner if it
 would stay in a joining state where it only receives writes.

 B) Disabling join_ring on my new node does nothing. The new node will not
 have a token. I cant see it in nodetool status. Therefore I assume it
 will not receive any writes.

 C) write_survey unfortunetaly does not seem to work for me: My new node,
 which I start with survey-mode, gets writes from other nodes and shows as
 joining in the ring. Which is good! But does not get a schema, so it
 throws exceptions when receiving these writes. I assume its just a bug in
 2.0.




 Disclaimer: I am using C* 2.0, with which I can't get the desire
 behaviour (or at least I don't know how).

 kind regards,
 Christian




 On Tue, Aug 4, 2015 at 7:12 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 4, 2015 at 6:19 AM, horschi hors...@gmail.com wrote:

 I would like to separate these steps:
 1. assign tokens
 2. have the node in a joining state, so that I can copy in data
 3. mark the node as ready



 Did anyone ever use write_survey for such a partial bootstrapping?


 What you're asking doesn't make sense to me.

 What does partial bootstrap mean? Where are you getting the data from?
 How are you copying in data and why do you need the node to be in a
 joining state to do that?

 https://issues.apache.org/jira/browse/CASSANDRA-6961

 Explains a method by which you can repair a partially joined node. In
 what way does this differ from what you want?

 =Rob





Re: auto_bootstrap=false broken?

2015-08-04 Thread Robert Coli
On Tue, Aug 4, 2015 at 11:40 AM, horschi hors...@gmail.com wrote:

 unless you specify auto_bootstrap=false :)


... so why are you doing that?

Two experts are confused as to what you're trying to do; why do you think
you need to do it?

=Rob


Re: Understanding Cassandra Architecture

2015-08-04 Thread Patrick McFadin
Thouraya,

Assuming that you have started all 4 nodes in step 1, then your cluster is
setup and running. Gossip just relays state about the cluster members, not
much else.

The seed node isn't special in any way, it's just the node you have
designated the other nodes to contact if they need to get cluster
information. Keep in mind, there are no master nodes or leaders. This is
peer-to-peer so each node should have a complete picture of the cluster
stored in the system keyspace after joining the cluster.

One other thing. You will want to

USE my_keyspace;

Before

CREATE TABLE my_table  (text varchar PRIMARY KEY);

In your question, the clients will ask the node they connect to for cluster
information, but client state is not stored in Cassanda.

Patrick



On Tue, Aug 4, 2015 at 7:09 AM, Thouraya TH thouray...@gmail.com wrote:

 Hi all,





 1-  I have 4 nodes A, B, C, and D (A is the seed node). I have modified
 seed node address and listen address of each Cassandra.yaml file (I have
 modified nothing else)



 2-

 Then, from A:

 CREATE KEYSPACE   my_keyspace WITH REPLICATION = {‘class’:’Simple
 Strategy, ‘replication_factor’:2}

 CREATE TABLE my_table  (text varchar PRIMARY KEY);

 USE my_keyspace;

 INSERT INTO my_table (text)VALUES('text1');



 Please, I try to understand the architecture of Cassandra and what happen
 exactly in this case:

 As I understand:

 *Step 1:* Create the ring

 1)  The seed node A bootstrap gossip protocol and wait listening
 other clients (B, C, D)

 2)  The other node joins (contacts) seed node and run gossip process
 (to exchange information across the cluster every second)

 *Question 1:* at that moment (at this step) the seed node store some
 details about clients? It broadcasts details to other clients?

 Each node exchanges information across the cluster every second ?



 *Step 2:* Creating the key space and Storing rows (replication factor =2)



 Thank you so much for help.

 Best Reagrds.



TTLs on tables with *only* primary keys?

2015-08-04 Thread Kevin Burton
I have a table which just has primary keys.

basically:

create table foo (

sequence bigint,
signature text,
primary key( sequence, signature )
)

I need these to eventually get GCd however it doesn’t seem to work.

If I then run:

select ttl(sequence) from foo;

I get:

Cannot use selection function ttl on PRIMARY KEY part sequence

…

I get the same thing if I do it on the second column .. (signature).

And the value doesn’t seem to be TTLd.

What’s the best way to proceed here?


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts


Re: Cassandra Data Stax java driver Snappy Compression library

2015-08-04 Thread Sachin Nikam
Janne,
A little clarification i found snappy-java-1.0.4.1.jar on class path. But
other questions still remain.

On Tue, Aug 4, 2015 at 8:24 PM, Sachin Nikam skni...@gmail.com wrote:

 Janne,
 Thanks for continuing to take the time to answer my queries. We noticed
 that write latency (tp99) from Services S1 and S2 is 50% of the write
 latency (tp99) for Service S3. I also noticed that S1 and S2, which also
 use astyanax client library also have compress-lzf.jar on their class path.
 Although the table is defined to use Snappy Compression. Is this
 compression library or some other transitive dependency pulled in by
 Astyanax enabling compression of the payload i.e. sent over the wire and
 account for the difference in tp99?
 Regards
 Sachin

 On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen janne.jalka...@ecyrd.com
 wrote:


 Correct. Note that you may lose some performance this way though; in a
 typical case saving bandwidth by increasing CPU usage is good. However, it
 always depends on your usecase and whether you’re running your cluster to
 the max. It’s a good, low-hanging optimization to keep in mind though for
 production environments, if you choose not to enable compression now.

 /Janne

 On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com wrote:

 Thanks Janne...
 To clarify, Service S3 should not run in to any issues and I may choose
 to not fix the issue?
 Regards
 Sachin

 On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com
  wrote:

 No, this just tells that your client (S3 using Datastax driver) cannot
 communicate to the Cassandra cluster using a compressed protocol, since the
 necessary libraries are missing on the client side.  Servers will still
 compress the data they receive when they write it to disk.

 In other words

 Client  - [uncompressed data] - Server - [compressed data] - Disk.

 To fix, make sure that the Snappy libraries are in the classpath of your
 S3 service application.  As always, there’s no guarantee that this improves
 your performance, since if your app is already CPU-heavy, the extra CPU
 overhead of compression *may* be a problem.  So measure :-)

 /Janne

 On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com wrote:

 I am currently running a Cassandra 1.2 cluster. This cluster has 2
 tables i.e.
 TableA and TableB.

 TableA is read and written to by Services S1 and S2 which use Astyanax
 client library.

 TableB is read and written by Service S3 which uses the datastax java
 driver 2.1. S3 also reads data from TableA.

 Both TableA and TableB are defined on the Cassandra nodes to use
 SnappyCompressor.

 On start-up service, Service S3 throws the following WARNing messages.
 The service is able to continue doing its normal operation thereafter

 **
 [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
 find Snappy class, you should make sure the Snappy library is in the
 classpath if you intend to use it. Snappy compression will not be
 available for the protocol.
 ***


 My questions are as follows--
 #1. Does the compression happen on the cassandra client side or within
 cassandra server side itself?
 #2. Does Service S3 need to pull in additional dependencies for Snappy
 Compressions as mentioned here --

 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
 #3. What happens without this additional library not being present on
 class path of Service S3. Any data that S3 writes to TableB will not be
 compressed?
 Regards
 Sachin








Retrieve all the columnfamily / tables of thrift and CQL from the keyspace in cassandra

2015-08-04 Thread Shuo Chen
Hi,

I use hector to manipulate cassandra of version 2.1.8 and want to retrieve
all the tables from certain keyspace in an application. I use
KeyspaceDefinition.getCfDefs() to retrieve the columnfamily list in a
keyspace.

However, I found that the getCfDefs() function can just retrieve the
columnfamily created by the thrift api such as
me.prettyprint.hector.api.Cluster.updateColumnFamily but not table
created by CQL such as cqlsh client.

Then, how to retrieve all the tables from certain keyspace?

I asked the same question on stackoverflow

http://stackoverflow.com/questions/31804797/retrieve-all-the-columnfamily-tables-of-thrift-and-cql-from-the-keyspace-in-ca



Shuo Chen


Re: Cassandra Data Stax java driver Snappy Compression library

2015-08-04 Thread Sachin Nikam
Janne,
Thanks for continuing to take the time to answer my queries. We noticed
that write latency (tp99) from Services S1 and S2 is 50% of the write
latency (tp99) for Service S3. I also noticed that S1 and S2, which also
use astyanax client library also have compress-lzf.jar on their class path.
Although the table is defined to use Snappy Compression. Is this
compression library or some other transitive dependency pulled in by
Astyanax enabling compression of the payload i.e. sent over the wire and
account for the difference in tp99?
Regards
Sachin

On Mon, Aug 3, 2015 at 12:14 AM, Janne Jalkanen janne.jalka...@ecyrd.com
wrote:


 Correct. Note that you may lose some performance this way though; in a
 typical case saving bandwidth by increasing CPU usage is good. However, it
 always depends on your usecase and whether you’re running your cluster to
 the max. It’s a good, low-hanging optimization to keep in mind though for
 production environments, if you choose not to enable compression now.

 /Janne

 On 3 Aug 2015, at 08:40, Sachin Nikam skni...@gmail.com wrote:

 Thanks Janne...
 To clarify, Service S3 should not run in to any issues and I may choose to
 not fix the issue?
 Regards
 Sachin

 On Sat, Aug 1, 2015 at 11:50 PM, Janne Jalkanen janne.jalka...@ecyrd.com
 wrote:

 No, this just tells that your client (S3 using Datastax driver) cannot
 communicate to the Cassandra cluster using a compressed protocol, since the
 necessary libraries are missing on the client side.  Servers will still
 compress the data they receive when they write it to disk.

 In other words

 Client  - [uncompressed data] - Server - [compressed data] - Disk.

 To fix, make sure that the Snappy libraries are in the classpath of your
 S3 service application.  As always, there’s no guarantee that this improves
 your performance, since if your app is already CPU-heavy, the extra CPU
 overhead of compression *may* be a problem.  So measure :-)

 /Janne

 On 02 Aug 2015, at 02:17 , Sachin Nikam skni...@gmail.com wrote:

 I am currently running a Cassandra 1.2 cluster. This cluster has 2 tables
 i.e.
 TableA and TableB.

 TableA is read and written to by Services S1 and S2 which use Astyanax
 client library.

 TableB is read and written by Service S3 which uses the datastax java
 driver 2.1. S3 also reads data from TableA.

 Both TableA and TableB are defined on the Cassandra nodes to use
 SnappyCompressor.

 On start-up service, Service S3 throws the following WARNing messages.
 The service is able to continue doing its normal operation thereafter

 **
 [main] WARN loggerClass=com.datastax.driver.core.FrameCompressor;Cannot
 find Snappy class, you should make sure the Snappy library is in the
 classpath if you intend to use it. Snappy compression will not be
 available for the protocol.
 ***


 My questions are as follows--
 #1. Does the compression happen on the cassandra client side or within
 cassandra server side itself?
 #2. Does Service S3 need to pull in additional dependencies for Snappy
 Compressions as mentioned here --

 http://stackoverflow.com/questions/21784149/getting-cassandra-connection-error
 #3. What happens without this additional library not being present on
 class path of Service S3. Any data that S3 writes to TableB will not be
 compressed?
 Regards
 Sachin