Re: Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread sudheer k
Thanks Vladimir!! I did that and it's working good now. It is taking new
location when I reloadtriggers. Really appreciate your support!

Regards
Sudheer

On Thursday, October 13, 2016, Vladimir Yudovin 
wrote:

> Yes, pass this argument to bin/cassandra script:
> *bin/cassandra -Djava.io.tmpdir=/path/to/tmpdir*
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Thu, 13 Oct 2016 00:36:35 -0400*sudheer k
>  >* wrote 
>
> Sorry for the confusion. I didn't see that command line arguments you told
> in the mail. So, this arguments needs to be passed when I start Cassandra?
>
> Regards
> Sudheer
>
> On Thursday, October 13, 2016, sudheer k  > wrote:
>
>
>
> --
> --
> Regards
> Sudheer
>
> Appreciate your reply Vladimir! Is this the configuration I need to
> include in Cassandra-env.sh file?
>
> Regards
> Sudheer
>
> On Thursday, October 13, 2016, Vladimir Yudovin 
> wrote:
>
>
> Hi,
>
> where can I change that default location /tmp/lib it is using for creating
> the jar files?
> Cassandra uses Java property java.io.tmpdir as temporary folder. By
> default it's /tmp but can be changed with command line arguments:
> *cassandra -Djava.io.tmpdir=/path/to/tmpdir*
>
> Best regards, Vladimir Yudovin,
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Wed, 12 Oct 2016 23:54:58 -0400*sudheer k
> * wrote 
>
> Hi All,
>
> I faced issue with triggers today. Below is the clear description of the
> issue:
>
> 1) When we planned to use triggers, we placed the triggers.jar file in
> /conf/triggers folder in Cassandra, restarted the Cassandra service and
> created a trigger in CQLSH. Everything was working good as expected.
>
> 2) Recently we started getting the below error in Cassandra logs saying
> trigger.jar is missing in /tmp/lib directory folder.
>
> ServerError: 
>
> 3) We thought that the jar is corrupted somehow and reloaded the triggers
> again by placing the newly created jar and it worked fine.
>
> 4) We saw the similar error in other environments and saw that /tmp/lib
> folder is not present in the server as our open stack servers have a
> retention policy of 15 days and after that it deletes the /tmp/ files. As
> the /tmp/ files are deleted, Cassandra is still pointing to the deleted
> trigger.jar and not allowing the application to insert the records into the
> table and throwing the errors.
>
> 5) So we came to know like whenever we reloadtriggers, it is creating a
> new jar file in /tmp/lib/ directory as *cassandra-8754700968157790389.jar
> *(numbers keep changing) and using this jar file for reference but not
> using the jar that we placed in /conf/triggers folder.
>
> 6) I just need to know where can I change that default location /tmp/lib
> it is using for creating the jar files?
> Whether I can change that directory location to something else and use for
> my Cassandra servers alone?
> If there is any other solution, it is highly appreciated.
>
> Note: Our management doesn't want to change the /tmp/ directory retention
> policy as it is common for all our servers in all the environments.
>
> --
> Regards
> Sudheer
>
>
>
>
> --
> --
> Regards
> Sudheer
>
>
>

-- 
--
Regards
Sudheer


Re: Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread Vladimir Yudovin
Yes, pass this argument to bin/cassandra script: 

bin/cassandra -Djava.io.tmpdir=/path/to/tmpdir





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Thu, 13 Oct 2016 00:36:35 -0400sudheer k 
sudheer.hdp...@gmail.com wrote 




Sorry for the confusion. I didn't see that command line arguments you told in 
the mail. So, this arguments needs to be passed when I start Cassandra?



Regards

Sudheer



On Thursday, October 13, 2016, sudheer k sudheer.hdp...@gmail.com wrote:








-- 

--

Regards

Sudheer







Appreciate your reply Vladimir! Is this the configuration I need to include in 
Cassandra-env.sh file?



Regards

Sudheer



On Thursday, October 13, 2016, Vladimir Yudovin vla...@winguzone.com 
wrote:



Hi,



where can I change that default location /tmp/lib it is using for creating the 
jar files? 

Cassandra uses Java property java.io.tmpdir as temporary folder. By default 
it's /tmp but can be changed with command line arguments:

cassandra -Djava.io.tmpdir=/path/to/tmpdir



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Wed, 12 Oct 2016 23:54:58 -0400sudheer k 
sudheer.hdp...@gmail.com wrote 




Hi All,



I faced issue with triggers today. Below is the clear description of the issue:



1) When we planned to use triggers, we placed the triggers.jar file in 
/conf/triggers folder in Cassandra, restarted the Cassandra service and created 
a trigger in CQLSH. Everything was working good as expected.



2) Recently we started getting the below error in Cassandra logs saying 
trigger.jar is missing in /tmp/lib directory folder.



ServerError: ErrorMessage code= [Server error] 
message="com.sun.jersey.spi.service.ServiceConfigurationError: 
com.sun.jersey.spi.inject.InjectableProvider: : java.io.FileNotFoundException: 
/tmp/lib/cassandra-8906616690931579554.jar (No such file or directory)"



3) We thought that the jar is corrupted somehow and reloaded the triggers again 
by placing the newly created jar and it worked fine.



4) We saw the similar error in other environments and saw that /tmp/lib folder 
is not present in the server as our open stack servers have a retention policy 
of 15 days and after that it deletes the /tmp/ files. As the /tmp/ files are 
deleted, Cassandra is still pointing to the deleted trigger.jar and not 
allowing the application to insert the records into the table and throwing the 
errors.



5) So we came to know like whenever we reloadtriggers, it is creating a new jar 
file in /tmp/lib/ directory as cassandra-8754700968157790389.jar (numbers keep 
changing) and using this jar file for reference but not using the jar that we 
placed in /conf/triggers folder.



6) I just need to know where can I change that default location /tmp/lib it is 
using for creating the jar files? 

Whether I can change that directory location to something else and use for my 
Cassandra servers alone? 

If there is any other solution, it is highly appreciated.



Note: Our management doesn't want to change the /tmp/ directory retention 
policy as it is common for all our servers in all the environments.




--

Regards

Sudheer




















-- 

--

Regards

Sudheer












Re: Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread sudheer k
Sorry for the confusion. I didn't see that command line arguments you told
in the mail. So, this arguments needs to be passed when I start Cassandra?

Regards
Sudheer

On Thursday, October 13, 2016, sudheer k  wrote:

> Appreciate your reply Vladimir! Is this the configuration I need to
> include in Cassandra-env.sh file?
>
> Regards
> Sudheer
>
> On Thursday, October 13, 2016, Vladimir Yudovin  > wrote:
>
>> Hi,
>>
>> where can I change that default location /tmp/lib it is using for
>> creating the jar files?
>> Cassandra uses Java property java.io.tmpdir as temporary folder. By
>> default it's /tmp but can be changed with command line arguments:
>> *cassandra -Djava.io.tmpdir=/path/to/tmpdir*
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone  - Hosted Cloud Cassandra on
>> Azure and SoftLayer.Launch your cluster in minutes.*
>>
>>
>>  On Wed, 12 Oct 2016 23:54:58 -0400*sudheer k
>> * wrote 
>>
>> Hi All,
>>
>> I faced issue with triggers today. Below is the clear description of the
>> issue:
>>
>> 1) When we planned to use triggers, we placed the triggers.jar file in
>> /conf/triggers folder in Cassandra, restarted the Cassandra service and
>> created a trigger in CQLSH. Everything was working good as expected.
>>
>> 2) Recently we started getting the below error in Cassandra logs saying
>> trigger.jar is missing in /tmp/lib directory folder.
>>
>> ServerError: 
>>
>> 3) We thought that the jar is corrupted somehow and reloaded the triggers
>> again by placing the newly created jar and it worked fine.
>>
>> 4) We saw the similar error in other environments and saw that /tmp/lib
>> folder is not present in the server as our open stack servers have a
>> retention policy of 15 days and after that it deletes the /tmp/ files. As
>> the /tmp/ files are deleted, Cassandra is still pointing to the deleted
>> trigger.jar and not allowing the application to insert the records into the
>> table and throwing the errors.
>>
>> 5) So we came to know like whenever we reloadtriggers, it is creating a
>> new jar file in /tmp/lib/ directory as *cassandra-8754700968157790389.jar
>> *(numbers keep changing) and using this jar file for reference but not
>> using the jar that we placed in /conf/triggers folder.
>>
>> 6) I just need to know where can I change that default location /tmp/lib
>> it is using for creating the jar files?
>> Whether I can change that directory location to something else and use
>> for my Cassandra servers alone?
>> If there is any other solution, it is highly appreciated.
>>
>> Note: Our management doesn't want to change the /tmp/ directory retention
>> policy as it is common for all our servers in all the environments.
>>
>> --
>> Regards
>> Sudheer
>>
>>
>>
>
> --
> --
> Regards
> Sudheer
>
>

-- 
--
Regards
Sudheer


Re: Inserting list data

2016-10-12 Thread Vladimir Yudovin
The data is actually appended. not overwritten.

Strange, can you send exactly operators?



Here is example I do:

CREATE KEYSPACE events WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};

CREATE TABLE events.data (id int primary key, events listtext);

INSERT INTO events.data (id, events) VALUES ( 0, ['a']);

SELECT * FROM events.data ;

 id | events

+

  0 |  ['a']



(1 rows)



INSERT INTO events.data (id, events) VALUES ( 0, ['b']);

SELECT * FROM events.data ;

 id | events

+

  0 |  ['b']



(1 rows)



As you see, 'a' was overwritten by 'b'





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Wed, 12 Oct 2016 23:58:23 -0400Aoi Kadoya cadyan@gmail.com 
wrote 




yes, that's what I thought. but, when I use these forms, 

INSERT ... ['A'] 

INSERT ... ['B'] 



The data is actually appended. not overwritten. 

so I guess this is something unexpected? 



Thanks, 

Aoi 



2016-10-12 20:55 GMT-07:00 Vladimir Yudovin vla...@winguzone.com: 

 If you use form 

 INSERT ... ['A'] 

 INSERT ... ['B'] 

 

 latest INSERT will overwrite first, because this insert the whole list. 
It's 

 better to use UPDATE like: 

 UPDATE ... SET events = events + ['A'] 

 UPDATE ... SET events = events + ['B'] 

 These operations add new elements to the end of existing list. 

 

 

 From here 
https://docs.datastax.com/en/cql/3.0/cql/cql_using/use_list_t.html 

 : 

 

 These update operations are implemented internally without any 

 read-before-write. Appending and prepending a new element to the list 
writes 

 only the new element. 

 

 

 Best regards, Vladimir Yudovin, 

 Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer. 

 Launch your cluster in minutes. 

 

 

  On Wed, 12 Oct 2016 17:39:46 -0400Aoi Kadoya 
cadyan@gmail.com 

 wrote  

 

 Hi, 

 

 When inserting different data into a list type column from different 

 clients at the same time, is data supposed to be combined into one 

 list? 

 

 For example, if these 2 queries were requested from clients at the 

 same timing, how events list should look like after? 

 

 INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES 

 (2015, 06, ['A']); 

 INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES 

 (2015, 06, ['B']); 

 

 In my understanding, each operation should be treated as atomic, which 

 makes me think that even if client throw the queries at the same time, 

 cassandra would take them separately and the last insert would update 

 the events list. (= data should be either ['A'] or ['B']) 

 

 In my environment, I found that some data was saved as like ['A',B'] 

 in the case like above. 

 Is this expected behavior of list data type? 

 

 I am still new to cassandra and trying to make myself understood how 

 this happened. 

 Appreciate if you could help me with figuring this out! 

 

 Thanks, 

 Aoi 

 

 








Re: Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread sudheer k
Appreciate your reply Vladimir! Is this the configuration I need to
include in Cassandra-env.sh file?

Regards
Sudheer

On Thursday, October 13, 2016, Vladimir Yudovin 
wrote:

> Hi,
>
> where can I change that default location /tmp/lib it is using for creating
> the jar files?
> Cassandra uses Java property java.io.tmpdir as temporary folder. By
> default it's /tmp but can be changed with command line arguments:
> *cassandra -Djava.io.tmpdir=/path/to/tmpdir*
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Wed, 12 Oct 2016 23:54:58 -0400*sudheer k
>  >* wrote 
>
> Hi All,
>
> I faced issue with triggers today. Below is the clear description of the
> issue:
>
> 1) When we planned to use triggers, we placed the triggers.jar file in
> /conf/triggers folder in Cassandra, restarted the Cassandra service and
> created a trigger in CQLSH. Everything was working good as expected.
>
> 2) Recently we started getting the below error in Cassandra logs saying
> trigger.jar is missing in /tmp/lib directory folder.
>
> ServerError:  message="com.sun.jersey.spi.service.ServiceConfigurationError:
> com.sun.jersey.spi.inject.InjectableProvider: :
> java.io.FileNotFoundException: /tmp/lib/cassandra-8906616690931579554.jar
> (No such file or directory)">
>
> 3) We thought that the jar is corrupted somehow and reloaded the triggers
> again by placing the newly created jar and it worked fine.
>
> 4) We saw the similar error in other environments and saw that /tmp/lib
> folder is not present in the server as our open stack servers have a
> retention policy of 15 days and after that it deletes the /tmp/ files. As
> the /tmp/ files are deleted, Cassandra is still pointing to the deleted
> trigger.jar and not allowing the application to insert the records into the
> table and throwing the errors.
>
> 5) So we came to know like whenever we reloadtriggers, it is creating a
> new jar file in /tmp/lib/ directory as *cassandra-8754700968157790389.jar
> *(numbers keep changing) and using this jar file for reference but not
> using the jar that we placed in /conf/triggers folder.
>
> 6) I just need to know where can I change that default location /tmp/lib
> it is using for creating the jar files?
> Whether I can change that directory location to something else and use for
> my Cassandra servers alone?
> If there is any other solution, it is highly appreciated.
>
> Note: Our management doesn't want to change the /tmp/ directory retention
> policy as it is common for all our servers in all the environments.
>
> --
> Regards
> Sudheer
>
>
>

-- 
--
Regards
Sudheer


Re: Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread Vladimir Yudovin
Hi,



where can I change that default location /tmp/lib it is using for creating the 
jar files? 

Cassandra uses Java property java.io.tmpdir as temporary folder. By default 
it's /tmp but can be changed with command line arguments:

cassandra -Djava.io.tmpdir=/path/to/tmpdir



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Wed, 12 Oct 2016 23:54:58 -0400sudheer k 
sudheer.hdp...@gmail.com wrote 




Hi All,



I faced issue with triggers today. Below is the clear description of the issue:



1) When we planned to use triggers, we placed the triggers.jar file in 
/conf/triggers folder in Cassandra, restarted the Cassandra service and created 
a trigger in CQLSH. Everything was working good as expected.



2) Recently we started getting the below error in Cassandra logs saying 
trigger.jar is missing in /tmp/lib directory folder.



ServerError: ErrorMessage code= [Server error] 
message="com.sun.jersey.spi.service.ServiceConfigurationError: 
com.sun.jersey.spi.inject.InjectableProvider: : java.io.FileNotFoundException: 
/tmp/lib/cassandra-8906616690931579554.jar (No such file or directory)"



3) We thought that the jar is corrupted somehow and reloaded the triggers again 
by placing the newly created jar and it worked fine.



4) We saw the similar error in other environments and saw that /tmp/lib folder 
is not present in the server as our open stack servers have a retention policy 
of 15 days and after that it deletes the /tmp/ files. As the /tmp/ files are 
deleted, Cassandra is still pointing to the deleted trigger.jar and not 
allowing the application to insert the records into the table and throwing the 
errors.



5) So we came to know like whenever we reloadtriggers, it is creating a new jar 
file in /tmp/lib/ directory as cassandra-8754700968157790389.jar (numbers keep 
changing) and using this jar file for reference but not using the jar that we 
placed in /conf/triggers folder.



6) I just need to know where can I change that default location /tmp/lib it is 
using for creating the jar files? 

Whether I can change that directory location to something else and use for my 
Cassandra servers alone? 

If there is any other solution, it is highly appreciated.



Note: Our management doesn't want to change the /tmp/ directory retention 
policy as it is common for all our servers in all the environments.




--

Regards

Sudheer
















Re: Inserting list data

2016-10-12 Thread Aoi Kadoya
yes, that's what I thought. but, when I use these forms,
INSERT ... ['A']
INSERT ... ['B']

The data is actually appended. not overwritten.
so I guess this is something unexpected?

Thanks,
Aoi

2016-10-12 20:55 GMT-07:00 Vladimir Yudovin :
> If you use form
> INSERT ... ['A']
> INSERT ... ['B']
>
> latest INSERT will overwrite first, because this insert the whole list. It's
> better to use UPDATE like:
> UPDATE ... SET events = events + ['A']
> UPDATE ... SET events = events + ['B']
> These operations add new elements to the end of existing list.
>
>
> From here https://docs.datastax.com/en/cql/3.0/cql/cql_using/use_list_t.html
> :
>
> These update operations are implemented internally without any
> read-before-write. Appending and prepending a new element to the list writes
> only the new element.
>
>
> Best regards, Vladimir Yudovin,
> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
> Launch your cluster in minutes.
>
>
>  On Wed, 12 Oct 2016 17:39:46 -0400Aoi Kadoya 
> wrote 
>
> Hi,
>
> When inserting different data into a list type column from different
> clients at the same time, is data supposed to be combined into one
> list?
>
> For example, if these 2 queries were requested from clients at the
> same timing, how events list should look like after?
>
> INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
> (2015, 06, ['A']);
> INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
> (2015, 06, ['B']);
>
> In my understanding, each operation should be treated as atomic, which
> makes me think that even if client throw the queries at the same time,
> cassandra would take them separately and the last insert would update
> the events list. (= data should be either ['A'] or ['B'])
>
> In my environment, I found that some data was saved as like ['A',B']
> in the case like above.
> Is this expected behavior of list data type?
>
> I am still new to cassandra and trying to make myself understood how
> this happened.
> Appreciate if you could help me with figuring this out!
>
> Thanks,
> Aoi
>
>


Re: Inserting list data

2016-10-12 Thread Vladimir Yudovin
If you use form 

INSERT ... ['A']

INSERT ... ['B']



latest INSERT will overwrite first, because this insert the whole list. It's 
better to use UPDATE like:

UPDATE ... SET events = events + ['A']

UPDATE ... SET events = events + ['B']

These operations add new elements to the end of existing list.





>From here https://docs.datastax.com/en/cql/3.0/cql/cql_using/use_list_t.html :

These update operations are implemented internally without any 
read-before-write. Appending and prepending a new element to the list writes 
only the new element.




Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Wed, 12 Oct 2016 17:39:46 -0400Aoi Kadoya cadyan@gmail.com 
wrote 




Hi, 

 

When inserting different data into a list type column from different 

clients at the same time, is data supposed to be combined into one 

list? 

 

For example, if these 2 queries were requested from clients at the 

same timing, how events list should look like after? 

 

INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES 

(2015, 06, ['A']); 

INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES 

(2015, 06, ['B']); 

 

In my understanding, each operation should be treated as atomic, which 

makes me think that even if client throw the queries at the same time, 

cassandra would take them separately and the last insert would update 

the events list. (= data should be either ['A'] or ['B']) 

 

In my environment, I found that some data was saved as like ['A',B'] 

in the case like above. 

Is this expected behavior of list data type? 

 

I am still new to cassandra and trying to make myself understood how 

this happened. 

Appreciate if you could help me with figuring this out! 

 

Thanks, 

Aoi 








Cassandra Triggers - Cassandra internally creating trigger.jar files in /tmp/lib/ directory

2016-10-12 Thread sudheer k
Hi All,

I faced issue with triggers today. Below is the clear description of the
issue:

1) When we planned to use triggers, we placed the triggers.jar file in
/conf/triggers folder in Cassandra, restarted the Cassandra service and
created a trigger in CQLSH. Everything was working good as expected.

2) Recently we started getting the below error in Cassandra logs saying
trigger.jar is missing in /tmp/lib directory folder.

ServerError: 

3) We thought that the jar is corrupted somehow and reloaded the triggers
again by placing the newly created jar and it worked fine.

4) We saw the similar error in other environments and saw that /tmp/lib
folder is not present in the server as our open stack servers have a
retention policy of 15 days and after that it deletes the /tmp/ files. As
the /tmp/ files are deleted, Cassandra is still pointing to the deleted
trigger.jar and not allowing the application to insert the records into the
table and throwing the errors.

5) So we came to know like whenever we reloadtriggers, it is creating a new
jar file in /tmp/lib/ directory as *cassandra-8754700968157790389.jar *(numbers
keep changing) and using this jar file for reference but not using the jar
that we placed in /conf/triggers folder.

6) I just need to know where can I change that default location /tmp/lib it
is using for creating the jar files?
Whether I can change that directory location to something else and use for
my Cassandra servers alone?
If there is any other solution, it is highly appreciated.

Note: Our management doesn't want to change the /tmp/ directory retention
policy as it is common for all our servers in all the environments.

--
Regards
Sudheer


Inserting list data

2016-10-12 Thread Aoi Kadoya
Hi,

When inserting different data into a list type column from different
clients at the same time, is data supposed to be combined into one
list?

For example,  if these 2 queries were requested from clients at the
same timing, how events list should look like after?

INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
(2015, 06, ['A']);
INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES
(2015, 06, ['B']);

In my understanding, each operation should be treated as atomic, which
makes me think that even if client throw the queries at the same time,
cassandra would take them separately and the last insert would update
the events list. (= data should be either ['A'] or ['B'])

In my environment, I found that some data was saved as like ['A',B']
in the case like above.
Is this expected behavior of list data type?

I am still new to cassandra and trying to make myself understood how
this happened.
Appreciate if you could help me with figuring this out!

Thanks,
Aoi


Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Harikrishnan Pillai
In my experience dc local repair node by node with
Pr and par options is best .full repair increased sstables
A lot and take days to compact it back or another
Easy option for repair is use a spark job ,read all data with
Consistency all and increase read repair chance to
100 % or use Netflix tickler

Sent from my iPhone

On Oct 12, 2016, at 11:44 AM, Anuj Wadehra 
> wrote:

Hi Leena,

First thing you should be concerned about is : Why the repair -pr operation 
doesnt complete ?
Second comes the question : Which repair option is best?


One probable cause of stuck repairs is : if the firewall between DCs is closing 
TCP connections and Cassandra is trying to use such connections, repairs will 
hang. Please refer 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
 . We faced that.

Also make sure you comply with basic bandwidth requirement between DCs. 
Recommended is 1000 Mb/s (1 gigabit) or greater.

Answers for specific questions:
1.As per my understanding, all replicas will not participate in dc local 
repairs and thus repair would be ineffective. You need to make sure that all 
replicas of a data in all dcs are in sync.

2. Every DC is not a ring. All DCs together form a token ring. So, I think yes 
you should run repair -pr on all nodes.

3. Yes. I dont have experience with incremental repairs. But you can run repair 
-pr on all nodes of all DCs.

Regarding Best approach of repair, you should see some repair presentations of 
Cassandra Summit 2016. All are online now.

I attended the summit and people using large clusters generally use sub range 
repairs to repair their clusters. But such large deployments are on older 
Cassandra versions and these deployments generally dont use vnodes. So people 
know easily which nodes hold which token range.



Thanks
Anuj



From: Leena Ghatpande >;
To: user@cassandra.apache.org 
>;
Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or 
repair with -pr
Sent: Wed, Oct 12, 2016 2:15:51 PM


Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.


We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either.


What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the datacenter?

2. Is running repair with -pr across all nodes required , if we perform the 
step 1 every night?

3. Is cross data center repair required and if so whats the best option?


Thanks


Leena





Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anuj Wadehra
Hi Leena,

First thing you should be concerned about is : Why the repair -pr operation 
doesnt complete ?
Second comes the question : Which repair option is best?


One probable cause of stuck repairs is : if the firewall between DCs is closing 
TCP connections and Cassandra is trying to use such connections, repairs will 
hang. Please refer 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
 . We faced that.

Also make sure you comply with basic bandwidth requirement between DCs. 
Recommended is 1000 Mb/s (1 gigabit) or greater.

Answers for specific questions:
1.As per my understanding, all replicas will not participate in dc local 
repairs and thus repair would be ineffective. You need to make sure that all 
replicas of a data in all dcs are in sync.

2. Every DC is not a ring. All DCs together form a token ring. So, I think yes 
you should run repair -pr on all nodes.

3. Yes. I dont have experience with incremental repairs. But you can run repair 
-pr on all nodes of all DCs.

Regarding Best approach of repair, you should see some repair presentations of 
Cassandra Summit 2016. All are online now.

I attended the summit and people using large clusters generally use sub range 
repairs to repair their clusters. But such large deployments are on older 
Cassandra versions and these deployments generally dont use vnodes. So people 
know easily which nodes hold which token range.



Thanks
Anuj


Re: unsubscrible

2016-10-12 Thread Matija Gobec
Steven,

Send an empty email to user-unsubscr...@cassandra.apache.org to unsubscribe.

See you

On Wed, Oct 12, 2016 at 8:15 PM, zhao yi  wrote:

>
>
> Best regards,
>
> Steven Zhao
>
>


Re: Unsubscribe

2016-10-12 Thread Matija Gobec
Omar,

Send an empty email to user-unsubscr...@cassandra.apache.org to unsubscribe.

See you

On Wed, Oct 12, 2016 at 1:33 PM, Omar Mambelli 
wrote:

> Unsubscribe
>
> --
> Inviato da iPhone
>


unsubscrible

2016-10-12 Thread zhao yi


Best regards,

Steven Zhao



Re: VNode Streaming Math

2016-10-12 Thread Vladimir Yudovin
Hi,



Calculation in general is very simple - each node keeps 
replication_factor/number_of_nodes part of data (number of replicas is spread 
over all nodes). I.e. If you have 100 nodes and replication factor is three 
each node keeps 0.03 of table size. 



But you can go even with more simple approach - each node keeps more or less 
the same amount of date. If you add new node to cluster it should get the same 
data volume that is stored on any other node. 





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Wed, 12 Oct 2016 12:13:25 -0400Anubhav Kale 
anubhav.k...@microsoft.com wrote 




Hello,

 

Suppose I have a 100 node ring, with num_tokens=32 (thus, 32 VNodes per 
physical machine). Assume this cluster has just one keyspace having one table. 
There are 10 SS Tables on each node, and size on disk is 10GB on each node. For 
simplicity, assume each SSTable is 1GB.

 

Now, a node went down, and I need to rebuild it. Can you please explain to me 
the math around how many SS Table files (and size) each node would stream to 
this node ? How does that math change as #VNodes change ?

 

I am looking for rough calculations to understand this process better. I am 
guessing I might have missed some variables in here (amount of data per token 
range ?), so please let me know that too !

 

Thanks much !









corrupted gossip generation

2016-10-12 Thread Yucheng Liu
*Env: * apache cassandra 2.1.8, 6-nodes

*Problem: *one node had kernel panic and crashed twice this morning.  seems
gossip generation was messed up.  all nodes are flooded with "received an
invalid gossip generation for peer" warning messages.  multiple rolling
restarts only fixed "nodetool status".  the warning messages are still
happening. and gossipinfo is showing "shutdown" for some nodes.

*Question: *Does anyone know how to get rid of this warning messages?
take the whole cluster down is not desired as this is production.

*$ nodetool gossipinfo|grep STATUS*
  STATUS:NORMAL,3074457345618258600
  STATUS:NORMAL,-3074457345618258604
  STATUS:NORMAL,6148914691236517202
  STATUS:shutdown,true
  STATUS:shutdown,true
  STATUS:NORMAL,-9223372036854775808

*system.log:*

WARN  [GossipStage:1] 2016-10-12 09:58:02,913 Gossiper.java:1078 - received
an invalid gossip generation for peer /10.150.12.118; local generation =
144263, received generation = 1476286723

cqlsh> select gossip_generation from system.local;

*1476286662   (I don't see where 144263 is from...)*


Re: [Marketing Mail] Re: [Marketing Mail] Re: sstableloader question

2016-10-12 Thread Osman YOZGATLIOGLU
Hello,

It's about 2500 sstables worth 25TB of data.
-t parameter doesn't change -t 1000 and -t 1
Most probably I face some limitation at target cluster.
I'm preparing to split sstables and run up to ten parallel sstableloader 
sessions.

Regards,
Osman

On 11-10-2016 21:46, Rajath Subramanyam wrote:
How many sstables are you trying to load ? Running sstableloaders in parallel 
will help. Did you try setting the "-t" parameter and see if you are getting 
the expected throughput ?

- Rajath


Rajath Subramanyam


On Mon, Oct 10, 2016 at 2:02 PM, Osman YOZGATLIOGLU 
> wrote:
Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t 
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network 
traffic.
At total this can be much more. Source and target servers has plenty of unused 
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can 
restart the sstableloader job itself. Compaction will eventually take care of 
the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson 
>>
 wrote:
It'll start over from the beginning.


On Sunday, October 9, 2016, Osman YOZGATLIOGLU 
>>
 wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman


This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.


--

Adam Hutson
Data Architect | DataScale
+1 (417) 
224-5212
a...@datascale.io>




This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.




This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.


RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
Agree.

However, if we go from a world where repairs don’t run (or run very unreliably 
so C* can’t mark the SSTables as repaired anyways) to a world where repairs run 
more reliably (Spark / Tickler approach) – the impact on tombstone removal 
doesn’t become any worse (because SS Tables aren’t marked either ways).

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, October 12, 2016 9:25 AM
To: user@cassandra.apache.org
Subject: Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair 
or repair with -pr

Note that the tickle approach doesn’t mark sstables as repaired (it’s a simpler 
version of mutation based repair in a sense), so Cassandra has no way to prove 
that the data has been repaired.

With tickets like https://issues.apache.org/jira/browse/CASSANDRA-6434, this 
has implications on tombstone removal.


From: Anubhav Kale 
>
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, October 12, 2016 at 9:17 AM
To: "user@cassandra.apache.org" 
>
Subject: RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair 
or repair with -pr

The default repair process doesn’t usually work at scale, unfortunately.

Depending on your data size, you have the following options.


Netflix Tickler: 
https://github.com/ckalantzis/cassTickler
 (Read at CL.ALL via CQL continuously :: Python)

Spotify Reaper: 
https://github.com/spotify/cassandra-reaper
 (Subrange repair, provides a REST endpoint and calls APIs through JMX :: Java)

List subranges: 
https://github.com/pauloricardomg/cassandra-list-subranges
 (Tool to get subranges for a given node. :: Java)

Subrange Repair: 
https://github.com/BrianGallew/cassandra_range_repair
 (Tool to subrange repair :: Python)

Mutation Based Repair (Not ready yet): 
https://issues.apache.org/jira/browse/CASSANDRA-8911
 (C* is thinking of doing this - hot off the press)

If you have Spark in your system, you could use that to do what Netflix Tickler 
does. We’re experimenting with it and seems to be the best fit for our datasets 
over all the other options.

From: Leena Ghatpande [mailto:lghatpa...@hotmail.com]
Sent: Wednesday, October 12, 2016 7:16 AM
To: user@cassandra.apache.org
Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or 
repair with -pr


Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.



We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either.



What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the 

Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Jeff Jirsa
Note that the tickle approach doesn’t mark sstables as repaired (it’s a simpler 
version of mutation based repair in a sense), so Cassandra has no way to prove 
that the data has been repaired. 

 

With tickets like https://issues.apache.org/jira/browse/CASSANDRA-6434, this 
has implications on tombstone removal.

 

 

From: Anubhav Kale 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, October 12, 2016 at 9:17 AM
To: "user@cassandra.apache.org" 
Subject: RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair 
or repair with -pr

 

The default repair process doesn’t usually work at scale, unfortunately. 

 

Depending on your data size, you have the following options.

 

Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via 
CQL continuously :: Python)

Spotify Reaper: https://github.com/spotify/cassandra-reaper (Subrange repair, 
provides a REST endpoint and calls APIs through JMX :: Java)

List subranges: https://github.com/pauloricardomg/cassandra-list-subranges 
(Tool to get subranges for a given node. :: Java)

Subrange Repair: https://github.com/BrianGallew/cassandra_range_repair (Tool to 
subrange repair :: Python)

Mutation Based Repair (Not ready yet): 
https://issues.apache.org/jira/browse/CASSANDRA-8911 (C* is thinking of doing 
this - hot off the press)

 

If you have Spark in your system, you could use that to do what Netflix Tickler 
does. We’re experimenting with it and seems to be the best fit for our datasets 
over all the other options.

 

From: Leena Ghatpande [mailto:lghatpa...@hotmail.com] 
Sent: Wednesday, October 12, 2016 7:16 AM
To: user@cassandra.apache.org
Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or 
repair with -pr

 

Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.

 

We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either. 

 

What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the datacenter?

2. Is running repair with -pr across all nodes required , if we perform the 
step 1 every night?

3. Is cross data center repair required and if so whats the best option?

 

Thanks

 

Leena

 

 


CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Anubhav Kale
The default repair process doesn't usually work at scale, unfortunately.

Depending on your data size, you have the following options.


Netflix Tickler: https://github.com/ckalantzis/cassTickler (Read at CL.ALL via 
CQL continuously :: Python)

Spotify Reaper: https://github.com/spotify/cassandra-reaper (Subrange repair, 
provides a REST endpoint and calls APIs through JMX :: Java)

List subranges: https://github.com/pauloricardomg/cassandra-list-subranges 
(Tool to get subranges for a given node. :: Java)

Subrange Repair: 
https://github.com/BrianGallew/cassandra_range_repair
 (Tool to subrange repair :: Python)

Mutation Based Repair (Not ready yet): 
https://issues.apache.org/jira/browse/CASSANDRA-8911 (C* is thinking of doing 
this - hot off the press)

If you have Spark in your system, you could use that to do what Netflix Tickler 
does. We're experimenting with it and seems to be the best fit for our datasets 
over all the other options.

From: Leena Ghatpande [mailto:lghatpa...@hotmail.com]
Sent: Wednesday, October 12, 2016 7:16 AM
To: user@cassandra.apache.org
Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or 
repair with -pr


Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.



We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either.



What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the datacenter?

2. Is running repair with -pr across all nodes required , if we perform the 
step 1 every night?

3. Is cross data center repair required and if so whats the best option?



Thanks



Leena






VNode Streaming Math

2016-10-12 Thread Anubhav Kale
Hello,

Suppose I have a 100 node ring, with num_tokens=32 (thus, 32 VNodes per 
physical machine). Assume this cluster has just one keyspace having one table. 
There are 10 SS Tables on each node, and size on disk is 10GB on each node. For 
simplicity, assume each SSTable is 1GB.

Now, a node went down, and I need to rebuild it. Can you please explain to me 
the math around how many SS Table files (and size) each node would stream to 
this node ? How does that math change as #VNodes change ?

I am looking for rough calculations to understand this process better. I am 
guessing I might have missed some variables in here (amount of data per token 
range ?), so please let me know that too !

Thanks much !


Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Edward Capriolo
The "2 billion column limit" press clipping "puffery". This statement
seemingly became popular because highly traffic traffic-ed story, in which
a tech reporter embellished on a statement to make a splashy article.

The effect is something like this:
http://www.healthnewsreview.org/2012/08/iced-tea-kidney-stones-and-the-study-that-never-existed/

Iced tea does not cause kidney stones! Cassandra does not store rows with 2
billion columns! It is just not true.






On Wed, Oct 12, 2016 at 4:57 AM, Kant Kodali  wrote:

> Well 1) I have not sent it to postgresql mailing lists 2) I thought this
> is an open ended question as it can involve ideas from everywhere including
> the Cassandra java driver mailing lists so sorry If that bothered you for
> some reason.
>
> On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha 
> wrote:
>
>> Also, I'm not sure, but I don't think it's "cool" to write to multiple
>> lists in the same message. (based on postgresql mailing lists rules).
>> Example I'm not subscribed to those, and now the messages are separated.
>>
>> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha 
>> wrote:
>>
>>> There are some issues working on larger partitions.
>>> Hbase doesn't do what you say! You have also to be carefull on hbase not
>>> to create large rows! But since they are globally-sorted, you can easily
>>> sort between them and create small rows.
>>>
>>> In my opinion, cassandra people are wrong, in that they say "globally
>>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>>> most of the time! You have to be careful though (just like with random
>>> partition)
>>>
>>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
>>> way.
>>> The most "recent", means there's a timestamp in there ?
>>>
>>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:
>>>
 Hi All,

 I understand Cassandra can have a maximum of 2B rows per partition but
 in practice some people seem to suggest the magic number is 100K. why not
 create another partition/rowkey automatically (whenever we reach a safe
 limit that  we consider would be efficient)  with auto increment bigint  as
 a suffix appended to the new rowkey? so that the driver can return the new
 rowkey  indicating that there is a new partition and so on...Now I
 understand this would involve allowing partial row key searches which
 currently Cassandra wouldn't do (but I believe HBASE does) and thinking
 about token ranges and potentially many other things..

 My current problem is this

 I have a row key followed by bunch of columns (this is not time series
 data)
 and these columns can grow to any number so since I have 100K limit (or
 whatever the number is. say some limit) I want to break the partition into
 level/pages

 rowkey1, page1->col1, col2, col3..
 rowkey1, page2->col1, col2, col3..

 now say my Cassandra db is populated with data and say my application
 just got booted up and I want to most recent value of a certain partition
 but I don't know which page it belongs to since my application just got
 booted up? how do I solve this in the most efficient that is possible in
 Cassandra today? I understand I can create MV, other tables that can hold
 some auxiliary data such as number of pages per partition and so on..but
 that involves the maintenance cost of that other table which I cannot
 afford really because I have MV's, secondary indexes for other good
 reasons. so it would be great if someone can explain the best way possible
 as of today with Cassandra? By best way I mean is it possible with one
 request? If Yes, then how? If not, then what is the next best way to solve
 this?

 Thanks,
 kant

>>>
>>>
>>
>


Re: Gossip status: hibernate

2016-10-12 Thread Joel Knighton
1. A hibernating node is participating in gossip but intentionally hasn't
yet joined the ring. The two cases where a node would set a hibernating
status are when the node was started with "-Dcassandra.join_ring=False" and
has tokens or when the node was started to replace another node (using
"-Dcassandra.replace_address" or "-Dcassandra.replace_address_first_boot").

2. A rolling restart is probably your best bet. You may have more luck with
an assassinate in the case that you connect to a node that is not
continuously removing/adding the state. I suspect that this node will have
an alive status for this endpoint state. As usual, you should wield
assassinate with lots of caution.

This issue sounds most similar to CASSANDRA-10371. If you provide debugging
information similar to that requested on the above ticket as well as what
operation you were performing on the node (was it a failed attempt at
replacing? etc) on a JIRA ticket, someone might have a chance to look into
this further.

On Wed, Oct 12, 2016 at 9:48 AM, Kasper Petersen 
wrote:

> Hi,
>
> I've recently upgraded our Cassandra cluster from 2.1 to 3.9. By
> default(?) 3.9 creates a debug.log file containing a ton of lines (a new
> one every second) with:
>
> DEBUG [GossipTasks:1] 2016-10-12 14:43:38,761 Gossiper.java:337 -
>> Convicting /172.31.137.65 with status hibernate - alive false
>
>
> That node has not been around for a very long time now.
>
> It does not show up in nodetool status and nodetool gossipinfo returns
> the following output about that node:
>
> /172.31.137.65
>>   generation:1433571405
>>   heartbeat:232
>>   STATUS:3:hibernate,true
>>   LOAD:225:96445.0
>>   SCHEMA:53:e2d1a288-581c-3f35-b492-1b9d5a803631
>>   DC:9:us-east
>>   RACK:11:1b
>>   RELEASE_VERSION:7:2.1.5
>>   RPC_ADDRESS:6:172.31.137.65
>>   SEVERITY:231:0.2512562870979309
>>   NET_VERSION:4:8
>>   HOST_ID:5:7988d3c9-dec8-4b71-b5a9-0b962aad0680
>>   TOKENS:2:
>
>
> nodetool removenode 7988d3c9-dec8-4b71-b5a9-0b962aad0680 resulted in:
>
> error: Host ID not found.
>>
>
> Now my questions are:
>
>1. What does it mean for a node to be "hibernating"? How does it end
>up in that state?
>2. How do I get rid of it? Its not coming back.
>
>
>
> --
> Best regards,
> Kasper Middelboe Petersen
>
> *Lead Backend Developer*
>
> *SYBO Games ApS*
> Jorcks Passage 1A, 4th.
> 1162 Copenhagen K
>


Gossip status: hibernate

2016-10-12 Thread Kasper Petersen
Hi,

I've recently upgraded our Cassandra cluster from 2.1 to 3.9. By default(?)
3.9 creates a debug.log file containing a ton of lines (a new one every
second) with:

DEBUG [GossipTasks:1] 2016-10-12 14:43:38,761 Gossiper.java:337 -
> Convicting /172.31.137.65 with status hibernate - alive false


That node has not been around for a very long time now.

It does not show up in nodetool status and nodetool gossipinfo returns the
following output about that node:

/172.31.137.65
>   generation:1433571405
>   heartbeat:232
>   STATUS:3:hibernate,true
>   LOAD:225:96445.0
>   SCHEMA:53:e2d1a288-581c-3f35-b492-1b9d5a803631
>   DC:9:us-east
>   RACK:11:1b
>   RELEASE_VERSION:7:2.1.5
>   RPC_ADDRESS:6:172.31.137.65
>   SEVERITY:231:0.2512562870979309
>   NET_VERSION:4:8
>   HOST_ID:5:7988d3c9-dec8-4b71-b5a9-0b962aad0680
>   TOKENS:2:


nodetool removenode 7988d3c9-dec8-4b71-b5a9-0b962aad0680 resulted in:

error: Host ID not found.
>

Now my questions are:

   1. What does it mean for a node to be "hibernating"? How does it end up
   in that state?
   2. How do I get rid of it? Its not coming back.



-- 
Best regards,
Kasper Middelboe Petersen

*Lead Backend Developer*

*SYBO Games ApS*
Jorcks Passage 1A, 4th.
1162 Copenhagen K


Re: Run sstablescrub in parallel

2016-10-12 Thread Eric Evans
On Wed, Oct 12, 2016 at 2:38 AM, Oleg Krayushkin  wrote:
> Is there any way to run sstablescrub on one CF in parallel?

I don't think so, but you can use `nodetool scrub' which has concurrency.

If you need to do this "offline" you can use `nodetool
disable{thrift,binary}` to prevent client connections and `nodetool
disablegossip` to leave the ring.

Cheers,

-- 
Eric Evans
john.eric.ev...@gmail.com


Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

2016-10-12 Thread Leena Ghatpande
Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.


We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either.


What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the datacenter?

2. Is running repair with -pr across all nodes required , if we perform the 
step 1 every night?

3. Is cross data center repair required and if so whats the best option?


Thanks


Leena




Re: Multiple Network Interfaces in non-EC2

2016-10-12 Thread Anuj Wadehra
Hi Amir,

I would like to understand your requirement first. Why do you need multiface 
interface configuration mentioned at 
http://docs.datastax.com/en/cassandra/3.x/cassandra/configuration/configMultiNetworks.html
 with single DC setup?

As per my understanding, you could simply set listen address to private IP and 
dont set broadcast_address and listen_on_broadcast address properties at all. 
You could use your private IP everywhere because you dont have any other DC 
which would connect using public IP.

In multiple DCs, you need public IP for communicating with nodes in other DCs 
and thats where you need private IP for internal communication and public IP 
for across DC communication.

Let me know if using private IP solves your problem.

Also, if you have a specific use case for using multiple interface 
configuration, you could add a NAT rule to route your traffic on public IP to 
your private IP (route traffic on Cassandra port only). This could act as a 
workaround till the JIRA is fixed. Let me know if you see any issues with the 
workaround.


Thanks
Anuj

WARN [SharedPool-Worker-3] AbstractTracingAwareExecutorService.java

2016-10-12 Thread James Joseph
I am seeing following warn in system.log, as a temporary turn around i can
increase commitlog_size in cassandra.yaml to 64, but how can i trace it
down ??? which appilcation is trying to write large writes and to which
keyspace and table it is trying to write ??


WARN  [SharedPool-Worker-3] 2016-10-05 03:46:22,363
 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on
thread Thread[SharedPool-Worker-3,5,main]: {}
java.lang.IllegalArgumentException: Mutation of 19711728 bytes is too large
for the maxiumum size of 16777216
  at
org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:221)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:383)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:363)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at
org.apache.cassandra.db.MutationVerbHandler.doVerb(MutationVerbHandler.java:54)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at java.util.concurrent.Executors$RunnableAdapter.call(Unknown
Source) ~[na:1.8.0_92]
  at
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
~[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
[cassandra-all-2.1.8.689.jar:2.1.8.689]
  at java.lang.Thread.run(Unknown Source) [na:1.8.0_92]


Thanks
James.


Re: JConsole Support for SSL in C* 2.0

2016-10-12 Thread Vladimir Yudovin
Hi,



I didn't try, but I guess it's possible.



Look at conf/cassandra-env.sh in latest versions:



  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"

#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.keyStore=/path/to/keystore"

#  JVM_OPTS="$JVM_OPTS 
-Djavax.net.ssl.keyStorePassword=keystore-password"

#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.trustStore=/path/to/truststore"

#  JVM_OPTS="$JVM_OPTS 
-Djavax.net.ssl.trustStorePassword=truststore-password"

#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.need.client.auth=true"

#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.registry.ssl=true"

#  JVM_OPTS="$JVM_OPTS 
-Dcom.sun.management.jmxremote.ssl.enabled.protocols=enabled-protocols"

#  JVM_OPTS="$JVM_OPTS 
-Dcom.sun.management.jmxremote.ssl.enabled.cipher.suites=enabled-cipher-suites"




You see ssl and keystore options.



The same in 2.0.17, the only SSL option is:

JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"



Though this config file fo version 2.0 doesn't contain keystore options I think 
it's worth to try to add them in v2.0. enable SSL and check whether it works.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Wed, 12 Oct 2016 08:08:34 -0400Amit Singh F 
amit.f.si...@ericsson.com wrote 




Hi All,

 

I was looking through the documentation of Security in C* 2.0, I noticed that 
there is no such mention of Jconsole over SSL whereas in latest 3.x doc, I can 
spot that :

 

http://docs.datastax.com/en/cassandra_win/3.0/cassandra/configuration/secureJconsoleSSL.html

 

so what I can infer from this is that only in C* 3.x, we can secure Jconsole 
over SSL?

Also in C* 2.0 , SSL can only be used by clients except nodetool,jconsole ?

 

Please correct me if I am on wrong way .

 

Regards

Amit Singh

Datastax Certified Developer









JConsole Support for SSL in C* 2.0

2016-10-12 Thread Amit Singh F
Hi All,

I was looking through the documentation of Security in C* 2.0, I noticed that 
there is no such mention of Jconsole over SSL whereas in latest 3.x doc, I can 
spot that :

http://docs.datastax.com/en/cassandra_win/3.0/cassandra/configuration/secureJconsoleSSL.html

so what I can infer from this is that only in C* 3.x, we can secure Jconsole 
over SSL?
Also in C* 2.0 , SSL can only be used by clients except nodetool,jconsole ?

Please correct me if I am on wrong way .

Regards
Amit Singh
Datastax Certified Developer


Unsubscribe

2016-10-12 Thread Omar Mambelli
Unsubscribe

-- 
Inviato da iPhone


Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Kant Kodali
Well 1) I have not sent it to postgresql mailing lists 2) I thought this is
an open ended question as it can involve ideas from everywhere including
the Cassandra java driver mailing lists so sorry If that bothered you for
some reason.

On Wed, Oct 12, 2016 at 1:41 AM, Dorian Hoxha 
wrote:

> Also, I'm not sure, but I don't think it's "cool" to write to multiple
> lists in the same message. (based on postgresql mailing lists rules).
> Example I'm not subscribed to those, and now the messages are separated.
>
> On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha 
> wrote:
>
>> There are some issues working on larger partitions.
>> Hbase doesn't do what you say! You have also to be carefull on hbase not
>> to create large rows! But since they are globally-sorted, you can easily
>> sort between them and create small rows.
>>
>> In my opinion, cassandra people are wrong, in that they say "globally
>> sorted is the devil!" while all fb/google/etc actually use globally-sorted
>> most of the time! You have to be careful though (just like with random
>> partition)
>>
>> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
>> way.
>> The most "recent", means there's a timestamp in there ?
>>
>> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:
>>
>>> Hi All,
>>>
>>> I understand Cassandra can have a maximum of 2B rows per partition but
>>> in practice some people seem to suggest the magic number is 100K. why not
>>> create another partition/rowkey automatically (whenever we reach a safe
>>> limit that  we consider would be efficient)  with auto increment bigint  as
>>> a suffix appended to the new rowkey? so that the driver can return the new
>>> rowkey  indicating that there is a new partition and so on...Now I
>>> understand this would involve allowing partial row key searches which
>>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>>> about token ranges and potentially many other things..
>>>
>>> My current problem is this
>>>
>>> I have a row key followed by bunch of columns (this is not time series
>>> data)
>>> and these columns can grow to any number so since I have 100K limit (or
>>> whatever the number is. say some limit) I want to break the partition into
>>> level/pages
>>>
>>> rowkey1, page1->col1, col2, col3..
>>> rowkey1, page2->col1, col2, col3..
>>>
>>> now say my Cassandra db is populated with data and say my application
>>> just got booted up and I want to most recent value of a certain partition
>>> but I don't know which page it belongs to since my application just got
>>> booted up? how do I solve this in the most efficient that is possible in
>>> Cassandra today? I understand I can create MV, other tables that can hold
>>> some auxiliary data such as number of pages per partition and so on..but
>>> that involves the maintenance cost of that other table which I cannot
>>> afford really because I have MV's, secondary indexes for other good
>>> reasons. so it would be great if someone can explain the best way possible
>>> as of today with Cassandra? By best way I mean is it possible with one
>>> request? If Yes, then how? If not, then what is the next best way to solve
>>> this?
>>>
>>> Thanks,
>>> kant
>>>
>>
>>
>


Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Kant Kodali
I did mention this in my previous email.  This is not time series data. I
understand how to structure it if it is a time series data/

What do you mean globally sorted? you mean keeping every partition sorted
(since I come from Casandra world)?

rowkey 1 -> blob
page -> int or long or bigint
col1  -> text
col2 -> blob
co3 -> bigint

On Wed, Oct 12, 2016 at 1:37 AM, Dorian Hoxha 
wrote:

> There are some issues working on larger partitions.
> Hbase doesn't do what you say! You have also to be carefull on hbase not
> to create large rows! But since they are globally-sorted, you can easily
> sort between them and create small rows.
>
> In my opinion, cassandra people are wrong, in that they say "globally
> sorted is the devil!" while all fb/google/etc actually use globally-sorted
> most of the time! You have to be careful though (just like with random
> partition)
>
> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
> way.
> The most "recent", means there's a timestamp in there ?
>
> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:
>
>> Hi All,
>>
>> I understand Cassandra can have a maximum of 2B rows per partition but in
>> practice some people seem to suggest the magic number is 100K. why not
>> create another partition/rowkey automatically (whenever we reach a safe
>> limit that  we consider would be efficient)  with auto increment bigint  as
>> a suffix appended to the new rowkey? so that the driver can return the new
>> rowkey  indicating that there is a new partition and so on...Now I
>> understand this would involve allowing partial row key searches which
>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>> about token ranges and potentially many other things..
>>
>> My current problem is this
>>
>> I have a row key followed by bunch of columns (this is not time series
>> data)
>> and these columns can grow to any number so since I have 100K limit (or
>> whatever the number is. say some limit) I want to break the partition into
>> level/pages
>>
>> rowkey1, page1->col1, col2, col3..
>> rowkey1, page2->col1, col2, col3..
>>
>> now say my Cassandra db is populated with data and say my application
>> just got booted up and I want to most recent value of a certain partition
>> but I don't know which page it belongs to since my application just got
>> booted up? how do I solve this in the most efficient that is possible in
>> Cassandra today? I understand I can create MV, other tables that can hold
>> some auxiliary data such as number of pages per partition and so on..but
>> that involves the maintenance cost of that other table which I cannot
>> afford really because I have MV's, secondary indexes for other good
>> reasons. so it would be great if someone can explain the best way possible
>> as of today with Cassandra? By best way I mean is it possible with one
>> request? If Yes, then how? If not, then what is the next best way to solve
>> this?
>>
>> Thanks,
>> kant
>>
>
>


Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Dorian Hoxha
Also, I'm not sure, but I don't think it's "cool" to write to multiple
lists in the same message. (based on postgresql mailing lists rules).
Example I'm not subscribed to those, and now the messages are separated.

On Wed, Oct 12, 2016 at 10:37 AM, Dorian Hoxha 
wrote:

> There are some issues working on larger partitions.
> Hbase doesn't do what you say! You have also to be carefull on hbase not
> to create large rows! But since they are globally-sorted, you can easily
> sort between them and create small rows.
>
> In my opinion, cassandra people are wrong, in that they say "globally
> sorted is the devil!" while all fb/google/etc actually use globally-sorted
> most of the time! You have to be careful though (just like with random
> partition)
>
> Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
> way.
> The most "recent", means there's a timestamp in there ?
>
> On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:
>
>> Hi All,
>>
>> I understand Cassandra can have a maximum of 2B rows per partition but in
>> practice some people seem to suggest the magic number is 100K. why not
>> create another partition/rowkey automatically (whenever we reach a safe
>> limit that  we consider would be efficient)  with auto increment bigint  as
>> a suffix appended to the new rowkey? so that the driver can return the new
>> rowkey  indicating that there is a new partition and so on...Now I
>> understand this would involve allowing partial row key searches which
>> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
>> about token ranges and potentially many other things..
>>
>> My current problem is this
>>
>> I have a row key followed by bunch of columns (this is not time series
>> data)
>> and these columns can grow to any number so since I have 100K limit (or
>> whatever the number is. say some limit) I want to break the partition into
>> level/pages
>>
>> rowkey1, page1->col1, col2, col3..
>> rowkey1, page2->col1, col2, col3..
>>
>> now say my Cassandra db is populated with data and say my application
>> just got booted up and I want to most recent value of a certain partition
>> but I don't know which page it belongs to since my application just got
>> booted up? how do I solve this in the most efficient that is possible in
>> Cassandra today? I understand I can create MV, other tables that can hold
>> some auxiliary data such as number of pages per partition and so on..but
>> that involves the maintenance cost of that other table which I cannot
>> afford really because I have MV's, secondary indexes for other good
>> reasons. so it would be great if someone can explain the best way possible
>> as of today with Cassandra? By best way I mean is it possible with one
>> request? If Yes, then how? If not, then what is the next best way to solve
>> this?
>>
>> Thanks,
>> kant
>>
>
>


Re: Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Dorian Hoxha
There are some issues working on larger partitions.
Hbase doesn't do what you say! You have also to be carefull on hbase not to
create large rows! But since they are globally-sorted, you can easily sort
between them and create small rows.

In my opinion, cassandra people are wrong, in that they say "globally
sorted is the devil!" while all fb/google/etc actually use globally-sorted
most of the time! You have to be careful though (just like with random
partition)

Can you tell what rowkey1, page1, col(x) actually are ? Maybe there is a
way.
The most "recent", means there's a timestamp in there ?

On Wed, Oct 12, 2016 at 9:58 AM, Kant Kodali  wrote:

> Hi All,
>
> I understand Cassandra can have a maximum of 2B rows per partition but in
> practice some people seem to suggest the magic number is 100K. why not
> create another partition/rowkey automatically (whenever we reach a safe
> limit that  we consider would be efficient)  with auto increment bigint  as
> a suffix appended to the new rowkey? so that the driver can return the new
> rowkey  indicating that there is a new partition and so on...Now I
> understand this would involve allowing partial row key searches which
> currently Cassandra wouldn't do (but I believe HBASE does) and thinking
> about token ranges and potentially many other things..
>
> My current problem is this
>
> I have a row key followed by bunch of columns (this is not time series
> data)
> and these columns can grow to any number so since I have 100K limit (or
> whatever the number is. say some limit) I want to break the partition into
> level/pages
>
> rowkey1, page1->col1, col2, col3..
> rowkey1, page2->col1, col2, col3..
>
> now say my Cassandra db is populated with data and say my application just
> got booted up and I want to most recent value of a certain partition but I
> don't know which page it belongs to since my application just got booted
> up? how do I solve this in the most efficient that is possible in Cassandra
> today? I understand I can create MV, other tables that can hold some
> auxiliary data such as number of pages per partition and so on..but that
> involves the maintenance cost of that other table which I cannot afford
> really because I have MV's, secondary indexes for other good reasons. so it
> would be great if someone can explain the best way possible as of today
> with Cassandra? By best way I mean is it possible with one request? If Yes,
> then how? If not, then what is the next best way to solve this?
>
> Thanks,
> kant
>


Why does Cassandra need to have 2B column limit? why can't we have unlimited ?

2016-10-12 Thread Kant Kodali
Hi All,

I understand Cassandra can have a maximum of 2B rows per partition but in
practice some people seem to suggest the magic number is 100K. why not
create another partition/rowkey automatically (whenever we reach a safe
limit that  we consider would be efficient)  with auto increment bigint  as
a suffix appended to the new rowkey? so that the driver can return the new
rowkey  indicating that there is a new partition and so on...Now I
understand this would involve allowing partial row key searches which
currently Cassandra wouldn't do (but I believe HBASE does) and thinking
about token ranges and potentially many other things..

My current problem is this

I have a row key followed by bunch of columns (this is not time series data)
and these columns can grow to any number so since I have 100K limit (or
whatever the number is. say some limit) I want to break the partition into
level/pages

rowkey1, page1->col1, col2, col3..
rowkey1, page2->col1, col2, col3..

now say my Cassandra db is populated with data and say my application just
got booted up and I want to most recent value of a certain partition but I
don't know which page it belongs to since my application just got booted
up? how do I solve this in the most efficient that is possible in Cassandra
today? I understand I can create MV, other tables that can hold some
auxiliary data such as number of pages per partition and so on..but that
involves the maintenance cost of that other table which I cannot afford
really because I have MV's, secondary indexes for other good reasons. so it
would be great if someone can explain the best way possible as of today
with Cassandra? By best way I mean is it possible with one request? If Yes,
then how? If not, then what is the next best way to solve this?

Thanks,
kant


Run sstablescrub in parallel

2016-10-12 Thread Oleg Krayushkin
Hello,

Is there any way to run sstablescrub on one CF in parallel?

Thanks!

--

Oleg Krayushkin