from:"Vladimir Yudovin"

Re: Limit on having number of nodes in C* cluster

2017-08-22 Thread Vladimir Yudovin

Probably decreasing tokens number can help to mange big cluster?

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting

On Mon, 21 Aug 2017 19:38:37 -0400 Eduard Tudenhoefner
<eduard.tudenhoef...@datastax.com> wrote

We've been doing successful testing with multi-DC setups and 500 nodes per DC.
However, I agree with Jon here. Certain things are easier/faster with e.g.
5x100 node clusters than 1x500 node cluster.

Cheers

On Mon, Aug 21, 2017 at 10:16 AM, Jon Haddad <jonathan.had...@gmail.com>
wrote:

As far as I know, those 75K nodes are not in a single cluster. If memory
serves correctly (and this article seems to indicate that it does
http://www.techrepublic.com/article/apples-secret-nosql-sauce-includes-a-hefty-dose-of-cassandra/),
you’ll see clusters of 1,000 nodes.

Things start to get a little hairy once you go above a couple hundred nodes. I
would rather run 5 100 node clusters than a single 500 node cluster. In
theory, once you’ve built out the tooling to manage 2 clusters you should be
able to apply it to manage 20 (reality always gets in the way though…)

Jon

On Aug 21, 2017, at 9:15 AM, techpyaasa . <techpya...@gmail.com> wrote:

Thanks lot for reply :)

On Aug 21, 2017 6:44 PM, "Vladimir Yudovin" <vla...@winguzone.com> wrote:

Actually there are clusters of thousandths nodes: Some of the largest
production deployments include Apple's, with over 75,000 nodes storing over 10
PB of data

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting

On Mon, 21 Aug 2017 08:35:37 -0400 techpyaasa .
<techpya...@gmail.com> wrote

Is there any limit on having number of nodes in c* cluster.

Right now we have c*-2.1.17 cluster with 3 DCs each DC with 3 groups & each
group has 21 nodes.

We wanted to increase the cluster capacity by adding 6 nodes per group as many
of nodes disk usage crossed 65%.

So just wanted to clarify is there any limit/drawback having huge cluster/too
many nodes in a c* cluster

Thanks in advance

TechPyaasa

Re: Limit on having number of nodes in C* cluster

2017-08-21 Thread Vladimir Yudovin

Actually there are clusters of thousandths nodes: Some of the largest 
production deployments include Apple's, with over 75,000 nodes storing over 10 
PB of data



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 21 Aug 2017 08:35:37 -0400 techpyaasa . 
<techpya...@gmail.com> wrote 




Hi 



Is there any limit on having number of nodes in c* cluster.

Right now we have c*-2.1.17 cluster with 3 DCs each DC with 3 groups & each 
group has 21 nodes.



We wanted to increase the cluster capacity by adding 6 nodes per group as many 
of nodes disk usage crossed 65%.



So just wanted to clarify is there any limit/drawback having huge cluster/too 
many nodes in a c* cluster



Thanks in advance

TechPyaasa

SASI index returns no results

2017-08-15 Thread Vladimir Yudovin

Hi,



I recently encountered with strange issue.

Assuming there is table



id PRIMARY KEY

indexed text

column text



CREATE custom index on table(indexed) using '...SASIIndex'



I inserted row like id=0, indexed='string1', column='just string'



When I did SELECT * FROM table WHERE id=0 AND indexed='string1' no rows is 
returned, while SELECT * FROM table WHERE id=0 AND column =' just string' ALLOW 
FILTERING returned the row.



In reality this table consist about 70 columns, 20M rows and probably total 30G 
on disk with RF=2 and three nodes. I guess this issue is somehow linked to 
memory usage, as I run some different heavy queries, then cluster become 
unresponsible to cqlsh and nodetool. I sow GC messages in gc.log, so I restated 
cluster and all returned to normal.



I would expect the whole query to fail with timeout or some error message, but 
not to silently return zero rows.

Any thoughts?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

Re: Help in c* Data modelling

2017-07-23 Thread Vladimir Yudovin

Hi,



unfortunately ORDER BY is supported for clustering columns only...

 

Winguzone - Cloud Cassandra Hosting






 On Sun, 23 Jul 2017 12:49:36 -0400 techpyaasa . 
 wrote 




Hi Varun,



Thanks a lot for your reply.



In this case if I want to update status(status can be updated for given 
account_id, pid) , I need to delete existing row in 2nd table & add new 
one...  :( :(



Its like hitting cassandra twice for 1 change.. :(



 





On Sun, Jul 23, 2017 at 8:42 PM, Varun Barala  
wrote:

Hi,


You can create pseudo index table.




IMO, structure can be:-





CREATE TABLE IF NOT EXISTS test.user ( account_id bigint, pid bigint, disp_name 
text, status int, PRIMARY KEY (account_id, pid) ) WITH CLUSTERING ORDER BY (pid 
ASC); CREATE TABLE IF NOT EXISTS test.user_index ( account_id bigint, pid 
bigint, disp_name text, status int, PRIMARY KEY ((account_id, status), 
disp_name) ) WITH CLUSTERING ORDER BY (disp_name ASC);

to support query :-  select * from site24x7.wm_current_status where uid=1 order 
by dispName asc;


You can use in condition on last partition key status in table test.user_index.


It depends on your use case and amount of data as well. It can be optimized 
more...


Thanks!!




On Sun, Jul 23, 2017 at 2:48 AM, techpyaasa .  
wrote:

Hi ,



We have a table like below :



CREATE TABLE ks.cf ( accountId bigint, pid bigint, dispName text, status int, 
PRIMARY KEY (accountId, pid) ) WITH CLUSTERING ORDER BY (pid ASC);




We would like to have following queries possible on the above table:



select * from site24x7.wm_current_status where uid=1 and mid=1;

select * from site24x7.wm_current_status where uid=1 order by dispName asc;

select * from site24x7.wm_current_status where uid=1 and status=0 order by 
dispName asc;




I know first query is possible by default , but I want the last 2 queries also 
to work.



So can some one please let me know how can I achieve the same in 
cassandra(c*-2.1.17). I'm ok with applying indexes etc,



Thanks
TechPyaasa

Re: secondary index use case

2017-07-20 Thread Vladimir Yudovin

Hi,



You didn't mention your C* version, but starting from 3.4 SASI indexes are 
available. You can try it with SPARSE option, as uuid corresponds to only one 
row.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 20 Jul 2017 05:21:31 -0400 Micha <mich...@fantasymail.de> 
wrote 




Hi, 

 

even after reading much about secondary index usage I'm not sure if I 

have the correct use case for it. 

 

My table will contain about 150'000'000 records (each about 2KB data). 

There are two uuids used to identify a row. One uuid is unique for each 

row, the other uuid is something like a groupid, which give mostly 20 

records when queried. 

 

So, if I define my primary key as (groupuuid, uuid) then: 

"select * ... where groupuuid = X" gives me 0 - 20 rows 

 

"select * ... where groupuuid = X and uuid = Y" gives me 0 | 1 row 

 

now, sometimes I want to query only with uuid: 

"select * ... where uuid = X" to get exactly one row (without using 

groupid) 

 

Is this a good use case for a secondary index on uuid? 

 

 

Thanks for helping, 

 Michael 

 

 

 

 

 

 

- 

To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 

For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Stable cassandra version with frozen UDTs

2017-06-26 Thread Vladimir Yudovin

So according to what Michael wrote 3.11.x will be stable branch now.

You can stick to it or revert to 3.0.x (latest is 3.0.14), if you don't need 
3.x features.

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 26 Jun 2017 13:51:32 -0400 Ali Akhtar <ali.rac...@gmail.com> 
wrote 

So, which cassandra version is the most stable / production ready currently? 
I'm fine with reverting to 2.x if needed.

On Mon, Jun 26, 2017 at 8:37 PM, Michael Shuler <mich...@pbandjelly.org> 
wrote:

On 06/26/2017 10:17 AM, Vladimir Yudovin wrote:
 >
 > In terms of tick-tock releases odd releases (e.g. 3.11) are bug fixes.

 The last tick-tock feature release was 3.10. Tick-tock releases are no
 more. The project has moved back to development on stable release series.

 The bug-fixes on top of the features developed during the tick-tock

 cycle will live on as the ongoing 3.11 release series. 3.11.0 was *just*

 released and announced here on the user@ list. The cassandra-3.11 branch

 will get ongoing 3.11.x releases as a stable production series.

 Y'all know where to post bugs[0] while testing out and deploying this

 new Apache Cassandra release series. :)

 --
 Warm regards,
 Michael

 [0] https://issues.apache.org/jira/browse/CASSANDRA

 -
 To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
 For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Stable cassandra version with frozen UDTs

2017-06-26 Thread Vladimir Yudovin



Is that version going to be stable for production?

Well, 1M$ question )). There is a lot of discussion on this topic.

In terms of tick-tock releases odd releases (e.g. 3.11) are bug fixes. But I 
guess latest 3.0.x (3.0.14 now) should more stable.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 26 Jun 2017 07:09:08 -0400 Ali Akhtar <ali.rac...@gmail.com> 
wrote 




Is that version going to be stable for production? I'm looking for something 
that I can just install, add nodes when needed, but otherwise not have to worry 
or think about, even if it means downgrading to a lower version and rewriting 
some of the code involving UDTs.



On Mon, Jun 26, 2017 at 3:51 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:








Latest comment in this JIRA is "I've committed to 3.11". 3.11 change log also 
contains "* Fix validation of non-frozen UDT cells (CASSANDRA-12916)" (merged 
from 3.10)

So try version 3.11



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 22 Jun 2017 10:17:15 -0400 Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I'm running cassandra 3.9, but it doesn't seem stable. E.g, one of my nodes 
recently crashed with the message 



'org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3976606415170694683dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read 0th field board_id'



It looks like this particular bug is fixed in 3.10: 
https://issues.apache.org/jira/browse/CASSANDRA-12916



Is there a stable version with support for frozen UDTs that I should use? If 
not, should I change my UDT code to use text, and revert to a 2.x version which 
is stable? I'm still in development, so it will be a pain, but I can revert to 
non frozen UDTs.

Re: Question: Behavior of inserting a list multiple times with same timestamp

2017-06-26 Thread Vladimir Yudovin

Hi,

INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;

SELECT * FROM test.test ;



cqlsh> SELECT * FROM test.test ;



k | v

---+-

1 | [3]

// = WHY ?? =




TIMESTAMP is measured as epoch_in_microseconds, so USING TIMESTAMP 1000 is like 
insert some-when on 1970-01-01, thus it has no effect after insert without 
timestamp, that got current time.   



Regarding your original question: it's really look strange, may be you should 
file JIRA about this.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Tue, 20 Jun 2017 10:19:08 -0400 Thakrar, Jayesh 
<jthak...@conversantmedia.com> wrote 




Ok, tried the test again, w/o the TIMESTAMP, and got the expected behavior.

Apparently, the INSERT does replace the entire list if no timestamp is 
specified (as expected).

However, if the TIMESTAMP is specified, then it does (what appears to be) an 
append.

But found even more weird issue - see later below!

 

===

 

cqlsh> CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);   
   

cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ; 

cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

cqlsh> INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

cqlsh> SELECT * FROM test.test ;

 

 k | v

---+-

 1 | [1]

 

(1 rows)

cqlsh> 

 

===

 

 

DROP TABLE IF EXISTS test.test ;

CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);

INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;

SELECT * FROM test.test ;

 

cqlsh> SELECT * FROM test.test ;

 

 k | v

---+-

 1 | [3]

 

// = EXPECTED RESULT =

 

INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;

SELECT * FROM test.test ;

 

cqlsh> SELECT * FROM test.test ;

 

 k | v

---+-

 1 | [3]

 

// = WHY ?? =

 

 

DROP TABLE IF EXISTS test.test ;

CREATE TABLE test.test (k int PRIMARY KEY , v list<int>);

INSERT INTO test.test (k , v ) VALUES ( 1 ,[10]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[20]) USING TIMESTAMP 1000;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[30]) USING TIMESTAMP 1000;

SELECT * FROM test.test ;

 

cqlsh> SELECT * FROM test.test ;

 

 k | v

---+--

 1 | [10, 20, 30]

 

// = WHY ?? Probably the server-timestamp-uuid playing a role?! =

 

INSERT INTO test.test (k , v ) VALUES ( 1 ,[1]) ;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[2]) ;

INSERT INTO test.test (k , v ) VALUES ( 1 ,[3]) ;

SELECT * FROM test.test ;

 

cqlsh> SELECT * FROM test.test ;

 

 k | v

---+-

 1 | [3]

 

// = EXPECTED RESULT =

 

 

 

 

 

From:  Subroto Barua <sbarua...@yahoo.com>
 Date: Monday, June 19, 2017 at 11:09 PM
 To: "Thakrar, Jayesh" <jthak...@conversantmedia.com>, Subroto Barua 
<sbarua...@yahoo.com.INVALID>, Zhongxiang Zheng 
<zzh...@yahoo-corp.jp>
 Cc: "user@cassandra.apache.org" <user@cassandra.apache.org>
 Subject: Re: Question: Behavior of inserting a list multiple times with same 
timestamp

 


here is the response from Datastax support/dev:



 


 


In a list each item is its own cell. Append adds a new cell sorted at basically 
"current server time uuid" prepend adds at "-current server time uuid". User 
supplied time stamps are used for the cell timestamp when specified.

Inserting the entire list deletes and then inserts

Reading reads out the entire list

Positional access reads the entire list and gets/puts at the spot specified



 


Basically, lists are not idempotent


 


 


On Monday, June 19, 2017, 6:55:40 AM PDT, Thakrar, Jayesh 
<jthak...@conversantmedia.com> wrote:


 


 


Subroto,
 
 Cassandra docs say otherwise.
 
 Writing list data is accomplished with a JSON-style syntax. To write a record 
using INSERT, specify the entire list as a JSON array. Note: An INSERT will 
always replace the entire list.
 
 Maybe you can elaborate/shed some more light?
 
 Thanks,
 Jayesh
 
 
 Lists
 
 A list is a typed collection of non-unique values where elements are ordered 
by there position in the list. To create a column of type list, use the list 
keyword suffixed with the value type enclosed in angle brackets. For example:
 
 CREATE TABLE plays (
 id text PRIMARY KEY,
 game text,
 players int,
 scores list<int>
 )
 Do note that as

Re: Stable cassandra version with frozen UDTs

2017-06-26 Thread Vladimir Yudovin

Latest comment in this JIRA is "I've committed to 3.11". 3.11 change log also 
contains "* Fix validation of non-frozen UDT cells (CASSANDRA-12916)" (merged 
from 3.10)

So try version 3.11



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 22 Jun 2017 10:17:15 -0400 Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I'm running cassandra 3.9, but it doesn't seem stable. E.g, one of my nodes 
recently crashed with the message 



'org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: 
Unexpected error deserializing mutation; saved to 
/tmp/mutation3976606415170694683dat.  This may be caused by replaying a 
mutation against a table with the same name but incompatible schema.  Exception 
follows: org.apache.cassandra.serializers.MarshalException: Not enough bytes to 
read 0th field board_id'



It looks like this particular bug is fixed in 3.10: 
https://issues.apache.org/jira/browse/CASSANDRA-12916



Is there a stable version with support for frozen UDTs that I should use? If 
not, should I change my UDT code to use text, and revert to a 2.x version which 
is stable? I'm still in development, so it will be a pain, but I can revert to 
non frozen UDTs.

Re: Secondary Index

2017-06-25 Thread Vladimir Yudovin

Hi,



beyond scope of your question (as you use 2.1.17) but starting from v3.4 SASI 
is avaialble, doc is about DSE, but is applicable for free version as well. 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 19 Jun 2017 14:00:40 -0400 techpyaasa . 
<techpya...@gmail.com> wrote 




Hi,



I want to create Index on already existing table which has more than 3 GB/node.

We are using c*-2.1.17 with 2 DCs , each DC with 3 groups and each group has 7 
nodes.(Total 42 nodes in cluster)



So is it ok to create Index on this table now or will it have any problem?

If its ok , how much time it would take for this process?





Thanks in advance,

TechPyaasa

Re: Count limit

2017-06-21 Thread Vladimir Yudovin

Hi,



>Some body told because the count return 1 row result

He is right



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Jun 2017 02:43:32 -0400 web master 
<socketman2...@gmail.com> wrote 




According to 
http://www.maigfrga.ntweb.co/counting-indexing-and-ordering-cassandra

SELECT COUNT(*) FROM product limit 5000;

must return no more than 5000 , but Why it don't works? and count whole number?

Some body told because the count return 1 row result and some body told that it 
is a bug in new version of cassandra?



How can I stop counting and limit it?

Re: Pagination

2017-06-21 Thread Vladimir Yudovin

Hi,

can this https://docs.datastax.com/en/developer/java-driver/2.1/manual/paging/ 
help you?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Jun 2017 02:44:17 -0400 web master 
<socketman2...@gmail.com> wrote 




I am migrating from MySql to Cassandra , In mysql I use OFFSET and LIMIT to 
paginate , the problem is that we have Android client that request next page 
and POST to server OFFSET and LIMIT so I don't know how can I migrate to 
Cassandra and keep backward compatibility 

Is there any technique for the problem?

Re: Write / read cost of *QUORUM

2017-06-18 Thread Vladimir Yudovin

Hi,



yes, you are write. Actually write with QUORUM will cause to coordinator to 
wait for reply from other nodes, but I guess it's negligible.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 18 Jun 2017 10:07:06 -0400 Jan Algermissen 
<algermissen1...@icloud.com> wrote 




Hi, 

 

my understanding is that 

 

- for writes using any of the quorum CLs will not put more overall load 

on the cluster because writes will be sent to all nodes responsible for 

a partition anyhow. So quorum only increases response time of the 

coordinator, not cluster load. 

 

Correct? 

 

- for reads all quorum CLs will yield more requests sent by the 

coordinator to other nodes and hence *QUORUM reads definitely increase 

cluster load. (And of course response time of the coordinator, too). 

 

Correct? 

 

Jan 

 

- 

To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 

For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Decommissioned nodes show as DOWN in Cassandra version 3.10

2017-06-12 Thread Vladimir Yudovin

Hi,

you can use 

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRemoveNode.html



or if this doesn't work ("It is a last resort tool if you cannot successfully 
use nodetool removenode.")

http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsAssassinate.html



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 12 Jun 2017 15:15:33 -0400 pabbireddy avinash 
<pabbireddyavin...@gmail.com> wrote 




Hi

In the Cassandra version 3.10, after we decommission a node or datacenter, we 
observe the decommissioned nodes marked as DOWN in the cluster when you do a 
"nodetool describecluster". The nodes however do not show up in the "nodetool 
status" command.
The decommissioned node also does not show up in the "system_peers" table on 
the nodes.
The workaround we follow is rolling restart of the cluster, which removes the 
decommissioned nodes from the "UNREACHABLE STATE", and shows the actual state 
of the cluster. The workaround is tedious for huge clusters.



as anybody in the community observed similar issue?

Below are the observed logs

2017-06-12 18:23:29,209 [RMI TCP Connection(8)-127.0.0.1] INFO 
StorageService.java:3938 - Announcing that I have left the ring for 3ms
 2017-06-12 18:23:59,210 [RMI TCP Connection(8)-127.0.0.1] INFO 
ThriftServer.java:139 - Stop listening to thrift clients
 2017-06-12 18:23:59,215 [RMI TCP Connection(8)-127.0.0.1] INFO Server.java:176 
- Stop listening for CQL clients
 2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] WARN 
Gossiper.java:1514 - No local state, state is in silent shutdown, or node 
hasn't joined, not announcing shutdown
 2017-06-12 18:23:59,216 [RMI TCP Connection(8)-127.0.0.1] INFO 
MessagingService.java:964 - Waiting for messaging service to quiesce
 2017-06-12 18:23:59,217 [ACCEPT-/96.115.209.228] INFO 
MessagingService.java:1314 - MessagingService has terminated the accept() thread
 2017-06-12 18:23:59,263 [RMI TCP Connection(8)-127.0.0.1] INFO 
StorageService.java:1435 - DECOMMISSIONED




Regards,
Avinash.

Re: Data in multi disks is not evenly distributed

2017-06-11 Thread Vladimir Yudovin

Hi,



Do your disks have the same size? AFAK Cassandra distributes data with 
proportion to disk size, i.e. keeps the same percent of busy space.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 07 Jun 2017 06:15:48 -0400 Xihui He <xihu...@gmail.com> 
wrote 




Dear All,



We are using multiple disks per node and find the data is not evenly 
distributed (data01 uses 1.1T, but data02 uses 353G). Is this expected? If 
data01 becomes full, would the node be still writable? We are using 2.2.6.



Thanks,

Xihui



data_file_directories:

- /data00/cassandra

- /data01/cassandra

- /data02/cassandra

- /data03/cassandra

- /data04/cassandra




df

/dev/sde1   1.8T  544G  1.2T  32% /data03

/dev/sdc1   1.8T  1.1T  683G  61% /data01

/dev/sdf1   1.8T  491G  1.3T  29% /data04

/dev/sdd1   1.8T  353G  1.4T  21% /data02

/dev/sdb1   1.8T  285G  1.5T  17% /data00





root@n9-016-015:~# du -sh /data01/cassandra/album_media_feature/*

143M 
/data01/cassandra/album_media_feature/media_feature_blur-066e5700c41511e5beacf197ae340934

4.4G 
/data01/cassandra/album_media_feature/media_feature_c1-dbadf930c41411e5974743d3a691d887

56K 
/data01/cassandra/album_media_feature/media_feature_duplicate-09d4b380c41511e58501e9aa37be91a5

16K 
/data01/cassandra/album_media_feature/media_feature_emotion-b8570470054d11e69fb88f073bab8267

240M 
/data01/cassandra/album_media_feature/media_feature_exposure-f55449c0c41411e58f5c9b66773b60c3

649M 
/data01/cassandra/album_media_feature/media_feature_group-f8de0cc0c41411e5827b995f709095c8

22G 
/data01/cassandra/album_media_feature/media_feature_multi_class-cf3bb72006c511e69fb88f073bab8267

44K 
/data01/cassandra/album_media_feature/media_feature_pool5-1185b200c41511e5b7d8757e25e34d67

15G 
/data01/cassandra/album_media_feature/media_feature_poster-fcf45850c41411e597bb1507d1856305

8.0K 
/data01/cassandra/album_media_feature/media_feature_quality-155d9500c41511e5974743d3a691d887

17G 
/data01/cassandra/album_media_feature/media_feature_quality_rc-51babf50dba811e59fb88f073bab8267

8.7G 
/data01/cassandra/album_media_feature/media_feature_scene-008a5050c41511e59ebcc3582d286c8d

8.0K 
/data01/cassandra/album_media_feature/media_region_features_v4-29a0cd10150611e6bd3e3f41faa2612a

971G 
/data01/cassandra/album_media_feature/media_region_features_v5-1b805470a3d711e68121757e9ac51b7b



root@n9-016-015:~# du -sh /data02/cassandra/album_media_feature/*

1.6G 
/data02/cassandra/album_media_feature/media_feature_blur-066e5700c41511e5beacf197ae340934

44G 
/data02/cassandra/album_media_feature/media_feature_c1-dbadf930c41411e5974743d3a691d887

64K 
/data02/cassandra/album_media_feature/media_feature_duplicate-09d4b380c41511e58501e9aa37be91a5

75G 
/data02/cassandra/album_media_feature/media_feature_emotion-b8570470054d11e69fb88f073bab8267

2.0G 
/data02/cassandra/album_media_feature/media_feature_exposure-f55449c0c41411e58f5c9b66773b60c3

21G 
/data02/cassandra/album_media_feature/media_feature_group-f8de0cc0c41411e5827b995f709095c8

336M 
/data02/cassandra/album_media_feature/media_feature_multi_class-cf3bb72006c511e69fb88f073bab8267

44K 
/data02/cassandra/album_media_feature/media_feature_pool5-1185b200c41511e5b7d8757e25e34d67

2.0G 
/data02/cassandra/album_media_feature/media_feature_poster-fcf45850c41411e597bb1507d1856305

8.0K 
/data02/cassandra/album_media_feature/media_feature_quality-155d9500c41511e5974743d3a691d887

17G 
/data02/cassandra/album_media_feature/media_feature_quality_rc-51babf50dba811e59fb88f073bab8267

141M 
/data02/cassandra/album_media_feature/media_feature_scene-008a5050c41511e59ebcc3582d286c8d

8.0K 
/data02/cassandra/album_media_feature/media_region_features_v4-29a0cd10150611e6bd3e3f41faa2612a

93G 
/data02/cassandra/album_media_feature/media_region_features_v5-1b805470a3d711e68121757e9ac51b7b



root@n9-016-015:~# du -sh /data03/cassandra/album_media_feature/*

4.3G 
/data03/cassandra/album_media_feature/media_feature_blur-066e5700c41511e5beacf197ae340934

19G 
/data03/cassandra/album_media_feature/media_feature_c1-dbadf930c41411e5974743d3a691d887

72K 
/data03/cassandra/album_media_feature/media_feature_duplicate-09d4b380c41511e58501e9aa37be91a5

2.8G 
/data03/cassandra/album_media_feature/media_feature_emotion-b8570470054d11e69fb88f073bab8267

105M 
/data03/cassandra/album_media_feature/media_feature_exposure-f55449c0c41411e58f5c9b66773b60c3

15G 
/data03/cassandra/album_media_feature/media_feature_group-f8de0cc0c41411e5827b995f709095c8

23G 
/data03/cassandra/album_media_feature/media_feature_multi_class-cf3bb72006c511e69fb88f073bab8267

44K 
/data03/cassandra/album_media_feature/media_feature_pool5-1185b200c41511e5b7d8757e25e34d67

17G 
/data03/cassandra/album_media_feature/media_feature_poster-fcf45850c41411e597bb1507d1856305

8.0K 
/data03/cassandra/album_media_feature/media_feature_quality-155d9500c41511e5974743d3a691d887

294M 
/data03/cassandra/album_media_f

Re: certificate pinning feature

2017-06-11 Thread Vladimir Yudovin

Hi,



Cassandra uses standard Java API for SSL security, so in general Java options 
are available.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 09 Jun 2017 10:13:27 -0400 Victor Ashik 
<ash...@microsoft.com.INVALID> wrote 




Hello,

 

Is it possible to have a CA certificates in truststores, but do any kind of 
certificate pinning, i.e. adding extra requirements for certificates (matching 
hostname or thumbprint) to be trusted by Cassandra for internode and/or client 
communication?

 

The only way to achieve this I was able to find so far is to have only trusted 
certificates in truststores and do not have CA certificates there at all, but 
this will require to change truststores and restart nodes for adding new 
certificates.

 

 

--

Regards,

Victor Ashik

Re: Replicated data size

2017-06-11 Thread Vladimir Yudovin

Hi Vasu,



I'm not sure Cassandra can pro, but actually all volume of data inserted in on 
dc should go to other DC. You can check network traffic with any available 
tools.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sat, 10 Jun 2017 14:18:17 -0400 vasu gunja <vasu.no...@gmail.com> 
wrote 




Hi All,





I have a unique requirement from my management. Here is the details of it

 

Quick idea about my environment:

 we have multi-DC( 2 dc's) setup 10 nodes each.

 keyspace RF of 3 each.





We need to know how much data replicated across data centers on per day basis. 
is there anyway to calculate that ? 



any help really appreciated here.





thanks,

Re: Convert single node C* to cluster (rebalancing problem)

2017-06-01 Thread Vladimir Yudovin

Did you run "nodetool cleanup" on first node after second was bootstrapped? It 
should clean rows not belonging to node after tokens changed.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 31 May 2017 03:55:54 -0400 Junaid Nasir <jna...@an10.io> 
wrote 




Cassandra ensure that adding or removing nodes are very easy and that load is 
balanced between nodes when a change is made. but it's not working in my case.

I have a single node C* deployment (with 270 GB of data) and want to load 
balance the data on multiple nodes, I followed this guide 

`nodetool status` shows 2 nodes but load is not balanced between them

Datacenter: dc1 === Status=Up/Down |/ 
State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) 
Host ID Rack UN 10.128.0.7 270.75 GiB 256 48.6% 
1a3f6faa-4376-45a8-9c20-11480ae5664c rack1 UN 10.128.0.14 414.36 KiB 256 51.4% 
66a89fbf-08ba-4b5d-9f10-55d52a199b41 rack1

I also ran 'nodetool repair' on new node but result is same. any pointers would 
be appreciated :)



conf file of new node

cluster_name: 'cluster1' - seeds: "10.128.0.7"
num_tokens: 256 endpoint_snitch: GossipingPropertyFileSnitch
Thanks,

Junaid

Re: Amazon linux upgrade

2017-05-18 Thread Vladimir Yudovin

Hi,

actually each Linux with appropriate Java version (Java 8 for Cassandra 3) 
should be suitable (unless you would like to install with deb/rpm and not just 
tarball)





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 15 May 2017 14:34:56 -0400 Nitan Kainth <ni...@bamlabs.com> 
wrote 




Hi, 

 

We are planning to upgrade Amazon linux to 2017.03. Can someone please point us 
to compatibility matrix for linux with C*? 

 

We are currently at C* 3.0.10.1443 and Linux: NAME="Amazon Linux AMI” 
VERSION=“2016.03" 

 

Thank you 

- 

To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 

For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Will query on PK read entire partition?

2017-04-25 Thread Vladimir Yudovin

Hi,



if you provide primary key C* will not scan whole partition, but will bloom 
filter to determinate SSTable:

Cassandra uses Bloom filters to determine whether an SSTable has data for a 
particular row. Bloom filters are unused for range scans, but are used for 
index scans.






Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 21 Apr 2017 07:56:08 -0400 Alain RODRIGUEZ 
<arodr...@gmail.com> wrote 




Hi Oskar,



My guess (wait for confirmation maybe): When you read from a primary key + 
specific clustering key or (range of clustering keys), Apache Cassandra will 
look for these specific values and not read all the row. Yet it is important to 
know that a minimal block size of 64 KB is read from the disk (not configurable 
in C* 2.0). Or if the table is compressed, the minimal read size is a chunk, 
for which you can manually set the size. That's why when using small rows, it 
is sometimes interesting to enable compression, even if you don't care about 
the data size... This all has been improved a bit in 2.1 / 2.2 and greatly in 
C* 3.0+.



I might write a post about this, if I do, I will let you know. It's an 
interesting topic I have been working on recently.



C*heers,

---

Alain Rodriguez - @arodream - al...@thelastpickle.com

France



The Last Pickle - Apache Cassandra Consulting

http://www.thelastpickle.com










2017-04-21 10:44 GMT+02:00 Oskar Kjellin <oskar.kjel...@gmail.com>:

If I have a table like this:



PRIMARY KEY ((userid),deviceid)



And I query

SELECT * FROM devices where userid= ? and deviceid = ?



Will cassandra read the entire partition for the userid? So if I lots of 
tombstones for userid, will they get scanned?



I guess this depends on how the bloomfilter is working. Does it contain 
partitioning key or primary key?



We're using 2.0.17 if it matters.



/Oskar

Re: Cassandra Cluster Doubts

2017-04-25 Thread Vladimir Yudovin

Hi Luis,



I don't thinks it's possible to achieve this by custom Snitch. As far as I 
understand Snitch only provides cluster topology, and connectivity is done by 
another component/layer. And each cluster node should be able to connect to 
every other node. So I would keep with Michael's options a) - "establish 
network communication forthe entire cluster" 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 21 Apr 2017 15:42:17 -0400 Luis Miguel <arb...@hotmail.com> 
wrote 




Hi Michael! 

 

Thanks for your answer I feared that was the answer...do you know if 
implementing my own Snitch would be possible to handle this situation? 

 

De: Michael Shuler <mshu...@pbandjelly.org> en nombre de Michael Shuler 
<mich...@pbandjelly.org> 

Enviado: viernes, 21 de abril de 2017 19:16:43 

Para: user@cassandra.apache.org 

Asunto: Re: Cassandra Cluster Doubts 

 

You have one cluster that is comprised of N nodes that may be 

distributed in racks and data centers. All the nodes of your cluster 

need to be able to communicate - they are one cluster. 

 

I think your options would be to a) establish network communication for 

the entire cluster, or b) set up a new cluster for DCR2 and sync data 

snapshots of Keyspace2 in some manner, or c) figure out a second cluster 

that contains the data centers that do have network connectivity and 

adjust application to query the appropriate cluster. There may be some 

other creative ideas that pop up. 

 

-- 

Kind regards, 

Michael 

 

On 04/21/2017 07:26 AM, Luis Miguel wrote: 

> Hello! 

> 

> 

> I have three DC: 

> 

> DC1 -> 3 nodes, Keyspace1:3 

> DC2 -> 3 nodes, Keyspace2:3 

> DCR1 -> 3 nodes, Keyspace1:2, Keyspace2:2 

> 

> now I am trying to add a new datacenter to the cluster: 

> 

> DCR2-> 1 node (by now), Keyspace2:1 which network configuration can 

> access to DC2 and DCR1 but it will never has access to DC1. 

> 

> when I try to start the node in DCR2, it does everything right with 

> Keyspace2...but Gossips DCR1 and DC1... and crashes with 

> RuntimeException because it can't move data consistently from DC1 nodes 

> (obviously I don't have network connection to those nodes from this 

> datacenter)... 

> when I try to use -Dcassandra.consistent.rangemovement= false 

> option ...It also crashes with IllegalStateException: unable to find 

> sufficient sources for streaming range..etc..etc.. 

> 

> It is possible to have that kind of topology in cassandra? I mean.. Can 

> I have a cluster where some datacenters will never "connect" other 

> datacenters? 

> 

> Thanks in advance!!!

Re: Can we get username and timestamp in cqlsh_history?

2017-04-01 Thread Vladimir Yudovin

Hi anuja,



I don't thinks there is a way to do this without creating custom Cassandra 
build.

There is mutations logs and somewhere on list was thread about parsing them, 
but I'm not sure it's what you need.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 29 Mar 2017 07:37:15 -0400 anuja jain <anujaja...@gmail.com> 
wrote 




Hi,

I have a cassandra cluster having a lot of keyspaces and users. I want to get 
the history of cql commands along with the username and the time at which the 
command is run.

Also if we are running some commands from GUI tools like Devcenter,dbeaver, can 
we log those commands too? If yes, how?



Thanks,

Anuja

Re: nodes are always out of sync

2017-04-01 Thread Vladimir Yudovin

Hi,



did you try to read data with consistency ALL immediately after write with 
consistency ONE? Does it succeed?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 30 Mar 2017 04:22:28 -0400 Roland Otta 
<roland.o...@willhaben.at> wrote 




hi, 

 

we see the following behaviour in our environment: 

 

cluster consists of 6 nodes (cassandra version 3.0.7). keyspace has a 

replication factor 3. 

clients are writing data to the keyspace with consistency one. 

 

we are doing parallel, incremental repairs with cassandra reaper. 

 

even if a repair just finished and we are starting a new one 

immediately, we can see the following entries in our logs: 

 

INFO  [RepairJobTask:1] 2017-03-30 10:14:00,782 SyncTask.java:73 - 

[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188 

and /192.168.0.191 have 1 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:2] 2017-03-30 10:14:00,782 SyncTask.java:73 - 

[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.188 

and /192.168.0.189 have 1 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:4] 2017-03-30 10:14:00,782 SyncTask.java:73 - 

[repair #d0f651f6-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 

and /192.168.0.191 have 1 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:2] 2017-03-30 10:14:03,997 SyncTask.java:73 - 

[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26 

and /192.168.0.189 have 2 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:1] 2017-03-30 10:14:03,997 SyncTask.java:73 - 

[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.26 

and /192.168.0.191 have 2 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:4] 2017-03-30 10:14:03,997 SyncTask.java:73 - 

[repair #d0fa70a1-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 

and /192.168.0.191 have 2 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:1] 2017-03-30 10:14:05,375 SyncTask.java:73 - 

[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 

and /192.168.0.191 have 1 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:2] 2017-03-30 10:14:05,375 SyncTask.java:73 - 

[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.189 

and /192.168.0.190 have 1 range(s) out of sync for ad_event_history 

INFO  [RepairJobTask:4] 2017-03-30 10:14:05,375 SyncTask.java:73 - 

[repair #d0fbd033-1520-11e7-a443-d9f5b942818e] Endpoints /192.168.0.190 

and /192.168.0.191 have 1 range(s) out of sync for ad_event_history 

 

we cant see any hints on the systems ... so we thought everything is 

running smoothly with the writes. 

 

do we have to be concerned about the nodes always being out of sync or 

is this a normal behaviour in a write intensive table (as the tables 

will never be 100% in sync for the latest inserts)? 

 

bg, 

roland

Re: Weird error: InvalidQueryException: unconfigured table table2

2017-03-27 Thread Vladimir Yudovin

>Just wish that an error like:   "Table x not found in keyspace y"

You are welcome to open JIRA with type Improvement.





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 26 Mar 2017 13:31:33 -0400 S G <sg.online.em...@gmail.com> 
wrote 




Thanks, got it working now :)



Just wish that an error like:

   "Table x not found in keyspace y"

would have been much better than:

   "Table x not configured".







On Sat, Mar 25, 2017 at 6:13 AM, Arvydas Jonusonis 
<arvydas.jonuso...@gmail.com> wrote:






Make sure to prefix the table with the keyspace. 

On Sat, Mar 25, 2017 at 13:28 Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

Ensure that all the nodes are on same schema version such that table2 schema is 
replicated properly on all the nodes.



Thanks

Anuj



Sent from Yahoo Mail on Android






On Sat, Mar 25, 2017 at 3:19 AM, S G

<sg.online.em...@gmail.com> wrote:



Hi,



I have a keyspace with two tables.



I run a different query for each table:



Table 1:

  Select * from table1 where id = ?



Table 2:

  Select * from table2 where id1 = ? and id = ?





My code using datastax fires above two queries one after the other.

While it never fails for table 1, it never succeeds for table 2

And gives an error:





com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table 
table2

at com.datastax.driver.core.Responses$Error.asException(Responses.java:136) 

at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
 

at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:177) 

at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:46) 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:799)
 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:633)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1070)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:993)
 

at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
 

at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564) 

at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419) 

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391) 




Any idea what might be wrong?



I have confirmed that all table-names and columns names are lowercase.

Datastax java version tried : 3.1.2  and 3.1.4

Cassandra version: 3.10





Thanks

SG

Re: Weird error: InvalidQueryException: unconfigured table table2

2017-03-24 Thread Vladimir Yudovin

>Wish the error message a little more helpful.

Actually "unconfigured" means "not existed", "not created".





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 24 Mar 2017 21:39:45 -0400 S G <sg.online.em...@gmail.com> 
wrote 




Ah, the keyspace for table2 was somehow getting hardcoded to a wrong keyspace.



Wish the error message a little more helpful.






On Fri, Mar 24, 2017 at 2:48 PM, S G <sg.online.em...@gmail.com> wrote:






Hi,



I have a keyspace with two tables.



I run a different query for each table:



Table 1:

  Select * from table1 where id = ?



Table 2:

  Select * from table2 where id1 = ? and id = ?





My code using datastax fires above two queries one after the other.

While it never fails for table 1, it never succeeds for table 2

And gives an error:





com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table 
table2

at com.datastax.driver.core.Responses$Error.asException(Responses.java:136) 

at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
 

at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:177) 

at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:46) 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:799)
 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:633)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1070)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:993)
 

at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
 

at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564) 

at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419) 

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391) 




Any idea what might be wrong?



I have confirmed that all table-names and columns names are lowercase.

Datastax java version tried : 3.1.2  and 3.1.4

Cassandra version: 3.10





Thanks

SG

Re: Altering of types is not allowed

2017-03-24 Thread Vladimir Yudovin

As error message said "Altering of types is not allowed" is not allowed in 
Cassandra. It's NO Sql, but still not schemaless  database.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 22 Mar 2017 18:57:18 -0400 Ryan Flynn <ry...@splunk.com> 
wrote 




Hi,

 

I’m fairly new to this and am attempting a pretty basic “change column type” 
example.  I’m receiving an error saying InvalidRequest: Error from server: 
code=2200 [Invalid query] message="Altering of types is not allowed" upon 
attempting to change any column type.

 

For example, from the  DataStax ALTER TABLE CQL Reference Page, (assuming 
already USEing a particular keyspace)

 

CREATE TABLE users (user_name varchar PRIMARY KEY, bio ascii);

 

Then,

 

ALTER TABLE users ALTER bio TYPE text;

 

Results in the error mentioned above.  I saw this JIRA Issue, and it’s the only 
thing I can find related to this problem, although that use case is specific to 
changing time-related types.  Could someone tell me if I’m doing something 
wrong or if this is a known issue?  I originally saw this on version 3.10, then 
tried using 3.0.12, and the issue is present there as well.  And I’m just 
running these via cqlsh, if it makes any difference.

 

Thanks in advance!

 

Ryan Flynn

Re: Using datastax driver, how can I read a non-primitive column as a JSON string?

2017-03-24 Thread Vladimir Yudovin

Hi,



why not used SELECT JSON * FROM as described here 
https://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support ?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 23 Mar 2017 13:08:30 -0400 S G <sg.online.em...@gmail.com> 
wrote 




Hi,



I have several non-primitive columns in my cassandra tables.

Some of them are user-defined-types UDTs.



While querying them through datastax driver, I want to convert such UDTs into 
JSON values.

More specifically, I want to get JSON string for the value object below:

Row row = itr.next();

ColumnDefinitions cds = row.getColumnDefinitions();

cds.asList().forEach((ColumnDefinitions.Definition cd) -> {

String name = cd.getName();

Object value = row.getObject(name);

  }

I have gone through 
http://docs.datastax.com/en/developer/java-driver/3.1/manual/custom_codecs/

But I do not want to add a codec for every UDT I have.



Can the driver somehow return me direct JSON without explicit meddling with 
codecs and all?



Thanks

SG

Re: Changed node ID?

2017-03-07 Thread Vladimir Yudovin

Hi,

Why did the host ID change?



probably this node data folder (at least system keyspace) was erased. Or nodes 
changed their IP, do you use dynamic IPs?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 06 Mar 2017 22:44:50 -0500 Joe Olson 
<technol...@nododos.com> wrote 






I have a 9 node cluster I had shut down (cassandra stopped on all nodes, all 
nodes shutdown) that I just tried to start back up. I have done this several 
times successfully. However, on this attempt, one of the nodes failed to join 
the cluster. Upon inspection of /var/log/cassandra/system.log, I found the 
following:



WARN  [GossipStage:1] 2017-03-06 21:06:36,648 TokenMetadata.java:252 - Changing 
/192.168.211.82's host ID from cff3ef25-9a47-4ea4-9519-b85d20bef3ee to 
59f2da9f-0b85-452f-b61a-fa990de53e4b



further down:



ERROR [main] 2017-03-06 21:20:14,718 CassandraDaemon.java:747 - Exception 
encountered during startup

java.lang.RuntimeException: A node with address /192.168.211.82 already exists, 
cancelling join. Use cassandra.replace_address if you want to replace this node.

at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:491)
 ~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:778)
 ~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:648) 
~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:548) 
~[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:385) 
[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) 
[apache-cassandra-3.9.0.jar:3.9.0]

at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730) 
[apache-cassandra-3.9.0.jar:3.9.0]




nodetool status:



UN  192.168.211.88  2.58 TiB   256  32.0% 
9de2d3ef-5ae1-4c7f-8560-730757a6d1ae  rack1

UN  192.168.211.80  2.26 TiB   256  33.9% 
d83829d3-a1d3-4e6c-b014-7cfe45e22d67  rack1

UN  192.168.211.81  2.91 TiB   256  34.1% 
0cafd24e-d3ed-4e51-b586-0b496835a931  rack1

DN  192.168.211.82  551.45 KiB  256  31.9% 
59f2da9f-0b85-452f-b61a-fa990de53e4b  rack1

UN  192.168.211.83  2.32 TiB   256  32.7% 
db006e31-03fa-486a-8512-f88eb583bd0c  rack1

UN  192.168.211.84  2.54 TiB   256  34.3% 
a9a50a74-2fc2-4866-a03a-ec95a7866183  rack1

UN  192.168.211.85  2.4 TiB256  35.9% 
733e6703-c18f-432f-a787-3731f80ba42d  rack1

UN  192.168.211.86  2.34 TiB   256  32.1% 
0daa06fa-708f-4ff8-a15e-861f1a53113a  rack1

UN  192.168.211.87  4.07 TiB   256  33.1% 
2aa578c6-1332-4b94-81c6-c3ce005a52ef  rack1




My questions:

1. Why did the host ID change?

2. If I modify cassandra-env.sh to include 

JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=192.168.211.82", will I recover 
the data on the original node? It is still on the node's hard drive.I really 
don't want to have to restream 2.6TB of data onto a "new" node.

Re: Limit on number of keyspaces/tables

2017-03-05 Thread Vladimir Yudovin

>From source code and measurement.

Try to create a lot of tables with small write to each of them and monitor Java 
heap. Each table takes some more then 1M.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 05 Mar 2017 14:40:42 -0500 benjamin roth <brs...@gmail.com> 
wrote 




Why do you think 1 table consumes 1m??



Am 05.03.2017 20:36 schrieb "Vladimir Yudovin" <vla...@winguzone.com>:








Hi,



there is no such hard limit, but each table consume at least 1M memory, so 1000 
tables takes at least 1G.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 05 Mar 2017 05:57:48 -0500 Lata Kannan 
<lata.kan...@oracle.com> wrote 




Hi 



I just wanted to check if there is any known limit to the number of 

keyspaces one can create in a Cassandra cluster? Alternatively is there 

a max on the number of tables that can be created in a cluster? 





-- 

Thanks 

--lata

Re: Limit on number of keyspaces/tables

2017-03-05 Thread Vladimir Yudovin

Hi,



there is no such hard limit, but each table consume at least 1M memory, so 1000 
tables takes at least 1G.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 05 Mar 2017 05:57:48 -0500 Lata Kannan 
<lata.kan...@oracle.com> wrote 




Hi 

 

I just wanted to check if there is any known limit to the number of 

keyspaces one can create in a Cassandra cluster? Alternatively is there 

a max on the number of tables that can be created in a cluster? 

 

 

-- 

Thanks 

--lata

Re: Backups eating up disk space

2017-02-27 Thread Vladimir Yudovin

Yes, you can. It's just hardlinks to tables files, so if some file is still 
active it will remain intact.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 27 Feb 2017 09:27:50 -0500 Kunal Gangakhedkar 
<kgangakhed...@gmail.com> wrote 




Hi all,



Is it safe to delete the backup folders from various CFs from 'system' keyspace 
too?

I seem to have missed them in the last cleanup - and now, the size_estimates 
and compactions_in_progress seem to have grown large ( >200G and ~6G 
respectively).



Can I remove them too?



Thanks,

Kunal




On 13 January 2017 at 18:30, Kunal Gangakhedkar <kgangakhed...@gmail.com> 
wrote:








Great, thanks a lot to all for the help :)



I finally took the dive and went with Razi's suggestions.

In summary, this is what I did:

turn off incremental backups on each of the nodes in rolling fashion

remove the 'backups' directory from each keyspace on each node.

This ended up freeing up almost 350GB on each node - yay :)




Again, thanks a lot for the help, guys.



Kunal




On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
<raziuddin.kh...@nih.gov> wrote:

snapshots are slightly different than backups.

 

In my explanation of the hardlinks created in the backups folder, notice that 
compacted sstables, never end up in the backups folder.

 

On the other hand, a snapshot is meant to represent the data at a particular 
moment in time. Thus, the snapshots directory contains hardlinks to all active 
sstables at the time the snapshot was taken, which would include: compacted 
sstables; and any sstables from memtable flush or streamed from other nodes 
that both exist in the table directory and the backups directory.

 

So, that would be the difference between snapshots and backups.

 

Best regards,

-Razi

 

 

From:  Alain RODRIGUEZ <arodr...@gmail.com>
 Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
 Date: Thursday, January 12, 2017 at 9:16 AM


To: "user@cassandra.apache.org" <user@cassandra.apache.org>

 Subject: Re: Backups eating up disk space






 


My 2 cents, 

 


As I mentioned earlier, we're not currently using snapshots - it's only the 
backups that are bothering me right now.

 


I believe backups folder is just the new name for the previously called 
snapshots folder. But I can be completely wrong, I haven't played that much 
with snapshots in new versions yet.


 


Anyway, some operations in Apache Cassandra can trigger a snapshot:


 


- Repair (when not using parallel option but sequential repairs instead)


- Truncating a table (by default)


- Dropping a table (by default)


- Maybe other I can't think of... ?


 


If you want to clean space but still keep a backup you can run:


 


"nodetool clearsnapshots"


"nodetool snapshot <whatever>"


 


This way and for a while, data won't be taking space as old files will be 
cleaned and new files will be only hardlinks as detailed above. Then you might 
want to work at a proper backup policy, probably implying getting data out of 
production server (a lot of people uses S3 or similar services). Or just do 
that from time to time, meaning you only keep a backup and disk space behaviour 
will be hard to predict.


 


C*heers,


---


Alain Rodriguez - @arodream -  al...@thelastpickle.com


France


 


The Last Pickle - Apache Cassandra Consulting


http://www.thelastpickle.com




 

2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar <prasenjit.sar...@datos.io>:

Hi Kunal, 

 


Razi's post does give a very lucid description of how cassandra manages the 
hard links inside the backup directory.


 


Where it needs clarification is the following:


--> incremental backups is a system wide setting and so its an all or 
nothing approach


 


--> as multiple people have stated, incremental backups do not create hard 
links to compacted sstables. however, this can bloat the size of your backups


 


--> again as stated, it is a general industry practice to place backups in a 
different secondary storage location than the main production site. So best to 
move it to the secondary storage before applying rm on the backups folder


 


In my experience with production clusters, managing the backups folder across 
multiple nodes can be painful if the objective is to ever recover data. With 
the usual disclaimers, better to rely on third party vendors to accomplish the 
needful rather than scripts/tablesnap.


 


Regards


Prasenjit 

 

 

On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] 
<raziuddin.kh...@nih.gov> wrote:

Hello Kunal,

 

Caveat: I am not a super-expert on Cassandra, but it helps to explain to 
others, in order to eventually become an expert, so if my explanation is wrong, 
I would hope others would correct

Re: Which compaction strategy when modeling a dumb set

2017-02-27 Thread Vladimir Yudovin

Do you also store events in Cassandra? If yes, why not to add "processed" flag 
to existing table(s), and fetch non-processed events with single SELECT?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 24 Feb 2017 06:24:09 -0500 Vincent Rischmann 
<m...@vrischmann.me> wrote 




Hello,



I'm using a table like this:



   CREATE TABLE myset (id uuid PRIMARY KEY)



which is basically a set I use for deduplication, id is a unique id for an 
event, when I process the event I insert the id, and before processing I check 
if it has already been processed for deduplication.



It works well enough, but I'm wondering which compaction strategy I should use. 
I expect maybe 1% or less of events will end up duplicated (thus not generating 
an insert), so the workload will probably be 50% writes 50% read.



Is LCS a good strategy here or should I stick with STCS ?

Re: Reminder: don't listen on public addresses

2017-02-26 Thread Vladimir Yudovin

I would add: use SSL and internode certificate authentication (password for CQL 
goes withou saying, of course). 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 20 Jan 2017 14:14:56 -0500 Richard L. Burton III 
<mrbur...@gmail.com> wrote 




I'm often asked "How should I setup my security for Cassandra?" My answer is 
simple "Do not expose them to the outside world! If using AWS, setup your VPC 
and block any IP address that's not in your range and restrict what machines 
can access them." 






On Fri, Jan 20, 2017 at 12:29 PM, Jonathan Ellis <jbel...@gmail.com> 
wrote:

MongoDB has been in the news for hackers deleting unsecured databases and 
demanding money to return the data.


Now copycats are starting to look at other targets too like the thousands of 
unsecured Cassandra databases.




Preventing this is very simple: don't allow Cassandra to listen on public 
interfaces.  




Of course additional security measures are useful as defense in depth, but 
bottom line if the bad guys can't connect to your cluster they can't harm it.





-- 

Jonathan Ellis

co-founder, http://www.datastax.com

@spyced


















-- 

-Richard L. Burton III

@rburton

Re: Not timing out some queries (Java driver)

2016-12-22 Thread Vladimir Yudovin

What is replication factor? Why not use CONSISTENCY QUORUM? It's faster and 
safe enough.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 22 Dec 2016 10:14:14 -0500 Ali Akhtar <ali.rac...@gmail.com> 
wrote 




Is it possible to provide these options per query rather than set them globally?



On Thu, Dec 22, 2016 at 7:15 AM, Voytek Jarnot <voytek.jar...@gmail.com> 
wrote:

cassandra.yaml has various timeouts such as read_request_timeout, 
range_request_timeout, write_request_timeout, etc.  The driver does as well 
(via Cluster -> Configuration -> SocketOptions -> 
setReadTimeoutMillis).



Not sure if you can (or would want to) set them to "forever", but it's a 
starting point.




On Wed, Dec 21, 2016 at 7:10 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

I have some queries which need to be processed in a consistent manner. I'm 
setting the consistently level = ALL option on these queries.



However, I've noticed that sometimes these queries fail because of a timeout (2 
seconds).



In my use case, for certain queries, I want them to never time out and block 
until they have been acknowledged by all nodes.



Is that possible thru the Datastax Java driver, or another way?

Re: Openstack and Cassandra

2016-12-22 Thread Vladimir Yudovin

Hi Shalom,



I don't see any reason why it wouldn't work,  but obviously, any resource 
sharing affects performance. You can expect less degradation with SSD disks, I 
guess.





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Dec 2016 13:31:22 -0500 Shalom Sagges 
<shal...@liveperson.com> wrote 




Hi Everyone, 



I am looking into the option of deploying a Cassandra cluster on Openstack 
nodes instead of physical nodes due to resource management considerations. 



Does anyone has any insights regarding this?

Can this combination work properly? 

Since the disks (HDDs) are part of one physical machine that divide their 
capacity to various instances (not only Cassandra), will this affect 
performance, especially when the commitlog directory will probably reside with 
the data directory?



I'm at a loss here and don't have any answers for that matter. 



Can anyone assist please? 



Thanks!





 


 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Cqlsh timeout and schema refresh exceptions

2016-12-19 Thread Vladimir Yudovin

Regarding schema agreement - try to increase time between CF creation. 

Also stress-tool waits for schema, look on its code, probably it uses some 
methods to ensure schema distribution. 

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 19 Dec 2016 14:35:00 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 

Thanks Vladimir!

Is there any known issue in 3.0.10, where creating "CF with large number of 
cols" or "creating large number of CFs quickly" one after other gives schema 
agreement issue?

What other things can I try to support ~12000 CF without hitting schema 
agreement related issues? I can put more RAM and increase heap size(even if I 
need to spend time in GC tuning for such large heap), but the issue which I get 
with 2400 cols CFs starts happening just after few keyspaces(less than 200 
CFs). What can I try to fix that?

On Tue, Dec 20, 2016 at 12:53 AM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

>I want to dig deeper into what all things happen in C* at time of CF 
creation

It starts somewhere in MigrationManager.announceNewColumnFamily function, I 
guess. 

>imitation of number of keyspaces which can be created.

Actually it's CF limitation, not keyspaces. 

>if you can also point me to the this 1MB per CF thingy, it would be great.

Look at http://www.mail-archive.com/user@cassandra.apache.org/msg46359.html, 
CASSANDRA-5935, CASSANDRA-2252

In source look at SlabAllocator.REGION_SIZE definition.

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 19 Dec 2016 14:10:37 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 

Hi Vladimir,

Thanks for the response.

When I see "com.datastax.driver.core.ControlConnection" exceptions, I see that 
keyspaces and CF are created. But when I create CF with large number of 
columns(2400 cols) quickly one after the other(with 2 seconds gap between 
CREATE TABLE queries), I get schema agreement timeout errors ( 
com.datastax.driver.core.Cluster | Error while waiting for schema agreement). 
This happens even with a clean slate(empty data directory), just after creating 
4 keyspaces. Timeout is set to 30 seconds. Please note that CREATE TABLE 
queries are NOT fired in parallel. I wait for 1 query to complete(with schema 
agreement) before firing another one.

I want to dig deeper into what all things happen in C* at time of CF creation 
to understand more about the limitation of number of keyspaces which can be 
created. Can you please point me to the corresponding source code? Specifically 
if you can also point me to the this 1MB per CF thingy, it would be great.

Best Regards,

Saumitra

On Mon, Dec 19, 2016 at 11:41 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

Hi,

Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?

A lot ). cqlsh reads schemas, cluster topology, each node tokens, etc. You can 
just capture TCP port 9042 (unless you use SSL)  and view all negotiation 
between cqlsh and node.

Question: Can a single C* cluster of 5 nodes(32gb/8cpu each) support upto 500 
keyspaces each having 25 CFs. What kind of issues I can expect?

You have 500*25 = 12500 tables, it's huge number. Each CF takes at least 1M of 
heap memory. So it needs 12G heap only for starting usage. Make test on one-two 
node cluster.

Question: What is the effect of below exception?

Is keyspaces created despite exception or no?

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 19 Dec 2016 10:24:20 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 

Hi All,

I have a 2 node cluster(32gb ram/8cpu) running 3.0.10 and I created 50 
keyspaces in it. Each keyspace has 25 CF. Column count in each CF ranges 
between 5 to 30. 

I am getting few issues once keyspace count reaches ~50. 

Issue 1:

When I try to use cqlsh, I get timeout.

$ cqlsh `hostname -i`

Connection error: ('Unable to connect to any servers', {'10.0.20.220': 
OperationTimedOut('errors=None, last_host=None',)})

If I increase connect timeout, I am able to access cluster through cqlsh

$ cqlsh --connect-timeout 20  `hostname -i   //this works fine

Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?

Issue 2:

If I create keyspaces which have 3 large CF(each having around 2500 cols), then 
I start to see schema agreement timeout in my logs. I have set schema agreement 
timeout to 30 seconds in driver.

2016-12-13 08:37:02.733 | gbd-std-01 | WARN | cluster2-worker-194 | 
com.datastax.driver.core.Cluster | Error while waiting for schema agreement

Question: Can a single C* cluste

Re: Cqlsh timeout and schema refresh exceptions

2016-12-19 Thread Vladimir Yudovin

>I want to dig deeper into what all things happen in C* at time of CF 
creation

It starts somewhere in MigrationManager.announceNewColumnFamily function, I 
guess. 

>imitation of number of keyspaces which can be created.

Actually it's CF limitation, not keyspaces. 

>if you can also point me to the this 1MB per CF thingy, it would be great.

Look at http://www.mail-archive.com/user@cassandra.apache.org/msg46359.html, 
CASSANDRA-5935, CASSANDRA-2252

In source look at SlabAllocator.REGION_SIZE definition.

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 19 Dec 2016 14:10:37 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 

Hi Vladimir,

Thanks for the response.

When I see "com.datastax.driver.core.ControlConnection" exceptions, I see that 
keyspaces and CF are created. But when I create CF with large number of 
columns(2400 cols) quickly one after the other(with 2 seconds gap between 
CREATE TABLE queries), I get schema agreement timeout errors ( 
com.datastax.driver.core.Cluster | Error while waiting for schema agreement). 
This happens even with a clean slate(empty data directory), just after creating 
4 keyspaces. Timeout is set to 30 seconds. Please note that CREATE TABLE 
queries are NOT fired in parallel. I wait for 1 query to complete(with schema 
agreement) before firing another one.

I want to dig deeper into what all things happen in C* at time of CF creation 
to understand more about the limitation of number of keyspaces which can be 
created. Can you please point me to the corresponding source code? Specifically 
if you can also point me to the this 1MB per CF thingy, it would be great.

Best Regards,

Saumitra

On Mon, Dec 19, 2016 at 11:41 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

Hi,

Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?

A lot ). cqlsh reads schemas, cluster topology, each node tokens, etc. You can 
just capture TCP port 9042 (unless you use SSL)  and view all negotiation 
between cqlsh and node.

Question: Can a single C* cluster of 5 nodes(32gb/8cpu each) support upto 500 
keyspaces each having 25 CFs. What kind of issues I can expect?

You have 500*25 = 12500 tables, it's huge number. Each CF takes at least 1M of 
heap memory. So it needs 12G heap only for starting usage. Make test on one-two 
node cluster.

Question: What is the effect of below exception?

Is keyspaces created despite exception or no?

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Mon, 19 Dec 2016 10:24:20 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 

Hi All,

I have a 2 node cluster(32gb ram/8cpu) running 3.0.10 and I created 50 
keyspaces in it. Each keyspace has 25 CF. Column count in each CF ranges 
between 5 to 30. 

I am getting few issues once keyspace count reaches ~50. 

Issue 1:

When I try to use cqlsh, I get timeout.

$ cqlsh `hostname -i`

Connection error: ('Unable to connect to any servers', {'10.0.20.220': 
OperationTimedOut('errors=None, last_host=None',)})

If I increase connect timeout, I am able to access cluster through cqlsh

$ cqlsh --connect-timeout 20  `hostname -i   //this works fine

Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?

Issue 2:

If I create keyspaces which have 3 large CF(each having around 2500 cols), then 
I start to see schema agreement timeout in my logs. I have set schema agreement 
timeout to 30 seconds in driver.

2016-12-13 08:37:02.733 | gbd-std-01 | WARN | cluster2-worker-194 | 
com.datastax.driver.core.Cluster | Error while waiting for schema agreement

Question: Can a single C* cluster of 5 nodes(32gb/8cpu each) support upto 500 
keyspaces each having 25 CFs. What kind of issues I can expect?

Issue 3:

I am creating keyspaces and CFs through datastax driver. I see following 
exception in my log after reaching ~50 keyspaces.

Question: What is the effect of below exception?

2016-12-19 13:55:35.615 | gbd-std-01 | ERROR | cluster1-worker-147 | 
com.datastax.driver.core.ControlConnection | [Control connection] Unexpected 
error while refreshing schema

java.util.concurrent.ExecutionException: 
com.datastax.driver.core.exceptions.OperationTimedOutException: 
[gbd-cass-20.ec2-east1.hidden.com/10.0.20.220] Operation timed out

at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
 ~[com.google.guava.guava-18.0.jar:na]

at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
 ~[com.google.guava.guava-18.0.jar:na]

at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[com.google.g

Re: Cqlsh timeout and schema refresh exceptions

2016-12-19 Thread Vladimir Yudovin

Hi,



Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?

A lot ). cqlsh reads schemas, cluster topology, each node tokens, etc. You can 
just capture TCP port 9042 (unless you use SSL)  and view all negotiation 
between cqlsh and node.





Question: Can a single C* cluster of 5 nodes(32gb/8cpu each) support upto 500 
keyspaces each having 25 CFs. What kind of issues I can expect?

You have 500*25 = 12500 tables, it's huge number. Each CF takes at least 1M of 
heap memory. So it needs 12G heap only for starting usage. Make test on one-two 
node cluster.





Question: What is the effect of below exception?

Is keyspaces created despite exception or no?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Mon, 19 Dec 2016 10:24:20 -0500 Saumitra S 
<saumitra.srivast...@gmail.com> wrote 




Hi All,



I have a 2 node cluster(32gb ram/8cpu) running 3.0.10 and I created 50 
keyspaces in it. Each keyspace has 25 CF. Column count in each CF ranges 
between 5 to 30. 



I am getting few issues once keyspace count reaches ~50. 



Issue 1:



When I try to use cqlsh, I get timeout.



$ cqlsh `hostname -i`

Connection error: ('Unable to connect to any servers', {'10.0.20.220': 
OperationTimedOut('errors=None, last_host=None',)})




If I increase connect timeout, I am able to access cluster through cqlsh



$ cqlsh --connect-timeout 20  `hostname -i   //this works fine



Question: Does C* reads some schema/metadata on calling cqlsh, which is causing 
timeout with large number of keyspaces?





Issue 2:



If I create keyspaces which have 3 large CF(each having around 2500 cols), then 
I start to see schema agreement timeout in my logs. I have set schema agreement 
timeout to 30 seconds in driver.



2016-12-13 08:37:02.733 | gbd-std-01 | WARN | cluster2-worker-194 | 
com.datastax.driver.core.Cluster | Error while waiting for schema agreement




Question: Can a single C* cluster of 5 nodes(32gb/8cpu each) support upto 500 
keyspaces each having 25 CFs. What kind of issues I can expect?





Issue 3:



I am creating keyspaces and CFs through datastax driver. I see following 
exception in my log after reaching ~50 keyspaces.



Question: What is the effect of below exception?



2016-12-19 13:55:35.615 | gbd-std-01 | ERROR | cluster1-worker-147 | 
com.datastax.driver.core.ControlConnection | [Control connection] Unexpected 
error while refreshing schema

java.util.concurrent.ExecutionException: 
com.datastax.driver.core.exceptions.OperationTimedOutException: 
[gbd-cass-20.ec2-east1.hidden.com/10.0.20.220] Operation timed out

at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
 ~[com.google.guava.guava-18.0.jar:na]

at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
 ~[com.google.guava.guava-18.0.jar:na]

at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
~[com.google.guava.guava-18.0.jar:na]

at com.datastax.driver.core.SchemaParser.get(SchemaParser.java:467) 
~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.SchemaParser.access$400(SchemaParser.java:30) 
~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.SchemaParser$V3SchemaParser.fetchSystemRows(SchemaParser.java:632)
 ~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at com.datastax.driver.core.SchemaParser.refresh(SchemaParser.java:56) 
~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:341)
 ~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.ControlConnection.refreshSchema(ControlConnection.java:306)
 ~[com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.Cluster$Manager$SchemaRefreshRequestDeliveryCallback$1.runMayThrow(Cluster.java:2570)
 [com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
com.datastax.driver.core.ExceptionCatchingRunnable.run(ExceptionCatchingRunnable.java:32)
 [com.datastax.cassandra.cassandra-driver-core-3.0.0.jar:na]

at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_45]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_45]

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_45]

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_45]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]

Caused by: com.datastax.driver.core.exceptions.OperationTimedOutException: 
[gbd-cass-20.ec2-east1.hidden.com/10.

Re: Cassandra: maximum size of collection list type

2016-12-01 Thread Vladimir Yudovin

As doc says:

 The maximum size of an item in a collection is 64K.


Keep collections small to prevent delays during querying because Cassandra 
reads a collection in its entirety. The collection is not paged internally.
As discussed earlier, collections are designed to store only a small amount of 
data.


Never insert more than 64K items in a collection. 
If you insert more than 64K items into a collection, only 64K of them will be 
queryable, resulting in data loss.




Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 01 Dec 2016 12:17:23 -0500 Benjamin Roth 
<benjamin.r...@jaumo.com> wrote 




You can read it in the docs but i think it was 2^16 aka 64k



Am 01.12.2016 18:00 schrieb "Selvam Raman" <sel...@gmail.com>:

Hi,



What is the maximum size which can be stored into collection list(in a row ) in 
cassandra.



-- 

Selvam Raman

"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: Cassandra 2.x Stability

2016-11-30 Thread Vladimir Yudovin

You should also consider end of support term, as Cassandra page says:



Apache Cassandra 2.2 is supported until November 2016.

Apache Cassandra 2.1 is supported until November 2016 with critical fixes only



So 2.1 actually don't get any fixes, even critical.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 30 Nov 2016 07:38:46 -0500 kurt Greaves 
<k...@instaclustr.com> wrote 




Latest release in 2.2. 2.1 is borderline EOL and from my experience 2.2 is 
quite stable and has some handy bugfixes that didn't actually make it into 2.1



On 30 November 2016 at 10:41, Shalom Sagges <shal...@liveperson.com> 
wrote:

Hi Everyone, 



I'm about to upgrade our 2.0.14 version to a newer 2.x version. 

At first I thought of upgrading to 2.2.8, but I'm not sure how stable it is, as 
I understand the 2.2 version was supposed to be a sort of beta version for 3.0 
feature-wise, whereas 3.0 upgrade will mainly handle the storage modifications 
(please correct me if I'm wrong). 



So my question is, if I need a 2.x version (can't upgrade to 3 due to client 
considerations), which one should I choose, 2.1.x or 2.2.x? (I'm don't require 
any new features available in 2.2). 



Thanks!




 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 











This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Java GC pauses, reality check

2016-11-25 Thread Vladimir Yudovin

Hi Ahmed,



obviously, 20-30 sec. pause is unacceptable.

I suppose check the following:



- disable swapping completely

- check Java version, v8. is desirable (depending on Cassandra version)

- use multiprocessor machine (it allows concurrent GC)



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Fri, 25 Nov 2016 16:25:15 -0500 S Ahmed <sahmed1...@gmail.com> 
wrote 




Hello!



>From what I understand java GC pauses are pretty much a fact of life, but you 
>can tune the jvm to reduce the likelihood of the frequency and length of GC 
>pauses.



When using Cassandra, how frequent or long have these pauses known to be?  Even 
with tuning, is it safe to assume they cannot be eliminated?



Would a 20-30 second pause be something out of the ordinary?



Thanks.

Re: generate different sizes of request from single client

2016-11-24 Thread Vladimir Yudovin

>doesn't has any option for mixed request size

As a workaround you can run two parallel tests with its own size each.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 16:54:08 -0500 Vikas Jaiman 
<er.vikasjai...@gmail.com> wrote 




Hi Vladimir,



It has option of mixed request of read/write but doesn't has any option for 
mixed request size where I can request different size of request.



Thanks,

Vikas  



On Thu, Nov 24, 2016 at 7:46 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:



You can use cassandra stress-tool.

It has options to set different load patterns.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 13:27:59 -0500Vikas Jaiman 
<er.vikasjai...@gmail.com> wrote 




Hi all,


I want to generate two different sizes (let's say 1 KB and 10 KB) of request 
from single client for benchmarking Cassandra. Is there any tool exist for this 
type of scenario? 



Vikas

Re: generate different sizes of request from single client

2016-11-24 Thread Vladimir Yudovin

You can use cassandra stress-tool.

It has options to set different load patterns.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 13:27:59 -0500Vikas Jaiman 
<er.vikasjai...@gmail.com> wrote 




Hi all,


I want to generate two different sizes (let's say 1 KB and 10 KB) of request 
from single client for benchmarking Cassandra. Is there any tool exist for this 
type of scenario? 



Vikas

Re: OperationTimedOutException (NoHostAvailableException)

2016-11-24 Thread Vladimir Yudovin

>rpc_address: 0.0.0.0  , broadcast_address: 1.2.3.4

Did you try set rpc_address to node IP and not to 0.0.0.0 ?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 04:50:08 -0500Jeff Jirsa 
<jeff.ji...@crowdstrike.com> wrote 




Did you already try doing what the error message indicates you should try?

 

Is there anything in the logs on the 3 cassandra boxes listed (192.168.198.168, 
192.168.198.169, 192.168.198.75) that indicates they had problems at that time, 
perhaps GCInspector or StatusLogger messages about pauses, or any drops in 
network utilization to indicate a networking problem?

 



 

From: "techpyaasa ." <techpya...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, November 24, 2016 at 1:43 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: OperationTimedOutException (NoHostAvailableException)

 


Hi all,

Following exception thrown sometimes though all nodes are up.

 SEVERE : This error occurs if there are not enough Cassandra nodes for the 
required QUORUM to persist data. Please make sure enough nodes are up at this 
point of time. Error Count is at 150 Exception 
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried 
for query failed (tried: /192.168.198.168:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /192.168.198.169:9042 
(com.datastax.driver.core.exceptions.DriverException: Timeout while trying to 
acquire available connection (you may want to increase the driver number of 
per-host connections)), /192.168.198.75:9042 
(com.datastax.driver.core.OperationTimedOutException: [/192.168.198.75:9042] 
Operation timed out)) at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
 at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
 at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) 
at 


We are using c*-2.0.17 , datastax java driver - cassandra-driver-core-2.1.8.jar.

In cassandra.yaml following were set 
rpc_address: 0.0.0.0 , broadcast_address: 1.2.3.4

This exception thrown for both READ & WRITE queries. Can someone please 
help me out in debugging things?


Thanks 
Techpyaasa

Re: how to effectively drop table

2016-11-24 Thread Vladimir Yudovin

Actually you shouldn't drop tables prior to dropping keyspace. Just drop 
keyspace with all its tables.



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 05:38:36 -0500joseph gao 
<gaojf.bok...@gmail.com> wrote 




Hi，Vladimir,

I have to do the whole stuff in 2 step. DROP whole keyspace works in step2. 
But in step 1, I have to drop 2000 tables。 All I could wait and skip step 1 and 
do all jobs in step 2.

Anyway, thanks very much!




2016-11-24 16:25 GMT+08:00 Vladimir Yudovin <vla...@winguzone.com>:



Hi,



Is DROP whole keyspace an option?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 03:00:40 -0500joseph gao 
<gaojf.bok...@gmail.com> wrote 




Hi, all

I've had a very bad system design before. This caused about 1 tables in 
my cassandra cluster, and the cluster is very unstable. Now I want to redesign 
the system, but it's so suffering to drop the former tables. Dropping a table 
may cost 30 seconds or even worse(170s) using cqlsh or driver client. 

So is there an effectively way to drop these unused table ? Thanks very 
much.



-- 

--

Joseph Gao

PhoneNum:18136950721

QQ: 409343351

















-- 

--

Joseph Gao

PhoneNum:15210513582

QQ: 409343351

Re: how to effectively drop table

2016-11-24 Thread Vladimir Yudovin

Hi,



Is DROP whole keyspace an option?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Thu, 24 Nov 2016 03:00:40 -0500joseph gao 
<gaojf.bok...@gmail.com> wrote 




Hi, all

I've had a very bad system design before. This caused about 1 tables in 
my cassandra cluster, and the cluster is very unstable. Now I want to redesign 
the system, but it's so suffering to drop the former tables. Dropping a table 
may cost 30 seconds or even worse(170s) using cqlsh or driver client. 

So is there an effectively way to drop these unused table ? Thanks very 
much.



-- 

--

Joseph Gao

PhoneNum:18136950721

QQ: 409343351

Re: Row and column level tombstones

2016-11-23 Thread Vladimir Yudovin

You are right, only new inserts after delete are taken into account:

CREATE KEYSPACE ks WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};

CREATE TABLE ks.tb (id int PRIMARY KEY , str text);

INSERT INTO ks.tb (id, str) VALUES ( 0,'');

DELETE from ks.tb WHERE id =0;

INSERT INTO ks.tb (id) VALUES ( 0);

SELECT * FROM ks.tb ;



 id | str

+--

  0 | null



(1 rows)




Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 23 Nov 2016 11:59:51 -0500Andrew Cooper 
<andrew.coo...@nisc.coop> wrote 




What would be returned in the following example?

 

Row with columns exists

Row is deleted (row tombstone)

Row key is recreated

 

Would columns that existed before the row delete/tombstone show back up in a 
read if the row key is recreated?

My assumption is that the row key tombstone timestamp is taken into 
consideration on the read path and all columns with timestamp less than key 
tombstone are ignored in the response.

I have not dug into the codebase yet.  If anyone can shed light on this 
question from their own experiences that would be helpful.

 

Thanks,

 

-Andrew

RE: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Vladimir Yudovin

Try to build cluster with .withPoolingOptions



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 23 Nov 2016 05:57:58 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 




Yes, i also try with async mode but I got max speed on 2500 request/sec per 
server.

 

   ExecutorService service=Executors.newFixedThreadPool(1000);

for(final AdLog adLog:li){

service.submit(()->{

session.executeAsync(ktest.adImprLogToStatement(adLog.getAdLogType(),adLog.getAdImprLog()));

inte.incrementAndGet();

 });

  }

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

 

From: Benjamin Roth [mailto:benjamin.r...@jaumo.com] 
 Sent: Wednesday, November 23, 2016 4:09 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra Config as per server hardware for heavy write
 

This has nothing to do with sync/async operations. An async operation is also 
replayable. You receive the result in a future instead.

Have you ever dealt with async programming techniques like promises, futures, 
callbacks?


Async programming does not change the fact that you get a result of your 
operation only WHERE and WHEN.


Doing sync operations means the result is available in the "next line of code" 
whereas async operation means that some handler is called when the result is 
there.


 


There are tons of articles around this in the web.



 

2016-11-23 11:29 GMT+01:00 Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in>:

But I need to do it in sync mode as per business requirement. If something went 
wrong then it should be replayle. That’s why I am using sync mode.

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.


 

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 3:47 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write


 

session.execute is coming from Session session = cluster.connect(); I guess?


 


So actually all threads work with the same TCP connection. It's worth to try 
async API with Connection Pool.


 


Best regards, Vladimir Yudovin,


Winguzone - Cloud Cassandra Hosting



 


 


 On Wed, 23 Nov 2016 04:49:18 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 



 


Hi

 

I am submitting record to Executor service and below is my client config and 
code:

 

cluster = Cluster.builder().addContactPoints(hostAddresses)

   .withRetryPolicy(DefaultRetryPolicy.INSTANCE)

   .withReconnectionPolicy(new 
ConstantReconnectionPolicy(3L))

   .withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy()))

   .build();

 

   ExecutorService service=Executors.newFixedThreadPool(1000);

for(final AdLog adLog:li){

service.submit(()->{

session.execute(ktest.adImprLogToStatement(adLog.getAdLogType(),adLog.getAdImprLog()));

inte.incrementAndGet();

 });

  }

 

 

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.


 

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 3:15 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write


 

>I have a list with 1cr record. I am just iterating on it and executing the 
query. Also, I try with 200 thread


Do you fetch each list item and put it to separate thread to perform CQL query? 
Also how exactly do you connect to Cassandra?


If you use synchronous API so it's better to create connection pool (with 
TokenAwarePolicy each) and then pass each item to separate thread.


 


 


Best regards, Vladimir Yudovin,


Winguzone - Cloud Cassandra Hosting



 


 


 On Wed, 23 Nov 2016 04:23:13 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 



 


Hi Siddharth,

 

For me it seems Cassandra side. Because I have a list with 1cr record. I am 
just iterating on it and executing the query.

Also, I try with 200 thread but still speed doesn’t increase that much as 
expected. On grafana write latency is near about 10

RE: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Vladimir Yudovin

session.execute is coming from Session session = cluster.connect(); I guess?

So actually all threads work with the same TCP connection. It's worth to try 
async API with Connection Pool.

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Wed, 23 Nov 2016 04:49:18 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

Hi

I am submitting record to Executor service and below is my client config and 
code:

cluster = Cluster.builder().addContactPoints(hostAddresses)

   .withRetryPolicy(DefaultRetryPolicy.INSTANCE)

   .withReconnectionPolicy(new 
ConstantReconnectionPolicy(3L))

   .withLoadBalancingPolicy(new TokenAwarePolicy(new 
DCAwareRoundRobinPolicy()))

   .build();

   ExecutorService service=Executors.newFixedThreadPool(1000);

for(final AdLog adLog:li){

service.submit(()->{

session.execute(ktest.adImprLogToStatement(adLog.getAdLogType(),adLog.getAdImprLog()));

inte.incrementAndGet();

 });

  }

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 3:15 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write

>I have a list with 1cr record. I am just iterating on it and executing the 
query. Also, I try with 200 thread

Do you fetch each list item and put it to separate thread to perform CQL query? 
Also how exactly do you connect to Cassandra?

If you use synchronous API so it's better to create connection pool (with 
TokenAwarePolicy each) and then pass each item to separate thread.

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting

 On Wed, 23 Nov 2016 04:23:13 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

Hi Siddharth,

For me it seems Cassandra side. Because I have a list with 1cr record. I am 
just iterating on it and executing the query.

Also, I try with 200 thread but still speed doesn’t increase that much as 
expected. On grafana write latency is near about 10Ms.

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com] 
 Sent: Wednesday, November 23, 2016 2:23 PM
 To:  user@cassandra.apache.org
 Subject: Re: Cassandra Config as per server hardware for heavy write

Hi Abhishek,

You could check whether you are throttling on client side queries or on 
cassandra side.

You could also use grafana to monitor the cluster as well.

As you said, you are using 100 threads, it can't be sure whether you are 
throttling cassandra cluster to its max limit.

As Benjamin suggested, you could use cassandra stress tool.

Lastly, if after everything( and you are sure, that cassandra seems slow) the 
TPS comes out to be the numbers as you suggested, you could check you schema, 
many rows in one partition key, read queries, read write load, write queries 
with Batch/LWT, compactions running etc.

For checking ONLY cassandra throughput, you could use cassandra-stress with any 
schema of your choice.

Regards

On Wed, Nov 23, 2016 at 2:07 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

So do you see speed write saturation at this number of thread? Does doubling to 
200 bring increase?

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting, Zero production time

 On Wed, 23 Nov 2016 03:31:32 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

No I am using 100 threads.

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 2:00 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write

>I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.

Is your Java program single threaded?

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting, Zero producti

RE: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Vladimir Yudovin

>I have a list with 1cr record. I am just iterating on it and executing the 
query. Also, I try with 200 thread

Do you fetch each list item and put it to separate thread to perform CQL query? 
Also how exactly do you connect to Cassandra?

If you use synchronous API so it's better to create connection pool (with 
TokenAwarePolicy each) and then pass each item to separate thread.

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting

 On Wed, 23 Nov 2016 04:23:13 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

Hi Siddharth,

For me it seems Cassandra side. Because I have a list with 1cr record. I am 
just iterating on it and executing the query.

Also, I try with 200 thread but still speed doesn’t increase that much as 
expected. On grafana write latency is near about 10Ms.

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: siddharth verma [mailto:sidd.verma29.l...@gmail.com] 
 Sent: Wednesday, November 23, 2016 2:23 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra Config as per server hardware for heavy write

Hi Abhishek,

You could check whether you are throttling on client side queries or on 
cassandra side.

You could also use grafana to monitor the cluster as well.

As you said, you are using 100 threads, it can't be sure whether you are 
throttling cassandra cluster to its max limit.

As Benjamin suggested, you could use cassandra stress tool.

Lastly, if after everything( and you are sure, that cassandra seems slow) the 
TPS comes out to be the numbers as you suggested, you could check you schema, 
many rows in one partition key, read queries, read write load, write queries 
with Batch/LWT, compactions running etc.

For checking ONLY cassandra throughput, you could use cassandra-stress with any 
schema of your choice.

Regards

On Wed, Nov 23, 2016 at 2:07 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

So do you see speed write saturation at this number of thread? Does doubling to 
200 bring increase?

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting, Zero production time

 On Wed, 23 Nov 2016 03:31:32 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

No I am using 100 threads.

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 2:00 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write

>I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.

Is your Java program single threaded?

Best regards, Vladimir Yudovin,

Winguzone - Cloud Cassandra Hosting, Zero production time

 On Wed, 23 Nov 2016 03:09:29 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 

Hi Benjamin,

I have 1Cr records in my Java ArrayList and yes I am writing in sync mode. My 
table is as below:

CREATE TABLE XXX_YY_MMS (

date timestamp,

userid text,

time timestamp,

xid text,

addimid text,

advcid bigint,

algo bigint,

alla text,

aud text,

bmid text,

ctyid text,

bid double,

ctxid text,

devipid text,

gmid text,

ip text,

itcid bigint,

iid text,

metid bigint,

osdid text,

paid int,

position text,

pcid bigint,

refurl text,

sec text,

siid bigint,

tmpid bigint,

xforwardedfor text,

PRIMARY KEY (date, userid, time, xid)

) WITH CLUSTERING ORDER BY (userid ASC, time ASC, xid ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';

RE: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Vladimir Yudovin

So do you see speed write saturation at this number of thread? Does doubling to 
200 bring increase?





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, Zero production time






 On Wed, 23 Nov 2016 03:31:32 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 




No I am using 100 threads.

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.


 

From: Vladimir Yudovin [mailto:vla...@winguzone.com] 
 Sent: Wednesday, November 23, 2016 2:00 PM
 To: user <user@cassandra.apache.org>
 Subject: RE: Cassandra Config as per server hardware for heavy write


 

>I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.


Is your Java program single threaded?


 


Best regards, Vladimir Yudovin,


Winguzone - Cloud Cassandra Hosting, Zero production time



 


 


 On Wed, 23 Nov 2016 03:09:29 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 



 


Hi Benjamin,

 

I have 1Cr records in my Java ArrayList and yes I am writing in sync mode. My 
table is as below:

 

CREATE TABLE XXX_YY_MMS (

date timestamp,

userid text,

time timestamp,

xid text,

addimid text,

advcid bigint,

algo bigint,

alla text,

aud text,

bmid text,

ctyid text,

bid double,

ctxid text,

devipid text,

gmid text,

ip text,

itcid bigint,

iid text,

metid bigint,

osdid text,

paid int,

position text,

pcid bigint,

refurl text,

sec text,

siid bigint,

tmpid bigint,

xforwardedfor text,

PRIMARY KEY (date, userid, time, xid)

) WITH CLUSTERING ORDER BY (userid ASC, time ASC, xid ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';

 

So please let me know what I miss?

 

And for this hardware below config is fine?

 

concurrent_reads: 32

concurrent_writes: 64

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 32

concurrent_compactors: 8

 

thanks,

Abhishek

 

From: Benjamin Roth [mailto:benjamin.r...@jaumo.com] 
 Sent: Wednesday, November 23, 2016 12:56 PM
 To:  user@cassandra.apache.org
 Subject: Re: Cassandra Config as per server hardware for heavy write
 

This is ridiculously slow for that hardware setup. Sounds like you benchmark 
with a single thread and / or sync queries or very large writes.

A setup like this should be easily able to handle tens of thousands of writes / 
s



 

2016-11-23 8:02 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

How are you benchmarking that?

On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote:


Hi,

 

I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40 
Cores and 8 SSD. Currently I have below config in Cassandra.yaml:

 

concurrent_reads: 32

concurrent_writes: 64

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 32

concurrent_compactors: 8

 

With this configuration, I can write 1700 Request/Sec per server.

 

But our desired write performance is 3000-4000 Request/Sec per server. As per 
my Understanding Max value for these parameters can be as below:

concurrent_reads: 32

concurrent_writes: 128(8*16 Corew)

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 128

concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for this)

 

Please let me know this is fine or I need to tune some other parameters for 
speedup write.

 

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

 


Education gets Exciting with IIM Kozhikode Executive Post Graduate Programme in 
Management - 2 years (AMBA accredited with full benefits of IIMK Alumni 
status). Brought to you by IIMK in association with TSW, an Executive Education 
initiative from The T

RE: Cassandra Config as per server hardware for heavy write

2016-11-23 Thread Vladimir Yudovin

>I have 1Cr records in my Java ArrayList and yes I am writing in sync mode.

Is your Java program single threaded?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, Zero production time






 On Wed, 23 Nov 2016 03:09:29 -0500Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote 




Hi Benjamin,

 

I have 1Cr records in my Java ArrayList and yes I am writing in sync mode. My 
table is as below:

 

CREATE TABLE XXX_YY_MMS (

date timestamp,

userid text,

time timestamp,

xid text,

addimid text,

advcid bigint,

algo bigint,

alla text,

aud text,

bmid text,

ctyid text,

bid double,

ctxid text,

devipid text,

gmid text,

ip text,

itcid bigint,

iid text,

metid bigint,

osdid text,

paid int,

position text,

pcid bigint,

refurl text,

sec text,

siid bigint,

tmpid bigint,

xforwardedfor text,

PRIMARY KEY (date, userid, time, xid)

) WITH CLUSTERING ORDER BY (userid ASC, time ASC, xid ASC)

AND bloom_filter_fp_chance = 0.01

AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'

AND comment = ''

AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}

AND compression = {'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99.0PERCENTILE';

 

So please let me know what I miss?

 

And for this hardware below config is fine?

 

concurrent_reads: 32

concurrent_writes: 64

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 32

concurrent_compactors: 8

 

thanks,

Abhishek

 

From: Benjamin Roth [mailto:benjamin.r...@jaumo.com] 
 Sent: Wednesday, November 23, 2016 12:56 PM
 To: user@cassandra.apache.org
 Subject: Re: Cassandra Config as per server hardware for heavy write
 

This is ridiculously slow for that hardware setup. Sounds like you benchmark 
with a single thread and / or sync queries or very large writes.

A setup like this should be easily able to handle tens of thousands of writes / 
s



 

2016-11-23 8:02 GMT+01:00 Jonathan Haddad <j...@jonhaddad.com>:

How are you benchmarking that?

On Tue, Nov 22, 2016 at 9:16 PM Abhishek Kumar Maheshwari 
<abhishek.maheshw...@timesinternet.in> wrote:


Hi,

 

I have 8 servers in my Cassandra Cluster. Each server has 64 GB ram and 40 
Cores and 8 SSD. Currently I have below config in Cassandra.yaml:

 

concurrent_reads: 32

concurrent_writes: 64

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 32

concurrent_compactors: 8

 

With this configuration, I can write 1700 Request/Sec per server.

 

But our desired write performance is 3000-4000 Request/Sec per server. As per 
my Understanding Max value for these parameters can be as below:

concurrent_reads: 32

concurrent_writes: 128(8*16 Corew)

concurrent_counter_writes: 32

compaction_throughput_mb_per_sec: 128

concurrent_compactors: 8 or 16 (as I have 8 SSD and 16 core reserve for this)

 

Please let me know this is fine or I need to tune some other parameters for 
speedup write.

 

 

Thanks & Regards,
 Abhishek Kumar Maheshwari
 +91- 805591 (Mobile)
Times Internet Ltd. | A Times of India Group Company

FC - 6, Sector 16A, Film City,  Noida,  U.P. 201301 | INDIA

P Please do not print this email unless it is absolutely necessary. Spread 
environmental awareness.

 


Education gets Exciting with IIM Kozhikode Executive Post Graduate Programme in 
Management - 2 years (AMBA accredited with full benefits of IIMK Alumni 
status). Brought to you by IIMK in association with TSW, an Executive Education 
initiative from The Times of India Group. Learn more:  www.timestsw.com












 


--



Benjamin Roth

Prokurist



Jaumo GmbH · www.jaumo.com

Wehrstraße 46 · 73035 Göppingen · Germany

Phone +49 7161 304880-6 · Fax +49 7161 304880-1

AG Ulm · HRB 731058 · Managing Director: Jens Kammerer

Re: How to Choose a Version for Upgrade

2016-11-23 Thread Vladimir Yudovin

Hi Shalom,



there are a lot of discussion on this topic, but it seems that for know we can 
call 3.0.xx line as most stable. If you don't need specific feature from 3.x 
line take 3.0.10.





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 23 Nov 2016 03:14:37 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 




Hi Everyone, 



I was wondering how to choose the proper, most stable Cassandra version for a 
Production environment. 

Should I follow the version that's used in Datastax Enterprise (in this case 
3.0.10) or is there a better way of figuring this out?



Thanks!



 


 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: single instance failover

2016-11-22 Thread Vladimir Yudovin

Sorry, probably I didn't catch your setup fully.

Would you like to use shared data folder for both nodes, assuming you never run 
two Cassandra process simultaneously?

Well, I guess it's possible. Running two Cassandra instances on the same data 
folder together won't work, so prevent this situation, may be with some sort of 
file locking.

>multinode Cassandra for Node B is not free

Sure, but besides higher reliability you also get increase in read queries 
speed (with consistency ONE).

Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, Zero production time

 On Tue, 22 Nov 2016 14:28:33 -0500Lou DeGenaro 
<lou.degen...@gmail.com> wrote 

Yes, change rpc_address to node B.

Immutability aside, if Node A Cassandra and Node B Cassandra are using the same 
directory on the same shared filesystem, let's call it 
/cassandra/state/database, would that not be a problem?  Or said differently, 
does not Node A need its own writable place /cassandra/state/database/nodeA and 
likewise /cassandra/state/database/nodeB for Node B's writable place?

Multinode Cassandra may not always be available due to resource constraints.  
Presumably multinode Cassandra for Node B is not free: it takes up network, 
cpu, and replicated disk space, no?

Lou.

On 2016-11-22 11:10 (-0500), Vladimir Yudovin <v...@winguzone.com> wrote: 

> Hi Lou,> 

> 

> 

> 

> do you mean you set rpc_address (or broadcast_rpc_address) to Node_B_IP on 
second machine?> 

> 

> 

> 

> &gt;there would be potential database corruption, no?> 

> 

> Well, so SSTables are immutable, it can lead to unpredictable behavior, I 
guess. I don't believe anybody tested such setup before.> 

> 

> 

> 

> &gt;Is there any guidance on single instance failover?> 

> 

> I never saw one, the main Casandra idea that you build multinode 
cluster.> 

>%

Re: single instance failover

2016-11-22 Thread Vladimir Yudovin

Hi Lou,



do you mean you set  rpc_address (or broadcast_rpc_address) to Node_B_IP on 
second machine?



>there would be potential database corruption, no?

Well, so SSTables are immutable, it can lead to unpredictable behavior, I 
guess. I don't believe anybody tested such setup before.



>Is there any guidance on single instance failover?

I never saw one, the main Casandra idea that you build multinode cluster.



Any specific reason why can't you use two nodes as single cluster? 



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting, zero production time.






 On Tue, 22 Nov 2016 09:25:52 -0500Lou DeGenaro 
<lou.degen...@gmail.com> wrote 




We use a single instance of Cassandra on Node A that employs a shared file 
system to keep its data and logs.


Let's say we want to fail-over to Node B, by editing the yaml file by changing 
Node A to Node B.  If we now (mistakenly) bring up Cassandra on Node B whilst 
the Cassandra on Node A is still running, there would be potential database 
corruption, no?


Is there any guidance on single instance failover?


Thanks.


Lou.

RE: cassandra documentation (Multiple datacenter write requests) question

2016-11-22 Thread Vladimir Yudovin

Is Apache Cassandra community can update this documentation ?

I don't think so, it's hosted on DataStax website and it's not public Wiki.

Anyway you know what is right quorum calculation formula is ))).

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Tue, 22 Nov 2016 09:01:32 -0500CHAUMIER, RAPHAËL
<racha...@bouyguestelecom.fr> <racha...@bouyguestelecom.fr> wrote

Thank you Hannu,

Is Apache Cassandra community can update this documentation ?

De : Hannu Kröger [mailto:hkro...@gmail.com]
Envoyé : mardi 22 novembre 2016 14:48
À : user@cassandra.apache.org
Objet : Re: cassandra documentation (Multiple datacenter write requests)
question

Looks like the graph is wrong.

Hannu

On 22 Nov 2016, at 15.43, CHAUMIER, RAPHAËL <racha...@bouyguestelecom.fr>
wrote:

Hello everyone,

I don’t know if you have access to DataStax documentation. I don’t understand
the example about Multiple datacenter write requests
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlClientRequestsMultiDCWrites.html).
The graph shows there’s 3 nodes making up of QUORUM, but based on the quorum
computation rule
(http://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html#dmlConfigConsistency__about-the-quorum-level)

quorum = (sum_of_replication_factors / 2) + 1

sum_of_replication_factors = datacenter1_RF + datacenter2_RF + . . . +
datacentern_RF

If I have 2 DC of 3 replica nodes so the quorum should be = ( 3+3 /2) +1 =
(6/2) + 1 = 3 + 1 = 4

Am I missing something ?

Thanks for your response.

Regards,

L'intégrité de ce message n'étant pas assurée sur internet, la société
expéditrice ne peut être tenue responsable de son contenu ni de ses pièces
jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous
n'êtes pas destinataire de ce message, merci de le détruire et d'avertir
l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The
company that sent this message cannot therefore be held liable for its content
nor attachments. Any unauthorized use or dissemination is prohibited. If you
are not the intended recipient of this message, then please delete it and
notify the sender.

Re: Cassandra Encryption

2016-11-22 Thread Vladimir Yudovin

>if I use the same certificate how does it helps?

This certificate will be recognized by all existing nodes, and no restart will
be needed.

Or, as Nate suggested, you can use trusted root certificate to issue nodes'
certificates.

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Tue, 22 Nov 2016 03:07:28 -0500Jai Bheemsen Rao Dhanwada
<jaibheem...@gmail.com> wrote

yes, I am generating separate certificate for each node.

even if I use the same certificate how does it helps?

On Mon, Nov 21, 2016 at 9:02 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Hi Jai,

so do you generate separate certificate for each node? Why not use one
certificate for all nodes?

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 17:25:11 -0500Jai Bheemsen Rao Dhanwada
<jaibheem...@gmail.com> wrote

Hello,

I am setting up encryption on one of my cassandra cluster using the below
procedure.

server_encryption_options:

internode_encryption: all

keystore: /etc/keystore

keystore_password: x

truststore: /etc/truststore

truststore_password: x

http://docs.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore

However, one difficulty with this approach is whenever I am adding a new node I
had to rolling restart all the C* nodes in the cluster, so that the truststore
is updated with the new server information.

Is there a way to automatically trigger a reload so that the truststore is
updated on the existing machines without restart.

Can someone please help ?

Re: Cassandra Encryption

2016-11-21 Thread Vladimir Yudovin

Hi Jai,



so do you generate separate certificate for each node? Why not use one 
certificate for all nodes?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 17:25:11 -0500Jai Bheemsen Rao Dhanwada 
<jaibheem...@gmail.com> wrote 




Hello,



I am setting up encryption on one of my cassandra cluster using the below 
procedure.



server_encryption_options:

internode_encryption: all

keystore: /etc/keystore

keystore_password: x

truststore: /etc/truststore

truststore_password: x




http://docs.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore



However, one difficulty with this approach is whenever I am adding a new node I 
had to rolling restart all the C* nodes in the cluster, so that the truststore 
is updated with the new server information.



Is there a way to automatically trigger a reload so that the truststore is 
updated on the existing machines without restart.



Can someone please help ?

Re: NoHostAvailableException

2016-11-21 Thread Vladimir Yudovin

Hi,

as I mentioned about rpc_address: 0.0.0.0 YAML says

it is allowed to specify 0.0.0.0 ... but that will break clients that rely on
node auto-discovery.

Try set rpc_address: external_ip

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 11:52:31 -0500techpyaasa .
<techpya...@gmail.com> wrote

Hi Vladimir,

I have attached cassandra.yaml we have in our setup, please check once.

- do you have native port 9042 open in firewall ?
Yes, 9042 is opened on our firewall, checked with our team

- Can you connect to cluster with cqlsh?

Yes, Im able to connect cluster using cqlsh.

What else could be issue? :(

On Mon, Nov 21, 2016 at 7:23 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Yaml in 2.0.17 says

# The address to bind the Thrift RPC service and native transport

# server -- clients connect here.

# Leaving this blank has the same effect it does for ListenAddress,

# (i.e. it will be based on the configured hostname of the node).

# Note that unlike ListenAddress above, it is allowed to specify 0.0.0.0

# here if you want to listen on all interfaces, but that will break clients

# that rely on node auto-discovery.

# For security reasons, you should not expose this port to the internet.
Firewall it if needed.

rpc_address: localhost

# port for Thrift to listen for clients on

rpc_port: 9160

So probably rpc_address: 0.0.0.0 is a problem. Also do you have native port
9042 open in firewall (if there is one).

Can you connect to cluster with cqlsh?

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 08:26:54 -0500techpyaasa .
<techpya...@gmail.com> wrote

Sorry it was typo..

It is broadcast_address and not broadcast_rpc_address.
And also there is no such configuration in cass.yaml with broadcast_rpc_address
in c*-2.0.17.
Very sorry once again.

This is configrn I have in cass.yaml

listen_address: [external IP]

# Address to broadcast to other Cassandra nodes

# Leaving this blank will set it to the same value as listen_address

# broadcast_address: 1.2.3.4 #It is commented, I have not made any changes for
it

rpc_address: 0.0.0.0
rpc_port: 9160

Thanks
TechPyaasa

On Mon, Nov 21, 2016 at 6:48 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Not broadcast_address, but broadcast_rpc_address (you gave this
example:rpc_address: 0.0.0.0 , broadcast_rpc_address: 1.2.3.4)

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 08:14:38 -0500techpyaasa .
<techpya...@gmail.com> wrote

Hi Vladimir,

I have not modified anything for broadcasr_address, I left as it was..

# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4

So the comment above says "Leaving this blank will set it to the same value as
listen_address" , so it shud set as listen_address and I have set
listen_address as its external IPs for all nodes..

So I guess that should not be a problem... :(

What else could be the issue...?? :( :(

On Mon, Nov 21, 2016 at 4:21 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Try to set broadcast_rpc_address on each node to its real external IP address.

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 05:47:00 -0500techpyaasa .
<techpya...@gmail.com> wrote

Following exception intermittently thrown by datastax java driver though all
nodes are up.(Happening for both read & write queries)

"Exception com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (no host was tried) at
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
at
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at"

Using c*-2.0.17 , datastax java driver - cassandra-driver-core-2.1.8.jar.

In cassandra.yaml following were set
rpc_address: 0.0.0.0 , broadcast_rpc_address: 1.2.3.4

Have anyone faced such issue ? What could be the reason and fix for it?

Thanks in advance

Techpyaasa.

Re: NoHostAvailableException

2016-11-21 Thread Vladimir Yudovin

Yaml in 2.0.17 says

# The address to bind the Thrift RPC service and native transport

# server -- clients connect here.

# Leaving this blank has the same effect it does for ListenAddress,

# (i.e. it will be based on the configured hostname of the node).

# Note that unlike ListenAddress above, it is allowed to specify 0.0.0.0

# here if you want to listen on all interfaces, but that will break clients

# that rely on node auto-discovery.

# For security reasons, you should not expose this port to the internet.
Firewall it if needed.

rpc_address: localhost

# port for Thrift to listen for clients on

rpc_port: 9160

So probably rpc_address: 0.0.0.0 is a problem. Also do you have native port
9042 open in firewall (if there is one).

Can you connect to cluster with cqlsh?

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 08:26:54 -0500techpyaasa .
<techpya...@gmail.com> wrote

Sorry it was typo..

It is broadcast_address and not broadcast_rpc_address.
And also there is no such configuration in cass.yaml with broadcast_rpc_address
in c*-2.0.17.
Very sorry once again.

This is configrn I have in cass.yaml

listen_address: [external IP]

# Address to broadcast to other Cassandra nodes

# Leaving this blank will set it to the same value as listen_address

# broadcast_address: 1.2.3.4 #It is commented, I have not made any changes for
it

rpc_address: 0.0.0.0
rpc_port: 9160

Thanks
TechPyaasa

On Mon, Nov 21, 2016 at 6:48 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Not broadcast_address, but broadcast_rpc_address (you gave this
example:rpc_address: 0.0.0.0 , broadcast_rpc_address: 1.2.3.4)

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 08:14:38 -0500techpyaasa .
<techpya...@gmail.com> wrote

Hi Vladimir,

I have not modified anything for broadcasr_address, I left as it was..

# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4

So the comment above says "Leaving this blank will set it to the same value as
listen_address" , so it shud set as listen_address and I have set
listen_address as its external IPs for all nodes..

So I guess that should not be a problem... :(

What else could be the issue...?? :( :(

On Mon, Nov 21, 2016 at 4:21 PM, Vladimir Yudovin <vla...@winguzone.com>
wrote:

Try to set broadcast_rpc_address on each node to its real external IP address.

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Mon, 21 Nov 2016 05:47:00 -0500techpyaasa .
<techpya...@gmail.com> wrote

Following exception intermittently thrown by datastax java driver though all
nodes are up.(Happening for both read & write queries)

Using c*-2.0.17 , datastax java driver - cassandra-driver-core-2.1.8.jar.

In cassandra.yaml following were set
rpc_address: 0.0.0.0 , broadcast_rpc_address: 1.2.3.4

Have anyone faced such issue ? What could be the reason and fix for it?

Thanks in advance

Techpyaasa.

Re: NoHostAvailableException

2016-11-21 Thread Vladimir Yudovin

Not broadcast_address, but broadcast_rpc_address  (you gave this 
example:rpc_address: 0.0.0.0  , broadcast_rpc_address: 1.2.3.4)





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 08:14:38 -0500techpyaasa . 
<techpya...@gmail.com> wrote 




Hi Vladimir,



I have not modified anything for broadcasr_address, I left as it was..


# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4



So the  comment above says "Leaving this blank will set it to the same value as 
listen_address" , so it shud set as listen_address and I have set 
listen_address as its external IPs for all nodes..

So I guess that should not be a problem... :(



What else could be the issue...??  :( :(





On Mon, Nov 21, 2016 at 4:21 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:








Try to set broadcast_rpc_address on each node to its real external IP address.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 05:47:00 -0500techpyaasa . 
<techpya...@gmail.com> wrote 




Following exception intermittently thrown by datastax java driver though all 
nodes are up.(Happening for both read & write queries)

"Exception com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (no host was tried) at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
 at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
 at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) 
at"

Using c*-2.0.17 , datastax java driver - cassandra-driver-core-2.1.8.jar.

In cassandra.yaml following were set 
rpc_address: 0.0.0.0  , broadcast_rpc_address: 1.2.3.4

Have anyone faced such issue ? What could be the reason and fix for it?

Thanks in advance


Techpyaasa.

Re: Out of memory and/or OOM kill on a cluster

2016-11-21 Thread Vladimir Yudovin

Did you try any value in the range 8-20 (e.g. 60-70% of physical memory).

Also how many tables do you have across all keyspaces? Each table can consume 
minimum 1M of Java heap.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 05:13:12 -0500Vincent Rischmann 
<m...@vrischmann.me> wrote 




Hello,



we have a 8 node Cassandra 2.1.15 cluster at work which is giving us a lot of 
trouble lately.



The problem is simple: nodes regularly die because of an out of memory 
exception or the Linux OOM killer decides to kill the process.

For a couple of weeks now we increased the heap to 20Gb hoping it would solve 
the out of memory errors, but in fact it didn't; instead of getting out of 
memory exception the OOM killer killed the JVM.



We reduced the heap on some nodes to 8Gb to see if it would work better, but 
some nodes crashed again with out of memory exception.



I suspect some of our tables are badly modelled, which would cause Cassandra to 
allocate a lot of data, however I don't how to prove that and/or find which 
table is bad, and which query is responsible.



I tried looking at metrics in JMX, and tried profiling using mission control 
but it didn't really help; it's possible I missed it because I have no idea 
what to look for exactly.



Anyone have some advice for troubleshooting this ?



Thanks.

Re: NoHostAvailableException

2016-11-21 Thread Vladimir Yudovin

Try to set broadcast_rpc_address on each node to its real external IP address.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 21 Nov 2016 05:47:00 -0500techpyaasa . 
<techpya...@gmail.com> wrote 




Following exception intermittently thrown by datastax java driver though all 
nodes are up.(Happening for both read & write queries)

"Exception com.datastax.driver.core.exceptions.NoHostAvailableException: All 
host(s) tried for query failed (no host was tried) at 
com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
 at 
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
 at 
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:214)
 at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) 
at"

Using c*-2.0.17 , datastax java driver - cassandra-driver-core-2.1.8.jar.

In cassandra.yaml following were set 
rpc_address: 0.0.0.0  , broadcast_rpc_address: 1.2.3.4

Have anyone faced such issue ? What could be the reason and fix for it?

Thanks in advance


Techpyaasa.

Re: data not replicated on new node

2016-11-20 Thread Vladimir Yudovin

>try SELECT for some undoubtedly old data with consistency ALL

It's worth to turn trace on for this query.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Mon, 21 Nov 2016 02:05:35 -0500Vladimir Yudovin 
<vla...@winguzone.com> wrote 

>can the node "talk" with the others? (i.e. telnet to the other nodes on 
port 7000). 

According to nodetool status all nodes are joined and UP, so they seems to be 
able talk one with other (though there still can be connectivity issues). Are 
all nodes on internal network (e.g. 10.x.x.x) in the same locations?

>I suspect that ... the old data stored on the first 2 nodes are not 
replicated on the new node.

Can you try SELECT for some undoubtedly old data with consistency ALL (just one 
raw is enough)?

Also the whole event sequence are not clear: you had two nodes with data, then 
added third. When did you change replication factor for existing keyspace? Or 
it was created with factor three?

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Mon, 21 Nov 2016 01:28:54 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 

I believe the logs should show you what the issue is. 

Also, can the node "talk" with the others? (i.e. telnet to the other nodes on 
port 7000). 

Shalom Sagges

DBA

T: +972-74-700-4035

 We Create Meaningful Connections

On Sun, Nov 20, 2016 at 8:50 PM, Bertrand Brelier 
<bertrand.brel...@gmail.com> wrote:

This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Hello Jonathan,

No, the new node is not a seed in my cluster. 

When I ran nodetool bootstrap resume

Node is already bootstrapped.

Cheers,

Bertrand

On Sun, Nov 20, 2016 at 1:43 PM, Jonathan Haddad <j...@jonhaddad.com> 
wrote:

Did you add the new node as a seed? If you did, it wouldn't bootstrap, and you 
should run repair. 

On Sun, Nov 20, 2016 at 10:36 AM Bertrand Brelier 
<bertrand.brel...@gmail.com> wrote:

Hello everybody,

I am using a 3-node Cassandra cluster with Cassandra 3.0.10.

I recently added a new node (to make it a 3-node cluster).

I am using a replication factor of 3 , so I expected to have a copy of

the same data on each node :

CREATE KEYSPACE mydata WITH replication = {'class': 'SimpleStrategy',

'replication_factor': '3'}  AND durable_writes = true;

But the new node has  less data that the other 2 :

Datacenter: datacenter1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address   Load   Tokens   Owns (effective)  Host

ID   Rack

UN  XXX.XXX.XXX.XXX  53.28 GB   256  100.0% xx  rack1

UN  XXX.XXX.XXX.XXX  64.7 GB256  100.0% xx  rack1

UN  XXX.XXX.XXX.XXX  1.28 GB256  100.0% xx  rack1

On the new node :

/XX/data-6d674a40efab11e5b67e6d75503d5d02/:

total 1.2G

on one of the old nodes :

/XX/data-6d674a40efab11e5b67e6d75503d5d02/:

total 52G

I am monitoring the amount of data on each node, and they grow at the

same rate. So I suspect that my new data are replicated on the 3 nodes

but the old data stored on the first 2 nodes are not replicated on the

new node.

I ran nodetool repair (on each node, one at a time), but the new node

still does not have a copy of the old data.

Could you please help me understand why the old data is not replicated

to the new node ? Please let me know if you need further information.

Thank you,

Cheers,

Bertrand

Re: data not replicated on new node

2016-11-20 Thread Vladimir Yudovin

>can the node "talk" with the others? (i.e. telnet to the other nodes on 
port 7000). 

According to nodetool status all nodes are joined and UP, so they seems to be 
able talk one with other (though there still can be connectivity issues). Are 
all nodes on internal network (e.g. 10.x.x.x) in the same locations?

>I suspect that ... the old data stored on the first 2 nodes are not 
replicated on the new node.

Can you try SELECT for some undoubtedly old data with consistency ALL (just one 
raw is enough)?

Also the whole event sequence are not clear: you had two nodes with data, then 
added third. When did you change replication factor for existing keyspace? Or 
it was created with factor three?

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Mon, 21 Nov 2016 01:28:54 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 

I believe the logs should show you what the issue is. 

Also, can the node "talk" with the others? (i.e. telnet to the other nodes on 
port 7000). 

Shalom Sagges

DBA

T: +972-74-700-4035

 We Create Meaningful Connections

On Sun, Nov 20, 2016 at 8:50 PM, Bertrand Brelier 
<bertrand.brel...@gmail.com> wrote:

This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Hello Jonathan,

No, the new node is not a seed in my cluster. 

When I ran nodetool bootstrap resume

Node is already bootstrapped.

Cheers,

Bertrand

On Sun, Nov 20, 2016 at 1:43 PM, Jonathan Haddad <j...@jonhaddad.com> 
wrote:

Did you add the new node as a seed? If you did, it wouldn't bootstrap, and you 
should run repair. 

On Sun, Nov 20, 2016 at 10:36 AM Bertrand Brelier 
<bertrand.brel...@gmail.com> wrote:

Hello everybody,

 I am using a 3-node Cassandra cluster with Cassandra 3.0.10.

 I recently added a new node (to make it a 3-node cluster).

 I am using a replication factor of 3 , so I expected to have a copy of

 the same data on each node :

 CREATE KEYSPACE mydata WITH replication = {'class': 'SimpleStrategy',

 'replication_factor': '3'}  AND durable_writes = true;

 But the new node has  less data that the other 2 :

 Datacenter: datacenter1

 ===

 Status=Up/Down

 |/ State=Normal/Leaving/Joining/Moving

 --  Address   Load   Tokens   Owns (effective)  Host

 ID   Rack

 UN  XXX.XXX.XXX.XXX  53.28 GB   256  100.0% xx  rack1

 UN  XXX.XXX.XXX.XXX  64.7 GB256  100.0% xx  rack1

 UN  XXX.XXX.XXX.XXX  1.28 GB256  100.0% xx  rack1

 On the new node :

 /XX/data-6d674a40efab11e5b67e6d75503d5d02/:

 total 1.2G

 on one of the old nodes :

 /XX/data-6d674a40efab11e5b67e6d75503d5d02/:

 total 52G

 I am monitoring the amount of data on each node, and they grow at the

 same rate. So I suspect that my new data are replicated on the 3 nodes

 but the old data stored on the first 2 nodes are not replicated on the

 new node.

 I ran nodetool repair (on each node, one at a time), but the new node

 still does not have a copy of the old data.

 Could you please help me understand why the old data is not replicated

 to the new node ? Please let me know if you need further information.

 Thank you,

 Cheers,

 Bertrand

Re: Is Centos 7 Supported for Version 3.0

2016-11-20 Thread Vladimir Yudovin

Hi,



Centos 7 has Java 8 available, so there shouldn't be any problem to run 
Casandra.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 20 Nov 2016 11:14:07 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 




Hi Guys, 



A simple question for which I couldn't find an answer in the docs. 

Is Centos 7 supported on DataStax Community Edition v3.0.9?



Thanks!



 


 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: [RELEASE] Apache Cassandra 3.0.10 released

2016-11-17 Thread Vladimir Yudovin

>My question was about a different option named "offheap_objects".

Sorry.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Thu, 17 Nov 2016 07:56:10 -0500Oleksandr Shulgin 
<oleksandr.shul...@zalando.de> wrote 

On Thu, Nov 17, 2016 at 1:05 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

Hi,

>Does this mean that offheap_objects is still available or that there is no 
longer support for offheap memtables in version 3.0?

If you  set offheap_buffers in cassandra.yaml in 3.0.10, you'll get exception 

offheap_buffers are not available in 3.0. They will be re-introduced in a 
future release, see https://issues.apache.org/jira/browse/CASSANDRA-9472 for 
details

on Cassandra start.

Thank you.  My question was about a different option named "offheap_objects".  
I've already figured out that it was removed even earlier with the release of 
3.0.

--

Alex

Re: [RELEASE] Apache Cassandra 3.0.10 released

2016-11-17 Thread Vladimir Yudovin

Hi,

>Does this mean that offheap_objects is still available or that there is no 
longer support for offheap memtables in version 3.0?

If you  set offheap_buffers in cassandra.yaml in 3.0.10, you'll get exception 

offheap_buffers are not available in 3.0. They will be re-introduced in a 
future release, see https://issues.apache.org/jira/browse/CASSANDRA-9472 for 
details

on Cassandra start.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Thu, 17 Nov 2016 06:02:15 -0500Oleksandr Shulgin 
<oleksandr.shul...@zalando.de> wrote 

On Wed, Nov 16, 2016 at 9:17 PM, Michael Shuler <mich...@pbandjelly.org> 
wrote:

>

> The Cassandra team is pleased to announce the release of Apache

> Cassandra version 3.0.10.

>

> Apache Cassandra is a fully distributed database. It is the right choice

> when you need scalability and high availability without compromising

> performance.

>

>  http://cassandra.apache.org/

>

> Downloads of source and binary distributions are listed in our download

> section:

>

>  http://cassandra.apache.org/download/

>

> This version is a bug fix release[1] on the 3.0 series. As always,

> please pay attention to the release notes[2] and Let us know[3] if you

> were to encounter any problem.

Hello,

>From the NEWS file:

3.0.10

=

Upgrading

-

   - memtable_allocation_type: offheap_buffers is no longer allowed to be 
specified in the 3.0 series.

 This was an oversight that can cause segfaults. Offheap was re-introduced 
in 3.4 see CASSANDRA-11039

 and CASSANDRA-9472 for details.

Does this mean that offheap_objects is still available or that there is no 
longer support for offheap memtables in version 3.0?

--

Alex

Re: Some questions to updating and tombstone

2016-11-14 Thread Vladimir Yudovin

Hi Boying,



UPDATE write new value with new time stamp. Old value is not tombstone, but 
remains until compaction. gc_grace_period is not related to this.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying <boying...@dell.com> 
wrote 




Hi, All,

 

Will the Cassandra generates a new tombstone when updating a column by using 
CQL update statement?

 

And is there any way to get the number of tombstones of a column family since 
we want to void generating

too many tombstones within gc_grace_period?

 

Thanks

 

Boying

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Vladimir Yudovin

If I used consistency = ALL both when getting the record, and when saving the 
record, will that avoid the race condition?

If I use consistency level = all, will that cause it to end up with [1,2]?

No. Even if you have only one host it's possible that two threads first both 
read data and than overwrite existing value one by one.



The list is actually of a list<frozen<my_udt>> and not a text (I 
used text for simplification, apologies).

In that case, will updates still merge the list values instead of overwriting 
them?

Do you mean UPDATE cql operation? Yes, it adds new values to list, allowing 
duplicates.



When setting a new value to a list, C* will do a read-delete-write internally 
e.g. read the current list, remove all its value (by a range tombstone) and 
then write the new list.

As I mentioned duplicates are allowed in LIST, and as DOC says:

These update operations are implemented internally without any 
read-before-write. Appending and prepending a new element to the list writes 
only the new element.


Only when using index

When you add an element at a particular position, Cassandra reads the entire 
list, and then writes only the updated element. Consequently, adding an element 
at a particular position results in greater latency than appending or prefixing 
an element to a list.




Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sat, 12 Nov 2016 07:57:36 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




The labels collection is of the type set<frozen<label>> , where 
label is a udt containing: id, name, description , all text fields.



On Sat, Nov 12, 2016 at 5:54 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:






The problem isn't just the update / insert though, right? Don't frozen entities 
get overwritten completely? So if I had [1] [2] being written as updates, won't 
each update overwrite the set completely, so i'll end up with either one of 
them instead of [1,2]?



On Sat, Nov 12, 2016 at 5:50 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

Maybe you should use my Achilles mapper, which does generates UPDATE statements 
on collections and not only INSERT

Le 12 nov. 2016 13:08, "Ali Akhtar" <ali.rac...@gmail.com> a écrit :

I am using the Java Cassandra mapper for all of these cases, so my code looks 
like this:



Item myItem = myaccessor.get( itemId );

Mapper<Item> mapper = mappingManager.create( Item.class );



myItem.labels.add( newLabel );

mapper.save( myItem );




On Sat, Nov 12, 2016 at 5:06 PM, Ali Akhtar <ali.rac...@gmail.com> wrote:

Thanks DuyHai, I will switch to using a set.



But I'm still not sure how to resolve the original question.



- Original labels = []

- Request 1 arrives with label = 1, and request 2 arrives with label = 2

- Updates are sent to c* with labels = [1] and labels = [2] simultaneously.



What will happen in the above case? Will it cause the labels to end up as [1,2] 
(what I want) or either [1] or [2]?



If I use consistency level = all, will that cause it to end up with [1,2]?




On Sat, Nov 12, 2016 at 4:59 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

Don't use list, use set instead. If you need ordering of insertion, use a 
map<timeuuid,text> where timeuuid is generated by the client to guarantee 
insertion order



When setting a new value to a list, C* will do a read-delete-write internally 
e.g. read the current list, remove all its value (by a range tombstone) and 
then write the new list. Please note that prepend & append operations on 
list do not require this read-delete-write and thus performs slightly better




On Sat, Nov 12, 2016 at 11:34 AM, Ali Akhtar <ali.rac...@gmail.com> wrote:

I have a table where each record contains a list<string> of labels.



I have an endpoint which responds to new labels being added to a record by the 
user.



Consider the following scenario:



- Record X, labels = []

- User selects 2 labels, clicks a button, and 2 http requests are generated.

- The server receives request for Label 1 and Label 2 at the same time.

- Both requests see the labels as empty, add 1 label to the collection, and 
send it.

- Record state as label 1 request sees it: [1], as label 2 sees it: [2]



How will the above conflict be resolved? What can I do so I end up with [1, 2] 
instead of either [1] or [2] after both requests have been processed?

Re: Consistency when adding data to collections concurrently?

2016-11-12 Thread Vladimir Yudovin

Hi Ali,



>What can I do so I end up with [1, 2] instead of either [1] or [2] after 
both requests have been processed?

Use UPDATE, not INSERT. Thus new labels will be added to list, without 
overwriting old ones. Also consider usage of SET instead of LIST to avoid 
duplicates.  



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sat, 12 Nov 2016 05:34:24 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I have a table where each record contains a list<string> of labels.



I have an endpoint which responds to new labels being added to a record by the 
user.



Consider the following scenario:



- Record X, labels = []

- User selects 2 labels, clicks a button, and 2 http requests are generated.

- The server receives request for Label 1 and Label 2 at the same time.

- Both requests see the labels as empty, add 1 label to the collection, and 
send it.

- Record state as label 1 request sees it: [1], as label 2 sees it: [2]



How will the above conflict be resolved? What can I do so I end up with [1, 2] 
instead of either [1] or [2] after both requests have been processed?

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Vladimir Yudovin

As I said I'm not sure about it, but it will be interesting to check memory 
heap state with any JMX tool, e.g. https://github.com/patric-r/jvmtop



By a way, why Cassandra 2.0.14? It's quit old and unsupported version. Even in 
2.0 branch there is 2.0.17 available.


Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 05:47:37 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 




Thanks for the quick reply Vladimir. 

Is it really possible that ~12,500 writes per second (per node in a 12 nodes 
DC) are caused by memory flushes?










 


 
Shalom Sagges
 
DBA
 
T: +972-74-700-4035
 

 
 
 
 We Create Meaningful Connections
 
 

 

 







On Thu, Nov 10, 2016 at 11:02 AM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.




Hi Shalom,



so not sure, but probably excessive memory consumption by this SELECT causes C* 
to flush tables to free memory. 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 




Hi There!



I'm using C* 2.0.14. 

I experienced a scenario where a "select count(*)" that ran every minute on a 
table with practically no results limit (yes, this should definitely be 
avoided), caused a huge increase in Cassandra writes to around 150 thousand 
writes per second for that particular table.



Can anyone explain this behavior? Why would a Select query significantly 
increase write count in Cassandra?



Thanks!




 
Shalom Sagges
 

 

 
 
 
 We Create Meaningful Connections
 
 

 













This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Can a Select Count(*) Affect Writes in Cassandra?

2016-11-10 Thread Vladimir Yudovin

Hi Shalom,



so not sure, but probably excessive memory consumption by this SELECT causes C* 
to flush tables to free memory. 


Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 10 Nov 2016 03:36:59 -0500Shalom Sagges 
<shal...@liveperson.com> wrote 




Hi There!



I'm using C* 2.0.14. 

I experienced a scenario where a "select count(*)" that ran every minute on a 
table with practically no results limit (yes, this should definitely be 
avoided), caused a huge increase in Cassandra writes to around 150 thousand 
writes per second for that particular table.



Can anyone explain this behavior? Why would a Select query significantly 
increase write count in Cassandra?



Thanks!

 


 
Shalom Sagges
 

 

 
 
 
 We Create Meaningful Connections
 
 

 

 









This message may contain confidential and/or privileged information. 

If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this message 
or any information herein. 

If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: 答复: 答复: A difficult data model with C*

2016-11-10 Thread Vladimir Yudovin

>Do you mean the oldest one should be removed when a new play is added?

Sure. As you described the issue "the last ten items may be adequate for the 
business"



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 09 Nov 2016 20:47:05 -0500Diamond ben 
<diamond@outlook.com> wrote 




The solution maybe work. However, the play list will grow over time and 
somebody maybe has ten thousands that will slow down the query and sort . Do 
you mean the oldest one should be removed when a new play is added?

BTW, the version is 2.1.16 in our live system.



BRs,

BEN




发件人: Vladimir Yudovin <vla...@winguzone.com>
 发送时间: 2016年11月9日 18:11:26
 收件人: user
 主题: Re: 答复: A difficult data model with C* 
 


You are welcome! )



>recent ten movies watched by the user within 30 days.

In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is 
demanded to fetch row, so all this stuff may be

CREATE TYPE play (video_id text, position int, last_time timestamp);

CREATE TABLE recent (user_name text PRIMARY KEY, play_list 
LIST<frozen<play>>);


You can easily retrieve play list for specific user by his ID. Instead of LIST 
you can use MAP, I don't think that for ten entries it matters.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





 On Tue, 08 Nov 2016 22:29:48 -0500ben ben <diamond@outlook.com> 
wrote 




Hi Vladimir Yudovin,



Thank you very much for your detailed explaining. Maybe I didn't describe 
the requirement clearly. The use cases should be:

1. a user login our app.

2. show the recent ten movies watched by the user within 30 days.

3. the user can click any one of the ten movie and continue to watch from the 
last position she/he did. BTW, a movie can be watched several times by a user 
and the last positon is needed indeed.



BRs,

BEN




发件人: Vladimir Yudovin <vla...@winguzone.com>
 发送时间: 2016年11月8日 22:35:48
 收件人: user
 主题: Re: A difficult data model with C*
 


Hi Ben,



if need very limited number of positions (as you said ten) may be you can store 
them in LIST of UDT? Or just as JSON string?

So you'll have one row per each pair user-video. 



It can be something like this:



CREATE TYPE play (position int, last_time timestamp);

CREATE TABLE recent (user_name text, video_id text, review 
LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));



UPDATE recent set review = review + [(1234,12345)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,123456)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,1234567)] where user_name='some 
user' AND video_id='great video';



You can delete the oldest entry by index:

DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great 
video';



or by value, if you know the oldest entry:



UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' 
AND video_id='great video';



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





 On Mon, 07 Nov 2016 21:54:08 -0500ben ben <diamond@outlook.com> 
wrote 






Hi guys,



We are maintaining a system for an on-line video service. ALL users' viewing 
records of every movie are stored in C*. So she/he can continue to enjoy the 
movie from the last point next time. The table is designed as below:

CREATE TABLE recent (

user_name text,

vedio_id text,

position int,

last_time timestamp,

PRIMARY KEY (user_name, vedio_id)

)



It worked well before. However, the records increase every day and the last ten 
items may be adequate for the business. The current model use vedio_id as 
cluster key to keep a row for a movie, but as you know, the business prefer to 
order by the last_time desc. If we use last_time as cluster key, there will be 
many records for a singe movie and the recent one is actually desired. So how 
to model that? Do you have any suggestions? 

Thanks!





BRs,

BEN

Re: Log traces of debug logs

2016-11-09 Thread Vladimir Yudovin

Hi,



you can change log level with nodetool setlogginglevel command



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 09 Nov 2016 10:17:37 -0500Benjamin Roth 
<benjamin.r...@jaumo.com> wrote 




Hi!

Is there a way to tell logback to log the trace of a debug log? The background 
is that i'd like to know from where a table flush is triggered.

Thanks guys!

Re: Having Counters in a Collection, like a map?

2016-11-09 Thread Vladimir Yudovin

>The keys however are dynamic in my case.

Why is it problem for you? As you said "if something related to 5 happened, 
then i'd get the counter for 5 and increment / decrement it."

So do "UPDATE cnt SET value = value + SOMETHING where id = 5;"

If counter for event 5 exists it will be changed, if not - created set to  
initial value.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Wed, 09 Nov 2016 07:52:29 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 

The only issue with the last 2 solutions is, they require knowing the key in 
advance in order to look up the counters.

The keys however are dynamic in my case.

On Wed, Nov 9, 2016 at 5:47 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

"Is there a way to do this in c* which doesn't require creating 1 table per 
type of map<int, counter> that i need?"

You're lucky, it's possible with some tricks

CREATE TABLE my_counters_map (

 partition_key id uuid,

 map_name text,

 map_key int,

 count counter,

 PRIMARY KEY ((id), map_name, map_key)

);

This table can be seen as:

Map <partition_key, SortedMap<map_name, SortddMap<map_key, 
counter>>>

The couple (map_key, counter) simulates your map

The clustering column map_name allows you to have multiple maps of counters for 
a single partition_key

On Wed, Nov 9, 2016 at 1:32 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:

Unfortunately it's impossible nor to use counters inside collections neither 
mix them with other non-counter columns :

CREATE TABLE cnt (id int PRIMARY KEY , cntmap MAP<int,counter>);

InvalidRequest: Error from server: code=2200 [Invalid query] message="Counters 
are not allowed inside collections: map<int, counter>"

CREATE TABLE cnt (id int PRIMARY KEY , cnt1 counter, txt text);

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
mix counter and non counter columns in the same table"

>Is there a way to do this in c* which doesn't require creating 1 table per 
type of map<int, counter> that i need?

But you don't need to create separate table per each counter, just use one row 
per counter:

CREATE TABLE cnt (id int PRIMARY KEY , value counter);

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Wed, 09 Nov 2016 07:17:53 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 

I have a use-case where I need to have a dynamic number of counters.

The easiest way to do this would be to have a map<int, counter> where the 
int is the key, and the counter is the value which is incremented / 
decremented. E.g if something related to 5 happened, then i'd get the counter 
for 5 and increment / decrement it.

I also need to have multiple map<int, counter>s of this type, where each 
int is a key referring to something different.

Is there a way to do this in c* which doesn't require creating 1 table per type 
of map<int, counter> that i need?

Re: Having Counters in a Collection, like a map?

2016-11-09 Thread Vladimir Yudovin

Unfortunately it's impossible nor to use counters inside collections neither 
mix them with other non-counter columns :



CREATE TABLE cnt (id int PRIMARY KEY , cntmap MAP<int,counter>);

InvalidRequest: Error from server: code=2200 [Invalid query] message="Counters 
are not allowed inside collections: map<int, counter>"



CREATE TABLE cnt (id int PRIMARY KEY , cnt1 counter, txt text);

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
mix counter and non counter columns in the same table"





>Is there a way to do this in c* which doesn't require creating 1 table per 
type of map<int, counter> that i need?

But you don't need to create separate table per each counter, just use one row 
per counter:



CREATE TABLE cnt (id int PRIMARY KEY , value counter);



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 09 Nov 2016 07:17:53 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I have a use-case where I need to have a dynamic number of counters.



The easiest way to do this would be to have a map<int, counter> where the 
int is the key, and the counter is the value which is incremented / 
decremented. E.g if something related to 5 happened, then i'd get the counter 
for 5 and increment / decrement it.



I also need to have multiple map<int, counter>s of this type, where each 
int is a key referring to something different.



Is there a way to do this in c* which doesn't require creating 1 table per type 
of map<int, counter> that i need?

Re: 答复: A difficult data model with C*

2016-11-09 Thread Vladimir Yudovin

You are welcome! )



>recent ten movies watched by the user within 30 days.

In this case you can't use PRIMARY KEY (user_name, video_id), as video_id is 
demanded to fetch row, so all this stuff may be

CREATE TYPE play (video_id text, position int, last_time timestamp);

CREATE TABLE recent (user_name text PRIMARY KEY, play_list 
LIST<frozen<play>>);


You can easily retrieve play list for specific user by his ID. Instead of LIST 
you can use MAP, I don't think that for ten entries it matters.




Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 08 Nov 2016 22:29:48 -0500ben ben <diamond@outlook.com> 
wrote 




Hi Vladimir Yudovin,



Thank you very much for your detailed explaining. Maybe I didn't describe 
the requirement clearly. The use cases should be:

1. a user login our app.

2. show the recent ten movies watched by the user within 30 days.

3. the user can click any one of the ten movie and continue to watch from the 
last position she/he did. BTW, a movie can be watched several times by a user 
and the last positon is needed indeed.



BRs,

BEN




发件人: Vladimir Yudovin <vla...@winguzone.com>
 发送时间: 2016年11月8日 22:35:48
 收件人: user
 主题: Re: A difficult data model with C* 
 


Hi Ben,



if need very limited number of positions (as you said ten) may be you can store 
them in LIST of UDT? Or just as JSON string?

So you'll have one row per each pair user-video. 



It can be something like this:



CREATE TYPE play (position int, last_time timestamp);

CREATE TABLE recent (user_name text, video_id text, review 
LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));



UPDATE recent set review = review + [(1234,12345)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,123456)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,1234567)] where user_name='some 
user' AND video_id='great video';



You can delete the oldest entry by index:

DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great 
video';



or by value, if you know the oldest entry:



UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' 
AND video_id='great video';



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
 Launch your cluster in minutes.





 On Mon, 07 Nov 2016 21:54:08 -0500ben ben <diamond@outlook.com> 
wrote 






Hi guys,



We are maintaining a system for an on-line video service. ALL users' viewing 
records of every movie are stored in C*. So she/he can continue to enjoy the 
movie from the last point next time. The table is designed as below:

CREATE TABLE recent (

user_name text,

vedio_id text,

position int,

last_time timestamp,

PRIMARY KEY (user_name, vedio_id)

)



It worked well before. However, the records increase every day and the last ten 
items may be adequate for the business. The current model use vedio_id as 
cluster key to keep a row for a movie, but as you know, the business prefer to 
order by the last_time desc. If we use last_time as cluster key, there will be 
many records for a singe movie and the recent one is actually desired. So how 
to model that? Do you have any suggestions? 

Thanks!





BRs,

BEN

Re: A difficult data model with C*

2016-11-08 Thread Vladimir Yudovin

Hi Ben,



if need very limited number of positions (as you said ten) may be you can store 
them in LIST of UDT? Or just as JSON string?

So you'll have one row per each pair user-video. 



It can be something like this:



CREATE TYPE play (position int, last_time timestamp);

CREATE TABLE recent (user_name text, video_id text, review 
LIST<frozen<play>>, PRIMARY KEY (user_name, video_id));



UPDATE recent set review = review + [(1234,12345)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,123456)] where user_name='some user' 
AND video_id='great video';

UPDATE recent set review = review + [(1234,1234567)] where user_name='some 
user' AND video_id='great video';



You can delete the oldest entry by index:

DELETE review[0] FROM recent WHERE user_name='some user' AND video_id='great 
video';



or by value, if you know the oldest entry:



UPDATE recent SET review = review - [(1234,12345)]  WHERE user_name='some user' 
AND video_id='great video';



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 07 Nov 2016 21:54:08 -0500ben ben <diamond@outlook.com> 
wrote 






Hi guys,

 

 We are maintaining a system for an on-line video service. ALL users' viewing 
records of every movie are stored in C*. So she/he can continue to enjoy the 
movie from the last point next time. The table is designed as below:

 CREATE TABLE recent (

 user_name text,

 vedio_id text,

 position int,

 last_time timestamp,

 PRIMARY KEY (user_name, vedio_id)

 )

 

 It worked well before. However, the records increase every day and the last 
ten items may be adequate for the business. The current model use vedio_id as 
cluster key to keep a row for a movie, but as you know, the business prefer to 
order by the last_time desc. If we use last_time as cluster key, there will be 
many records for a singe movie and the recent one is actually desired. So how 
to model that? Do you have any suggestions? 

 Thanks!

 

 

 BRs,

 BEN

Re: Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Vladimir Yudovin

Yes, as doc says "Expired data is marked with a tombstone" but you save 
communication with host and processing of DELETE operator.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 08 Nov 2016 09:32:16 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




Does TTL also cause tombstones?



On Tue, Nov 8, 2016 at 6:57 PM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:








>The deletes will be done at a scheduled time, probably at the end of the 
day, each day.





Probably you can use TTL? 
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I have a use case where a lot of updates and deletes to a table will be 
necessary.



The deletes will be done at a scheduled time, probably at the end of the day, 
each day.



Updates will be done throughout the day, as new data comes in.



Are there any guidelines on improving cassandra's performance for this use 
case? Any caveats to be aware of? Any tips, like running nodetool repair every 
X days?




Thanks.

Re: Designing a table in cassandra

2016-11-08 Thread Vladimir Yudovin

Hi Sathish,



probably I didn't catch exactly your requirements, but why not create single 
table for all devices, and represent each device as rows, storing both user and 
network configuration per device. You can use MAP for flexible storage model.



If you have thousandth of devices creating own table for each device can be 
quite heavy solution. 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 06 Nov 2016 19:23:20 -0500sat <sathish.al...@gmail.com> 
wrote 




Hi,



We are new to Cassandra. For our POC, we tried creating table and inserting 
them as JSON and all these went fine. Now we are trying to implement one of the 
application scenario, and I am having difficulty in coming up with the best 
approach. 



Scenario:

We have a Device POJO which have some attributes/fields which are read/write by 
users as well as network and some attributes/fields only network can modify. 
When users need to configure they will create an instance of Device POJO and 
set/configure applicable fields, however network can update those attributes. 
We wanted to know the discrepancy by the values configured by users versus the 
values updated by network. Hence we have thought of 3 different approaches



1) Create multiple tables for the same Device like Device_Users and 
Device_Network so that we can see the difference.



2) Create different Keyspace as multiple objects like Device can have the same 
requirement



3) Create one "Device" table and insert one row for user configuration and 
another row for network update. We will create this table with multiple primary 
key (device_name, updated_by)



Please let us know which is the best option (with their pros and cons if 
possible) among these 3, and also let us know if there are other options.



Thanks and Regards

A.SathishKumar

Re: Improving performance where a lot of updates and deletes are required?

2016-11-08 Thread Vladimir Yudovin

>The deletes will be done at a scheduled time, probably at the end of the 
day, each day.



Probably you can use TTL? 
http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 08 Nov 2016 05:04:12 -0500Ali Akhtar <ali.rac...@gmail.com> 
wrote 




I have a use case where a lot of updates and deletes to a table will be 
necessary.



The deletes will be done at a scheduled time, probably at the end of the day, 
each day.



Updates will be done throughout the day, as new data comes in.



Are there any guidelines on improving cassandra's performance for this use 
case? Any caveats to be aware of? Any tips, like running nodetool repair every 
X days?




Thanks.

Re: store individual inventory items in a table, how to assign them correctly

2016-11-08 Thread Vladimir Yudovin

Hi,



can you elaborate a little your data model?

Would you like to create 100 rows for each product and then remove one row and 
add this row to customer?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 07 Nov 2016 14:51:56 -0500S Ahmed <sahmed1...@gmail.com> 
wrote 




Say I have 100 products in inventory, instead of having a counter I want to 
create 100 rows per inventory item.



When someone purchases a product, how can I correctly assign that customer a 
product from inventory without having any race conditions etc?



Thanks.

Re: operation and maintenance tools

2016-11-08 Thread Vladimir Yudovin

For memory usage you can use small command line tool 
https://github.com/patric-r/jvmtop

Also there are number of GUI tools that connect to JMX port, like jvisualvm



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 07 Nov 2016 22:25:47 -0500<wxn...@zjqunshuo.com> wrote 




Hi All,



I need to do maintenance work for a C* cluster with about 10 nodes. Please 
recommend a C* operation and maintenance tools you are using.

I also noticed my C* deamon using large memory while doing nothing. Is there 
any convenent tool to deeply analysize the C* node memory?



Cheers,

Simon

Re: Cassandra on Cloud platforms experience

2016-11-04 Thread Vladimir Yudovin

Hi,

>1. No native snitch

It's not great problem. GossipPropertyFileSnitch is good enough.

>2. No concept of availability zones.

Azure does have such concept - Availability Set. It provides three fault domain 
(availability zone in Amazon terms) and 20 updates domains.

>4. Even running SSDs will give you poor performance.

It depends on disk size. 1T SSD provides 5000 IOPS.

So in short:

Amazon - provides data at rest encryption, flexible EBS storage (or local 
disks), availability zones.

Azure - provides data at rest encryption, less flexible storage (or local 
disks), availability zones.

SoftLayer - no data encryption, but they have unique feature - connectivity 
between different data centers (they call this VLAN spanning) without need in 
VPN or other tunneling. They don't have explicit AV zones, but you can put 
nodes in different DC in the same region (some locations) with relative low 
latency 1-1.5 ms. or purchase another VLAN in different pod for $25 per month 
in the same DC.

We provide Cassandra cluster on all provider in many worldwide locations.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Fri, 04 Nov 2016 05:10:06 -0400Oskar Kjellin 
<oskar.kjel...@gmail.com> wrote 

So I've run Cassandra on both Aws and azure. I would strongly suggest that if 
you have the option, run as far away from azure as you can. 

Here's a list of issues I have running Cassandra on azure: 

1. No native snitch 

2. No concept of availability zones. This makes it impossible for Cassandra to 
put replicas in different AZs. This will hurt your uptime and might incur loss 
of data. (They have something called a fault domain tho) 

3. The disks have iops that land in the floppy disk range 

4. Even running SSDs will give you poor performance. 

5. Beware of the global storage account limit. This makes scaling out hurt 
performance if you put them on the same storage account. Which if your using 
images is your only choice. 

Sent from my iPhone 

> On 4 nov. 2016, at 00:22, cass savy <casss...@gmail.com> wrote: 

> 

> I would like to hear from the community on their experiences or lesson 
learnt on hosting Cassandra in cloud platforms like 

> 

> 1. Google Cloud Platform 

> 2. AWS 

> 3. Azure 

> 

> 1. Which cloud hosting is better and Why? 

> 2. What differences of C* over vendor provided NoSQL DB like (Bigtable, 
Dynamo,Azure Document DB) 

> 3. AWS is more mature in his offerings and Azure is getting there or its 
there already based on what I have been investigating so far? 

> 

> 4. What is drive to pick one vs another -Is it cost, infrastructure, 
hardware SKU, availability, scalability, performance,ease of deployment and 
maintenance,..etc? 

> 

> Please let me know your thoughts and suggestions if somebody has done a 
deep dive into these 3 cloud platforms for C*. 

> 

> 

> We use datastax cassandra and exploring new usecases in AWS and also 
evaluating or POC it in Azure/GCP 

>

Re: failing bootstraps with OOM

2016-11-02 Thread Vladimir Yudovin

Hi,

probably you can try to start new node with auto_bootstrap: false and then
repair keypaces or even tables one by one with nodetool repair

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Wed, 02 Nov 2016 10:35:45 -0400Mike Torra <mto...@demandware.com>
wrote

Hi All -

I am trying to bootstrap a replacement node in a cluster, but it consistently
fails to bootstrap because of OOM exceptions. For almost a week I've been going
through cycles of bootstrapping, finding errors, then restarting / resuming
bootstrap, and I am struggling to move forward. Sometimes the bootstrapping
node itself fails, which usually manifests first as very high GC times
(sometimes 30s+!), then nodetool commands start to fail with timeouts, then the
node will crash with an OOM exception. Other times, a node streaming data to
this bootstrapping node will have a similar failure. In either case, when it
happens I need to restart the crashed node, then resume the bootstrap.

On top of these issues, when I do need to restart a node it takes a lng
time
(http://stackoverflow.com/questions/40141739/why-does-cassandra-sometimes-take-a-hours-to-start).
This exasperates the problem because it takes so long to find out if a change
to the cluster helps or if it still fails. I am in the process of upgrading all
nodes in the cluster from m4.xlarge to c4.4xlarge, and I am running Cassandra
DDC 3.5 on all nodes. The cluster has 26 nodes spread across 4 regions in EC2.
Here is some other relevant cluster info (also in stack overflow post):

Cluster Info

Cassandra DDC 3.5

EC2MultiRegionSnitch

m4.xlarge, moving to c4.4xlarge

Schema Info

3 CF's, all 'write once' (ie no updates), 1 week ttl, STCS (default)

no secondary indexes

I am unsure what to try next. The node that is currently having this bootstrap
problem is a pretty beefy box, with 16 cores, 30G of ram, and a 3.2T EBS
volume. The slow startup time might be because of the issues with a high number
of SSTables that Jeff Jirsa mentioned in a comment on the SO post, but I am at
a loss for the OOM issues. I've tried:

Changing from CMS to G1 GC, which seemed to have helped a bit

Upgrading from 3.5 to 3.9, which did not seem to help

Upgrading instance types from m4.xlarge to c4.4xlarge, which seems to help, but
I'm still having issues

I'd appreciate any suggestions on what else I can try to track down the cause
of these OOM exceptions.

- Mike

Re: commit log on NFS volume

2016-11-01 Thread Vladimir Yudovin

Hi,



it's not only performance issue. In case of network problem writer tread can be 
blocked, also in case of failure loss of data can occur.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 01 Nov 2016 14:10:10 -0400John Sanda <john.sa...@gmail.com> 
wrote 




I know that using NFS is discouraged, particularly for the commit log. Can 
anyone shed some light into what kinds of problems I might encounter aside from 
performance? The reason for my inquiry is because I have some deployments with 
Cassandra 2.2.1 that use NFS and are experiencing some problems like 
reoccurring corrupted commit log segments on start up:



 ERROR 19:38:42 Exiting due to error while processing commit log during 
initialization. 
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: 
Mutation checksum failure at 33296351 in CommitLog-5-1474325237114.log at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.handleReplayError(CommitLogReplayer.java:622)
 [apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.replaySyncSection(CommitLogReplayer.java:492)
 [apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:388)
 [apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:147)
 [apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:189) 
[apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:169) 
[apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:266) 
[apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488) 
[apache-cassandra-2.2.1.jar] at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595) 
[apache-cassandra-2.2.1.jar]



In one deployment after removing all of corrupted commit log segments I got a 
different error:



Exception (java.lang.RuntimeException) encountered during startup: 
java.nio.file.NoSuchFileException: 
/cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
 java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
/cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
 at org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:55) 
at org.apache.cassandra.io.util.ChannelProxy.<init>(ChannelProxy.java:66) 
at 
org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:78)
 at 
org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:91)
 at 
org.apache.cassandra.io.sstable.metadata.MetadataSerializer.deserialize(MetadataSerializer.java:101)
 at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
 at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:216) at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:488) 
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:595) 
Caused by: java.nio.file.NoSuchFileException: 
/cassandra_data/data/hawkular_metrics/metrics_idx-1ea881506ee311e6890b131c1bc89929/la-86-big-Statistics.db
 at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
 at java.nio.channels.FileChannel.open(FileChannel.java:287) at 
java.nio.channels.FileChannel.open(FileChannel.java:335) at 
org.apache.cassandra.io.util.ChannelProxy.openChannel(ChannelProxy.java:51) ... 
8 more



The latter error looks like it involves compaction and might be unrelated. I 
don't know if it matters, but I have commit log compression enabled in these 
environments.



-- 



- John

Re: Specifying multiple conditions for lightweight conditions?

2016-11-01 Thread Vladimir Yudovin

Hi,



Unfortunately CQL syntax doesn't allow use of OR operator in condition list: 



UPDATE [ keyspace_name. ] table_name [ USING TTL time_value | USING TIMESTAMP 
timestamp_value ] SET assignment [ , assignment ] . . . WHERE row_specification 
[ IF EXISTS | IF NOT EXISTS | IF condition [ AND condition ] . . . ] ;



The simplest solution to make two different request, first with null check:



UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200' WHERE id = '1' IF 
last_due_at = null;

then test result, if applied = False make second request:



UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200' WHERE id = '1' IF 
last_due_at < '2013-01-01 00:00:00+0200'



Sure, it's less effective then OR condition. Probably you can use IF NOT EXISTS 
in first request (depending on your application logic), may be it will be 
slightly faster (not sure).





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Tue, 01 Nov 2016 08:22:31 -0400Ali Akhtar <ali.rac...@gmail.com> 
wrote 




In the following query:



UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200'

WHERE id = '1' 

IF last_due_at < '2013-01-01 00:00:00+0200';




The intent is to change the value of 'last_due_at' as long as 'last_due_at' 
isn't already set to a later date than the one I've supplied.



The problem is, last_due_at starts off with an initial value of null, so the 
above query fails.



If I try the following:





UPDATE project SET last_due_at = '2013-01-01 00:00:00+0200'


WHERE id = '1' 

IF last_due_at < '2013-01-01 00:00:00+0200' OR last_due_at = null;




That fails due to a syntax error.



Is there any other way to achieve this?

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin

I would set rpc_address to 0.0.0.0 and broadcast_rpc_address to EACH_IP

This allows to connect to both 127.0.0.1 from inside and to IP from outside.

By a way, I see that port 7000 bound to external IP. Aren't both node in the 
same network? If yes, use internal IPs. 

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Sun, 30 Oct 2016 15:37:50 -0400Raimund Klein 
<chessra...@gmail.com> wrote 

Hi guys,

Thank you for your responses. Let me try to address them:

I just tried cqlsh directly with the IP, no change in behaviour. (I previously 
tried the hostnames, didn't work either.)

As for the "empty" ..._address: I meant that I leave these blank. Please let me 
quote from the default cassandra.yaml:

# Leaving it blank leaves it up to InetAddress.getLocalHost(). This

# (hostname, name resolution, etc), and the Right Thing is to use the

# address associated with the hostname (it might not be).

# will always do the Right Thing _if_ the node is properly configured

So what should I put instead?

Requested outputs:

nodetool status
Datacenter: datacenter1
===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack

UN  <IP_1>   344.56 KB  256  100.0%
6271c749-e41d-443c-89e4-46c0fbac49af  rack1

UN  <IP_2>  266.91 KB  256  100.0%
e50a1076-7149-45f3-9001-26bb479f2a50  rack1

# netstat -lptn | grep java
tcp0  0 <IP_1>:70000.0.0.0:*   LISTEN 
 17040/java  
tcp0  0 127.0.0.1:36415 0.0.0.0:*   LISTEN  
17040/java  
tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN  
17040/java  
tcp6   0  0 <IP_1>:9042:::*LISTEN 
 17040/java

# netstat -lptn | grep java

tcp0  0 127.0.0.1:43569 0.0.0.0:*   LISTEN  
49349/java  
tcp0  0 <IP_2>:7000   0.0.0.0:*   LISTEN  
49349/java  
tcp0  0 127.0.0.1:7199  0.0.0.0:*   LISTEN  
49349/java  
tcp6   0  0 :::8009 :::*LISTEN  
42088/java  
tcp6   0  0 :::8080 :::*LISTEN  
42088/java  
tcp6   0  0 <IP_2>:9042   :::*LISTEN  
49349/java  
tcp6   0  0 127.0.0.1:8005  :::*LISTEN  
42088/java

Jonathan, thank you for reassuring me that I didn't misunderstand seeds 
completely. ;-)

Any ideas?

Regards

Raimund

2016-10-30 18:48 GMT+00:00 Jonathan Haddad <j...@jonhaddad.com>:

I always prefer to set the listen interface instead of listen adress

Both nodes can be seeds. In fact, there should be more than one seed. Having 
your first 2 nodes as seeds is usual the correct thing to do. 

On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin <vla...@winguzone.com> 
wrote:

>Empty listen_address and rpc_address.

What do you mean by "Empty"? You should set either ***_address or 
***_interface. Otherwise 

Cassandra will not listen on port 9042.

>Open ports 9042, 7000 and 7001 for external communication.

Only port 9042 should be open to the world, Port 7000 for internode 
communication, and 7001 for internode SSL communication (only one of them is 
used).

>What is the best order of steps

Order doesn't really matter.

>Define both machines as seeds.

It's wrong. Only one (started first) should be seed.

>nodetool sees both of them

cqlsh refuses to connect

Can you please give output of

nodetool status

and

netstat -lptn | grep java

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein 
<chessra...@gmail.com> wrote 

Hi everyone,

We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes 
(nodetool sees both of them, so I'm quite certain the cluster is indeed 
active). My steps to create the cluster were (this applies to both machines):

 - Empty listen_address and rpc_address.

 - Define a cluster_name.

 - Define both machines as seeds.

 - Open ports 9042, 7000 and 7001 for external communication.

Now I want to secure access to the cluster in all forms:

 - define a different database user with a new password

 - encrypt communication bet ween clients and the cluster including client 
verification

 - encrypt communication between the nodes including verification

What is the best order of steps and correct way to achieve this? I wanted to 
start with definin

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-31 Thread Vladimir Yudovin

>Both nodes can be seeds.

Probably I misunderstood Raimund as setting each node as the only seed. If he 
set both IP on both nodes it's OK.

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Sun, 30 Oct 2016 14:48:00 -0400Jonathan Haddad 
<j...@jonhaddad.com> wrote 

I always prefer to set the listen interface instead of listen adress

Both nodes can be seeds. In fact, there should be more than one seed. Having 
your first 2 nodes as seeds is usual the correct thing to do. 

On Sun, Oct 30, 2016 at 8:28 AM Vladimir Yudovin <vla...@winguzone.com> 
wrote:

>Empty listen_address and rpc_address.

What do you mean by "Empty"? You should set either ***_address or 
***_interface. Otherwise 

Cassandra will not listen on port 9042.

>Open ports 9042, 7000 and 7001 for external communication.

Only port 9042 should be open to the world, Port 7000 for internode 
communication, and 7001 for internode SSL communication (only one of them is 
used).

>What is the best order of steps

Order doesn't really matter.

>Define both machines as seeds.

It's wrong. Only one (started first) should be seed.

>nodetool sees both of them

cqlsh refuses to connect

Can you please give output of

nodetool status

and

netstat -lptn | grep java

Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

 On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein 
<chessra...@gmail.com> wrote 

Hi everyone,

We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes 
(nodetool sees both of them, so I'm quite certain the cluster is indeed 
active). My steps to create the cluster were (this applies to both machines):

 - Empty listen_address and rpc_address.

 - Define a cluster_name.

 - Define both machines as seeds.

 - Open ports 9042, 7000 and 7001 for external communication.

Now I want to secure access to the cluster in all forms:

 - define a different database user with a new password

 - encrypt communication bet ween clients and the cluster including client 
verification

 - encrypt communication between the nodes including verification

What is the best order of steps and correct way to achieve this? I wanted to 
start with defining a different user, but cqlsh refuses to connect after 
enforcing user/password authentication:

cqlsh -u cassandra -p cassandra

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, 
"Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

This happens when I run the command on either of the two machines. Any help 
would be greatly appreciated.

Re: Securing a Cassandra 2.2.6 Cluster

2016-10-30 Thread Vladimir Yudovin

>Empty listen_address and rpc_address.

What do you mean by "Empty"? You should set either ***_address or 
***_interface. Otherwise 

Cassandra will not listen on port 9042.



>Open ports 9042, 7000 and 7001 for external communication.

Only port 9042 should be open to the world, Port 7000 for internode 
communication, and 7001 for internode SSL communication (only one of them is 
used).



>What is the best order of steps

Order doesn't really matter.



>Define both machines as seeds.

It's wrong. Only one (started first) should be seed.





>nodetool sees both of them

cqlsh refuses to connect

Can you please give output of

nodetool status

and

netstat -lptn | grep java



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Sun, 30 Oct 2016 14:11:55 -0400Raimund Klein 
<chessra...@gmail.com> wrote 




Hi everyone,

 

We've managed to set up a Cassandra 2.2.6 cluster of two physical nodes 
(nodetool sees both of them, so I'm quite certain the cluster is indeed 
active). My steps to create the cluster were (this applies to both machines):



 - Empty listen_address and rpc_address.

 - Define a cluster_name.

 - Define both machines as seeds.

 - Open ports 9042, 7000 and 7001 for external communication.



 



Now I want to secure access to the cluster in all forms:



 - define a different database user with a new password

 - encrypt communication bet ween clients and the cluster including client 
verification

 - encrypt communication between the nodes including verification



What is the best order of steps and correct way to achieve this? I wanted to 
start with defining a different user, but cqlsh refuses to connect after 
enforcing user/password authentication:



cqlsh -u cassandra -p cassandra

Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, 
"Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})



 



This happens when I run the command on either of the two machines. Any help 
would be greatly appreciated.

Re: how to get the size of the particular partition key belonging to an sstable ??

2016-10-27 Thread Vladimir Yudovin

Hi,



>size of a particular partition key

Can you please elucidate this? Key can be just number, or string, or several 
values. 



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 27 Oct 2016 11:45:47 -0400Pranay akula 
<pranay.akula2...@gmail.com> wrote 




how can i get the size of a particular partition key belonging to an sstable ?? 
can we find it using index or summary or Statistics.db files ?? does reading 
the hexdump of these files help ??







Thanks

Pranay.

Re: Re : Generic keystore when enabling SSL

2016-10-27 Thread Vladimir Yudovin

Hi Jacob,

there is no problem to use the same certificate (whether issued by some
authority or self signed) on all nodes until it's present in truststore. CN
doesn't matter in this case, it can be any string you want.

Would this impact client-to-node encryption

Nu, but clients should either add nodes certificate to their truststore or
disable validation (each Cassandra driver does this in its own way).

Best regards, Vladimir Yudovin,

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

On Thu, 27 Oct 2016 16:45:48 -0400Jacob Shadix
<jacobsha...@gmail.com> wrote

I am interested if anyone has taken this approach to share the same keystore
across all the nodes with the 3rd party root/intermediate CA existing only in
the truststore. If so, please share your experience and lessons learned. Would
this impact client-to-node encryption as the certificates used in internode
would not have the hostnames represented in CN?

-- Jacob Shadix

On Wed, Sep 21, 2016 at 11:40 AM, sai krishnam raju potturi
<pskraj...@gmail.com> wrote:

hi Evans;

rather than having one individual certificate for every node, we are looking
at getting one Comodo wild-card certificate, and importing that into the
keystore. along with the intermediate CA provided by Comodo. As far as the
trust-store is concerned, we are looking at importing the intermediate CA
provided along with the signed wild-card cert by Comodo.

So in this case we'll be having just one keystore (generic), and truststore
we'll be copying to all the nodes. We've run into issues however, and are
trying to iron that out. Interested to know if anybody in the community has
taken a similar approach.

We are pretty much going on the lines of following post by LastPickle
http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html.
Instead of creating our own CA, we are relying on Comodo.

thanks

Sai

On Wed, Sep 21, 2016 at 10:30 AM, Eric Evans <john.eric.ev...@gmail.com>
wrote:

On Tue, Sep 20, 2016 at 12:57 PM, sai krishnam raju potturi
<pskraj...@gmail.com> wrote:
> Due to the security policies in our company, we were asked to use 3rd
party
> signed certs. Since we'll require to manage 100's of individual certs, we
> wanted to know if there is a work around with a generic keystore and
> truststore.

Can you explain what you mean by "generic keystore"? Are you looking
to create keystores signed by a self-signed root CA (distributed via a

truststore)?

--
Eric Evans
john.eric.ev...@gmail.com

Re: High Usage of Survivor Space

2016-10-26 Thread Vladimir Yudovin

Hi,



did this high memory usage caused the problems? OOM crashes, GC pauses?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 26 Oct 2016 21:51:26 -0400Daniel Kleviansky 
<dan...@kleviansky.com> wrote 




Before anyone wastes any time replying, I found out that most of these were not 
being used, and manage to drop certain keyspaces and reduce the count of column 
families to 158.

Thought still a fair number, this seems to have relieved the issue 
significantly!



Hope this helps someone out at some point, otherwise, sorry for the unnecessary 
emails. ;)



Daniel




On Thu, Oct 27, 2016 at 11:25 AM, Daniel Kleviansky 
<dan...@kleviansky.com> wrote:










-- 

Daniel Kleviansky

System Engineer & CX Consultant

M: +61 (0) 499 103 043 | E: dan...@kleviansky.com | W: 
http://danielkleviansky.com







Hi everyone,



Organisation is running Cassandra for Windows v2.2.5

One of our development (non-load testing) clusters has a total of 6 nodes 
across two DCs, with an RF = 3:3.

Each node has a total of 16GB of memory, and no JVM options have been modified.




We're seeing what I believe to be unusually high usage of survivor space across 
all nodes in the cluster. Some sit at a consistently 100% maxed out state, 
while others are found at about 90%.



Initial thoughts is this is linked to high number of keyspaces/column families 
running on this particular cluster. There is a total of 28 keyspaces and 412 
column families. I have no empirical evidence to support this theory, hence me 
reaching out to the mailing list.



Is there any way to verify this theory, and/or discover potential root causes?



Please let me know what other information I can provide, and I'll be sure to 
get it ASAP.




Kindest regards,

Daniel Kleviansky

Re: Error creating pool to /IP_ADDRESS33:9042 (Proving Cassandra's NO SINGLE point of failure)

2016-10-24 Thread Vladimir Yudovin

Probably used Python driver can't restart failed operation with connection to 
other node. Do you provide all three IPs to Python driver for connecting?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Mon, 24 Oct 2016 07:48:05 -0400Rajesh Radhakrishnan 
<rajesh.radhakrish...@phe.gov.uk> wrote 




Hi,

 

 I have  3 nodes Cassandra cluster.

 Cassandra version : dsc-cassandra-2.1.5

 Python Cassandra Driver : 2.5.1

 

 Running the nodes in Red Hat virtual machines.

 

 Node ip info: 

 Node 1: IP_ADDRESS219

 Node 2: IP_ADDRESS229

 Node 3: IP_ADDRESS230

 

 

 (IP_ADDRESS219 is masked for this email which represents something similar 
123.321.123.219)

 

 

Cassandra.yaml configuration details of node1:

 

 listen_address: IP_ADDRESS219

 broadcast_address: commented

 rpc_address: IP_ADDRESS219

 broadcast_rpc_address : commented

 

 The IP address of the node is ( using the ifconfig command ) IP_ADDRESS219.

 

 While the cluster is up and running , when I put Node 3 (IP_ADDRESS230) down, 
I was able to connect to CQLSH from IP_ADDRESS219 and IP_ADDRESS229. 

 

 But while I was running a Python script which just reads data from Cassandra 
using the cassandra-python-driver, and when the Node 3 stops(while the script 
is still running I intentionally stops  node3).

 

 Then the script comes to a halt with OperationTimedOut: errors={}, last_host= 
IP_ADDRESS219.

 

 However if I run the script when node3 is already down, it runs and reads data.

 

 So during the reading operation  if any of the nodes in the cluster goes down 
its affects the client operation??? Does anyone has similar situation. 

 

 Here we are trying to establish or prove Cassandra's always on (NO single 
point of failure).  Do you know why this is happening Thank you .

 

 



Kind regards,
 Rajesh R















**

 The information contained in the EMail and any attachments is confidential and 
intended solely and for the attention and use of the named addressee(s). It may 
not be disclosed to any other person without the express authority of Public 
Health England, or the intended recipient, or both. If you are not the intended 
recipient, you must not disclose, copy, distribute or retain this message or 
any part of it. This footnote also confirms that this EMail has been swept for 
computer viruses by Symantec.Cloud, but please re-sweep any attachments before 
opening or saving. http://www.gov.uk/PHE

 **

Re: time series data model

2016-10-20 Thread Vladimir Yudovin

Hi Simon,

Why position is text and not float? Text takes much more place.
Also speed and headings can be calculated basing on latest positions, so you 
can also save them. If you really need it in data base you can save them as 
floats, or compose single float value like speed.heading: 41.173 (or opposite, 
heading.speed) and save column storage overhead.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Thu, 20 Oct 2016 03:29:16 -0400<wxn...@zjqunshuo.com> wrote 




Hi All,

I'm trying to migrate my time series data which is GPS trace from mysql to C*. 
I want a wide row to hold one day data. I designed the data model as below. 
Please help to see if there is any problem. Any suggestion is appreciated.



Table Model:

CREATE TABLE cargts.eventdata (
deviceid int,
date int,
event_time bigint,
position text,
PRIMARY KEY ((deviceid, date), event_time)
)


A slice of data:

cqlsh:cargts> SELECT * FROM eventdata WHERE deviceid =186628 and date = 
20160928 LIMIT 10;

 deviceid | date | event_time| position
--+--+---+-
   186628 | 20160928 | 1474992002000 |  
{"latitude":30.343443936386247,"longitude":120.08751351828943,"speed":41,"heading":48}
   186628 | 20160928 | 1474992012000 |   
{"latitude":30.34409508979662,"longitude":120.08840022183352,"speed":45,"heading":53}
   186628 | 20160928 | 1474992022000 |   
{"latitude":30.34461639856887,"longitude":120.08946100336443,"speed":28,"heading":65}
   186628 | 20160928 | 1474992032000 |   
{"latitude":30.34469478717028,"longitude":120.08973154015409,"speed":11,"heading":67}
   186628 | 20160928 | 1474992042000 |   
{"latitude":30.34494998929474,"longitude":120.09027263811151,"speed":19,"heading":47}
   186628 | 20160928 | 1474992052000 | 
{"latitude":30.346057349126617,"longitude":120.08967091817931,"speed":41,"heading":323}
   186628 | 20160928 | 1474992062000 |
{"latitude":30.346997145708,"longitude":120.08883508853253,"speed":52,"heading":323}
   186628 | 20160928 | 1474992072000 | 
{"latitude":30.348131044340988,"longitude":120.08774702315581,"speed":65,"heading":321}
   186628 | 20160928 | 1474992082000 | 
{"latitude":30.349438164412838,"longitude":120.08652612959328,"speed":68,"heading":322}


-Simon Wu

Re: How to insert "Empty" timeuuid by Cql

2016-10-19 Thread Vladimir Yudovin

Hi,



what does it exactly mean 'empty timeuuid'?  UUID takes 16 bytes for storage, 
so it should be either null, or some value. Do you mean 'zero' UUID?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 19 Oct 2016 09:16:29 -0400coderhlj <coder...@gmail.com> 
wrote 




Hi all,



We use Cassandra 2.1.11 in our product, and we update the Java Drive from 
Astyanax(Thrift API) to DataStax Java Driver(Cql) recently, but we encounter a 
difficult issue as following, please help us, thanks in advance.



Previously we were using Astyanax API, and we can insert empty timeuuid like 
below, but now we can only insert null timeuuid by cql command but not empty 
one. Is there any cql function to insert an empty timeuuid like by Astyanax?

And this cause a tough problem is that we can not delete the record by 
specifying the primary key, like:

delete from "Foo" where column1='test' and column2='accessState' and column3='' 
and column4=(need fill empty uuid here) IF EXISTS;



key  | column1  | column2 | column3 | column4 | value

-+-+-+- 
+-+--

test by thrift | accessState |  |  |  | 0x5




key  | column1  | column2 | column3 | column4  | value

-+-+-+-+--+--

 test by cql   | accessState |  | |  null | 0x5






cqlsh:StorageOS> desc table "Foo";  



CREATE TABLE "Foo" (

  key text,

  column1 text,

  column2 text,

  column3 text,

  column4 timeuuid,

  value blob,

  PRIMARY KEY (key, column1, column2, column3, column4)

) WITH COMPACT STORAGE AND

  bloom_filter_fp_chance=0.01 AND

  caching='{"keys":"ALL", "rows_per_partition":"NONE"}' AND

  comment='' AND

  dclocal_read_repair_chance=0.10 AND

  gc_grace_seconds=432000 AND

  read_repair_chance=0.00 AND

  default_time_to_live=0 AND

  speculative_retry='NONE' AND

  memtable_flush_period_in_ms=0 AND

  compaction={'class': 'SizeTieredCompactionStrategy'} AND

  compression={'sstable_compression': 'LZ4Compressor'};




--

Thanks,

Lijun Huang

Re: quick question

2016-10-19 Thread Vladimir Yudovin

What exactly do you mean by "resource usage"? If you mean "data size on disk" - 
no.

If you mean "current CPU usage" - it depends on query. Modify query should be 
be sent to all nodes owning specific partition key.

For read queries see 
http://www.datastax.com/dev/blog/dynamic-snitching-in-cassandra-past-present-and-future



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





 On Wed, 19 Oct 2016 06:14:27 -0400Kant Kodali <k...@peernova.com> 
wrote 




can Cassandra cluster direct or load balance the requests by detecting the 
resource usage of a particular node?

Re: Adding disk capacity to a running node

2016-10-18 Thread Vladimir Yudovin

 On Mon, 17 Oct 2016 15:59:41 -0400Ben Bromhead 
<b...@instaclustr.com> wrote 

For the times that AWS retires an instance, you get plenty of notice and it's 
generally pretty rare. We run over 1000 instances on AWS and see one forced 
retirement a month if that. We've never had an instance pulled from under our 
feet without warning.




Yes, in case of planned event. But in case of some hardware failure it can 
happen. And it shouldn't be some catastrophe affecting the whole availability 
zone. Just failure of singe blade. 





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Mon, 17 Oct 2016 15:59:41 -0400Ben Bromhead 
<b...@instaclustr.com> wrote 




Yup as everyone has mentioned ephemeral are fine if you run in multiple AZs... 
which is pretty much mandatory for any production deployment in AWS (and other 
cloud providers) . i2.2xls are generally your best bet for high read throughput 
applications on AWS. 



Also on AWS ephemeral storage will generally survive a user initiated restart. 
For the times that AWS retires an instance, you get plenty of notice and it's 
generally pretty rare. We run over 1000 instances on AWS and see one forced 
retirement a month if that. We've never had an instance pulled from under our 
feet without warning.



To add another option for the original question, one thing you can do is to 
attach a large EBS drive to the instance and bind mount it to the directory for 
the table that has the very large SSTables. You will need to copy data across 
to the EBS volume. Let everything compact and then copy everything back and 
detach EBS. Latency may be higher than normal on the node you are doing this on 
(especially if you are used to i2.2xl performance). 



This is something we often have to do, when we encounter pathological 
compaction situations associated with bootstrapping, adding new DCs or STCS 
with a dominant table or people ignore high disk usage warnings :)




On Mon, 17 Oct 2016 at 12:43 Jeff Jirsa <jeff.ji...@crowdstrike.com> 
wrote:




-- 

Ben Bromhead

CTO | Instaclustr

+1 650 284 9692

Managed Cassandra / Spark on AWS, Azure and Softlayer




Ephemeral is fine, you just need to have enough replicas (in enough AZs and 
enough regions) to tolerate instances being terminated.

 

 

 

From: Vladimir Yudovin <vla...@winguzone.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, October 17, 2016 at 11:48 AM
To: user <user@cassandra.apache.org>




Subject: Re: Adding disk capacity to a running node






 


It's extremely unreliable to use ephemeral (local) disks. Even if you don't 
stop instance by yourself, it can be restarted on different server in case of 
some hardware failure or AWS initiated update. So all node data will be lost.







 


Best regards, Vladimir Yudovin, 


Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.


 


 


 On Mon, 17 Oct 2016 14:45:00 -0400Seth Edwards <s...@pubnub.com> 
wrote 



 


These are i2.2xlarge instances so the disks currently configured as ephemeral 
dedicated disks. 


 


On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael 
<michael.la...@nytimes.com> wrote:


 





You could just expand the size of your ebs volume and extend the file system. 
No data is lost - assuming you are running Linux.


 


 


On Monday, October 17, 2016, Seth Edwards <s...@pubnub.com> wrote:


We're running 2.0.16. We're migrating to a new data model but we've had an 
unexpected increase in write traffic that has caused us some capacity issues 
when we encounter compactions. Our old data model is on STCS. We'd like to add 
another ebs volume (we're on aws) to our JBOD config and hopefully avoid any 
situation where we run out of disk space during a large compaction. It appears 
that the behavior we are hoping to get is actually undesirable and removed in 
3.2. It still might be an option for us until we can finish the migration. 


 


I'm not familiar with LVM so it may be a bit risky to try at this point. 



 


On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng <yabinm...@gmail.com> wrote:


I assume you're talking about Cassandra JBOD (just a bunch of disk) setup 
because you do mention it as adding it to the list of data directories. If this 
is the case, you may run into issues, depending on your C* version. Check this 
out: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.datastax.com_dev_blog_improving-2Djbod&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=ixOxpX-xpw1dJZNpaMT3mepToWX8gzmsVaXFizQLzoU&s=e_rkkJ8RHJXe4KvyNfeRWQkdy-zZzOnaMDQle3nN808&e=.


 


Or another approach is to use LVM to manage multiple devices into a

Re: Adding disk capacity to a running node

2016-10-17 Thread Vladimir Yudovin

But after such restart node should be joined to cluster again and restore data, 
right?



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Mon, 17 Oct 2016 14:55:49 -0400Jonathan Haddad 
<j...@jonhaddad.com> wrote 




Vladimir,



*Most* people are running Cassandra are doing so using ephemeral disks.  
Instances are not arbitrarily moved to different hosts.  Yes, instances can be 
shut down, but that's why you distribute across AZs.  




On Mon, Oct 17, 2016 at 11:48 AM Vladimir Yudovin <vla...@winguzone.com> 
wrote:







It's extremely unreliable to use ephemeral (local) disks. Even if you don't 
stop instance by yourself, it can be restarted on different server in case of 
some hardware failure or AWS initiated update. So all node data will be lost.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.







 On Mon, 17 Oct 2016 14:45:00 -0400Seth Edwards <s...@pubnub.com> 
wrote 







These are i2.2xlarge instances so the disks currently configured as ephemeral 
dedicated disks. 



On Mon, Oct 17, 2016 at 11:34 AM, Laing, Michael 
<michael.la...@nytimes.com> wrote:






You could just expand the size of your ebs volume and extend the file system. 
No data is lost - assuming you are running Linux.





On Monday, October 17, 2016, Seth Edwards <s...@pubnub.com> wrote:

We're running 2.0.16. We're migrating to a new data model but we've had an 
unexpected increase in write traffic that has caused us some capacity issues 
when we encounter compactions. Our old data model is on STCS. We'd like to add 
another ebs volume (we're on aws) to our JBOD config and hopefully avoid any 
situation where we run out of disk space during a large compaction. It appears 
that the behavior we are hoping to get is actually undesirable and removed in 
3.2. It still might be an option for us until we can finish the migration. 



I'm not familiar with LVM so it may be a bit risky to try at this point. 




On Mon, Oct 17, 2016 at 9:42 AM, Yabin Meng <yabinm...@gmail.com> wrote:

I assume you're talking about Cassandra JBOD (just a bunch of disk) setup 
because you do mention it as adding it to the list of data directories. If this 
is the case, you may run into issues, depending on your C* version. Check this 
out: http://www.datastax.com/dev/blog/improving-jbod.



Or another approach is to use LVM to manage multiple devices into a single 
mount point. If you do so, from what Cassandra can see is just simply increased 
disk storage space and there should should have no problem.



Hope this helps,



Yabin




On Mon, Oct 17, 2016 at 11:54 AM, Vladimir Yudovin <vla...@winguzone.com> 
wrote:



Yes, Cassandra should keep percent of disk usage equal for all disk. Compaction 
process and SSTable flushes will use new disk to distribute both new and 
existing data.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Mon, 17 Oct 2016 11:43:27 -0400Seth Edwards <s...@pubnub.com> 
wrote 




We have a few nodes that are running out of disk capacity at the moment and 
instead of adding more nodes to the cluster, we would like to add another disk 
to the server and add it to the list of data directories. My question, is, will 
Cassandra use the new disk for compactions on sstables that already exist in 
the primary directory? 







Thanks!

1 2 >

1 - 100 of 127 matches

Mail list logo