Re: Understanding cassandra data directory contents

2016-10-10 Thread Vladimir Yudovin
Snapshots are created inside of table folder (one with ID suffix):



$ nodetool snapshot music

Requested creating snapshot(s) for [music] with snapshot name [1476165047920]

Snapshot directory: 1476165047920



$pwd

cassandra/data/data/music/songs-6060ae608dd811e68e340f08799f1f06/snapshots/1476165047920




Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Mon, 10 Oct 2016 17:00:03 -0400Nicolas Douillet 
nicolas.douil...@gmail.com wrote 




Hi Json, 



I'm not familiar enough with Cassandra 3, but it might be snapshots. Snapshots 
are usually hardlinks to sstable directories.



Try this : 

nodetool clearsnapshot



Does it change anything?



--

Nicolas




Le sam. 8 oct. 2016 à 21:26, Jason Kania jason.ka...@ymail.com a écrit :





Hi Vladamir,



Thanks for the response. I assume then that it is safe to remove the 
directories that are not current as per the system_schema.tables table. I have 
dozens of the same table and haven't dropped and added nearly that many times. 
Do any of the nodetool or other commands clean up these unused directories?



Thanks,



Jason Kania


From: Vladimir Yudovin vla...@winguzone.com
 To: user@cassandra.apache.org; Jason Kania jason.ka...@ymail.com 
 Sent: Saturday, October 8, 2016 2:05 PM
 Subject: Re: Understanding cassandra data directory contents







Each table has unique id (suffix). If you drop and then recreate table with the 
same name it gets new id.



Try

SELECT keyspace_name, table_name, id FROM system_schema.tables ;

to determinate actual ID.



You can limit request to specific keyspace or table.





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.





 On Sat, 08 Oct 2016 13:42:19 -0400 Jason 
Kaniajason.ka...@ymail.com wrote  


Hello,



I am using Cassandra 3.0.9 and I have encountered an issue where the nodes in 
my 3 node cluster have vastly different amounts of data even though they should 
be roughly the same. When I looked through the data directory for my database 
on two of the nodes, I see a number of directories with the same prefix, eg:



periodicReading-76eb7510096811e68a7421c8b9466352,

periodicReading-453d55a0501d11e68623a9d2b6f96e86

...



Only one directory with a specific table name prefix has a current date and the 
rest are older.



In contrast, on the node with the least space used, each directory has a unique 
prefix (not shared).



I am wondering what the contents of a Cassandra database directory should look 
like. Are there supposed to be multiple entries for a given table or just one?



If just one, what would be a procedure to determine if the other directories 
with the same table are junk that can be removed.



Thanks,



Jason


























Re: Where to change the datacenter name?

2016-10-10 Thread Vladimir Yudovin
Hello,

on my local machine, Cassandra is annoyingly insisting on 'datacenter1'.

I don't believe Cassandra does it on its own )))



What is parameter endpoint_snitch in cassandra.yaml file? As it was mentioned, 
different snitches use different configuration files and you can set the same 
data center name in both your testing and production environments.



I would be careful changing datacenter name, particularly in 
production...it may result in stale data depending on token values 

Actually tokens shouldn't change if number of nodes remains the same.

You can change DC name on all nodes (-Dcassandra.ignore_dc=true must be set on 
first run) and run nodetool repair/cleanup on each node to ensure data 
consistency.





And unless I define my replication as: '{'class': 
'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace, my 
inserts / selects don't work because it says 0 replicas available

Probably you can also use SimpleStrategy (depending on system configuration and 
needs)





Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
Launch your cluster in minutes.






 On Mon, 10 Oct 2016 16:30:49 -0400Ali Akhtar ali.rac...@gmail.com 
wrote 




Yeah, so what's happening is, I'm running Cassandra thru a docker image in 
production, and so over there, it is using the datacenter name that I specified 
thru an env variable.



But on my local machine, Cassandra is annoyingly insisting on 'datacenter1'.



So in order to maintain the same .cql scripts for setting up the db, I either 
need to change the dc name locally or in production.



I guess it looks like I should leave it 'datacenter1' in production.




On Tue, Oct 11, 2016 at 1:19 AM, Amit Trivedi tria...@gmail.com wrote:






I believe it is coming from system.local. You can verify by executing


select data_center from system.local;

I would be careful changing datacenter name, particularly in production.  This 
is essentially because if change of datacenter requires snitch configuration 
change, it may result in stale data depending on token values and snitch 
settings and there is a risk of node reporting invalid/ missing data to client.






On Mon, Oct 10, 2016 at 4:08 PM, Ali Akhtar ali.rac...@gmail.com wrote:

So I see this:



cluster_name: 'Test Cluster'



But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I don't see 
that anywhere except in a comment.





Yet when I do nodetool status, I see: datacenter1



And unless I define my replication as: '{'class': 'NetworkTopologyStrategy', 
'datacenter1' : 3}' when creating my keyspace, my inserts / selects don't work 
because it says 0 replicas available (i.e if i use anything other than 
'datacenter1' in the above stmt)



I don't see 'datacenter1' in rackdc.properties. So my question is, which file 
contains 'datacenter1'?




On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson a...@datascale.io wrote:

There is a cluster name in the cassandra.yaml for naming the cluster, aka data 
center. Then you assign keyspaces to the data center within the CREATE KEYSPACE 
stmt with NetworkTopology. 





On Monday, October 10, 2016, Ali Akhtar ali.rac...@gmail.com wrote:

Where can I change the default name 'datacenter1'? I've looked through the 
configuration files in /etc/cassandra , and can't find where this value is 
being defined.





-- 

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io














Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Jonathan Haddad
No, you can't.  Keep in mind parts of the primary key are immutable, so
there would be no usability difference between a frozen UDT in your PK and
a non-frozen one other than the frozen keyword.

On Mon, Oct 10, 2016 at 10:07 PM Andrew Tolbert 
wrote:

> Is it possible to use fields on the UDT as primary / cluster keys?
>
>
> That is not supported as far as I know.  In that case it's probably best
> to either use a frozen UDT or make the field a separate column.
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:50 PM, Jonathan Haddad 
> wrote:
>
> Works for me.  You can see the version, CREATE TYPE, CREATE TABLE,
> insertion, and describing the table
>
> jhaddad@rustyrazorblade ~/dev/cassandra$ bin/cqlsh
>
>  c1fa214
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol
> v4]
> Use HELP for help.
> cqlsh> create KEYSPACE test
> 
> cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> CREATE TYPE test (
> ... foo text,
> ... bar text
> ... );
> cqlsh:test>
> cqlsh:test> CREATE TABLE test_table (
> ... id text,
> ... this_doesnt_work test,
> ... PRIMARY KEY (id)
> ... );
> cqlsh:test> insert into test_table ( id, this_doesnt_work) values ('jon',
> {foo:'a', bar:'b'});
> cqlsh:test>
> cqlsh:test> insert into test_table ( id, this_doesnt_work) values
> ('haddad', {foo:'a'});
> cqlsh:test> desc test_table;
>
> CREATE TABLE test.test_table (
> id text PRIMARY KEY,
> this_doesnt_work test
> ) WITH bloom_filter_fp_chance = 0.01
>
>
> On Mon, Oct 10, 2016 at 9:25 PM Ali Akhtar  wrote:
>
> CREATE TYPE test (
> foo text,
> bar text
> );
>
> CREATE TABLE test_table (
> id text,
> this_doesnt_work test,
> PRIMARY KEY (id)
> );
>
> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Can you please share an example where it doesn't work?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>
> Not sure I understand the question, sorry.
>
> The column isn't part of the primary key.
>
> I defined a UDT and then I tried to define a column (not primary or
> cluster key) as being of that type, but it doesn't let me do that unless i
> set it as frozen. Docs indicate otherwise though
>
> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
> According to
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html
>
> >  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
> that contain only non-collection fields.
>
> However if I create a type with 4-5 all text fields, and try to use that
> type in another table, I get told to use frozen , even though I'm on
> cassandra 3.9
>
> >  show VERSION
> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>
> Any ideas?
>
>
>
>
>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Andrew Tolbert
>
> Is it possible to use fields on the UDT as primary / cluster keys?


That is not supported as far as I know.  In that case it's probably best to
either use a frozen UDT or make the field a separate column.

Thanks,
Andy

On Mon, Oct 10, 2016 at 11:50 PM, Jonathan Haddad  wrote:

> Works for me.  You can see the version, CREATE TYPE, CREATE TABLE,
> insertion, and describing the table
>
> jhaddad@rustyrazorblade ~/dev/cassandra$ bin/cqlsh
>
>  c1fa214
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol
> v4]
> Use HELP for help.
> cqlsh> create KEYSPACE test
> 
> cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
> 'replication_factor': 1};
> cqlsh> use test;
> cqlsh:test> CREATE TYPE test (
> ... foo text,
> ... bar text
> ... );
> cqlsh:test>
> cqlsh:test> CREATE TABLE test_table (
> ... id text,
> ... this_doesnt_work test,
> ... PRIMARY KEY (id)
> ... );
> cqlsh:test> insert into test_table ( id, this_doesnt_work) values ('jon',
> {foo:'a', bar:'b'});
> cqlsh:test>
> cqlsh:test> insert into test_table ( id, this_doesnt_work) values
> ('haddad', {foo:'a'});
> cqlsh:test> desc test_table;
>
> CREATE TABLE test.test_table (
> id text PRIMARY KEY,
> this_doesnt_work test
> ) WITH bloom_filter_fp_chance = 0.01
>
>
> On Mon, Oct 10, 2016 at 9:25 PM Ali Akhtar  wrote:
>
>> CREATE TYPE test (
>> foo text,
>> bar text
>> );
>>
>> CREATE TABLE test_table (
>> id text,
>> this_doesnt_work test,
>> PRIMARY KEY (id)
>> );
>>
>> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Can you please share an example where it doesn't work?
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>>
>> Not sure I understand the question, sorry.
>>
>> The column isn't part of the primary key.
>>
>> I defined a UDT and then I tried to define a column (not primary or
>> cluster key) as being of that type, but it doesn't let me do that unless i
>> set it as frozen. Docs indicate otherwise though
>>
>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Is the column you are using that has the UDT type is the or is part of
>> the primary key?  If that is the case it still needs to be frozen (the same
>> goes for list, set, tuple as part of primary key).  This is the error I get
>> when I try that:
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>> basics"
>>
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCrea
>> teUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>
>>
>>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar
Is it possible to use fields on the UDT as primary / cluster keys?

On Tue, Oct 11, 2016 at 9:49 AM, Ali Akhtar  wrote:

> Yeah, you're right, it does work if I run it thru cqlsh. I was using
> DevCenter which shows that error.
>
> On Tue, Oct 11, 2016 at 9:48 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
>> That works for me.   Are you sure you are on 3.6+?  What error message
>> are you getting?
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 11:25 PM Ali Akhtar  wrote:
>>
>>> CREATE TYPE test (
>>> foo text,
>>> bar text
>>> );
>>>
>>> CREATE TABLE test_table (
>>> id text,
>>> this_doesnt_work test,
>>> PRIMARY KEY (id)
>>> );
>>>
>>> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
>>> andrew.tolb...@datastax.com> wrote:
>>>
>>> Can you please share an example where it doesn't work?
>>>
>>> Thanks,
>>> Andy
>>>
>>> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar 
>>> wrote:
>>>
>>> Not sure I understand the question, sorry.
>>>
>>> The column isn't part of the primary key.
>>>
>>> I defined a UDT and then I tried to define a column (not primary or
>>> cluster key) as being of that type, but it doesn't let me do that unless i
>>> set it as frozen. Docs indicate otherwise though
>>>
>>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>>> andrew.tolb...@datastax.com> wrote:
>>>
>>> Is the column you are using that has the UDT type is the or is part of
>>> the primary key?  If that is the case it still needs to be frozen (the same
>>> goes for list, set, tuple as part of primary key).  This is the error I get
>>> when I try that:
>>>
>>> InvalidRequest: Error from server: code=2200 [Invalid query]
>>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>>> basics"
>>>
>>> Andy
>>>
>>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>>
>>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCrea
>>> teUDT.html
>>>
>>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>>> UDTs that contain only non-collection fields.
>>>
>>> However if I create a type with 4-5 all text fields, and try to use that
>>> type in another table, I get told to use frozen , even though I'm on
>>> cassandra 3.9
>>>
>>> >  show VERSION
>>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>>
>>> Any ideas?
>>>
>>>
>>>
>>>
>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Jonathan Haddad
Works for me.  You can see the version, CREATE TYPE, CREATE TABLE,
insertion, and describing the table

jhaddad@rustyrazorblade ~/dev/cassandra$ bin/cqlsh

 c1fa214
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> create KEYSPACE test

cqlsh> create KEYSPACE test WITH replication = {'class': 'SimpleStrategy',
'replication_factor': 1};
cqlsh> use test;
cqlsh:test> CREATE TYPE test (
... foo text,
... bar text
... );
cqlsh:test>
cqlsh:test> CREATE TABLE test_table (
... id text,
... this_doesnt_work test,
... PRIMARY KEY (id)
... );
cqlsh:test> insert into test_table ( id, this_doesnt_work) values ('jon',
{foo:'a', bar:'b'});
cqlsh:test>
cqlsh:test> insert into test_table ( id, this_doesnt_work) values
('haddad', {foo:'a'});
cqlsh:test> desc test_table;

CREATE TABLE test.test_table (
id text PRIMARY KEY,
this_doesnt_work test
) WITH bloom_filter_fp_chance = 0.01


On Mon, Oct 10, 2016 at 9:25 PM Ali Akhtar  wrote:

> CREATE TYPE test (
> foo text,
> bar text
> );
>
> CREATE TABLE test_table (
> id text,
> this_doesnt_work test,
> PRIMARY KEY (id)
> );
>
> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Can you please share an example where it doesn't work?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>
> Not sure I understand the question, sorry.
>
> The column isn't part of the primary key.
>
> I defined a UDT and then I tried to define a column (not primary or
> cluster key) as being of that type, but it doesn't let me do that unless i
> set it as frozen. Docs indicate otherwise though
>
> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
> According to
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html
>
> >  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
> that contain only non-collection fields.
>
> However if I create a type with 4-5 all text fields, and try to use that
> type in another table, I get told to use frozen , even though I'm on
> cassandra 3.9
>
> >  show VERSION
> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>
> Any ideas?
>
>
>
>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar
Yeah, you're right, it does work if I run it thru cqlsh. I was using
DevCenter which shows that error.

On Tue, Oct 11, 2016 at 9:48 AM, Andrew Tolbert  wrote:

> That works for me.   Are you sure you are on 3.6+?  What error message are
> you getting?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:25 PM Ali Akhtar  wrote:
>
>> CREATE TYPE test (
>> foo text,
>> bar text
>> );
>>
>> CREATE TABLE test_table (
>> id text,
>> this_doesnt_work test,
>> PRIMARY KEY (id)
>> );
>>
>> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Can you please share an example where it doesn't work?
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>>
>> Not sure I understand the question, sorry.
>>
>> The column isn't part of the primary key.
>>
>> I defined a UDT and then I tried to define a column (not primary or
>> cluster key) as being of that type, but it doesn't let me do that unless i
>> set it as frozen. Docs indicate otherwise though
>>
>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Is the column you are using that has the UDT type is the or is part of
>> the primary key?  If that is the case it still needs to be frozen (the same
>> goes for list, set, tuple as part of primary key).  This is the error I get
>> when I try that:
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>> basics"
>>
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>
>>
>>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Andrew Tolbert
That works for me.   Are you sure you are on 3.6+?  What error message are
you getting?

Thanks,
Andy

On Mon, Oct 10, 2016 at 11:25 PM Ali Akhtar  wrote:

> CREATE TYPE test (
> foo text,
> bar text
> );
>
> CREATE TABLE test_table (
> id text,
> this_doesnt_work test,
> PRIMARY KEY (id)
> );
>
> On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Can you please share an example where it doesn't work?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>
> Not sure I understand the question, sorry.
>
> The column isn't part of the primary key.
>
> I defined a UDT and then I tried to define a column (not primary or
> cluster key) as being of that type, but it doesn't let me do that unless i
> set it as frozen. Docs indicate otherwise though
>
> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
> According to
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html
>
> >  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
> that contain only non-collection fields.
>
> However if I create a type with 4-5 all text fields, and try to use that
> type in another table, I get told to use frozen , even though I'm on
> cassandra 3.9
>
> >  show VERSION
> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>
> Any ideas?
>
>
>
>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar
Awesome, thank you.

Perhaps this should be updated on the docs here:
http://docs.datastax.com/en/developer/java-driver//3.1/manual/udts/



On Tue, Oct 11, 2016 at 9:27 AM, Andrew Tolbert  wrote:

> Indeed it is possible to use UDTs with the mapper (docs
> ).
> Pojos are annotated with @UDT and their fields are mapped with @Field (like
> table pojos are annotated with @Table and @Column respectively).  You are
> correct in that you can then use that type for a field on a @Table
> annotated class.
>
> Thanks,
> Andy
>
>
>
> On Mon, Oct 10, 2016 at 11:23 PM Ali Akhtar  wrote:
>
>> Thanks.
>>
>> Btw, is it possible to use UDTs and have them mapped via the java driver?
>> If so, how does that work - do I just create a pojo for the UDT, and use
>> @Column on the fields, and it will work if I define a field in the table
>> mapping class as being of that pojo type?
>>
>> On Tue, Oct 11, 2016 at 8:57 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> I agree this would be a nice mechanism for the driver mapper given the
>> difference between java field name conventions and how cql column names are
>> typically defined.   I've created JAVA-1316
>>  for this.
>>
>> Thanks,
>> Andy
>>
>>
>>
>> On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:
>>
>> Please fix this.
>>
>>
>>
>> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Hi Ali,
>>
>> As far as I know this hasn't changed.  Either the field name on the class
>> has to match the name of the column or you have to use the @Column with the
>> name attribute to set the column name being mapped by that field.
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>
>>
>>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Andrew Tolbert
Indeed it is possible to use UDTs with the mapper (docs
).
Pojos are annotated with @UDT and their fields are mapped with @Field (like
table pojos are annotated with @Table and @Column respectively).  You are
correct in that you can then use that type for a field on a @Table
annotated class.

Thanks,
Andy



On Mon, Oct 10, 2016 at 11:23 PM Ali Akhtar  wrote:

> Thanks.
>
> Btw, is it possible to use UDTs and have them mapped via the java driver?
> If so, how does that work - do I just create a pojo for the UDT, and use
> @Column on the fields, and it will work if I define a field in the table
> mapping class as being of that pojo type?
>
> On Tue, Oct 11, 2016 at 8:57 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> I agree this would be a nice mechanism for the driver mapper given the
> difference between java field name conventions and how cql column names are
> typically defined.   I've created JAVA-1316
>  for this.
>
> Thanks,
> Andy
>
>
>
> On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:
>
> Please fix this.
>
>
>
> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Hi Ali,
>
> As far as I know this hasn't changed.  Either the field name on the class
> has to match the name of the column or you have to use the @Column with the
> name attribute to set the column name being mapped by that field.
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>
> In working with Jackson, it has a NamingStrategy which lets you
> automatically map snake_case fields in json to camelCase fields on the Java
> class.
>
> Last time I worked w/ Cassandra, I didn't find anything like that, and had
> to define an @Column annotation for each field.
>
> Please tell me this has changed now?
>
>
>
>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar
CREATE TYPE test (
foo text,
bar text
);

CREATE TABLE test_table (
id text,
this_doesnt_work test,
PRIMARY KEY (id)
);

On Tue, Oct 11, 2016 at 9:23 AM, Andrew Tolbert  wrote:

> Can you please share an example where it doesn't work?
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:
>
>> Not sure I understand the question, sorry.
>>
>> The column isn't part of the primary key.
>>
>> I defined a UDT and then I tried to define a column (not primary or
>> cluster key) as being of that type, but it doesn't let me do that unless i
>> set it as frozen. Docs indicate otherwise though
>>
>> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Is the column you are using that has the UDT type is the or is part of
>> the primary key?  If that is the case it still needs to be frozen (the same
>> goes for list, set, tuple as part of primary key).  This is the error I get
>> when I try that:
>>
>> InvalidRequest: Error from server: code=2200 [Invalid query]
>> message="Invalid non-frozen user-defined type for PRIMARY KEY component
>> basics"
>>
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>
>>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Andrew Tolbert
Can you please share an example where it doesn't work?

Thanks,
Andy

On Mon, Oct 10, 2016 at 11:21 PM Ali Akhtar  wrote:

> Not sure I understand the question, sorry.
>
> The column isn't part of the primary key.
>
> I defined a UDT and then I tried to define a column (not primary or
> cluster key) as being of that type, but it doesn't let me do that unless i
> set it as frozen. Docs indicate otherwise though
>
> On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
> According to
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html
>
> >  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
> that contain only non-collection fields.
>
> However if I create a type with 4-5 all text fields, and try to use that
> type in another table, I get told to use frozen , even though I'm on
> cassandra 3.9
>
> >  show VERSION
> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>
> Any ideas?
>
>
>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar
Thanks.

Btw, is it possible to use UDTs and have them mapped via the java driver?
If so, how does that work - do I just create a pojo for the UDT, and use
@Column on the fields, and it will work if I define a field in the table
mapping class as being of that pojo type?

On Tue, Oct 11, 2016 at 8:57 AM, Andrew Tolbert  wrote:

> I agree this would be a nice mechanism for the driver mapper given the
> difference between java field name conventions and how cql column names are
> typically defined.   I've created JAVA-1316
>  for this.
>
> Thanks,
> Andy
>
>
>
> On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:
>
>> Please fix this.
>>
>>
>>
>> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
>> andrew.tolb...@datastax.com> wrote:
>>
>> Hi Ali,
>>
>> As far as I know this hasn't changed.  Either the field name on the class
>> has to match the name of the column or you have to use the @Column with the
>> name attribute to set the column name being mapped by that field.
>>
>> Thanks,
>> Andy
>>
>> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>
>>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar
Not sure I understand the question, sorry.

The column isn't part of the primary key.

I defined a UDT and then I tried to define a column (not primary or cluster
key) as being of that type, but it doesn't let me do that unless i set it
as frozen. Docs indicate otherwise though

On Tue, Oct 11, 2016 at 9:09 AM, Andrew Tolbert  wrote:

> Is the column you are using that has the UDT type is the or is part of the
> primary key?  If that is the case it still needs to be frozen (the same
> goes for list, set, tuple as part of primary key).  This is the error I get
> when I try that:
>
> InvalidRequest: Error from server: code=2200 [Invalid query]
> message="Invalid non-frozen user-defined type for PRIMARY KEY component
> basics"
>
> Andy
>
> On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:
>
>> According to http://docs.datastax.com/en/cql/3.3/cql/cql_using/
>> useCreateUDT.html
>>
>> >  In Cassandra 3.6 and later, the frozen keyword is not required for
>> UDTs that contain only non-collection fields.
>>
>> However if I create a type with 4-5 all text fields, and try to use that
>> type in another table, I get told to use frozen , even though I'm on
>> cassandra 3.9
>>
>> >  show VERSION
>> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>>
>> Any ideas?
>>
>>


Re: Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Andrew Tolbert
Is the column you are using that has the UDT type is the or is part of the
primary key?  If that is the case it still needs to be frozen (the same
goes for list, set, tuple as part of primary key).  This is the error I get
when I try that:

InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid non-frozen user-defined type for PRIMARY KEY component
basics"

Andy

On Mon, Oct 10, 2016 at 8:27 PM Ali Akhtar  wrote:

> According to
> http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html
>
> >  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
> that contain only non-collection fields.
>
> However if I create a type with 4-5 all text fields, and try to use that
> type in another table, I get told to use frozen , even though I'm on
> cassandra 3.9
>
> >  show VERSION
> > [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
>
> Any ideas?
>
>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Andrew Tolbert
I agree this would be a nice mechanism for the driver mapper given the
difference between java field name conventions and how cql column names are
typically defined.   I've created JAVA-1316
 for this.

Thanks,
Andy



On Mon, Oct 10, 2016 at 10:30 PM Ali Akhtar  wrote:

> Please fix this.
>
>
>
> On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert <
> andrew.tolb...@datastax.com> wrote:
>
> Hi Ali,
>
> As far as I know this hasn't changed.  Either the field name on the class
> has to match the name of the column or you have to use the @Column with the
> name attribute to set the column name being mapped by that field.
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>
> In working with Jackson, it has a NamingStrategy which lets you
> automatically map snake_case fields in json to camelCase fields on the Java
> class.
>
> Last time I worked w/ Cassandra, I didn't find anything like that, and had
> to define an @Column annotation for each field.
>
> Please tell me this has changed now?
>
>
>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar
Please fix this.



On Tue, Oct 11, 2016 at 8:28 AM, Andrew Tolbert  wrote:

> Hi Ali,
>
> As far as I know this hasn't changed.  Either the field name on the class
> has to match the name of the column or you have to use the @Column with the
> name attribute to set the column name being mapped by that field.
>
> Thanks,
> Andy
>
> On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:
>
>> In working with Jackson, it has a NamingStrategy which lets you
>> automatically map snake_case fields in json to camelCase fields on the Java
>> class.
>>
>> Last time I worked w/ Cassandra, I didn't find anything like that, and
>> had to define an @Column annotation for each field.
>>
>> Please tell me this has changed now?
>>
>>


Re: NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Andrew Tolbert
Hi Ali,

As far as I know this hasn't changed.  Either the field name on the class
has to match the name of the column or you have to use the @Column with the
name attribute to set the column name being mapped by that field.

Thanks,
Andy

On Mon, Oct 10, 2016 at 8:03 PM Ali Akhtar  wrote:

> In working with Jackson, it has a NamingStrategy which lets you
> automatically map snake_case fields in json to camelCase fields on the Java
> class.
>
> Last time I worked w/ Cassandra, I didn't find anything like that, and had
> to define an @Column annotation for each field.
>
> Please tell me this has changed now?
>
>


Being asked to use frozen for UDT in 3.9

2016-10-10 Thread Ali Akhtar
According to
http://docs.datastax.com/en/cql/3.3/cql/cql_using/useCreateUDT.html

>  In Cassandra 3.6 and later, the frozen keyword is not required for UDTs
that contain only non-collection fields.

However if I create a type with 4-5 all text fields, and try to use that
type in another table, I get told to use frozen , even though I'm on
cassandra 3.9

>  show VERSION
> [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]

Any ideas?


NamingStrategy for the Java Driver for camelCase / snake_case conversion?

2016-10-10 Thread Ali Akhtar
In working with Jackson, it has a NamingStrategy which lets you
automatically map snake_case fields in json to camelCase fields on the Java
class.

Last time I worked w/ Cassandra, I didn't find anything like that, and had
to define an @Column annotation for each field.

Please tell me this has changed now?


Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Abhishek Verma
Thanks Jonathan, Utkarsh and Jeff.

We will try to find a way for our Mesos framework to support upgrading the
nodes in-place.

On Mon, Oct 10, 2016 at 5:11 PM, Jonathan Haddad  wrote:

> During the upgrade you'll want to avoid the following operations that
> result in data streaming:
>
> 1. Bootstrapping nodes
> 2. Decomissioning nodes
> 3. Repair
>
>
> On Mon, Oct 10, 2016 at 5:00 PM Jeff Jirsa 
> wrote:
>
>>
>>
>> No need to cc dev@, user@ is the right list for this question.
>>
>>
>>
>> As Jon mentioned, you can’t stream (bootstrap/rebuild/repair) across
>> major versions, so don’t try to destroy the cluster – just upgrade in
>> place. It IS a good idea to do one DC at a time, but an in-place upgrade is
>> pretty straightforward – flush, drain, stop Cassandra, replace binaries,
>> start Cassandra, run nodetool upgradesstables -a.
>>
>>
>>
>> Note that you can run nodetool upgradesstables on more than one node at a
>> time if you can tolerate the hit to your read latencies.
>>
>>
>>
>> It IS common, I imagine, for there to be schema mismatches temporarily
>> while you have a mixed version cluster – this isn’t necessarily a huge
>> problem, but do try to get to 3.0.8 as quickly as possible once you start,
>> and if you can avoid administrative tasks (such as those that will change
>> the schema) during the process, that’s generally advisable.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *From: *Abhishek Verma 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Monday, October 10, 2016 at 4:34 PM
>> *To: *"user@cassandra.apache.org" , "
>> d...@cassandra.apache.org" 
>> *Subject: *Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8
>> datacenter fails because of streaming errors
>>
>>
>>
>> Hi Cassandra users,
>>
>>
>>
>> We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8
>> (running on Mesos, but that's besides the point). We have two datacenters,
>> so in order to preserve our data, we are trying to upgrade one datacenter
>> at a time.
>>
>>
>>
>> Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
>> down dc1 completely (delete all the data in it), bring it up with 3.0.8,
>> let data replicate from dc2 to dc1, and then tear down dc2, bring it up
>> with 3.0.8 and replicate data from dc1.
>>
>>
>>
>> I am able to reproduce the problem on bare metal clusters running on 3
>> nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.
>>
>>
>>
>> *Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its
>> own IP address, changed listen_address and rpc_address to its own IP and
>> changed endpoint_snitch to GossipingPropertyFileSnitch. I
>> changed conf/cassandra-rackdc.properties to
>>
>> dc=dc2
>>
>> rack=rack2
>>
>> This node started up fine and is UN in nodetool status in dc2.
>>
>>
>>
>> I used CQL shell to create a table and insert 3 rows:
>>
>> verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
>>
>> Connected to Test Cluster at x:9042.
>>
>> [cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
>>
>> Use HELP for help.
>>
>> cqlsh> desc tmp
>>
>>
>>
>> CREATE KEYSPACE tmp WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}  AND durable_writes =
>> true;
>>
>>
>>
>> CREATE TABLE tmp.map (
>>
>> key text PRIMARY KEY,
>>
>> value text
>>
>> )...;
>>
>> cqlsh> select * from tmp.map;
>>
>>
>>
>>  key | value
>>
>> -+---
>>
>>   k1 |v1
>>
>>   k3 |v3
>>
>>   k2 |v2
>>
>>
>>
>>
>>
>> *Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include
>> itself and node A, changed listen_address and rpc_address to its own IP,
>> changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change
>> conf/cassandra-rackdc.properties and its contents are
>>
>> dc=dc1
>>
>> rack=rack1
>>
>>
>>
>> In the logs, I see:
>>
>> INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
>> Messaging Service on /10.164.32.29:7000
>> 
>> (eth0)
>>
>> INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
>> will not auto bootstrap because it is configured to be a seed node.
>>
>>
>>
>> So I start a third node:
>>
>> *Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node
>> A and node B, changed listen_address and rpc_address to its own IP, changed
>> endpoint_snitch to GossipingPropertyFileSnitch. I did not change
>> conf/cassandra-rackdc.properties.
>>
>> Now, nodetool status shows:
>>
>>
>>
>> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status
>>
>> Datacenter: dc1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ 

Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Jonathan Haddad
During the upgrade you'll want to avoid the following operations that
result in data streaming:

1. Bootstrapping nodes
2. Decomissioning nodes
3. Repair


On Mon, Oct 10, 2016 at 5:00 PM Jeff Jirsa 
wrote:

>
>
> No need to cc dev@, user@ is the right list for this question.
>
>
>
> As Jon mentioned, you can’t stream (bootstrap/rebuild/repair) across major
> versions, so don’t try to destroy the cluster – just upgrade in place. It
> IS a good idea to do one DC at a time, but an in-place upgrade is pretty
> straightforward – flush, drain, stop Cassandra, replace binaries, start
> Cassandra, run nodetool upgradesstables -a.
>
>
>
> Note that you can run nodetool upgradesstables on more than one node at a
> time if you can tolerate the hit to your read latencies.
>
>
>
> It IS common, I imagine, for there to be schema mismatches temporarily
> while you have a mixed version cluster – this isn’t necessarily a huge
> problem, but do try to get to 3.0.8 as quickly as possible once you start,
> and if you can avoid administrative tasks (such as those that will change
> the schema) during the process, that’s generally advisable.
>
>
>
>
>
>
>
>
>
> *From: *Abhishek Verma 
> *Reply-To: *"user@cassandra.apache.org" 
> *Date: *Monday, October 10, 2016 at 4:34 PM
> *To: *"user@cassandra.apache.org" , "
> d...@cassandra.apache.org" 
> *Subject: *Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8
> datacenter fails because of streaming errors
>
>
>
> Hi Cassandra users,
>
>
>
> We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8
> (running on Mesos, but that's besides the point). We have two datacenters,
> so in order to preserve our data, we are trying to upgrade one datacenter
> at a time.
>
>
>
> Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
> down dc1 completely (delete all the data in it), bring it up with 3.0.8,
> let data replicate from dc2 to dc1, and then tear down dc2, bring it up
> with 3.0.8 and replicate data from dc1.
>
>
>
> I am able to reproduce the problem on bare metal clusters running on 3
> nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.
>
>
>
> *Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its
> own IP address, changed listen_address and rpc_address to its own IP and
> changed endpoint_snitch to GossipingPropertyFileSnitch. I
> changed conf/cassandra-rackdc.properties to
>
> dc=dc2
>
> rack=rack2
>
> This node started up fine and is UN in nodetool status in dc2.
>
>
>
> I used CQL shell to create a table and insert 3 rows:
>
> verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
>
> Connected to Test Cluster at x:9042.
>
> [cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
>
> Use HELP for help.
>
> cqlsh> desc tmp
>
>
>
> CREATE KEYSPACE tmp WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}  AND durable_writes =
> true;
>
>
>
> CREATE TABLE tmp.map (
>
> key text PRIMARY KEY,
>
> value text
>
> )...;
>
> cqlsh> select * from tmp.map;
>
>
>
>  key | value
>
> -+---
>
>   k1 |v1
>
>   k3 |v3
>
>   k2 |v2
>
>
>
>
>
> *Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include
> itself and node A, changed listen_address and rpc_address to its own IP,
> changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change
> conf/cassandra-rackdc.properties and its contents are
>
> dc=dc1
>
> rack=rack1
>
>
>
> In the logs, I see:
>
> INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
> Messaging Service on /10.164.32.29:7000
> 
> (eth0)
>
> INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
> will not auto bootstrap because it is configured to be a seed node.
>
>
>
> So I start a third node:
>
> *Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node
> A and node B, changed listen_address and rpc_address to its own IP, changed
> endpoint_snitch to GossipingPropertyFileSnitch. I did not change
> conf/cassandra-rackdc.properties.
>
> Now, nodetool status shows:
>
>
>
> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status
>
> Datacenter: dc1
>
> ===
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
>
> UJ 87.81 KB   256  ?
> 9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1
>
> UN107.72 KB  256  100.0%
>  28b1043f-115b-46a5-b6b6-8609829cde76  rack1
>
> Datacenter: dc2
>
> ===
>
> Status=Up/Down
>
> |/ 

Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Jeff Jirsa
 

No need to cc dev@, user@ is the right list for this question.

 

As Jon mentioned, you can’t stream (bootstrap/rebuild/repair) across major 
versions, so don’t try to destroy the cluster – just upgrade in place. It IS a 
good idea to do one DC at a time, but an in-place upgrade is pretty 
straightforward – flush, drain, stop Cassandra, replace binaries, start 
Cassandra, run nodetool upgradesstables -a.

 

Note that you can run nodetool upgradesstables on more than one node at a time 
if you can tolerate the hit to your read latencies.

 

It IS common, I imagine, for there to be schema mismatches temporarily while 
you have a mixed version cluster – this isn’t necessarily a huge problem, but 
do try to get to 3.0.8 as quickly as possible once you start, and if you can 
avoid administrative tasks (such as those that will change the schema) during 
the process, that’s generally advisable.

 

 

 

 

From: Abhishek Verma 
Reply-To: "user@cassandra.apache.org" 
Date: Monday, October 10, 2016 at 4:34 PM
To: "user@cassandra.apache.org" , 
"d...@cassandra.apache.org" 
Subject: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter 
fails because of streaming errors

 

Hi Cassandra users, 

 

We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8 (running on 
Mesos, but that's besides the point). We have two datacenters, so in order to 
preserve our data, we are trying to upgrade one datacenter at a time. 

 

Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear down 
dc1 completely (delete all the data in it), bring it up with 3.0.8, let data 
replicate from dc2 to dc1, and then tear down dc2, bring it up with 3.0.8 and 
replicate data from dc1.

 

I am able to reproduce the problem on bare metal clusters running on 3 nodes. I 
am using Oracle's server-jre-8u74-linux-x64 JRE.

 

Node A: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its own IP 
address, changed listen_address and rpc_address to its own IP and changed 
endpoint_snitch to GossipingPropertyFileSnitch. I changed 
conf/cassandra-rackdc.properties to

dc=dc2

rack=rack2

This node started up fine and is UN in nodetool status in dc2.

 

I used CQL shell to create a table and insert 3 rows:

verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME

Connected to Test Cluster at x:9042.

[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]

Use HELP for help.

cqlsh> desc tmp

 

CREATE KEYSPACE tmp WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '1', 'dc2': '1'}  AND durable_writes = true;

 

CREATE TABLE tmp.map (

key text PRIMARY KEY,

value text

)...;

cqlsh> select * from tmp.map;

 

 key | value

-+---

  k1 |v1

  k3 |v3

  k2 |v2

 

 

Node B: Downloaded 3.0.8-bin.tar.gz, changed the seeds to include itself and 
node A, changed listen_address and rpc_address to its own IP, changed 
endpoint_snitch to GossipingPropertyFileSnitch. I did not change 
conf/cassandra-rackdc.properties and its contents are

dc=dc1

rack=rack1

 

In the logs, I see:

INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting 
Messaging Service on /10.164.32.29:7000 (eth0)

INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node will 
not auto bootstrap because it is configured to be a seed node.

 

So I start a third node:

Node C: Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node A and 
node B, changed listen_address and rpc_address to its own IP, changed 
endpoint_snitch to GossipingPropertyFileSnitch. I did not change 
conf/cassandra-rackdc.properties.

Now, nodetool status shows:

 

verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status

Datacenter: dc1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack

UJ 87.81 KB   256  ? 
9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1

UN107.72 KB  256  100.0%
28b1043f-115b-46a5-b6b6-8609829cde76  rack1

Datacenter: dc2

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address   Load   Tokens   Owns (effective)  Host ID 
  Rack

UN  73.2 KB256  100.0%
09cc542c-2299-45a5-a4d1-159c239ded37  rack2

 

Nodetool describe cluster shows:

verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster

Cluster Information:

Name: Test Cluster

Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch

Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Schema versions:

c2a2bb4f-7d31-3fb8-a216-00b41a643650: [, ]

 

9770e3c5-3135-32e2-b761-65a0f6d8824e: []

 

Note that there are two schema versions and they don't match.

 

I see the following in the system.log: 

 


Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Utkarsh Sengar
As Johathan said, you need to upgrade cassandra directly and use "nodetool
upgradesstables".
Datastax has an excellent resource on upgrading cassandra
https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgdCassandra.html,
specifically
https://docs.datastax.com/en/latest-upgrade/upgrade/cassandra/upgrdCassandraDetails.html

Make sure you have a snapshot from which you can restore using "nodetool
snapshot". We upgraded from 1.x to 2.x and the upgrade went south, had to
restore from snapshot.

Thanks,
-Utkarsh


On Mon, Oct 10, 2016 at 4:46 PM, Jonathan Haddad  wrote:

> You can't stream between major versions. Don't tear down your first data
> center, upgrade it instead.
> On Mon, Oct 10, 2016 at 4:35 PM Abhishek Verma  wrote:
>
>> Hi Cassandra users,
>>
>> We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8
>> (running on Mesos, but that's besides the point). We have two datacenters,
>> so in order to preserve our data, we are trying to upgrade one datacenter
>> at a time.
>>
>> Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
>> down dc1 completely (delete all the data in it), bring it up with 3.0.8,
>> let data replicate from dc2 to dc1, and then tear down dc2, bring it up
>> with 3.0.8 and replicate data from dc1.
>>
>> I am able to reproduce the problem on bare metal clusters running on 3
>> nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.
>>
>> *Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its
>> own IP address, changed listen_address and rpc_address to its own IP and
>> changed endpoint_snitch to GossipingPropertyFileSnitch. I
>> changed conf/cassandra-rackdc.properties to
>> dc=dc2
>> rack=rack2
>> This node started up fine and is UN in nodetool status in dc2.
>>
>> I used CQL shell to create a table and insert 3 rows:
>> verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
>> Connected to Test Cluster at x:9042.
>> [cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
>> Use HELP for help.
>> cqlsh> desc tmp
>>
>> CREATE KEYSPACE tmp WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}  AND durable_writes =
>> true;
>>
>> CREATE TABLE tmp.map (
>> key text PRIMARY KEY,
>> value text
>> )...;
>> cqlsh> select * from tmp.map;
>>
>>  key | value
>> -+---
>>   k1 |v1
>>   k3 |v3
>>   k2 |v2
>>
>>
>> *Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include
>> itself and node A, changed listen_address and rpc_address to its own IP,
>> changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change
>> conf/cassandra-rackdc.properties and its contents are
>> dc=dc1
>> rack=rack1
>>
>> In the logs, I see:
>> INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
>> Messaging Service on /10.164.32.29:7000 (eth0)
>> INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
>> will not auto bootstrap because it is configured to be a seed node.
>>
>> So I start a third node:
>> *Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node
>> A and node B, changed listen_address and rpc_address to its own IP, changed
>> endpoint_snitch to GossipingPropertyFileSnitch. I did not change
>> conf/cassandra-rackdc.properties.
>> Now, nodetool status shows:
>>
>> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status
>> Datacenter: dc1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   Owns (effective)  Host ID
>> Rack
>> UJ 87.81 KB   256  ?
>> 9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1
>> UN107.72 KB  256  100.0%
>>  28b1043f-115b-46a5-b6b6-8609829cde76  rack1
>> Datacenter: dc2
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   Owns (effective)  Host ID
>> Rack
>> UN  73.2 KB256  100.0%
>>  09cc542c-2299-45a5-a4d1-159c239ded37  rack2
>>
>> Nodetool describe cluster shows:
>> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster
>> Cluster Information:
>> Name: Test Cluster
>> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
>> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> Schema versions:
>> c2a2bb4f-7d31-3fb8-a216-00b41a643650: [, ]
>>
>> 9770e3c5-3135-32e2-b761-65a0f6d8824e: []
>>
>> Note that there are two schema versions and they don't match.
>>
>> I see the following in the system.log:
>>
>> INFO  [InternalResponseStage:1] 2016-10-10 22:48:36,055
>> ColumnFamilyStore.java:390 - Initializing system_auth.roles
>> INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
>> waiting for schema information to complete
>> INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
>> schema complete, ready to bootstrap
>> INFO  [main] 2016-10-10 

Re: Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Jonathan Haddad
You can't stream between major versions. Don't tear down your first data
center, upgrade it instead.
On Mon, Oct 10, 2016 at 4:35 PM Abhishek Verma  wrote:

> Hi Cassandra users,
>
> We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8
> (running on Mesos, but that's besides the point). We have two datacenters,
> so in order to preserve our data, we are trying to upgrade one datacenter
> at a time.
>
> Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
> down dc1 completely (delete all the data in it), bring it up with 3.0.8,
> let data replicate from dc2 to dc1, and then tear down dc2, bring it up
> with 3.0.8 and replicate data from dc1.
>
> I am able to reproduce the problem on bare metal clusters running on 3
> nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.
>
> *Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its
> own IP address, changed listen_address and rpc_address to its own IP and
> changed endpoint_snitch to GossipingPropertyFileSnitch. I
> changed conf/cassandra-rackdc.properties to
> dc=dc2
> rack=rack2
> This node started up fine and is UN in nodetool status in dc2.
>
> I used CQL shell to create a table and insert 3 rows:
> verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
> Connected to Test Cluster at x:9042.
> [cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
> Use HELP for help.
> cqlsh> desc tmp
>
> CREATE KEYSPACE tmp WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'}  AND durable_writes =
> true;
>
> CREATE TABLE tmp.map (
> key text PRIMARY KEY,
> value text
> )...;
> cqlsh> select * from tmp.map;
>
>  key | value
> -+---
>   k1 |v1
>   k3 |v3
>   k2 |v2
>
>
> *Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include
> itself and node A, changed listen_address and rpc_address to its own IP,
> changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change
> conf/cassandra-rackdc.properties and its contents are
> dc=dc1
> rack=rack1
>
> In the logs, I see:
> INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
> Messaging Service on /10.164.32.29:7000 (eth0)
> INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
> will not auto bootstrap because it is configured to be a seed node.
>
> So I start a third node:
> *Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node
> A and node B, changed listen_address and rpc_address to its own IP, changed
> endpoint_snitch to GossipingPropertyFileSnitch. I did not change
> conf/cassandra-rackdc.properties.
> Now, nodetool status shows:
>
> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UJ 87.81 KB   256  ?
> 9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1
> UN107.72 KB  256  100.0%
>  28b1043f-115b-46a5-b6b6-8609829cde76  rack1
> Datacenter: dc2
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   Owns (effective)  Host ID
> Rack
> UN  73.2 KB256  100.0%
>  09cc542c-2299-45a5-a4d1-159c239ded37  rack2
>
> Nodetool describe cluster shows:
> verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster
> Cluster Information:
> Name: Test Cluster
> Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
> Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> Schema versions:
> c2a2bb4f-7d31-3fb8-a216-00b41a643650: [, ]
>
> 9770e3c5-3135-32e2-b761-65a0f6d8824e: []
>
> Note that there are two schema versions and they don't match.
>
> I see the following in the system.log:
>
> INFO  [InternalResponseStage:1] 2016-10-10 22:48:36,055
> ColumnFamilyStore.java:390 - Initializing system_auth.roles
> INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
> waiting for schema information to complete
> INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
> schema complete, ready to bootstrap
> INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
> waiting for pending range calculation
> INFO  [main] 2016-10-10 22:48:36,317 StorageService.java:1149 - JOINING:
> calculation complete, ready to bootstrap
> INFO  [main] 2016-10-10 22:48:36,319 StorageService.java:1149 - JOINING:
> getting bootstrap token
> INFO  [main] 2016-10-10 22:48:36,357 StorageService.java:1149 - JOINING:
> sleeping 3 ms for pending range setup
> INFO  [main] 2016-10-10 22:49:06,358 StorageService.java:1149 - JOINING:
> Starting to bootstrap...
> INFO  [main] 2016-10-10 22:49:06,494 StreamResultFuture.java:87 - [Stream
> #bfb5e470-8f3b-11e6-b69a-1b451159408e] Executing streaming plan for
> Bootstrap
> INFO  [StreamConnectionEstablisher:1] 

Bootstrapping data from Cassandra 2.2.5 datacenter to 3.0.8 datacenter fails because of streaming errors

2016-10-10 Thread Abhishek Verma
Hi Cassandra users,

We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8 (running
on Mesos, but that's besides the point). We have two datacenters, so in
order to preserve our data, we are trying to upgrade one datacenter at a
time.

Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
down dc1 completely (delete all the data in it), bring it up with 3.0.8,
let data replicate from dc2 to dc1, and then tear down dc2, bring it up
with 3.0.8 and replicate data from dc1.

I am able to reproduce the problem on bare metal clusters running on 3
nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.

*Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its own
IP address, changed listen_address and rpc_address to its own IP and
changed endpoint_snitch to GossipingPropertyFileSnitch. I
changed conf/cassandra-rackdc.properties to
dc=dc2
rack=rack2
This node started up fine and is UN in nodetool status in dc2.

I used CQL shell to create a table and insert 3 rows:
verma@x:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
Connected to Test Cluster at x:9042.
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> desc tmp

CREATE KEYSPACE tmp WITH replication = {'class': 'NetworkTopologyStrategy',
'dc1': '1', 'dc2': '1'}  AND durable_writes = true;

CREATE TABLE tmp.map (
key text PRIMARY KEY,
value text
)...;
cqlsh> select * from tmp.map;

 key | value
-+---
  k1 |v1
  k3 |v3
  k2 |v2


*Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include itself
and node A, changed listen_address and rpc_address to its own IP, changed
endpoint_snitch to GossipingPropertyFileSnitch. I did not change
conf/cassandra-rackdc.properties and its contents are
dc=dc1
rack=rack1

In the logs, I see:
INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
Messaging Service on /10.164.32.29:7000 (eth0)
INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
will not auto bootstrap because it is configured to be a seed node.

So I start a third node:
*Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node A
and node B, changed listen_address and rpc_address to its own IP, changed
endpoint_snitch to GossipingPropertyFileSnitch. I did not change
conf/cassandra-rackdc.properties.
Now, nodetool status shows:

verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool status
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID
  Rack
UJ 87.81 KB   256  ?
9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1
UN107.72 KB  256  100.0%
 28b1043f-115b-46a5-b6b6-8609829cde76  rack1
Datacenter: dc2
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens   Owns (effective)  Host ID
  Rack
UN  73.2 KB256  100.0%
 09cc542c-2299-45a5-a4d1-159c239ded37  rack2

Nodetool describe cluster shows:
verma@xxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
c2a2bb4f-7d31-3fb8-a216-00b41a643650: [, ]

9770e3c5-3135-32e2-b761-65a0f6d8824e: []

Note that there are two schema versions and they don't match.

I see the following in the system.log:

INFO  [InternalResponseStage:1] 2016-10-10 22:48:36,055
ColumnFamilyStore.java:390 - Initializing system_auth.roles
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
waiting for schema information to complete
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
schema complete, ready to bootstrap
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
waiting for pending range calculation
INFO  [main] 2016-10-10 22:48:36,317 StorageService.java:1149 - JOINING:
calculation complete, ready to bootstrap
INFO  [main] 2016-10-10 22:48:36,319 StorageService.java:1149 - JOINING:
getting bootstrap token
INFO  [main] 2016-10-10 22:48:36,357 StorageService.java:1149 - JOINING:
sleeping 3 ms for pending range setup
INFO  [main] 2016-10-10 22:49:06,358 StorageService.java:1149 - JOINING:
Starting to bootstrap...
INFO  [main] 2016-10-10 22:49:06,494 StreamResultFuture.java:87 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Executing streaming plan for
Bootstrap
INFO  [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,495
StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Starting streaming to /
INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,495
StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Starting streaming to /
INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,500
StreamCoordinator.java:213 - [Stream 

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar
Okay.. so, how would you achieve the above scenario in cassandra?

On Tue, Oct 11, 2016 at 3:25 AM, Peddi, Praveen  wrote:

> That's not just a bad idea but that's impossible. Any field that is part
> of primary key is immutable. You should read up the Cassandra documentation
> and understand the basics before start using it. Otherwise you could easily
> abuse it inadvertently.
>
> Praveen
>
> On Oct 10, 2016, at 6:22 PM, Ali Akhtar  wrote:
>
> E.g if I wanted to select * from foo where last_updated <= ?
>
> In this case, (I believe) last_updated will have to be a clustering key.
> If the record got updated and I wanted to update last_updated accordingly,
> that's a bad idea?
>
> :S
>
> On Tue, Oct 11, 2016 at 3:19 AM, Ali Akhtar  wrote:
>
>> Huh - So if I wanted to search / filter by a timestamp field, and this
>> timestamp needed to get updated, that won't be possible?
>>
>> On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
>> nicolas.douil...@gmail.com> wrote:
>>
>>> If I correctly understand the answers, the solution to your ordering
>>> question is to use clustering keys.
>>> I'm agree, but I just wanted to warn you about one limitation :  the
>>> values of keys can't be updated, unless by using a delete and then an
>>> insert.
>>> (In the case of your song "example", putting the rate as a key can be
>>> tricky if the value has to be frequently updated)
>>>
>>>
>>> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
>>> mikhail.krupits...@jetbrains.com> a écrit :
>>>
 Looks like ordering by multiple columns in Cassandra has few sides that
 are not obvious.
 I wasn’t able to find this information in the official documentation
 but it’s quite well described here:
 http://stackoverflow.com/questions/35708118/where-and-order-
 by-clauses-in-cassandra-cql

 Thanks,
 Mikhail

 On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:

 No, we didn't record the talk this time unfortunately :(

 On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar 
 wrote:

 Really helpful slides. Is there a video to go with them?

 On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
 wrote:

 Yes it is possible, read this: http://www.slideshare.ne
 t/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24

 and the following slides

 On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar 
 wrote:

 Is it possible to have multiple clustering keys in cassandra, or some
 other way to order by multiple columns?

 For example, say I have a table of songs, and each song has a rating
 and a date.

 I want to sort songs by rating first, and then with newer songs on top.

 So if two songs have 5 rating, and one's date is 1st Feb, the other is
 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.

 Like this:

 Select * from songs order by rating, createdAt

 Is this possible?






>>
>


Re: Ordering by multiple columns?

2016-10-10 Thread Peddi, Praveen
That's not just a bad idea but that's impossible. Any field that is part of 
primary key is immutable. You should read up the Cassandra documentation and 
understand the basics before start using it. Otherwise you could easily abuse 
it inadvertently.

Praveen

On Oct 10, 2016, at 6:22 PM, Ali Akhtar 
> wrote:

E.g if I wanted to select * from foo where last_updated <= ?

In this case, (I believe) last_updated will have to be a clustering key. If the 
record got updated and I wanted to update last_updated accordingly, that's a 
bad idea?

:S

On Tue, Oct 11, 2016 at 3:19 AM, Ali Akhtar 
> wrote:
Huh - So if I wanted to search / filter by a timestamp field, and this 
timestamp needed to get updated, that won't be possible?

On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet 
> wrote:
If I correctly understand the answers, the solution to your ordering question 
is to use clustering keys.
I'm agree, but I just wanted to warn you about one limitation :  the values of 
keys can't be updated, unless by using a delete and then an insert.
(In the case of your song "example", putting the rate as a key can be tricky if 
the value has to be frequently updated)


Le lun. 10 oct. 2016 ? 22:15, Mikhail Krupitskiy 
> a 
?crit :
Looks like ordering by multiple columns in Cassandra has few sides that are not 
obvious.
I wasn't able to find this information in the official documentation but it's 
quite well described here:
http://stackoverflow.com/questions/35708118/where-and-order-by-clauses-in-cassandra-cql

Thanks,
Mikhail
On 10 Oct 2016, at 21:55, DuyHai Doan 
> wrote:

No, we didn't record the talk this time unfortunately :(

On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar 
> wrote:
Really helpful slides. Is there a video to go with them?

On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
> wrote:
Yes it is possible, read this: 
http://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24

and the following slides

On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar 
> wrote:
Is it possible to have multiple clustering keys in cassandra, or some other way 
to order by multiple columns?

For example, say I have a table of songs, and each song has a rating and a date.

I want to sort songs by rating first, and then with newer songs on top.

So if two songs have 5 rating, and one's date is 1st Feb, the other is 2nd Feb, 
then I want the 2nd feb one to be sorted above the 1st feb one.

Like this:

Select * from songs order by rating, createdAt

Is this possible?








Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar
E.g if I wanted to select * from foo where last_updated <= ?

In this case, (I believe) last_updated will have to be a clustering key. If
the record got updated and I wanted to update last_updated accordingly,
that's a bad idea?

:S

On Tue, Oct 11, 2016 at 3:19 AM, Ali Akhtar  wrote:

> Huh - So if I wanted to search / filter by a timestamp field, and this
> timestamp needed to get updated, that won't be possible?
>
> On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
> nicolas.douil...@gmail.com> wrote:
>
>> If I correctly understand the answers, the solution to your ordering
>> question is to use clustering keys.
>> I'm agree, but I just wanted to warn you about one limitation :  the
>> values of keys can't be updated, unless by using a delete and then an
>> insert.
>> (In the case of your song "example", putting the rate as a key can be
>> tricky if the value has to be frequently updated)
>>
>>
>> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
>> mikhail.krupits...@jetbrains.com> a écrit :
>>
>>> Looks like ordering by multiple columns in Cassandra has few sides that
>>> are not obvious.
>>> I wasn’t able to find this information in the official documentation but
>>> it’s quite well described here:
>>> http://stackoverflow.com/questions/35708118/where-and-order-
>>> by-clauses-in-cassandra-cql
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>>>
>>> No, we didn't record the talk this time unfortunately :(
>>>
>>> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar 
>>> wrote:
>>>
>>> Really helpful slides. Is there a video to go with them?
>>>
>>> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
>>> wrote:
>>>
>>> Yes it is possible, read this: http://www.slideshare.ne
>>> t/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>>
>>> and the following slides
>>>
>>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>>>
>>> Is it possible to have multiple clustering keys in cassandra, or some
>>> other way to order by multiple columns?
>>>
>>> For example, say I have a table of songs, and each song has a rating and
>>> a date.
>>>
>>> I want to sort songs by rating first, and then with newer songs on top.
>>>
>>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>>
>>> Like this:
>>>
>>> Select * from songs order by rating, createdAt
>>>
>>> Is this possible?
>>>
>>>
>>>
>>>
>>>
>>>
>


Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar
Huh - So if I wanted to search / filter by a timestamp field, and this
timestamp needed to get updated, that won't be possible?

On Tue, Oct 11, 2016 at 3:07 AM, Nicolas Douillet <
nicolas.douil...@gmail.com> wrote:

> If I correctly understand the answers, the solution to your ordering
> question is to use clustering keys.
> I'm agree, but I just wanted to warn you about one limitation :  the
> values of keys can't be updated, unless by using a delete and then an
> insert.
> (In the case of your song "example", putting the rate as a key can be
> tricky if the value has to be frequently updated)
>
>
> Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
> mikhail.krupits...@jetbrains.com> a écrit :
>
>> Looks like ordering by multiple columns in Cassandra has few sides that
>> are not obvious.
>> I wasn’t able to find this information in the official documentation but
>> it’s quite well described here:
>> http://stackoverflow.com/questions/35708118/where-and-
>> order-by-clauses-in-cassandra-cql
>>
>> Thanks,
>> Mikhail
>>
>> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>>
>> No, we didn't record the talk this time unfortunately :(
>>
>> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  wrote:
>>
>> Really helpful slides. Is there a video to go with them?
>>
>> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan 
>> wrote:
>>
>> Yes it is possible, read this: http://www.slideshare.
>> net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>
>> and the following slides
>>
>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>>
>> Is it possible to have multiple clustering keys in cassandra, or some
>> other way to order by multiple columns?
>>
>> For example, say I have a table of songs, and each song has a rating and
>> a date.
>>
>> I want to sort songs by rating first, and then with newer songs on top.
>>
>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>
>> Like this:
>>
>> Select * from songs order by rating, createdAt
>>
>> Is this possible?
>>
>>
>>
>>
>>
>>


Re: Ordering by multiple columns?

2016-10-10 Thread Nicolas Douillet
If I correctly understand the answers, the solution to your ordering
question is to use clustering keys.
I'm agree, but I just wanted to warn you about one limitation :  the values
of keys can't be updated, unless by using a delete and then an insert.
(In the case of your song "example", putting the rate as a key can be
tricky if the value has to be frequently updated)


Le lun. 10 oct. 2016 à 22:15, Mikhail Krupitskiy <
mikhail.krupits...@jetbrains.com> a écrit :

> Looks like ordering by multiple columns in Cassandra has few sides that
> are not obvious.
> I wasn’t able to find this information in the official documentation but
> it’s quite well described here:
>
> http://stackoverflow.com/questions/35708118/where-and-order-by-clauses-in-cassandra-cql
>
> Thanks,
> Mikhail
>
> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
>
> No, we didn't record the talk this time unfortunately :(
>
> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  wrote:
>
> Really helpful slides. Is there a video to go with them?
>
> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  wrote:
>
> Yes it is possible, read this:
> http://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>
> and the following slides
>
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>
> Is it possible to have multiple clustering keys in cassandra, or some
> other way to order by multiple columns?
>
> For example, say I have a table of songs, and each song has a rating and a
> date.
>
> I want to sort songs by rating first, and then with newer songs on top.
>
> So if two songs have 5 rating, and one's date is 1st Feb, the other is 2nd
> Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>
> Like this:
>
> Select * from songs order by rating, createdAt
>
> Is this possible?
>
>
>
>
>
>


[RELEASE] Apache Cassandra 2.1.16 released

2016-10-10 Thread Michael Shuler
The Cassandra team is pleased to announce the release of Apache
Cassandra version 2.1.16.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always,
please pay attention to the release notes[2] and Let us know[3] if you
were to encounter any problem.

Enjoy!

[1]: (CHANGES.txt) https://goo.gl/Unwb9s
[2]: (NEWS.txt) https://goo.gl/LuZHa5
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Understanding cassandra data directory contents

2016-10-10 Thread Nicolas Douillet
Hi Json,

I'm not familiar enough with Cassandra 3, but it might be snapshots.
Snapshots are usually hardlinks to sstable directories.

Try this :
nodetool clearsnapshot

Does it change anything?

--
Nicolas

Le sam. 8 oct. 2016 à 21:26, Jason Kania  a écrit :

> Hi Vladamir,
>
> Thanks for the response. I assume then that it is safe to remove the
> directories that are not current as per the system_schema.tables table. I
> have dozens of the same table and haven't dropped and added nearly that
> many times. Do any of the nodetool or other commands clean up these unused
> directories?
>
> Thanks,
>
> Jason Kania
>
> --
> *From:* Vladimir Yudovin 
> *To:* user@cassandra.apache.org; Jason Kania 
> *Sent:* Saturday, October 8, 2016 2:05 PM
> *Subject:* Re: Understanding cassandra data directory contents
>
> Each table has unique id (suffix). If you drop and then recreate table
> with the same name it gets new id.
>
> Try
> *SELECT keyspace_name, table_name, id FROM system_schema.tables ;*
> to determinate actual ID.
>
> You can limit request to specific keyspace or table.
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone  - Hosted Cloud Cassandra on
> Azure and SoftLayer.Launch your cluster in minutes.*
>
>
>  On Sat, 08 Oct 2016 13:42:19 -0400 *Jason
> Kania>* wrote 
>
> Hello,
>
> I am using Cassandra 3.0.9 and I have encountered an issue where the nodes
> in my 3 node cluster have vastly different amounts of data even though they
> should be roughly the same. When I looked through the data directory for my
> database on two of the nodes, I see a number of directories with the same
> prefix, eg:
>
> periodicReading-76eb7510096811e68a7421c8b9466352,
> periodicReading-453d55a0501d11e68623a9d2b6f96e86
> ...
>
> Only one directory with a specific table name prefix has a current date
> and the rest are older.
>
> In contrast, on the node with the least space used, each directory has a
> unique prefix (not shared).
>
> I am wondering what the contents of a Cassandra database directory should
> look like. Are there supposed to be multiple entries for a given table or
> just one?
>
> If just one, what would be a procedure to determine if the other
> directories with the same table are junk that can be removed.
>
> Thanks,
>
> Jason
>
>
>
>
>
>


Re: [Marketing Mail] Re: sstableloader question

2016-10-10 Thread Osman YOZGATLIOGLU
Hello,

Thank you Adam and Rajath.

I'll split input sstables and run parallel jobs for each.
I tested this approach and run 3 parallel sstableloader job without -t 
parameter.
I raised stream_throughput_outbound_megabits_per_sec parameter from 200 to 600 
Mbit/sec at all of target nodes.
But each job runs about 10MB/sec only and generates about 100Mbit'sec network 
traffic.
At total this can be much more. Source and target servers has plenty of unused 
cpu, io and network resource.
Do you have any idea how can I increase speed of sstableloader job?

Regards,
Osman

On 10-10-2016 22:05, Rajath Subramanyam wrote:
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You can 
restart the sstableloader job itself. Compaction will eventually take care of 
the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson 
> wrote:
It'll start over from the beginning.


On Sunday, October 9, 2016, Osman YOZGATLIOGLU 
> wrote:
Hello,

I have running a sstableloader job.
Unfortunately some of nodes restarted since beginnig streaming.
I see streaming stop for those nodes.
Can I restart those streaming somehow?
Or if I restart sstableloader job, will it start from beginning?

Regards,
Osman


This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.


--

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io




This e-mail message, including any attachments, is for the sole use of the 
person to whom it has been sent, and may contain information that is 
confidential or legally protected. If you are not the intended recipient or 
have received this message in error, you are not authorized to copy, 
distribute, or otherwise use this message or its attachments. Please notify the 
sender immediately by return e-mail and permanently delete this message and any 
attachments. KRON makes no warranty that this e-mail is error or virus free.


Re: Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar
Yeah, so what's happening is, I'm running Cassandra thru a docker image in
production, and so over there, it is using the datacenter name that I
specified thru an env variable.

But on my local machine, Cassandra is annoyingly insisting on 'datacenter1'.

So in order to maintain the same .cql scripts for setting up the db, I
either need to change the dc name locally or in production.

I guess it looks like I should leave it 'datacenter1' in production.

On Tue, Oct 11, 2016 at 1:19 AM, Amit Trivedi  wrote:

> I believe it is coming from system.local. You can verify by executing
>
> select data_center from system.local;
>
> I would be careful changing datacenter name, particularly in production. This
> is essentially because if change of datacenter requires snitch
> configuration change, it may result in stale data depending on token values
> and snitch settings and there is a risk of node reporting invalid/ missing
> data to client.
>
>
>
> On Mon, Oct 10, 2016 at 4:08 PM, Ali Akhtar  wrote:
>
>> So I see this:
>>
>> cluster_name: 'Test Cluster'
>>
>> But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I
>> don't see that anywhere except in a comment.
>>
>>
>> Yet when I do nodetool status, I see: datacenter1
>>
>> And unless I define my replication as: '{'class':
>> 'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
>> my inserts / selects don't work because it says 0 replicas available (i.e
>> if i use anything other than 'datacenter1' in the above stmt)
>>
>> I don't see 'datacenter1' in rackdc.properties. So my question is, which
>> file contains 'datacenter1'?
>>
>> On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  wrote:
>>
>>> There is a cluster name in the cassandra.yaml for naming the cluster,
>>> aka data center. Then you assign keyspaces to the data center within the
>>> CREATE KEYSPACE stmt with NetworkTopology.
>>>
>>>
>>> On Monday, October 10, 2016, Ali Akhtar  wrote:
>>>
 Where can I change the default name 'datacenter1'? I've looked through
 the configuration files in /etc/cassandra , and can't find where this value
 is being defined.

>>>
>>>
>>> --
>>>
>>> Adam Hutson
>>> Data Architect | DataScale
>>> +1 (417) 224-5212
>>> a...@datascale.io
>>>
>>
>>
>


Re: Where to change the datacenter name?

2016-10-10 Thread Amit Trivedi
I believe it is coming from system.local. You can verify by executing

select data_center from system.local;

I would be careful changing datacenter name, particularly in production. This
is essentially because if change of datacenter requires snitch
configuration change, it may result in stale data depending on token values
and snitch settings and there is a risk of node reporting invalid/ missing
data to client.



On Mon, Oct 10, 2016 at 4:08 PM, Ali Akhtar  wrote:

> So I see this:
>
> cluster_name: 'Test Cluster'
>
> But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I
> don't see that anywhere except in a comment.
>
>
> Yet when I do nodetool status, I see: datacenter1
>
> And unless I define my replication as: '{'class':
> 'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
> my inserts / selects don't work because it says 0 replicas available (i.e
> if i use anything other than 'datacenter1' in the above stmt)
>
> I don't see 'datacenter1' in rackdc.properties. So my question is, which
> file contains 'datacenter1'?
>
> On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  wrote:
>
>> There is a cluster name in the cassandra.yaml for naming the cluster, aka
>> data center. Then you assign keyspaces to the data center within the CREATE
>> KEYSPACE stmt with NetworkTopology.
>>
>>
>> On Monday, October 10, 2016, Ali Akhtar  wrote:
>>
>>> Where can I change the default name 'datacenter1'? I've looked through
>>> the configuration files in /etc/cassandra , and can't find where this value
>>> is being defined.
>>>
>>
>>
>> --
>>
>> Adam Hutson
>> Data Architect | DataScale
>> +1 (417) 224-5212
>> a...@datascale.io
>>
>
>


Re: Where to change the datacenter name?

2016-10-10 Thread Surbhi Gupta
Data center name is there in two file , if you are using gossip as
GossipingPropertyFileSnitch
in Cassandra.yaml then data center name is in  cassandra-rackdc.properties

If you are using
PropertyFileSnitch  in Cassandra.yaml then file name where data center name
is Cassandra-topology.properties file
On Monday, October 10, 2016, Ali Akhtar  wrote:

> So I see this:
>
> cluster_name: 'Test Cluster'
>
> But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I
> don't see that anywhere except in a comment.
>
>
> Yet when I do nodetool status, I see: datacenter1
>
> And unless I define my replication as: '{'class':
> 'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
> my inserts / selects don't work because it says 0 replicas available (i.e
> if i use anything other than 'datacenter1' in the above stmt)
>
> I don't see 'datacenter1' in rackdc.properties. So my question is, which
> file contains 'datacenter1'?
>
> On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  > wrote:
>
>> There is a cluster name in the cassandra.yaml for naming the cluster, aka
>> data center. Then you assign keyspaces to the data center within the CREATE
>> KEYSPACE stmt with NetworkTopology.
>>
>>
>> On Monday, October 10, 2016, Ali Akhtar > > wrote:
>>
>>> Where can I change the default name 'datacenter1'? I've looked through
>>> the configuration files in /etc/cassandra , and can't find where this value
>>> is being defined.
>>>
>>
>>
>> --
>>
>> Adam Hutson
>> Data Architect | DataScale
>> +1 (417) 224-5212
>> a...@datascale.io 
>>
>
>


Re: Ordering by multiple columns?

2016-10-10 Thread Mikhail Krupitskiy
Looks like ordering by multiple columns in Cassandra has few sides that are not 
obvious.
I wasn’t able to find this information in the official documentation but it’s 
quite well described here:
http://stackoverflow.com/questions/35708118/where-and-order-by-clauses-in-cassandra-cql
 


Thanks,
Mikhail
> On 10 Oct 2016, at 21:55, DuyHai Doan  wrote:
> 
> No, we didn't record the talk this time unfortunately :(
> 
> On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  > wrote:
> Really helpful slides. Is there a video to go with them?
> 
> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  > wrote:
> Yes it is possible, read this: 
> http://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>  
> 
> 
> and the following slides
> 
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  > wrote:
> Is it possible to have multiple clustering keys in cassandra, or some other 
> way to order by multiple columns?
> 
> For example, say I have a table of songs, and each song has a rating and a 
> date.
> 
> I want to sort songs by rating first, and then with newer songs on top.
> 
> So if two songs have 5 rating, and one's date is 1st Feb, the other is 2nd 
> Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
> 
> Like this:
> 
> Select * from songs order by rating, createdAt
> 
> Is this possible?
> 
> 
> 



Re: Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar
So I see this:

cluster_name: 'Test Cluster'

But when I grep -i or ctrl + f for 'datacenter1` in cassandra.yaml, I don't
see that anywhere except in a comment.


Yet when I do nodetool status, I see: datacenter1

And unless I define my replication as: '{'class':
'NetworkTopologyStrategy', 'datacenter1' : 3}' when creating my keyspace,
my inserts / selects don't work because it says 0 replicas available (i.e
if i use anything other than 'datacenter1' in the above stmt)

I don't see 'datacenter1' in rackdc.properties. So my question is, which
file contains 'datacenter1'?

On Tue, Oct 11, 2016 at 12:54 AM, Adam Hutson  wrote:

> There is a cluster name in the cassandra.yaml for naming the cluster, aka
> data center. Then you assign keyspaces to the data center within the CREATE
> KEYSPACE stmt with NetworkTopology.
>
>
> On Monday, October 10, 2016, Ali Akhtar  wrote:
>
>> Where can I change the default name 'datacenter1'? I've looked through
>> the configuration files in /etc/cassandra , and can't find where this value
>> is being defined.
>>
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>


Re: Where to change the datacenter name?

2016-10-10 Thread Adam Hutson
There is a cluster name in the cassandra.yaml for naming the cluster, aka
data center. Then you assign keyspaces to the data center within the CREATE
KEYSPACE stmt with NetworkTopology.

On Monday, October 10, 2016, Ali Akhtar  wrote:

> Where can I change the default name 'datacenter1'? I've looked through the
> configuration files in /etc/cassandra , and can't find where this value is
> being defined.
>


-- 

Adam Hutson
Data Architect | DataScale
+1 (417) 224-5212
a...@datascale.io


Where to change the datacenter name?

2016-10-10 Thread Ali Akhtar
Where can I change the default name 'datacenter1'? I've looked through the
configuration files in /etc/cassandra , and can't find where this value is
being defined.


Re: sstableloader question

2016-10-10 Thread Rajath Subramanyam
Hi Osman,

You cannot restart the streaming only to the failed nodes specifically. You
can restart the sstableloader job itself. Compaction will eventually take
care of the redundant rows.

- Rajath


Rajath Subramanyam


On Sun, Oct 9, 2016 at 7:38 PM, Adam Hutson  wrote:

> It'll start over from the beginning.
>
>
> On Sunday, October 9, 2016, Osman YOZGATLIOGLU <
> osman.yozgatlio...@krontech.com> wrote:
>
>> Hello,
>>
>> I have running a sstableloader job.
>> Unfortunately some of nodes restarted since beginnig streaming.
>> I see streaming stop for those nodes.
>> Can I restart those streaming somehow?
>> Or if I restart sstableloader job, will it start from beginning?
>>
>> Regards,
>> Osman
>>
>>
>> This e-mail message, including any attachments, is for the sole use of
>> the person to whom it has been sent, and may contain information that is
>> confidential or legally protected. If you are not the intended recipient or
>> have received this message in error, you are not authorized to copy,
>> distribute, or otherwise use this message or its attachments. Please notify
>> the sender immediately by return e-mail and permanently delete this message
>> and any attachments. KRON makes no warranty that this e-mail is error or
>> virus free.
>>
>
>
> --
>
> Adam Hutson
> Data Architect | DataScale
> +1 (417) 224-5212
> a...@datascale.io
>


Re: Doing a calculation in a query?

2016-10-10 Thread DuyHai Doan
Assuming you're using Cassandra 3.0 or more, User Defined Functions (UDF)
can help you to compute the shipment_delay. For the ordering, since this
column is computed and not a clustering column, ordering won't be possible

More details about UDF: http://www.doanduyhai.com/blog/?p=1876

On Mon, Oct 10, 2016 at 6:08 PM, Ali Akhtar  wrote:

> I have a table for tracking orders. Each order has an `ordered_at` field
> (can be a timestamp, or a long with the milliseconds of the timestamp) and
> `shipped_at` field (ditto, timestamp or long).
>
> orderd_at tracks when the order was made.
>
> shipped_at tracks when the order was shipped.
>
> When retrieving the orders, I need to calculate an additional field,
> called 'shipment_delay'. This is simply, 'shipped_at - ordered_at`. I.e how
> long it took between when the order was made, and when it was shipped.
>
> The tricky part is, that if an order isn't yet shipped, then it should
> just return how many days it has been since the order was made.
>
> E.g, if order was made on Jan 1 and shipped on Jan 5th, shipment_delay = 4
>  days (in milliseconds if needed)
>
> If order made on Jan 1, but not yet shipped, and today is Jan 10th, then
> shipment_delay = 10 days.
>
> I then need to sort the orders in the order of 'shipment_delay desc', i.e
> show the orders which took the longest, at the top.
>
> Is it possible to define 'shipment_delay' at the table or query level, so
> it can be used in the 'order by' clause, or if this ordering will have to
> be done myself after the data is received?
>
> Thanks.
>
>


Re: Ordering by multiple columns?

2016-10-10 Thread DuyHai Doan
No, we didn't record the talk this time unfortunately :(

On Mon, Oct 10, 2016 at 8:17 PM, Ali Akhtar  wrote:

> Really helpful slides. Is there a video to go with them?
>
> On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  wrote:
>
>> Yes it is possible, read this: http://www.slideshare.ne
>> t/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>>
>> and the following slides
>>
>> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>>
>>> Is it possible to have multiple clustering keys in cassandra, or some
>>> other way to order by multiple columns?
>>>
>>> For example, say I have a table of songs, and each song has a rating and
>>> a date.
>>>
>>> I want to sort songs by rating first, and then with newer songs on top.
>>>
>>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>>
>>> Like this:
>>>
>>> Select * from songs order by rating, createdAt
>>>
>>> Is this possible?
>>>
>>
>>
>


Re: JVM safepoints, mmap, and slow disks

2016-10-10 Thread Ariel Weisberg
Hi,

> That StackOverflow headline is interesting. Based on my reading of
> Hotspot's
> code, it looks like sun.misc.unsafe is used under the hood to
> perform mmapped
> I/O. I need to learn more about Hotspot's implementation before I can
>  comment
> further.
A memory mapped file is "just" memory so it's accessed using a
ByteBuffer pointing to off heap memory. Works the same as if you had
mapped in some anonymous memory.

> Not sure what you mean here. Aren't there going to be cache and TLB
> misses for any I/O, whether via mmap or syscall?
>
The beauty of memory mapped files can be that if the data is already in
the page cache it's just a regular read of memory. If you touch each
page and it's all in memory it's going to be a slower operation that
blocks the CPU as it has to synchronously load each cache line. It's
possible you might be able to touch multiple pages in parallel if you
are clever.

So if the data is the page cache and you just access it regularly
(sequentially) you get all the benefits of the prefetcher. If you go and
touch every page first you will not have the latency of prefetching
hidden from you.

> There is a system call to page the memory in which might be better for
> larger reads. Still no guarantee things stay cached though.
When you fault a page the kernel has no idea how much you are going to
read. If there is a mismatch then you may end up going back and forth to
the device several times and for spinning disk this is worse. If you
express up front what you want to read either by fadvise/madvise or a
buffered read it can do something "smart". Granted IO scheduling ranges
from middling to non-existent most of the time, and the fadvise/madvise
stuff for this has holes I can't recall right now.

Ariel

On Mon, Oct 10, 2016, at 02:19 PM, Josh Snyder wrote:
> On Sat, Oct 8, 2016 at 9:02 PM, Ariel Weisberg
>  wrote:
> ...
>
> > You could use this to minimize the cost.
> > http://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni/36309652#36309652
>
> That StackOverflow headline is interesting. Based on my reading of
> Hotspot's
> code, it looks like sun.misc.unsafe is used under the hood to perform
> mmapped
> I/O. I need to learn more about Hotspot's implementation before I can
> comment
> further.
>
> > Maybe faster than doing buffered IO. It's a lot of cache and TLB
> > misses
> > with out prefetching though.
>
> Not sure what you mean here. Aren't there going to be cache and TLB
> misses for
> any I/O, whether via mmap or syscall?
>
> > There is a system call to page the memory in which might be
> > better for
> > larger reads. Still no guarantee things stay cached though.
>
> The approaches I've seen just involve something in userspace going
> through and
> touching every desired page. It works, especially if you touch
> pages in
> parallel.
>
> Thanks for the pointers. If I get anywhere with them, I'll be sure to
> let you know.
>
> Josh
>
> > On Sat, Oct 8, 2016, at 08:21 PM, Graham Sanderson wrote:
> >> I haven’t studied the read path that carefully, but there might be
> >> a spot at the C* level rather than JVM level where you could
> >> effectively do a JNI touch of the mmap region you’re going to need
> >> next.
> >>
> >>> On Oct 8, 2016, at 7:17 PM, Graham Sanderson 
> >>> wrote:
> >>>
> >>> We don’t use Azul’s Zing, but it does have the nice feature that
> >>> all threads don’t have to reach safepoints at the same time. That
> >>> said we make heavy use of Cassandra (with off heap memtables - not
> >>> directly related but allows us a lot more GC headroom) and SOLR
> >>> where we switched to mmap because it FAR out performed pread
> >>> variants - in no cases have we noticed long time to safe point
> >>> (then again our IO is lightning fast).
> >>>
>  On Oct 8, 2016, at 1:20 PM, Jonathan Haddad 
>  wrote:
> 
>  Linux automatically uses free memory as cache.  It's not swap.
> 
>  http://www.tldp.org/LDP/lki/lki-4.html
> 
>  On Sat, Oct 8, 2016 at 11:12 AM Vladimir Yudovin
>   wrote:
> > __
> > Sorry, I don't catch something. What page (memory) cache can
> > exist if there is no swap file.
> > Where are those page written/read?
> >
> >
> > Best regards, Vladimir Yudovin,
> > *Winguzone[https://winguzone.com/?from=list] - Hosted Cloud
> > Cassandra on Azure and SoftLayer.
> > Launch your cluster in minutes.
> > *
> >
> >  On Sat, 08 Oct 2016 14:09:50 -0400 *Ariel
> > Weisberg* wrote 
> >> Hi,
> >>
> >> Nope I mean page cache. Linux doesn't call the cache it
> >> maintains using free memory a file cache. It uses free (and
> >> some of the time not so free!) memory to buffer writes and to
> >> cache recently written/read data.
> >>
> >> http://www.tldp.org/LDP/lki/lki-4.html
> >>
> >> 

Re: JVM safepoints, mmap, and slow disks

2016-10-10 Thread Josh Snyder
That's a great idea. Even if the results were immediately thrown away,
pre-reading in a JNI method would eliminate cache misses with very high
probability. The only thing I'd worry about is the increased overhead of JNI
interfering with the fast path (cache hits). I don't have enough knowledge on
the read path or about JNI latency to comment on whether this concern is "real"
or not.

Josh

On Sat, Oct 8, 2016 at 5:21 PM, Graham Sanderson  wrote:
> I haven’t studied the read path that carefully, but there might be a spot at
> the C* level rather than JVM level where you could effectively do a JNI
> touch of the mmap region you’re going to need next.
>
> On Oct 8, 2016, at 7:17 PM, Graham Sanderson  wrote:
>
> We don’t use Azul’s Zing, but it does have the nice feature that all threads
> don’t have to reach safepoints at the same time. That said we make heavy use
> of Cassandra (with off heap memtables - not directly related but allows us a
> lot more GC headroom) and SOLR where we switched to mmap because it FAR out
> performed pread variants - in no cases have we noticed long time to safe
> point (then again our IO is lightning fast).
>
> On Oct 8, 2016, at 1:20 PM, Jonathan Haddad  wrote:
>
> Linux automatically uses free memory as cache.  It's not swap.
>
> http://www.tldp.org/LDP/lki/lki-4.html
>
> On Sat, Oct 8, 2016 at 11:12 AM Vladimir Yudovin 
> wrote:
>>
>> Sorry, I don't catch something. What page (memory) cache can exist if
>> there is no swap file.
>> Where are those page written/read?
>>
>>
>> Best regards, Vladimir Yudovin,
>> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
>> Launch your cluster in minutes.
>>
>>
>>
>>  On Sat, 08 Oct 2016 14:09:50 -0400 Ariel Weisberg
>> wrote 
>>
>> Hi,
>>
>> Nope I mean page cache. Linux doesn't call the cache it maintains using
>> free memory a file cache. It uses free (and some of the time not so free!)
>> memory to buffer writes and to cache recently written/read data.
>>
>> http://www.tldp.org/LDP/lki/lki-4.html
>>
>> When Linux decides it needs free memory it can either evict stuff from the
>> page cache, flush dirty pages and then evict, or swap anonymous memory out.
>> When you disable swap you only disable the last behavior.
>>
>> Maybe we are talking at cross purposes? What I meant is that increasing
>> the heap size to reduce GC frequency is a legitimate thing to do and it does
>> have an impact on the performance of the page cache even if you have swap
>> disabled?
>>
>> Ariel
>>
>>
>> On Sat, Oct 8, 2016, at 01:54 PM, Vladimir Yudovin wrote:
>>
>> >Page cache is data pending flush to disk and data cached from disk.
>>
>> Do you mean file cache?
>>
>>
>> Best regards, Vladimir Yudovin,
>> Winguzone - Hosted Cloud Cassandra on Azure and SoftLayer.
>> Launch your cluster in minutes.
>>
>>
>>  On Sat, 08 Oct 2016 13:40:19 -0400 Ariel Weisberg 
>> wrote 
>>
>> Hi,
>>
>> Page cache is in use even if you disable swap. Swap is anonymous memory,
>> and whatever else the Linux kernel supports paging out. Page cache is data
>> pending flush to disk and data cached from disk.
>>
>> Given how bad the GC pauses are in C* I think it's not the high pole in
>> the tent. Until key things are off heap and C* can run with CMS and get 10
>> millisecond GCs all day long.
>>
>> You can go through tuning and hardware selection try to get more
>> consistent IO pauses and remove outliers as you mention and as a user I
>> think this is your best bet. Generally it's either bad device or filesystem
>> behavior if you get page faults taking more than 200 milliseconds O(G1 gc
>> collection).
>>
>> I think a JVM change to allow safe points around memory mapped file access
>> is really unlikely although I agree it would be great. I think the best hack
>> around it is to code up your memory mapped file access into JNI methods and
>> find some way to get that to work. Right now if you want to create a safe
>> point a JNI method is the way to do it. The problem is that JNI methods and
>> POJOs don't get along well.
>>
>> If you think about it the reason non-memory mapped IO works well is that
>> it's all JNI methods so they don't impact time to safe point. I think there
>> is a tradeoff between tolerance for outliers and performance.
>>
>> I don't know the state of the non-memory mapped path and how reliable that
>> is. If it were reliable and I couldn't tolerate the outliers I would use
>> that. I have to ask though, why are you not able to tolerate the outliers?
>> If you are reading and writing at quorum how is this impacting you?
>>
>> Regards,
>> Ariel
>>
>> On Sat, Oct 8, 2016, at 12:54 AM, Vladimir Yudovin wrote:
>>
>> Hi Josh,
>>
>> >Running with increased heap size would reduce GC frequency, at the cost
>> > of page cache.
>>
>> Actually it's recommended to run C* without virtual memory enabled. So if
>> there is no enough memory 

Re: JVM safepoints, mmap, and slow disks

2016-10-10 Thread Josh Snyder
Do you know if there are any publicly available benchmarks on disk_access_mode,
preferably after the fix from CASSANDRA-10249?

If it turns out that syscall I/O is not significantly slower, I'd consider
switching. If I don't know the costs, I think I'd prefer to stick with the
devil I know how to mitigate (i.e. by policing by my block devices) rather than
switching to the devil that is non-standard and undocumented. :)

I may have time to do some benchmarking myself. If so, I'll be sure to inform
the list.

Josh

On Sun, Oct 9, 2016 at 2:39 AM, Benedict Elliott Smith
 wrote:
> The biggest problem with pread was the issue of over reading (reading 64k
> where 4k would suffice), which was significantly improved in 2.2 iirc. I
> don't think the penalty is very significant anymore, and if you are
> experiencing time to safe point issues it's very likely a worthwhile switch
> to flip.
>
>
> On Sunday, 9 October 2016, Graham Sanderson  wrote:
>>
>> I was using the term “touch” loosely to hopefully mean pre-fetch, though I
>> suspect (I think intel has been de-emphasizing) you can still do a sensible
>> prefetch instruction in native code. Even if not you are still better
>> blocking in JNI code - I haven’t looked at the link to see if the correct
>> barriers are enforced by the sun-misc-unsafe method.
>>
>> I do suspect that you’ll see up to about 5-10% sys call overhead if you
>> hit pread.
>>
>> > On Oct 8, 2016, at 11:02 PM, Ariel Weisberg  wrote:
>> >
>> > Hi,
>> >
>> > This is starting to get into dev list territory.
>> >
>> > Interesting idea to touch every 4K page you are going to read.
>> >
>> > You could use this to minimize the cost.
>> >
>> > http://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni/36309652#36309652
>> >
>> > Maybe faster than doing buffered IO. It's a lot of cache and TLB misses
>> > with out prefetching though.
>> >
>> > There is a system call to page the memory in which might be better for
>> > larger reads. Still no guarantee things stay cached though.
>> >
>> > Ariel
>> >
>> >
>> > On Sat, Oct 8, 2016, at 08:21 PM, Graham Sanderson wrote:
>> >> I haven’t studied the read path that carefully, but there might be a
>> >> spot at the C* level rather than JVM level where you could effectively do 
>> >> a
>> >> JNI touch of the mmap region you’re going to need next.
>> >>
>> >>> On Oct 8, 2016, at 7:17 PM, Graham Sanderson  wrote:
>> >>>
>> >>> We don’t use Azul’s Zing, but it does have the nice feature that all
>> >>> threads don’t have to reach safepoints at the same time. That said we 
>> >>> make
>> >>> heavy use of Cassandra (with off heap memtables - not directly related 
>> >>> but
>> >>> allows us a lot more GC headroom) and SOLR where we switched to mmap 
>> >>> because
>> >>> it FAR out performed pread variants - in no cases have we noticed long 
>> >>> time
>> >>> to safe point (then again our IO is lightning fast).
>> >>>
>>  On Oct 8, 2016, at 1:20 PM, Jonathan Haddad 
>>  wrote:
>> 
>>  Linux automatically uses free memory as cache.  It's not swap.
>> 
>>  http://www.tldp.org/LDP/lki/lki-4.html
>> 
>>  On Sat, Oct 8, 2016 at 11:12 AM Vladimir Yudovin
>>   wrote:
>> > __
>> > Sorry, I don't catch something. What page (memory) cache can exist
>> > if there is no swap file.
>> > Where are those page written/read?
>> >
>> >
>> > Best regards, Vladimir Yudovin,
>> > *Winguzone[https://winguzone.com/?from=list] - Hosted Cloud
>> > Cassandra on Azure and SoftLayer.
>> > Launch your cluster in minutes.
>> > *
>> >
>> >  On Sat, 08 Oct 2016 14:09:50 -0400 *Ariel
>> > Weisberg* wrote 
>> >> Hi,
>> >>
>> >> Nope I mean page cache. Linux doesn't call the cache it maintains
>> >> using free memory a file cache. It uses free (and some of the time 
>> >> not so
>> >> free!) memory to buffer writes and to cache recently written/read 
>> >> data.
>> >>
>> >> http://www.tldp.org/LDP/lki/lki-4.html
>> >>
>> >> When Linux decides it needs free memory it can either evict stuff
>> >> from the page cache, flush dirty pages and then evict, or swap 
>> >> anonymous
>> >> memory out. When you disable swap you only disable the last behavior.
>> >>
>> >> Maybe we are talking at cross purposes? What I meant is that
>> >> increasing the heap size to reduce GC frequency is a legitimate thing 
>> >> to do
>> >> and it does have an impact on the performance of the page cache even 
>> >> if you
>> >> have swap disabled?
>> >>
>> >> Ariel
>> >>
>> >>
>> >> On Sat, Oct 8, 2016, at 01:54 PM, Vladimir Yudovin wrote:
>>  Page cache is data pending flush to disk and data cached from
>>  disk.
>> >>>
>> 

Re: JVM safepoints, mmap, and slow disks

2016-10-10 Thread Josh Snyder
On Sat, Oct 8, 2016 at 9:02 PM, Ariel Weisberg  wrote:
...

> You could use this to minimize the cost.
> http://stackoverflow.com/questions/36298111/is-it-possible-to-use-sun-misc-unsafe-to-call-c-functions-without-jni/36309652#36309652

That StackOverflow headline is interesting. Based on my reading of Hotspot's
code, it looks like sun.misc.unsafe is used under the hood to perform mmapped
I/O. I need to learn more about Hotspot's implementation before I can comment
further.

> Maybe faster than doing buffered IO. It's a lot of cache and TLB misses
> with out prefetching though.

Not sure what you mean here. Aren't there going to be cache and TLB misses for
any I/O, whether via mmap or syscall?

> There is a system call to page the memory in which might be better for
> larger reads. Still no guarantee things stay cached though.

The approaches I've seen just involve something in userspace going through and
touching every desired page. It works, especially if you touch pages in
parallel.

Thanks for the pointers. If I get anywhere with them, I'll be sure to
let you know.

Josh

> On Sat, Oct 8, 2016, at 08:21 PM, Graham Sanderson wrote:
>> I haven’t studied the read path that carefully, but there might be a spot at 
>> the C* level rather than JVM level where you could effectively do a JNI 
>> touch of the mmap region you’re going to need next.
>>
>>> On Oct 8, 2016, at 7:17 PM, Graham Sanderson  wrote:
>>>
>>> We don’t use Azul’s Zing, but it does have the nice feature that all 
>>> threads don’t have to reach safepoints at the same time. That said we make 
>>> heavy use of Cassandra (with off heap memtables - not directly related but 
>>> allows us a lot more GC headroom) and SOLR where we switched to mmap 
>>> because it FAR out performed pread variants - in no cases have we noticed 
>>> long time to safe point (then again our IO is lightning fast).
>>>
 On Oct 8, 2016, at 1:20 PM, Jonathan Haddad  wrote:

 Linux automatically uses free memory as cache.  It's not swap.

 http://www.tldp.org/LDP/lki/lki-4.html

 On Sat, Oct 8, 2016 at 11:12 AM Vladimir Yudovin  
 wrote:
> __
> Sorry, I don't catch something. What page (memory) cache can exist if 
> there is no swap file.
> Where are those page written/read?
>
>
> Best regards, Vladimir Yudovin,
> *Winguzone[https://winguzone.com/?from=list] - Hosted Cloud Cassandra on 
> Azure and SoftLayer.
> Launch your cluster in minutes.
> *
>
>  On Sat, 08 Oct 2016 14:09:50 -0400 *Ariel 
> Weisberg* wrote 
>> Hi,
>>
>> Nope I mean page cache. Linux doesn't call the cache it maintains using 
>> free memory a file cache. It uses free (and some of the time not so 
>> free!) memory to buffer writes and to cache recently written/read data.
>>
>> http://www.tldp.org/LDP/lki/lki-4.html
>>
>> When Linux decides it needs free memory it can either evict stuff from 
>> the page cache, flush dirty pages and then evict, or swap anonymous 
>> memory out. When you disable swap you only disable the last behavior.
>>
>> Maybe we are talking at cross purposes? What I meant is that increasing 
>> the heap size to reduce GC frequency is a legitimate thing to do and it 
>> does have an impact on the performance of the page cache even if you 
>> have swap disabled?
>>
>> Ariel
>>
>>
>> On Sat, Oct 8, 2016, at 01:54 PM, Vladimir Yudovin wrote:
>>> >Page cache is data pending flush to disk and data cached from disk.
>>>
>>> Do you mean file cache?
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>> *Winguzone[https://winguzone.com/?from=list] - Hosted Cloud Cassandra 
>>> on Azure and SoftLayer.
>>> Launch your cluster in minutes.*
>>>
>>>
>>>  On Sat, 08 Oct 2016 13:40:19 -0400 *Ariel Weisberg 
>>> * wrote 
 Hi,

 Page cache is in use even if you disable swap. Swap is anonymous 
 memory, and whatever else the Linux kernel supports paging out. Page 
 cache is data pending flush to disk and data cached from disk.

 Given how bad the GC pauses are in C* I think it's not the high pole 
 in the tent. Until key things are off heap and C* can run with CMS and 
 get 10 millisecond GCs all day long.

 You can go through tuning and hardware selection try to get more 
 consistent IO pauses and remove outliers as you mention and as a user 
 I think this is your best bet. Generally it's either bad device or 
 filesystem behavior if you get page faults taking more than 200 
 milliseconds O(G1 gc collection).

 I think a JVM change to allow safe points around memory mapped file 
 access is really unlikely although I 

Re: Ordering by multiple columns?

2016-10-10 Thread Ali Akhtar
Really helpful slides. Is there a video to go with them?

On Sun, Oct 9, 2016 at 11:48 AM, DuyHai Doan  wrote:

> Yes it is possible, read this: http://www.slideshare.
> net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics/24
>
> and the following slides
>
> On Sun, Oct 9, 2016 at 2:04 AM, Ali Akhtar  wrote:
>
>> Is it possible to have multiple clustering keys in cassandra, or some
>> other way to order by multiple columns?
>>
>> For example, say I have a table of songs, and each song has a rating and
>> a date.
>>
>> I want to sort songs by rating first, and then with newer songs on top.
>>
>> So if two songs have 5 rating, and one's date is 1st Feb, the other is
>> 2nd Feb, then I want the 2nd feb one to be sorted above the 1st feb one.
>>
>> Like this:
>>
>> Select * from songs order by rating, createdAt
>>
>> Is this possible?
>>
>
>


Doing a calculation in a query?

2016-10-10 Thread Ali Akhtar
I have a table for tracking orders. Each order has an `ordered_at` field
(can be a timestamp, or a long with the milliseconds of the timestamp) and
`shipped_at` field (ditto, timestamp or long).

orderd_at tracks when the order was made.

shipped_at tracks when the order was shipped.

When retrieving the orders, I need to calculate an additional field, called
'shipment_delay'. This is simply, 'shipped_at - ordered_at`. I.e how long
it took between when the order was made, and when it was shipped.

The tricky part is, that if an order isn't yet shipped, then it should just
return how many days it has been since the order was made.

E.g, if order was made on Jan 1 and shipped on Jan 5th, shipment_delay = 4
 days (in milliseconds if needed)

If order made on Jan 1, but not yet shipped, and today is Jan 10th, then
shipment_delay = 10 days.

I then need to sort the orders in the order of 'shipment_delay desc', i.e
show the orders which took the longest, at the top.

Is it possible to define 'shipment_delay' at the table or query level, so
it can be used in the 'order by' clause, or if this ordering will have to
be done myself after the data is received?

Thanks.