Re: Cqlsh hangs & closes automatically

2016-02-02 Thread Tyler Hobbs
The default page size in cqlsh is 100, so perhaps something is going on
there?  Try running cqlsh with --debug to see if there are any errors.

On Tue, Feb 2, 2016 at 11:21 AM, Anuj Wadehra 
wrote:

> My cqlsh prompt hangs and closes if I try to fetch just 100 rows using
> select * query. Cassandra-cli does the job. Any solution?
>
>
>
> Thanks
> Anuj
>



-- 
Tyler Hobbs
DataStax 


Re: Moving Away from Compact Storage

2016-02-02 Thread Anuj Wadehra
Will it be possible to read dynamic columns data from compact storage and 
trasform them as collection e.g. map in new table?

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Wed, 3 Feb, 2016 at 12:28 am, DuyHai Doan wrote:   
So there is no "static" (in the sense of CQL static) column in your legacy 
table. 
Just define a Scala case class to match this table and use Spark to dump the 
content to a new non compact CQL table
On Tue, Feb 2, 2016 at 7:55 PM, Anuj Wadehra  wrote:

Our old table looks like this from cqlsh:
CREATE TABLE table table1 (  key text,  "Col1" blob,  "Col2" text,  "Col3" 
text,  "Col4" text,  PRIMARY KEY (key)) WITH COMPACT STORAGE AND …
And it will have some dynamic text data which we are planning to add in 
collections..
Please let me know if you need more details..

ThanksAnujSent from Yahoo Mail on Android 
 
 On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan wrote:   
Can you give the CREATE TABLE script for you old compact storage table ? Or at 
least the cassandra-client creation script
On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra  wrote:

Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was not 
sure that its would be so simple :)

We have a compact storage cf with each row having some data in staticly defined 
columns while other data in dynamic columns. Is the approach mentioned in link 
adaptable to the scenario where we want to migrate the existing data to a 
Non-Compact CF with static columns and collections ?

Thanks
Anuj


On Tue, 2/2/16, DuyHai Doan  wrote:

 Subject: Re: Moving Away from Compact Storage
 To: user@cassandra.apache.org
 Date: Tuesday, 2 February, 2016, 12:57 AM

 Use Apache
 Spark to parallelize the data migration. Look at this piece
 of code 
https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
 If your source and target tables
 have the SAME structure (except for the COMPACT STORAGE
 clause), migration with Spark is a 2 lines of
 code
 On Mon, Feb 1, 2016 at 8:14
 PM, Anuj Wadehra 
 wrote:
 Hi
 Whats the fastest and reliable way
 to migrate data from a Compact Storage table to Non-Compact
 storage table?
 I was not
 able to find any command for dropping the compact storage
 directive..so I think migrating data is the only way...any
 suggestions?
 ThanksAnuj




  


  


Re: Moving Away from Compact Storage

2016-02-02 Thread Anuj Wadehra
Our old table looks like this from cqlsh:
CREATE TABLE table table1 (  key text,  "Col1" blob,  "Col2" text,  "Col3" 
text,  "Col4" text,  PRIMARY KEY (key)) WITH COMPACT STORAGE AND …
And it will have some dynamic text data which we are planning to add in 
collections..
Please let me know if you need more details..

ThanksAnujSent from Yahoo Mail on Android 
 
  On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan wrote:   
Can you give the CREATE TABLE script for you old compact storage table ? Or at 
least the cassandra-client creation script
On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra  wrote:

Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was not 
sure that its would be so simple :)

We have a compact storage cf with each row having some data in staticly defined 
columns while other data in dynamic columns. Is the approach mentioned in link 
adaptable to the scenario where we want to migrate the existing data to a 
Non-Compact CF with static columns and collections ?

Thanks
Anuj


On Tue, 2/2/16, DuyHai Doan  wrote:

 Subject: Re: Moving Away from Compact Storage
 To: user@cassandra.apache.org
 Date: Tuesday, 2 February, 2016, 12:57 AM

 Use Apache
 Spark to parallelize the data migration. Look at this piece
 of code 
https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
 If your source and target tables
 have the SAME structure (except for the COMPACT STORAGE
 clause), migration with Spark is a 2 lines of
 code
 On Mon, Feb 1, 2016 at 8:14
 PM, Anuj Wadehra 
 wrote:
 Hi
 Whats the fastest and reliable way
 to migrate data from a Compact Storage table to Non-Compact
 storage table?
 I was not
 able to find any command for dropping the compact storage
 directive..so I think migrating data is the only way...any
 suggestions?
 ThanksAnuj




  


Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
Can you give the CREATE TABLE script for you old compact storage table ? Or
at least the cassandra-client creation script

On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra  wrote:

> Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was
> not sure that its would be so simple :)
>
> We have a compact storage cf with each row having some data in staticly
> defined columns while other data in dynamic columns. Is the approach
> mentioned in link adaptable to the scenario where we want to migrate the
> existing data to a Non-Compact CF with static columns and collections ?
>
> Thanks
> Anuj
>
> 
> On Tue, 2/2/16, DuyHai Doan  wrote:
>
>  Subject: Re: Moving Away from Compact Storage
>  To: user@cassandra.apache.org
>  Date: Tuesday, 2 February, 2016, 12:57 AM
>
>  Use Apache
>  Spark to parallelize the data migration. Look at this piece
>  of code
> https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
>  If your source and target tables
>  have the SAME structure (except for the COMPACT STORAGE
>  clause), migration with Spark is a 2 lines of
>  code
>  On Mon, Feb 1, 2016 at 8:14
>  PM, Anuj Wadehra 
>  wrote:
>  Hi
>  Whats the fastest and reliable way
>  to migrate data from a Compact Storage table to Non-Compact
>  storage table?
>  I was not
>  able to find any command for dropping the compact storage
>  directive..so I think migrating data is the only way...any
>  suggestions?
>  ThanksAnuj
>
>
>


Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
So there is no "static" (in the sense of CQL static) column in your legacy
table.

Just define a Scala case class to match this table and use Spark to dump
the content to a new non compact CQL table

On Tue, Feb 2, 2016 at 7:55 PM, Anuj Wadehra  wrote:

> Our old table looks like this from cqlsh:
>
> CREATE TABLE table table1 (
>   key text,
>   "Col1" blob,
>   "Col2" text,
>   "Col3" text,
>   "Col4" text,
>   PRIMARY KEY (key)
> ) WITH COMPACT STORAGE AND …
>
> And it will have some dynamic text data which we are planning to add in
> collections..
>
> Please let me know if you need more details..
>
>
> Thanks
> Anuj
> Sent from Yahoo Mail on Android
> 
>
> On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan
>  wrote:
> Can you give the CREATE TABLE script for you old compact storage table ?
> Or at least the cassandra-client creation script
>
> On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra 
> wrote:
>
>> Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was
>> not sure that its would be so simple :)
>>
>> We have a compact storage cf with each row having some data in staticly
>> defined columns while other data in dynamic columns. Is the approach
>> mentioned in link adaptable to the scenario where we want to migrate the
>> existing data to a Non-Compact CF with static columns and collections ?
>>
>> Thanks
>> Anuj
>>
>> 
>> On Tue, 2/2/16, DuyHai Doan  wrote:
>>
>>  Subject: Re: Moving Away from Compact Storage
>>  To: user@cassandra.apache.org
>>  Date: Tuesday, 2 February, 2016, 12:57 AM
>>
>>  Use Apache
>>  Spark to parallelize the data migration. Look at this piece
>>  of code
>> https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
>>  If your source and target tables
>>  have the SAME structure (except for the COMPACT STORAGE
>>  clause), migration with Spark is a 2 lines of
>>  code
>>  On Mon, Feb 1, 2016 at 8:14
>>  PM, Anuj Wadehra 
>>  wrote:
>>  Hi
>>  Whats the fastest and reliable way
>>  to migrate data from a Compact Storage table to Non-Compact
>>  storage table?
>>  I was not
>>  able to find any command for dropping the compact storage
>>  directive..so I think migrating data is the only way...any
>>  suggestions?
>>  ThanksAnuj
>>
>>
>>
>


Re: Moving Away from Compact Storage

2016-02-02 Thread DuyHai Doan
You 'll need to do the transformation in Spark, although I don't understand
what you mean by "dynamic columns". Given the CREATE TABLE script you gave
earlier, there is nothing such as dynamic columns

On Tue, Feb 2, 2016 at 8:01 PM, Anuj Wadehra  wrote:

> Will it be possible to read dynamic columns data from compact storage and
> trasform them as collection e.g. map in new table?
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Wed, 3 Feb, 2016 at 12:28 am, DuyHai Doan
>  wrote:
> So there is no "static" (in the sense of CQL static) column in your legacy
> table.
>
> Just define a Scala case class to match this table and use Spark to dump
> the content to a new non compact CQL table
>
> On Tue, Feb 2, 2016 at 7:55 PM, Anuj Wadehra 
> wrote:
>
>> Our old table looks like this from cqlsh:
>>
>> CREATE TABLE table table1 (
>>   key text,
>>   "Col1" blob,
>>   "Col2" text,
>>   "Col3" text,
>>   "Col4" text,
>>   PRIMARY KEY (key)
>> ) WITH COMPACT STORAGE AND …
>>
>> And it will have some dynamic text data which we are planning to add in
>> collections..
>>
>> Please let me know if you need more details..
>>
>>
>> Thanks
>> Anuj
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan
>>  wrote:
>> Can you give the CREATE TABLE script for you old compact storage table ?
>> Or at least the cassandra-client creation script
>>
>> On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra 
>> wrote:
>>
>>> Thanks DuyHai !! We were also thinking to do it the "Spark" way but I
>>> was not sure that its would be so simple :)
>>>
>>> We have a compact storage cf with each row having some data in staticly
>>> defined columns while other data in dynamic columns. Is the approach
>>> mentioned in link adaptable to the scenario where we want to migrate the
>>> existing data to a Non-Compact CF with static columns and collections ?
>>>
>>> Thanks
>>> Anuj
>>>
>>> 
>>> On Tue, 2/2/16, DuyHai Doan  wrote:
>>>
>>>  Subject: Re: Moving Away from Compact Storage
>>>  To: user@cassandra.apache.org
>>>  Date: Tuesday, 2 February, 2016, 12:57 AM
>>>
>>>  Use Apache
>>>  Spark to parallelize the data migration. Look at this piece
>>>  of code
>>> https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
>>>  If your source and target tables
>>>  have the SAME structure (except for the COMPACT STORAGE
>>>  clause), migration with Spark is a 2 lines of
>>>  code
>>>  On Mon, Feb 1, 2016 at 8:14
>>>  PM, Anuj Wadehra 
>>>  wrote:
>>>  Hi
>>>  Whats the fastest and reliable way
>>>  to migrate data from a Compact Storage table to Non-Compact
>>>  storage table?
>>>  I was not
>>>  able to find any command for dropping the compact storage
>>>  directive..so I think migrating data is the only way...any
>>>  suggestions?
>>>  ThanksAnuj
>>>
>>>
>>>
>>
>


Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-02 Thread sai krishnam raju potturi
hi;
  we have a multi-DC cluster spanning across our own private cloud and AWS.
We are currently using Propertyfile snitch across our cluster.

What is the possibility of using GossipingPropertFileSnitch on datacenters
in our private cloud, and Ec2MultiRegionSnitch in AWS?

Thanks in advance for the help.

thanks
Sai


Re: Re : Possibility of using 2 different snitches in the Multi_DC cluster

2016-02-02 Thread Robert Coli
On Tue, Feb 2, 2016 at 1:23 PM, sai krishnam raju potturi <
pskraj...@gmail.com> wrote:

> What is the possibility of using GossipingPropertFileSnitch on datacenters
> in our private cloud, and Ec2MultiRegionSnitch in AWS?
>

You should just use GPFS everywhere.

This is also the reason why you should not use EC2MRS if you might ever
have a DC that is outside of AWS. Just use GPFS.

=Rob
PS - To answer your actual question... one "can" use different snitches on
a per node basis, but ONE REALLY REALLY SHOULDN'T CONSIDER THIS A VALID
APPROACH AND IF ONE TRIES AND FAILS I WILL POINT AND LAUGH AND NOT HELP
THEM :D


Re: cassandra-stress tool - InvalidQueryException: Batch too large

2016-02-02 Thread Ralf Steppacher
I have raised https://issues.apache.org/jira/browse/CASSANDRA-11105 
.

Thanks!
Ralf

> On 01.02.2016, at 15:01, Jake Luciani  wrote:
> 
> Yeah that looks like a bug.  Can you open a JIRA and attach the full .yaml?
> 
> Thanks!
> 
> 
> On Mon, Feb 1, 2016 at 5:09 AM, Ralf Steppacher  > wrote:
> I am using Cassandra 2.2.4 and I am struggling to get the cassandra-stress 
> tool to work for my test scenario. I have followed the example on 
> http://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
>  
> 
>  to create a yaml file describing my test.
> 
> I am collecting events per user id (text, partition key). Events have a 
> session type (text), event type (text), and creation time (timestamp) 
> (clustering keys, in that order). Plus some more attributes required for 
> rendering the events in a UI. For testing purposes I ended up with the 
> following column spec and insert distribution:
> 
> columnspec:
>   - name: created_at
> cluster: uniform(10..1)
>   - name: event_type
> size: uniform(5..10)
> population: uniform(1..30)
> cluster: uniform(1..30)
>   - name: session_type
> size: fixed(5)
> population: uniform(1..4)
> cluster: uniform(1..4)
>   - name: user_id
> size: fixed(15)
> population: uniform(1..100)
>   - name: message
> size: uniform(10..100)
> population: uniform(1..100B)
> 
> insert:
>   partitions: fixed(1)
>   batchtype: UNLOGGED
>   select: fixed(1)/120
> 
> 
> Running stress tool for just the insert prints 
> 
> Generating batches with [1..1] partitions and [0..1] rows (of [10..120] 
> total rows in the partitions)
> 
> and then immediately starts flooding me with 
> "com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large”. 
> 
> Why I should be exceeding the "batch_size_fail_threshold_in_kb: 50” in the 
> cassandra.yaml I do not understand. My understanding is that the stress tool 
> should generate one row per batch. The size of a single row should not exceed 
> 8+10*3+5*3+15*3+100*3 = 398 bytes. Assuming a worst case of all text 
> characters being 3 byte unicode characters. 
> 
> How come I end up with batches that exceed the 50kb threshold? Am I missing 
> the point about the “select” attribute?
> 
> 
> Thanks!
> Ralf
> 
> 
> 
> -- 
> http://twitter.com/tjake 


Clustering key values not distributed

2016-02-02 Thread Ralf Steppacher
I am trying to get the stress tool to generate random values for three 
clustering keys. I am trying to simulate collecting events per user id (text, 
partition key). Events have a session type (text), event type (text), and 
creation time (timestamp) (clustering keys, in that order). For testing 
purposes I ended up with the following column spec:

columnspec:
 - name: created_at
   cluster: uniform(10..10)
 - name: event_type
   size: uniform(5..10)
   population: uniform(1..30)
   cluster: uniform(1..30)
 - name: session_type
   size: fixed(5)
   population: uniform(1..4)
   cluster: uniform(1..4)
 - name: user_id
   size: fixed(15)
   population: uniform(1..100)
 - name: message
   size: uniform(10..100)
   population: uniform(1..100B)

My expectation was that this would lead to anywhere between 10 and 1200 rows to 
be created per partition key. But it seems that exactly 10 rows are being 
created, with the created_at timestamp being the only variable that is assigned 
variable values (per partition key). The session_type and event_type variables 
are assigned fixed values. This is even the case if I set the cluster 
distribution to uniform(1..30) and uniform(4..4) respectively. With this 
setting I expected 1200 rows per partition key to be created, as announced when 
running the stress tool, but it is still 10.

[rsteppac@centos bin]$ ./cassandra-stress user profile=../batch_too_large.yaml 
ops\(insert=1\) -log level=verbose 
file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node 
10.211.55.8
…
Created schema. Sleeping 1s for propagation.
Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] 
total rows in the partitions)
Improvement over 4 threadCount: 19%
...


Sample of generated data:

cqlsh> select user_id, event_type, session_type, created_at from 
stresscql.batch_too_large LIMIT 30 ;

user_id | event_type   | session_type | created_at
-+--+--+--
   %\x7f\x03/.d29

If I remove the created_at clustering keys then the other two clustering keys 
are assigned variable values per partition key.

Is there a way to achieve this with the created_at clustering key being present?


Thanks!
Ralf

Re: Moving Away from Compact Storage

2016-02-02 Thread Anuj Wadehra
Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was not 
sure that its would be so simple :)
 
We have a compact storage cf with each row having some data in staticly defined 
columns while other data in dynamic columns. Is the approach mentioned in link 
adaptable to the scenario where we want to migrate the existing data to a 
Non-Compact CF with static columns and collections ?

Thanks
Anuj


On Tue, 2/2/16, DuyHai Doan  wrote:

 Subject: Re: Moving Away from Compact Storage
 To: user@cassandra.apache.org
 Date: Tuesday, 2 February, 2016, 12:57 AM
 
 Use Apache
 Spark to parallelize the data migration. Look at this piece
 of code 
https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
 If your source and target tables
 have the SAME structure (except for the COMPACT STORAGE
 clause), migration with Spark is a 2 lines of
 code
 On Mon, Feb 1, 2016 at 8:14
 PM, Anuj Wadehra 
 wrote:
 Hi
 Whats the fastest and reliable way
 to migrate data from a Compact Storage table to Non-Compact
 storage table?
 I was not
 able to find any command for dropping the compact storage
 directive..so I think migrating data is the only way...any
 suggestions?
 ThanksAnuj
 
 


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Sebastian Estevez
Hi Ken,

Earlier in this thread I posted a link to
https://issues.apache.org/jira/browse/CASSANDRA-9424

That is the fix for these schema disagreement issues and as ay commented,
the plan is to use CAS. Until then we have to treat schema delicately.

all the best,

Sebastián
On Feb 2, 2016 9:48 AM, "Ken Hancock"  wrote:

> So this rings odd to me.  If you can accomplish the same thing by using a
> CAS operation, why not fix create table if not exist so that if your are
> writing an application that creates the table on startup, that the
> application is safe to run on multiple nodes and uses CAS to safeguard
> multiple concurrent creations?
>
>
> On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens  wrote:
>
>> There's still a race condition there, because two clients could SELECT at
>> the same time as each other, then both INSERT.
>>
>> You'd be better served with a CAS operation, and let Paxos guarantee
>> at-most-once execution.
>>
>> On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes 
>> wrote:
>>
>>> On 01/22/2016 10:29 PM, Kevin Burton wrote:
>>>
>>> I sort of agree.. but we are also considering migrating to hourly
>>> tables.. and what if the single script doesn't run.
>>>
>>> I like having N nodes make changes like this because in my experience
>>> that central / single box will usually fail at the wrong time :-/
>>>
>>>
>>>
>>> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
>>> wrote:
>>>
 Instead of using ZK, why not solve your concurrency problem by removing
 it?  By that, I mean simply have 1 process that creates all your tables
 instead of creating a race condition intentionally?

 On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton 
 wrote:

> Not sure if this is a bug or not or kind of a *fuzzy* area.
>
> In 2.0 this worked fine.
>
> We have a bunch of automated scripts that go through and create
> tables... one per day.
>
> at midnight UTC our entire CQL went offline.. .took down our whole
> app.  ;-/
>
> The resolution was a full CQL shut down and then a drop table to
> remove the bad tables...
>
> pretty sure the issue was with schema disagreement.
>
> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
> EXISTS only checks locally?
>
> My work around is going to be to use zookeeper to create a mutex lock
> during this operation.
>
> Any other things I should avoid?
>
>
> --
> We’re hiring if you know of any awesome Java Devops or Linux
> Operations Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog:  
> http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>
>>>
>>>
>>> --
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>>> Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog:  http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>>> One way to accomplish both, a single process doing the work and having
>>> multiple machines be able to do it, is to have a control table.
>>>
>>> You can have a table that lists what tables have been created and force
>>> concistency all. In this table you list the names of tables created. If a
>>> table name is in there, it doesn't need to be created again.
>>>
>>
>
>
> --
> *Ken Hancock *| System Architect, Advanced Advertising
> SeaChange International
> 50 Nagog Park
> Acton, Massachusetts 01720
> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
> 
> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
> LinkedIn] 
>
> [image: SeaChange International]
> 
> This e-mail and any attachments may contain information which is SeaChange
> International confidential. The information enclosed is intended only for
> the addressees herein and may not be copied or forwarded without permission
> from SeaChange International.
>


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Ken Hancock
So this rings odd to me.  If you can accomplish the same thing by using a
CAS operation, why not fix create table if not exist so that if your are
writing an application that creates the table on startup, that the
application is safe to run on multiple nodes and uses CAS to safeguard
multiple concurrent creations?


On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens  wrote:

> There's still a race condition there, because two clients could SELECT at
> the same time as each other, then both INSERT.
>
> You'd be better served with a CAS operation, and let Paxos guarantee
> at-most-once execution.
>
> On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes  wrote:
>
>> On 01/22/2016 10:29 PM, Kevin Burton wrote:
>>
>> I sort of agree.. but we are also considering migrating to hourly
>> tables.. and what if the single script doesn't run.
>>
>> I like having N nodes make changes like this because in my experience
>> that central / single box will usually fail at the wrong time :-/
>>
>>
>>
>> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
>> wrote:
>>
>>> Instead of using ZK, why not solve your concurrency problem by removing
>>> it?  By that, I mean simply have 1 process that creates all your tables
>>> instead of creating a race condition intentionally?
>>>
>>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton  wrote:
>>>
 Not sure if this is a bug or not or kind of a *fuzzy* area.

 In 2.0 this worked fine.

 We have a bunch of automated scripts that go through and create
 tables... one per day.

 at midnight UTC our entire CQL went offline.. .took down our whole app.
  ;-/

 The resolution was a full CQL shut down and then a drop table to remove
 the bad tables...

 pretty sure the issue was with schema disagreement.

 All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
 EXISTS only checks locally?

 My work around is going to be to use zookeeper to create a mutex lock
 during this operation.

 Any other things I should avoid?


 --
 We’re hiring if you know of any awesome Java Devops or Linux Operations
 Engineers!

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog:  http://burtonator.wordpress.com
 … or check out my Google+ profile
 


>>
>>
>> --
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog:  http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>> One way to accomplish both, a single process doing the work and having
>> multiple machines be able to do it, is to have a control table.
>>
>> You can have a table that lists what tables have been created and force
>> concistency all. In this table you list the names of tables created. If a
>> table name is in there, it doesn't need to be created again.
>>
>


-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC

Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]


[image: SeaChange International]

This e-mail and any attachments may contain information which is SeaChange
International confidential. The information enclosed is intended only for
the addressees herein and may not be copied or forwarded without permission
from SeaChange International.


How are timestamps selected for LWTs?

2016-02-02 Thread Nicholas Wilson
Hi,

In the Cassandra docs I've read, it's not described how the timestamp is 
determined for LWTs. It's not possible to specify a timestamp with "USING 
TIMESTAMP ...", and my best guess is that in the "read" phase of the LWT 
(between propose and commit) the timestamp is selected based on the timestamps 
of the cells read. However, after reading through the source code (mainly 
StorageProxy::cas) I can't any hint of that.

I'm worried about the following problem:

Node A writes (using a LWT): UPDATE table SET val = 123, version = 2 WHERE key 
= 'foo' IF version = 1
Node B writes (using a LWT): UPDATE table SET val = 234, version = 3 WHERE key 
= 'foo' IF version = 2

If the first write is completed before the second, then both updates will be 
applied, but if Node B's clock is behind Node A's clock, then the second update 
would be effectively discarded if client-generated timestamps are used. It 
wouldn't take a big clock discrepancy, the HW clocks could in fact be perfectly 
in sync, but if the kernel ticks System.currentTimeMillis() at 15ms intervals 
it's quite possible for the two nodes to be 30ms out from each other.

So, after the update query has "succeeded", do you need to do a read to find 
out whether it was actually applied? That would be surprising, since I can't 
find mention of it anywhere in the docs. You'd actually have to do a QUORUM 
read after every LWT update, just to find out whether your client chose the 
timestamp sensibly.

The ideal thing would be if Cassandra chose the timestamp for the write, using 
the timestamp of the cells read during Paxos, to guarantee that writes are 
applied if the query condition holds, rather than leaving the potential for the 
query to succeed but do nothing if the cell already has a higher timestamp.

If I've misunderstood, please do correct me!

Thanks,
Nicholas

---
Nicholas Wilson
Software developer
RealVNC

Re: How are timestamps selected for LWTs?

2016-02-02 Thread Nicholas Wilson
Thanks, Sylvain.


I missed it because I wasn't looking in the right place! In StorageProxy::cas, 
Commit::newProposal() unpacks the ballot's UUID into a timestamp.


I think I understand how it works now, thank you.


Regards,

Nick


From: Sylvain Lebresne 
Sent: 02 February 2016 10:24
To: user@cassandra.apache.org
Subject: Re: How are timestamps selected for LWTs?

On Tue, Feb 2, 2016 at 10:46 AM, Nicholas Wilson 
> wrote:
Hi,

In the Cassandra docs I've read, it's not described how the timestamp is 
determined for LWTs. It's not possible to specify a timestamp with "USING 
TIMESTAMP ...", and my best guess is that in the "read" phase of the LWT 
(between propose and commit) the timestamp is selected based on the timestamps 
of the cells read. However, after reading through the source code (mainly 
StorageProxy::cas) I can't any hint of that.

It's not exactly how it works, but it yields a somewhat equivalent result. 
Internally, LWTs use a so call "ballot" which is timeuuid, and the underlying 
algorithm basically guarantees that the order of commit of operations is the 
order of their ballot. And the timestamp used for the cells of a given of 
operation is the timestamp part of that timeuuid ballot, thus guaranteeing that 
this timestamp respects the order in which operations are committed.

This is why you can't provide the timestamp client side: that timestamp is 
picked server side and the value picked depends on when the operation is 
committed.



I'm worried about the following problem:

Node A writes (using a LWT): UPDATE table SET val = 123, version = 2 WHERE key 
= 'foo' IF version = 1
Node B writes (using a LWT): UPDATE table SET val = 234, version = 3 WHERE key 
= 'foo' IF version = 2

If the first write is completed before the second, then both updates will be 
applied, but if Node B's clock is behind Node A's clock, then the second update 
would be effectively discarded if client-generated timestamps are used. It 
wouldn't take a big clock discrepancy, the HW clocks could in fact be perfectly 
in sync, but if the kernel ticks System.currentTimeMillis() at 15ms intervals 
it's quite possible for the two nodes to be 30ms out from each other.

So, after the update query has "succeeded", do you need to do a read to find 
out whether it was actually applied? That would be surprising, since I can't 
find mention of it anywhere in the docs. You'd actually have to do a QUORUM 
read after every LWT update, just to find out whether your client chose the 
timestamp sensibly.

The ideal thing would be if Cassandra chose the timestamp for the write, using 
the timestamp of the cells read during Paxos, to guarantee that writes are 
applied if the query condition holds, rather than leaving the potential for the 
query to succeed but do nothing if the cell already has a higher timestamp.

If I've misunderstood, please do correct me!

Thanks,
Nicholas

---
Nicholas Wilson
Software developer
RealVNC




Re: How are timestamps selected for LWTs?

2016-02-02 Thread Sylvain Lebresne
On Tue, Feb 2, 2016 at 10:46 AM, Nicholas Wilson <
nicholas.wil...@realvnc.com> wrote:

> Hi,
>
> In the Cassandra docs I've read, it's not described how the timestamp is
> determined for LWTs. It's not possible to specify a timestamp with "USING
> TIMESTAMP ...", and my best guess is that in the "read" phase of the LWT
> (between propose and commit) the timestamp is selected based on the
> timestamps of the cells read. However, after reading through the source
> code (mainly StorageProxy::cas) I can't any hint of that.
>

It's not exactly how it works, but it yields a somewhat equivalent result.
Internally, LWTs use a so call "ballot" which is timeuuid, and the
underlying algorithm basically guarantees that the order of commit of
operations is the order of their ballot. And the timestamp used for the
cells of a given of operation is the timestamp part of that timeuuid
ballot, thus guaranteeing that this timestamp respects the order in which
operations are committed.

This is why you can't provide the timestamp client side: that timestamp is
picked server side and the value picked depends on when the operation is
committed.



>
> I'm worried about the following problem:
>
> Node A writes (using a LWT): UPDATE table SET val = 123, version = 2 WHERE
> key = 'foo' IF version = 1
> Node B writes (using a LWT): UPDATE table SET val = 234, version = 3 WHERE
> key = 'foo' IF version = 2
>
> If the first write is completed before the second, then both updates will
> be applied, but if Node B's clock is behind Node A's clock, then the second
> update would be effectively discarded if client-generated timestamps are
> used. It wouldn't take a big clock discrepancy, the HW clocks could in fact
> be perfectly in sync, but if the kernel ticks System.currentTimeMillis() at
> 15ms intervals it's quite possible for the two nodes to be 30ms out from
> each other.
>
> So, after the update query has "succeeded", do you need to do a read to
> find out whether it was actually applied? That would be surprising, since I
> can't find mention of it anywhere in the docs. You'd actually have to do a
> QUORUM read after every LWT update, just to find out whether your client
> chose the timestamp sensibly.
>
> The ideal thing would be if Cassandra chose the timestamp for the write,
> using the timestamp of the cells read during Paxos, to guarantee that
> writes are applied if the query condition holds, rather than leaving the
> potential for the query to succeed but do nothing if the cell already has a
> higher timestamp.
>
> If I've misunderstood, please do correct me!
>
> Thanks,
> Nicholas
>
> ---
> Nicholas Wilson
> Software developer
> RealVNC


Re: Java Driver Question

2016-02-02 Thread Sylvain Lebresne
As a side note, if your email subject is "Java Driver Question", then this
almost surely belong to the java driver mailing list. Please try to respect
other subscribers by using the most appropriate mailing list when possible.

On Tue, Feb 2, 2016 at 5:01 PM, Richard L. Burton III 
wrote:

> Very nice - Thanks Jack. I was looking at the docs and Contact Points but
> didn't see this. I'll use DNS records to manage the main contact points and
> update the DNS when those servers change.
>
> We should catch up again soon. Last time was a few years ago at the bar
> with Jake.
>
> On Tue, Feb 2, 2016 at 10:58 AM, Jack Krupansky 
> wrote:
>
>> No need to restart. As per the doc for Node Discovery:
>> "The driver discovers the nodes that constitute a cluster by querying
>> the contact points used in building the cluster object. After this it is up
>> to the cluster's load balancing policy to keep track of node events (that
>> is add, down, remove, or up) by its implementation of the
>> Host.StateListener interface."
>>
>> See:
>>
>> http://docs.datastax.com/en/developer/java-driver/3.0/common/drivers/reference/nodeDiscovery_r.html
>>
>> That said, your client would need to be modified/reconfigured and
>> restarted if the contact points changed enough that none were accessible.
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III <
>> mrbur...@gmail.com> wrote:
>>
>>> In the case of adding more nodes to the cluster, would my application
>>> have to be restarted to detect the new nodes (as opposed to a node acting
>>> like a coordinator).
>>>
>>> e.g., Having the Java code connect using 3 known contact points and when
>>> a 4th and 5th node are added, the driver will become aware of these nodes
>>> without havng to be restarted?
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>


Re: Java Driver Question

2016-02-02 Thread Richard L. Burton III
Awesome! I love that.

:)

On Tue, Feb 2, 2016 at 11:14 AM, Alex Popescu  wrote:

>
> On Tue, Feb 2, 2016 at 8:12 AM, Richard L. Burton III 
> wrote:
>
>> is this behavior only related to the Java Drivers?
>
>
> All DataStax drivers for Cassandra provide the node discovery feature and
> are aware of the cluster topology.
>
>
> --
> Bests,
>
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
>
>


-- 
-Richard L. Burton III
@rburton


Re: Java Driver Question

2016-02-02 Thread Jack Krupansky
No need to restart. As per the doc for Node Discovery:
"The driver discovers the nodes that constitute a cluster by querying the
contact points used in building the cluster object. After this it is up to
the cluster's load balancing policy to keep track of node events (that is
add, down, remove, or up) by its implementation of the Host.StateListener
interface."

See:
http://docs.datastax.com/en/developer/java-driver/3.0/common/drivers/reference/nodeDiscovery_r.html

That said, your client would need to be modified/reconfigured and restarted
if the contact points changed enough that none were accessible.


-- Jack Krupansky

On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III 
wrote:

> In the case of adding more nodes to the cluster, would my application have
> to be restarted to detect the new nodes (as opposed to a node acting like a
> coordinator).
>
> e.g., Having the Java code connect using 3 known contact points and when a
> 4th and 5th node are added, the driver will become aware of these nodes
> without havng to be restarted?
>
> --
> -Richard L. Burton III
> @rburton
>


Re: Java Driver Question

2016-02-02 Thread Richard L. Burton III
I wasn't aware there was another mailing list specifically for this.
Another question, is this behavior only related to the Java Drivers?

On Tue, Feb 2, 2016 at 11:05 AM, Sylvain Lebresne 
wrote:

> As a side note, if your email subject is "Java Driver Question", then this
> almost surely belong to the java driver mailing list. Please try to respect
> other subscribers by using the most appropriate mailing list when possible.
>
> On Tue, Feb 2, 2016 at 5:01 PM, Richard L. Burton III 
> wrote:
>
>> Very nice - Thanks Jack. I was looking at the docs and Contact Points but
>> didn't see this. I'll use DNS records to manage the main contact points and
>> update the DNS when those servers change.
>>
>> We should catch up again soon. Last time was a few years ago at the bar
>> with Jake.
>>
>> On Tue, Feb 2, 2016 at 10:58 AM, Jack Krupansky > > wrote:
>>
>>> No need to restart. As per the doc for Node Discovery:
>>> "The driver discovers the nodes that constitute a cluster by querying
>>> the contact points used in building the cluster object. After this it is up
>>> to the cluster's load balancing policy to keep track of node events (that
>>> is add, down, remove, or up) by its implementation of the
>>> Host.StateListener interface."
>>>
>>> See:
>>>
>>> http://docs.datastax.com/en/developer/java-driver/3.0/common/drivers/reference/nodeDiscovery_r.html
>>>
>>> That said, your client would need to be modified/reconfigured and
>>> restarted if the contact points changed enough that none were accessible.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III <
>>> mrbur...@gmail.com> wrote:
>>>
 In the case of adding more nodes to the cluster, would my application
 have to be restarted to detect the new nodes (as opposed to a node acting
 like a coordinator).

 e.g., Having the Java code connect using 3 known contact points and
 when a 4th and 5th node are added, the driver will become aware of these
 nodes without havng to be restarted?

 --
 -Richard L. Burton III
 @rburton

>>>
>>>
>>
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>


-- 
-Richard L. Burton III
@rburton


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Ken Hancock
Just to close the loop on this, but am I correct that the IF NOT EXITS
isn't the real problem?  Even multiple calls to CREATE TABLE cause the same
schema mismatch if done concurrently?  Normally, a CREATE TABLE call will
return an exception that the table already exists.

On Tue, Feb 2, 2016 at 11:06 AM, Jack Krupansky 
wrote:

> And CASSANDRA-10699  seems to be the sub-issue of CASSANDRA-9424 to do
> that:
> https://issues.apache.org/jira/browse/CASSANDRA-10699
>
>
> -- Jack Krupansky
>
> On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> Hi Ken,
>>
>> Earlier in this thread I posted a link to
>> https://issues.apache.org/jira/browse/CASSANDRA-9424
>>
>> That is the fix for these schema disagreement issues and as ay commented,
>> the plan is to use CAS. Until then we have to treat schema delicately.
>>
>> all the best,
>>
>> Sebastián
>> On Feb 2, 2016 9:48 AM, "Ken Hancock"  wrote:
>>
>>> So this rings odd to me.  If you can accomplish the same thing by using
>>> a CAS operation, why not fix create table if not exist so that if your are
>>> writing an application that creates the table on startup, that the
>>> application is safe to run on multiple nodes and uses CAS to safeguard
>>> multiple concurrent creations?
>>>
>>>
>>> On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens 
>>> wrote:
>>>
 There's still a race condition there, because two clients could SELECT
 at the same time as each other, then both INSERT.

 You'd be better served with a CAS operation, and let Paxos guarantee
 at-most-once execution.

 On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes 
 wrote:

> On 01/22/2016 10:29 PM, Kevin Burton wrote:
>
> I sort of agree.. but we are also considering migrating to hourly
> tables.. and what if the single script doesn't run.
>
> I like having N nodes make changes like this because in my experience
> that central / single box will usually fail at the wrong time :-/
>
>
>
> On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
> wrote:
>
>> Instead of using ZK, why not solve your concurrency problem by
>> removing it?  By that, I mean simply have 1 process that creates all your
>> tables instead of creating a race condition intentionally?
>>
>> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton 
>> wrote:
>>
>>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>>
>>> In 2.0 this worked fine.
>>>
>>> We have a bunch of automated scripts that go through and create
>>> tables... one per day.
>>>
>>> at midnight UTC our entire CQL went offline.. .took down our whole
>>> app.  ;-/
>>>
>>> The resolution was a full CQL shut down and then a drop table to
>>> remove the bad tables...
>>>
>>> pretty sure the issue was with schema disagreement.
>>>
>>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
>>> EXISTS only checks locally?
>>>
>>> My work around is going to be to use zookeeper to create a mutex
>>> lock during this operation.
>>>
>>> Any other things I should avoid?
>>>
>>>
>>> --
>>> We’re hiring if you know of any awesome Java Devops or Linux
>>> Operations Engineers!
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog:  
>>> http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>>
>>>
>
>
> --
> We’re hiring if you know of any awesome Java Devops or Linux
> Operations Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog:  
> http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>
> One way to accomplish both, a single process doing the work and having
> multiple machines be able to do it, is to have a control table.
>
> You can have a table that lists what tables have been created and
> force concistency all. In this table you list the names of tables created.
> If a table name is in there, it doesn't need to be created again.
>

>>>
>>>
>>> --
>>> *Ken Hancock *| System Architect, Advanced Advertising
>>> SeaChange International
>>> 50 Nagog Park
>>> Acton, Massachusetts 01720
>>> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
>>> 
>>> Office: +1 (978) 889-3329 | [image: Google Talk:]
>>> ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]
>>> hancockks [image: LinkedIn] 
>>>
>>> 

Re: Java Driver Question

2016-02-02 Thread Alex Popescu
On Tue, Feb 2, 2016 at 8:12 AM, Richard L. Burton III 
wrote:

> is this behavior only related to the Java Drivers?


All DataStax drivers for Cassandra provide the node discovery feature and
are aware of the cluster topology.


-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax


Re: Cassandra's log is full of mesages reset by peers even without traffic

2016-02-02 Thread Anuj Wadehra
Hi Jean,
As mentioned in the DataStax link, your TCP connections will be marked dead 
after 300+75*9 =975 seconds. Make sure that your firewall idle timeout is more 
than 975 seconds. Otherwise firewall will drop connections and you may face 
issues.You can also try setting all three values same as mentioned in the link 
to see whether the problem gets resolved after doing that.
ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Mon, 1 Feb, 2016 at 9:18 pm, Jean Carlo wrote:  
 Hello Annuj,,

I checked my settings and this what I got.

root@node001[SPH][BENCH][PnS3]:~$ sysctl -A | grep net.ipv4 | grep 
net.ipv4.tcp_keepalive_probes
net.ipv4.tcp_keepalive_probes = 9
root@node001[SPH][BENCH][PnS3]:~$ sysctl -A | grep net.ipv4 | grep 
net.ipv4.tcp_keepalive_intvl 
net.ipv4.tcp_keepalive_intvl = 75
root@node001[SPH][BENCH][PnS3]:~$ sysctl -A | grep net.ipv4 | grep 
net.ipv4.tcp_keepalive_time 
net.ipv4.tcp_keepalive_time = 300

The tcp_keepalive_time is quite high in comparation to that written on the doc

https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html




Do you think that is ok?  

Best regards

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay

On Fri, Jan 29, 2016 at 11:02 AM, Anuj Wadehra  wrote:

Hi Jean,
Please make sure that your Firewall is not dropping TCP connections which are 
in use. Tcp keep alive on all nodes must be less than the firewall setting. 
Please refer to 
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
 for details on TCP settings.

ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Fri, 29 Jan, 2016 at 3:21 pm, Jean Carlo wrote:  
 Hello guys, 

I have a cluster cassandra 2.1.12 with 6 nodes. All the logs of my nodes are 
having this messages marked as INFO

INFO  [SharedPool-Worker-1] 2016-01-29 10:40:57,745 Message.java:532 - 
Unexpected exception during request; channel = [id: 0xff15eb8c, 
/172.16.162.4:9042]
java.io.IOException: Error while read(...): Connection reset by peer
    at io.netty.channel.epoll.Native.readAddress(Native Method) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 
~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

This happens either the cluster is stressed or not. Btw it is not production. 
The ip marked there (172.16.162.4) belongs to a node of the cluster, this is 
not the only node that appears, acctually we are having all the node's ip 
having that reset by peer problem.

Our cluster is having more reads than writes. like 50 reads per second. 

Any one got the same problem?


Best regards

Jean Carlo
"The best way to predict the future is to invent it" Alan Kay
  


  


Cqlsh hangs & closes automatically

2016-02-02 Thread Anuj Wadehra
My cqlsh prompt hangs and closes if I try to fetch just 100 rows using select * 
query. Cassandra-cli does the job. Any solution?



ThanksAnuj

Re: Java Driver Question

2016-02-02 Thread Richard L. Burton III
Very nice - Thanks Jack. I was looking at the docs and Contact Points but
didn't see this. I'll use DNS records to manage the main contact points and
update the DNS when those servers change.

We should catch up again soon. Last time was a few years ago at the bar
with Jake.

On Tue, Feb 2, 2016 at 10:58 AM, Jack Krupansky 
wrote:

> No need to restart. As per the doc for Node Discovery:
> "The driver discovers the nodes that constitute a cluster by querying the
> contact points used in building the cluster object. After this it is up to
> the cluster's load balancing policy to keep track of node events (that is
> add, down, remove, or up) by its implementation of the Host.StateListener
> interface."
>
> See:
>
> http://docs.datastax.com/en/developer/java-driver/3.0/common/drivers/reference/nodeDiscovery_r.html
>
> That said, your client would need to be modified/reconfigured and
> restarted if the contact points changed enough that none were accessible.
>
>
> -- Jack Krupansky
>
> On Tue, Feb 2, 2016 at 10:47 AM, Richard L. Burton III  > wrote:
>
>> In the case of adding more nodes to the cluster, would my application
>> have to be restarted to detect the new nodes (as opposed to a node acting
>> like a coordinator).
>>
>> e.g., Having the Java code connect using 3 known contact points and when
>> a 4th and 5th node are added, the driver will become aware of these nodes
>> without havng to be restarted?
>>
>> --
>> -Richard L. Burton III
>> @rburton
>>
>
>


-- 
-Richard L. Burton III
@rburton


Re: automated CREATE TABLE just nuked my cluster after a 2.0 -> 2.1 upgrade....

2016-02-02 Thread Jack Krupansky
And CASSANDRA-10699  seems to be the sub-issue of CASSANDRA-9424 to do that:
https://issues.apache.org/jira/browse/CASSANDRA-10699


-- Jack Krupansky

On Tue, Feb 2, 2016 at 9:59 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Hi Ken,
>
> Earlier in this thread I posted a link to
> https://issues.apache.org/jira/browse/CASSANDRA-9424
>
> That is the fix for these schema disagreement issues and as ay commented,
> the plan is to use CAS. Until then we have to treat schema delicately.
>
> all the best,
>
> Sebastián
> On Feb 2, 2016 9:48 AM, "Ken Hancock"  wrote:
>
>> So this rings odd to me.  If you can accomplish the same thing by using a
>> CAS operation, why not fix create table if not exist so that if your are
>> writing an application that creates the table on startup, that the
>> application is safe to run on multiple nodes and uses CAS to safeguard
>> multiple concurrent creations?
>>
>>
>> On Tue, Jan 26, 2016 at 12:32 PM, Eric Stevens  wrote:
>>
>>> There's still a race condition there, because two clients could SELECT
>>> at the same time as each other, then both INSERT.
>>>
>>> You'd be better served with a CAS operation, and let Paxos guarantee
>>> at-most-once execution.
>>>
>>> On Tue, Jan 26, 2016 at 9:06 AM Francisco Reyes 
>>> wrote:
>>>
 On 01/22/2016 10:29 PM, Kevin Burton wrote:

 I sort of agree.. but we are also considering migrating to hourly
 tables.. and what if the single script doesn't run.

 I like having N nodes make changes like this because in my experience
 that central / single box will usually fail at the wrong time :-/



 On Fri, Jan 22, 2016 at 6:47 PM, Jonathan Haddad 
 wrote:

> Instead of using ZK, why not solve your concurrency problem by
> removing it?  By that, I mean simply have 1 process that creates all your
> tables instead of creating a race condition intentionally?
>
> On Fri, Jan 22, 2016 at 6:16 PM Kevin Burton 
> wrote:
>
>> Not sure if this is a bug or not or kind of a *fuzzy* area.
>>
>> In 2.0 this worked fine.
>>
>> We have a bunch of automated scripts that go through and create
>> tables... one per day.
>>
>> at midnight UTC our entire CQL went offline.. .took down our whole
>> app.  ;-/
>>
>> The resolution was a full CQL shut down and then a drop table to
>> remove the bad tables...
>>
>> pretty sure the issue was with schema disagreement.
>>
>> All our CREATE TABLE use IF NOT EXISTS but I think the IF NOT
>> EXISTS only checks locally?
>>
>> My work around is going to be to use zookeeper to create a mutex lock
>> during this operation.
>>
>> Any other things I should avoid?
>>
>>
>> --
>> We’re hiring if you know of any awesome Java Devops or Linux
>> Operations Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog:  
>> http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>


 --
 We’re hiring if you know of any awesome Java Devops or Linux Operations
 Engineers!

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog:  http://burtonator.wordpress.com
 … or check out my Google+ profile
 


 One way to accomplish both, a single process doing the work and having
 multiple machines be able to do it, is to have a control table.

 You can have a table that lists what tables have been created and force
 concistency all. In this table you list the names of tables created. If a
 table name is in there, it doesn't need to be created again.

>>>
>>
>>
>> --
>> *Ken Hancock *| System Architect, Advanced Advertising
>> SeaChange International
>> 50 Nagog Park
>> Acton, Massachusetts 01720
>> ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
>> 
>> Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
>>  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image:
>> LinkedIn] 
>>
>> [image: SeaChange International]
>> 
>> This e-mail and any attachments may contain information which is
>> SeaChange International confidential. The information enclosed is intended
>> only for the addressees herein and may not be copied or forwarded without
>> permission from SeaChange International.
>>
>


Re: Java Driver Question

2016-02-02 Thread Sebastian Estevez
Yes, topology changes get pushed to the client via the control connection:

https://github.com/datastax/java-driver/blob/2.1/driver-core/src/main/java/com/datastax/driver/core/Cluster.java#L61

all the best,

Sebastián
On Feb 2, 2016 10:47 AM, "Richard L. Burton III"  wrote:

> In the case of adding more nodes to the cluster, would my application have
> to be restarted to detect the new nodes (as opposed to a node acting like a
> coordinator).
>
> e.g., Having the Java code connect using 3 known contact points and when a
> 4th and 5th node are added, the driver will become aware of these nodes
> without havng to be restarted?
>
> --
> -Richard L. Burton III
> @rburton
>


Java Driver Question

2016-02-02 Thread Richard L. Burton III
In the case of adding more nodes to the cluster, would my application have
to be restarted to detect the new nodes (as opposed to a node acting like a
coordinator).

e.g., Having the Java code connect using 3 known contact points and when a
4th and 5th node are added, the driver will become aware of these nodes
without havng to be restarted?

-- 
-Richard L. Burton III
@rburton


Missing rows while scanning table using java driver

2016-02-02 Thread Priyanka Gugale
Hi,

I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
a table as per suggestions given here
,
 On running the code to fetch records from table, it fetches different
number of records on each run. Some times it reads all records from table,
 and some times some records are missing. As I have observed there is no
fixed pattern for missing records.

I have tried to set consistency level to ALL while running select query
still I couldn't fetch all records. Is there any known issue? Or am I
suppose to do anything more than running simple "select" statement.

Code snippet to fetch data:

 SimpleStatement stmt = new SimpleStatement(query);
 stmt.setConsistencyLevel(ConsistencyLevel.ALL);
 ResultSet result = session.execute(stmt);
 if (!result.isExhausted()) {
   for (Row row : result) {
 process(row);
   }
 }

Query is of the form: select * from %t where token(%p) > %s limit %l;

where t=tablename, %p=primary key, %s=token value of primary key and l=limit

I am testing on my local machine and has created a Keyspace with
replication factor of 1. Also I don't see any errors in the logs.

-Priyanka


Re: Missing rows while scanning table using java driver

2016-02-02 Thread DuyHai Doan
Why don't you use server-side paging feature instead of messing with tokens
?

http://datastax.github.io/java-driver/manual/paging/

On Wed, Feb 3, 2016 at 7:36 AM, Priyanka Gugale  wrote:

> Hi,
>
> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
> a table as per suggestions given here
> ,
>  On running the code to fetch records from table, it fetches different
> number of records on each run. Some times it reads all records from table,
>  and some times some records are missing. As I have observed there is no
> fixed pattern for missing records.
>
> I have tried to set consistency level to ALL while running select query
> still I couldn't fetch all records. Is there any known issue? Or am I
> suppose to do anything more than running simple "select" statement.
>
> Code snippet to fetch data:
>
>  SimpleStatement stmt = new SimpleStatement(query);
>  stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>  ResultSet result = session.execute(stmt);
>  if (!result.isExhausted()) {
>for (Row row : result) {
>  process(row);
>}
>  }
>
> Query is of the form: select * from %t where token(%p) > %s limit %l;
>
> where t=tablename, %p=primary key, %s=token value of primary key and
> l=limit
>
> I am testing on my local machine and has created a Keyspace with
> replication factor of 1. Also I don't see any errors in the logs.
>
> -Priyanka
>


Re: Missing rows while scanning table using java driver

2016-02-02 Thread priyanka gugale
Thanks DuyHai for revert.

There are going to be pauses between fetching pages. For that they do seem
to have an option to save page state. I will try that out.

-Priyanka

On Wed, Feb 3, 2016 at 12:14 PM, DuyHai Doan  wrote:

> Why don't you use server-side paging feature instead of messing with
> tokens ?
>
> http://datastax.github.io/java-driver/manual/paging/
>
> On Wed, Feb 3, 2016 at 7:36 AM, Priyanka Gugale  wrote:
>
>> Hi,
>>
>> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to
>> scan a table as per suggestions given here
>> ,
>>  On running the code to fetch records from table, it fetches different
>> number of records on each run. Some times it reads all records from table,
>>  and some times some records are missing. As I have observed there is no
>> fixed pattern for missing records.
>>
>> I have tried to set consistency level to ALL while running select query
>> still I couldn't fetch all records. Is there any known issue? Or am I
>> suppose to do anything more than running simple "select" statement.
>>
>> Code snippet to fetch data:
>>
>>  SimpleStatement stmt = new SimpleStatement(query);
>>  stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>>  ResultSet result = session.execute(stmt);
>>  if (!result.isExhausted()) {
>>for (Row row : result) {
>>  process(row);
>>}
>>  }
>>
>> Query is of the form: select * from %t where token(%p) > %s limit %l;
>>
>> where t=tablename, %p=primary key, %s=token value of primary key and
>> l=limit
>>
>> I am testing on my local machine and has created a Keyspace with
>> replication factor of 1. Also I don't see any errors in the logs.
>>
>> -Priyanka
>>
>
>


Re: Moving Away from Compact Storage

2016-02-02 Thread Jack Krupansky
Dynamic columns are handled in CQL using either map collections or
clustering columns or a combination of the two.

-- Jack Krupansky

On Tue, Feb 2, 2016 at 10:11 PM, Anuj Wadehra 
wrote:

> By dynamic columns, I mean columns not defined in schema. In current
> scenario, every row has some data in columns which are defined in schema
> while rest of the data is in columns which are not defined in schema. We
> used Thrift for inserting data.
>
> In new schema, we want to create a collection column and put all the data
> which was there in columns NOT defined in schema to the collection.
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Wed, 3 Feb, 2016 at 12:36 am, DuyHai Doan
>  wrote:
> You 'll need to do the transformation in Spark, although I don't
> understand what you mean by "dynamic columns". Given the CREATE TABLE
> script you gave earlier, there is nothing such as dynamic columns
>
> On Tue, Feb 2, 2016 at 8:01 PM, Anuj Wadehra 
> wrote:
>
>> Will it be possible to read dynamic columns data from compact storage and
>> trasform them as collection e.g. map in new table?
>>
>>
>> Thanks
>> Anuj
>>
>> Sent from Yahoo Mail on Android
>> 
>>
>> On Wed, 3 Feb, 2016 at 12:28 am, DuyHai Doan
>>  wrote:
>> So there is no "static" (in the sense of CQL static) column in your
>> legacy table.
>>
>> Just define a Scala case class to match this table and use Spark to dump
>> the content to a new non compact CQL table
>>
>> On Tue, Feb 2, 2016 at 7:55 PM, Anuj Wadehra 
>> wrote:
>>
>>> Our old table looks like this from cqlsh:
>>>
>>> CREATE TABLE table table1 (
>>>   key text,
>>>   "Col1" blob,
>>>   "Col2" text,
>>>   "Col3" text,
>>>   "Col4" text,
>>>   PRIMARY KEY (key)
>>> ) WITH COMPACT STORAGE AND …
>>>
>>> And it will have some dynamic text data which we are planning to add in
>>> collections..
>>>
>>> Please let me know if you need more details..
>>>
>>>
>>> Thanks
>>> Anuj
>>> Sent from Yahoo Mail on Android
>>> 
>>>
>>> On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan
>>>  wrote:
>>> Can you give the CREATE TABLE script for you old compact storage table ?
>>> Or at least the cassandra-client creation script
>>>
>>> On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra 
>>> wrote:
>>>
 Thanks DuyHai !! We were also thinking to do it the "Spark" way but I
 was not sure that its would be so simple :)

 We have a compact storage cf with each row having some data in staticly
 defined columns while other data in dynamic columns. Is the approach
 mentioned in link adaptable to the scenario where we want to migrate the
 existing data to a Non-Compact CF with static columns and collections ?

 Thanks
 Anuj

 
 On Tue, 2/2/16, DuyHai Doan  wrote:

  Subject: Re: Moving Away from Compact Storage
  To: user@cassandra.apache.org
  Date: Tuesday, 2 February, 2016, 12:57 AM

  Use Apache
  Spark to parallelize the data migration. Look at this piece
  of code
 https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
  If your source and target tables
  have the SAME structure (except for the COMPACT STORAGE
  clause), migration with Spark is a 2 lines of
  code
  On Mon, Feb 1, 2016 at 8:14
  PM, Anuj Wadehra 
  wrote:
  Hi
  Whats the fastest and reliable way
  to migrate data from a Compact Storage table to Non-Compact
  storage table?
  I was not
  able to find any command for dropping the compact storage
  directive..so I think migrating data is the only way...any
  suggestions?
  ThanksAnuj



>>>
>>
>


Re: Moving Away from Compact Storage

2016-02-02 Thread Anuj Wadehra
By dynamic columns, I mean columns not defined in schema. In current scenario, 
every row has some data in columns which are defined in schema while rest of 
the data is in columns which are not defined in schema. We used Thrift for 
inserting data.
In new schema, we want to create a collection column and put all the data which 
was there in columns NOT defined in schema to the collection. 

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Wed, 3 Feb, 2016 at 12:36 am, DuyHai Doan wrote:   
You 'll need to do the transformation in Spark, although I don't understand 
what you mean by "dynamic columns". Given the CREATE TABLE script you gave 
earlier, there is nothing such as dynamic columns
On Tue, Feb 2, 2016 at 8:01 PM, Anuj Wadehra  wrote:

Will it be possible to read dynamic columns data from compact storage and 
trasform them as collection e.g. map in new table?

ThanksAnuj

Sent from Yahoo Mail on Android 
 
 On Wed, 3 Feb, 2016 at 12:28 am, DuyHai Doan wrote:   So 
there is no "static" (in the sense of CQL static) column in your legacy table. 
Just define a Scala case class to match this table and use Spark to dump the 
content to a new non compact CQL table
On Tue, Feb 2, 2016 at 7:55 PM, Anuj Wadehra  wrote:

Our old table looks like this from cqlsh:
CREATE TABLE table table1 (  key text,  "Col1" blob,  "Col2" text,  "Col3" 
text,  "Col4" text,  PRIMARY KEY (key)) WITH COMPACT STORAGE AND …
And it will have some dynamic text data which we are planning to add in 
collections..
Please let me know if you need more details..

ThanksAnujSent from Yahoo Mail on Android 
 
 On Wed, 3 Feb, 2016 at 12:14 am, DuyHai Doan wrote:   
Can you give the CREATE TABLE script for you old compact storage table ? Or at 
least the cassandra-client creation script
On Tue, Feb 2, 2016 at 3:48 PM, Anuj Wadehra  wrote:

Thanks DuyHai !! We were also thinking to do it the "Spark" way but I was not 
sure that its would be so simple :)

We have a compact storage cf with each row having some data in staticly defined 
columns while other data in dynamic columns. Is the approach mentioned in link 
adaptable to the scenario where we want to migrate the existing data to a 
Non-Compact CF with static columns and collections ?

Thanks
Anuj


On Tue, 2/2/16, DuyHai Doan  wrote:

 Subject: Re: Moving Away from Compact Storage
 To: user@cassandra.apache.org
 Date: Tuesday, 2 February, 2016, 12:57 AM

 Use Apache
 Spark to parallelize the data migration. Look at this piece
 of code 
https://github.com/doanduyhai/Cassandra-Spark-Demo/blob/master/src/main/scala/usecases/MigrateAlbumsData.scala#L58-L60
 If your source and target tables
 have the SAME structure (except for the COMPACT STORAGE
 clause), migration with Spark is a 2 lines of
 code
 On Mon, Feb 1, 2016 at 8:14
 PM, Anuj Wadehra 
 wrote:
 Hi
 Whats the fastest and reliable way
 to migrate data from a Compact Storage table to Non-Compact
 storage table?
 I was not
 able to find any command for dropping the compact storage
 directive..so I think migrating data is the only way...any
 suggestions?
 ThanksAnuj




  


  


  


Re: Missing rows while scanning table using java driver

2016-02-02 Thread Ryan Svihla
Priyanka,

This is a better question for the Cassandra user mailing list (cc’d above) 
which is where many experts in the use of Cassandra are subscribed, where as 
this list is more about improving or changing Cassandra itself.

As to your issue, there can be many combined issues at once that are leading to 
this situation, can I suggest you respond on the user list with the following:

- Keyspace (RF especially), data center and table configuration.
- Any errors in the logs on the Cassandra nodes.

Regards,

Ryan Svihla

> On Feb 2, 2016, at 4:58 AM, Priyanka Gugale  wrote:
> 
> I am using query of the form: select * from %t where token(%p) > %s limit
> %l;
> 
> where t=tablename, %p=primary key, %s=token value of primary key and l=limit
> 
> -Priyanka
> 
> On Mon, Feb 1, 2016 at 6:19 PM, Priyanka Gugale  wrote:
> 
>> Hi,
>> 
>> I am using Cassandra 2.2.0 and cassandra driver 2.1.8. I am trying to scan
>> a table as per suggestions given here
>> ,
>> On running the code to fetch records from table, it fetches different
>> number of records on each run. Some times it reads all records from table,
>> and some times some records are missing. As I have observed there is no
>> fixed pattern for missing records.
>> 
>> I have tried to set consistency level to ALL while running select query
>> still I couldn't fetch all records. Is there any known issue? Or am I
>> suppose to do anything more than running simple "select" statement.
>> 
>> Code snippet to fetch data:
>> 
>> SimpleStatement stmt = new SimpleStatement(query);
>> stmt.setConsistencyLevel(ConsistencyLevel.ALL);
>> ResultSet result = session.execute(stmt);
>> if (!result.isExhausted()) {
>>   for (Row row : result) {
>> process(row);
>>   }
>> }
>> 
>> -Priyanka
>>