Re: Advice for asymmetric reporting cluster architecture

2015-10-18 Thread Ryan Svihla
Don't forget SSDs for indexing joy and a reasonable amount of cpu or those 
indexes will be very behind.
If you size the hardware correctly and avoid very silly configuration it works 
really well for this sort of purpose, especially when combined with Spark to do 
any hardcore analysis on the filtered dataset.

- Ryan Svihla




On Sat, Oct 17, 2015 at 7:12 PM -0700, "Jack Krupansky" 
 wrote:










Yes, you can have all your normal data centers with DSE configured for 
real-time data access and then have a data center that shares the same data but 
has DSE Search (Solr indexing) enabled. Your Cassandra data will get replicated 
to the Search data center and then indexed there and only there. You do need to 
have more RAM on the DSE Search nodes for the indexing, and maybe more nodes as 
well to assure decent latency for complex queries.
-- Jack Krupansky

On Sat, Oct 17, 2015 at 3:54 PM, Mark Lewis  wrote:


I hadn't considered it because I didn't think it could be configured just for a 
single data center; can it?
On Oct 17, 2015 8:50 AM, "Jack Krupansky"  wrote:
Did you consider DSE Search in a DC?
-- Jack Krupansky

On Sat, Oct 17, 2015 at 11:30 AM, Mark Lewis  wrote:
I've got an existing C* cluster spread across three data centers, and I'm 
wrestling with how to add some support for ad-hoc user reporting against 
(ideally) near real-time data.  
The type of reports I want to support basically boil down to allowing the user 
to select a single highly-denormalized "Table" from a predefined list, pick 
some filters (ideally with arbitrary boolean logic), project out some columns, 
and allow for some simple grouping and aggregation.  I've seen several 
companies expose reporting this way and it seems like a good way to avoid the 
complexity of joins while still providing a good deal of flexibility.
Has anybody done this or have any recommendations?
My current thinking is that I'd like to have the ad-hoc reporting 
infrastructure in separate data centers from our active production OLTP-type 
stuff, both to isolate any load away from the OLTP infrastructure and also 
because I'll likely need other stuff there (Spark?) to support ad-hoc reporting.
So I basically have two problems:(1) Get an eventually-consistent view of the 
data into a data-center I can query against relativly quickly (so no big batch 
imports)(2) Be able to run ad-hoc user queries against it
If I just think about query flexibility, I might consider dumping data into 
PostgreSQL nodes (practical because the data that any individual user can query 
will fit onto a single node).  But then I have the problem of getting the data 
there; I looked into an architecture using Kafka to pump data from the OLTP 
data centers to PostgreSQL mirrors, but down that road lies the need to 
manually deal with the eventual consistency.  Ugh.
If I just run C* nodes in my reporting cluster that makes the problem of 
getting the data into the right place with eventual consistency easy to solve 
and I like that idea quite a lot, but then I need to run reporting against C*.  
I could make the queries I need to run reasonably performant with enough 
secondary-indexes or materialized views (we're upgrading to 3.0 soon), but I 
would need a lot of secondary-indexes and materialized views, and I'd rather 
not pay to store them in all of my data centers.  I wish there were a way to 
define secondary-indexes or materialized views to only exist in one DC of a 
cluster, but unless I've missed something it doesn't look possible.
Any advice or case studies in this area would be greatly appreciated.
-- Mark

Re: How to read data from local cassandra cluster

2015-10-18 Thread Ryan Svihla
Not a Cassandra question so this isn't the right list, but you can just upload 
the file to CFS and then access it by the path "cfs://filename".
However, since you have DSE you may want to contact support for help with 
pathing in DSE using CFS and Spark.
-Ryan Svihla




On Fri, Oct 16, 2015 at 1:33 AM -0700, "Adamantios Corais" 
 wrote:











Hi,
I have install Cassandra locally (DataStax Enterprise to be specific). 
Everything seems to work ok. For example, I can upload a test file into CFS or 
open a Spark REPL.
However, when it comes to my very own Spark application, I can't understand how 
to modify sc.textFile("/user/testuser/words.txt") so that I can read the file I 
just uploaded to my local DataStax installation. 
How should I refer to the associated host?

// Adamantios

Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Kevin Burton
ouch.. OK.. I think I really shot myself in the foot here then.  This might
be bad.

I'm not sure if I would have missing data.  I mean basically the data is on
the other nodes.. but the cluster has been running with 10 nodes
accidentally bootstrapped with auto_bootstrap=false.

So they have new data and seem to be missing values.

this is somewhat misleading... Initially if you start it up and run
nodetool status , it only returns one node.

So I assumed auto_bootstrap=false meant that it just doesn't join the
cluster.

I'm running a nodetool repair now to hopefully fix this.



On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa 
wrote:

> auto_bootstrap=false tells it to join the cluster without running
> bootstrap – the node assumes it has all of the necessary data, and won’t
> stream any missing data.
>
> This generally violates consistency guarantees, but if done on a single
> node, is typically correctable with `nodetool repair`.
>
> If you do it on many  nodes at once, it’s possible that the new nodes
> could represent all 3 replicas of the data, but don’t physically have any
> of that data, leading to missing records.
>
>
>
> From:  on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 3:44 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at
> once?
>
> An shit.. I think we're seeing corruption.. missing records :-/
>
> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton  wrote:
>
>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new
>> nodes)
>>
>> By default we have auto_boostrap = false
>>
>> so we just push our config to the cluster, the cassandra daemons restart,
>> and they're not cluster members and are the only nodes in the cluster.
>>
>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about
>> 7 members of the cluster and 8 not yet joined.
>>
>> We are only doing 1 at a time because apparently bootstrapping more than
>> 1 is unsafe.
>>
>> I did a rolling restart whereby I went through and restarted all the
>> cassandra boxes.
>>
>> Somehow the new nodes auto boostrapped themselves EVEN though
>> auto_bootstrap=false.
>>
>> We don't have any errors.  Everything seems functional.  I'm just worried
>> about data loss.
>>
>> Thoughts?
>>
>> Kevin
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Changing nodes ips

2015-10-18 Thread Cyril Scetbon
Hi,

I want to change the ip addresses (ip v4) that nodes use to discuss (gossip). 
I'm trying to migrate from ipv4 to ipv6 for those communications. I tried to 
follow a similar procedure as the one used at CASSANDRA-8382 
.
 However it doesn't work as expected. When I do the change on the first node, 
nodes seems to not see each other, if I trust nodetool :

on first node :

Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  OwnsHost ID  
 Rack
DN  10.10.12.19?  256 ?   
dab24e23-4b42-438e-9070-7994e329e868  i10
UN  2a01:c940:a5:2005:0:0:0:18  244.35 MB  256 ?   
03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10

on second node :

Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID  
 Rack
DN  10.10.12.18  244.24 MB  256 100.0%
03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10
UN  10.10.12.19  244.11 MB  256 100.0%
dab24e23-4b42-438e-9070-7994e329e868  i10

I can see in the first node logs that it tries to handshake with node 2, 
however I can't see neither error in node 1 logs nor information in node 2 logs.

Of course, I'm trying to find a procedure that does not cause any downtime of 
the whole cluster.

Any idea ?  
 -- 
Cyril SCETBON



compact/repair shouldn't compete for normal compaction resources.

2015-10-18 Thread Kevin Burton
I'm doing a big nodetool repair right now and I'm pretty sure the added
overhead is impacting our performance.

Shouldn't you be able to throttle repair so that normal compactions can use
most of the resources?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Raj Chudasama
In this can does it make sense to remove newly added nodes, correct the 
configuration and have them rejoin one at a time ?

Thx

Sent from my iPhone

> On Oct 18, 2015, at 11:19 PM, Jeff Jirsa  wrote:
> 
> Take a snapshot now, before you get rid of any data (whatever you do, don’t 
> run cleanup). 
> 
> If you identify missing data, you can go back to those snapshots, find the 
> nodes that had the data previously (sstable2json, for example), and either 
> re-stream that data into the cluster with sstableloader or copy it to a new 
> host and `nodetool refresh` it into the new system.
> 
> 
> 
> From:  on behalf of Kevin Burton
> Reply-To: "user@cassandra.apache.org"
> Date: Sunday, October 18, 2015 at 8:10 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at 
> once?
> 
> ouch.. OK.. I think I really shot myself in the foot here then.  This might 
> be bad.
> 
> I'm not sure if I would have missing data.  I mean basically the data is on 
> the other nodes.. but the cluster has been running with 10 nodes accidentally 
> bootstrapped with auto_bootstrap=false.  
> 
> So they have new data and seem to be missing values. 
> 
> this is somewhat misleading... Initially if you start it up and run nodetool 
> status , it only returns one node. 
> 
> So I assumed auto_bootstrap=false meant that it just doesn't join the cluster.
> 
> I'm running a nodetool repair now to hopefully fix this.
> 
> 
> 
>> On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa   
>> wrote:
>> auto_bootstrap=false tells it to join the cluster without running bootstrap 
>> – the node assumes it has all of the necessary data, and won’t stream any 
>> missing data.
>> 
>> This generally violates consistency guarantees, but if done on a single 
>> node, is typically correctable with `nodetool repair`.
>> 
>> If you do it on many  nodes at once, it’s possible that the new nodes could 
>> represent all 3 replicas of the data, but don’t physically have any of that 
>> data, leading to missing records.
>> 
>> 
>> 
>> From:  on behalf of Kevin Burton
>> Reply-To: "user@cassandra.apache.org"
>> Date: Sunday, October 18, 2015 at 3:44 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at 
>> once?
>> 
>> An shit.. I think we're seeing corruption.. missing records :-/
>> 
>>> On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton  wrote:
>>> We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new 
>>> nodes)
>>> 
>>> By default we have auto_boostrap = false
>>> 
>>> so we just push our config to the cluster, the cassandra daemons restart, 
>>> and they're not cluster members and are the only nodes in the cluster.
>>> 
>>> Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about 7 
>>> members of the cluster and 8 not yet joined.
>>> 
>>> We are only doing 1 at a time because apparently bootstrapping more than 1 
>>> is unsafe.  
>>> 
>>> I did a rolling restart whereby I went through and restarted all the 
>>> cassandra boxes.  
>>> 
>>> Somehow the new nodes auto boostrapped themselves EVEN though 
>>> auto_bootstrap=false.
>>> 
>>> We don't have any errors.  Everything seems functional.  I'm just worried 
>>> about data loss.
>>> 
>>> Thoughts?
>>> 
>>> Kevin
>>> 
>>> -- 
>>> We’re hiring if you know of any awesome Java Devops or Linux Operations 
>>> Engineers!
>>> 
>>> Founder/CEO Spinn3r.com
>>> Location: San Francisco, CA
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>> 
>> 
>> 
>> -- 
>> We’re hiring if you know of any awesome Java Devops or Linux Operations 
>> Engineers!
>> 
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
> 
> 
> 
> -- 
> We’re hiring if you know of any awesome Java Devops or Linux Operations 
> Engineers!
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 


Map of UDT Using Spring Cassandra

2015-10-18 Thread Kenji Fnu
I created a user defined type in cassandra, let's call it
param that has key text, value text as its property
and
stats with value int, events int and users int as its property.

recently i tried to insert data using spring-cassandra driver, its got me
this error:
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'insertParam': Injection of autowired dependencies failed;
nested exception is
org.springframework.beans.factory.BeanCreationException: Could not autowire
field: private com.test.repository.ParamRepository
com.test.service.impl.insertParam.ParamRepository; nested exception is
org.springframework.beans.factory.BeanCreationException: Error creating
bean with name 'ParamRepository': Invocation of init method failed; nested
exception is
org.springframework.data.cassandra.mapping.VerifierMappingExceptions:
com.test.model.CassandraEventStatsModel:
Cassandra entities must have the @Table, @Persistent or @PrimaryKeyClass
Annotation

inside the eventstatsmodel class:
package technology.mainspring.mscassandra.model;

import com.datastax.driver.mapping.annotations.Field;
import com.datastax.driver.mapping.annotations.UDT;

@UDT(keyspace = "com_test", name = "stats")
public class CassandraEventStatsModel {
@Field(name = "value")
public double value;
@Field(name = "events")
public int total_event;
@Field(name = "users")
public int total_uniq_user;
public double getValue(){
return value;
}
public int getTotalEvent(){
return total_event;
}
public int getTotalUniqueUser(){
return total_uniq_user;
}
public void setValue(double value){
this.value = value;
}
public void setTotalEvent(int total_event){
this.total_event = total_event;
}
public void setTotalUniqueUser(int total_uniq_user){
this.total_uniq_user = total_uniq_user;
}
}

i wonder what i did wrong? thanks


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Jeff Jirsa
Take a snapshot now, before you get rid of any data (whatever you do, don’t run 
cleanup). 

If you identify missing data, you can go back to those snapshots, find the 
nodes that had the data previously (sstable2json, for example), and either 
re-stream that data into the cluster with sstableloader or copy it to a new 
host and `nodetool refresh` it into the new system.



From:   on behalf of Kevin Burton
Reply-To:  "user@cassandra.apache.org"
Date:  Sunday, October 18, 2015 at 8:10 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Would we have data corruption if we bootstrapped 10 nodes at once?

ouch.. OK.. I think I really shot myself in the foot here then.  This might be 
bad. 

I'm not sure if I would have missing data.  I mean basically the data is on the 
other nodes.. but the cluster has been running with 10 nodes accidentally 
bootstrapped with auto_bootstrap=false.  

So they have new data and seem to be missing values. 

this is somewhat misleading... Initially if you start it up and run nodetool 
status , it only returns one node. 

So I assumed auto_bootstrap=false meant that it just doesn't join the cluster.

I'm running a nodetool repair now to hopefully fix this.



On Sun, Oct 18, 2015 at 7:25 PM, Jeff Jirsa  wrote:
auto_bootstrap=false tells it to join the cluster without running bootstrap – 
the node assumes it has all of the necessary data, and won’t stream any missing 
data.

This generally violates consistency guarantees, but if done on a single node, 
is typically correctable with `nodetool repair`.

If you do it on many  nodes at once, it’s possible that the new nodes could 
represent all 3 replicas of the data, but don’t physically have any of that 
data, leading to missing records.



From:  on behalf of Kevin Burton
Reply-To: "user@cassandra.apache.org"
Date: Sunday, October 18, 2015 at 3:44 PM
To: "user@cassandra.apache.org"
Subject: Re: Would we have data corruption if we bootstrapped 10 nodes at once?

An shit.. I think we're seeing corruption.. missing records :-/

On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton  wrote:
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new nodes) 

By default we have auto_boostrap = false

so we just push our config to the cluster, the cassandra daemons restart, and 
they're not cluster members and are the only nodes in the cluster.

Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about 7 
members of the cluster and 8 not yet joined.

We are only doing 1 at a time because apparently bootstrapping more than 1 is 
unsafe.  

I did a rolling restart whereby I went through and restarted all the cassandra 
boxes.  

Somehow the new nodes auto boostrapped themselves EVEN though 
auto_bootstrap=false.

We don't have any errors.  Everything seems functional.  I'm just worried about 
data loss.

Thoughts?

Kevin

-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




smime.p7s
Description: S/MIME cryptographic signature


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-18 Thread Jeff Jirsa
auto_bootstrap=false tells it to join the cluster without running bootstrap – 
the node assumes it has all of the necessary data, and won’t stream any missing 
data.

This generally violates consistency guarantees, but if done on a single node, 
is typically correctable with `nodetool repair`.

If you do it on many  nodes at once, it’s possible that the new nodes could 
represent all 3 replicas of the data, but don’t physically have any of that 
data, leading to missing records.



From:   on behalf of Kevin Burton
Reply-To:  "user@cassandra.apache.org"
Date:  Sunday, October 18, 2015 at 3:44 PM
To:  "user@cassandra.apache.org"
Subject:  Re: Would we have data corruption if we bootstrapped 10 nodes at once?

An shit.. I think we're seeing corruption.. missing records :-/

On Sat, Oct 17, 2015 at 10:45 AM, Kevin Burton  wrote:
We just migrated from a 30 node cluster to a 45 node cluster. (so 15 new nodes) 

By default we have auto_boostrap = false

so we just push our config to the cluster, the cassandra daemons restart, and 
they're not cluster members and are the only nodes in the cluster.

Anyway.  While I was about 1/2 way done adding the 15 nodes,  I had about 7 
members of the cluster and 8 not yet joined.

We are only doing 1 at a time because apparently bootstrapping more than 1 is 
unsafe.  

I did a rolling restart whereby I went through and restarted all the cassandra 
boxes.  

Somehow the new nodes auto boostrapped themselves EVEN though 
auto_bootstrap=false.

We don't have any errors.  Everything seems functional.  I'm just worried about 
data loss.

Thoughts?

Kevin

-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




-- 
We’re hiring if you know of any awesome Java Devops or Linux Operations 
Engineers!

Founder/CEO Spinn3r.com
Location: San Francisco, CA
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




smime.p7s
Description: S/MIME cryptographic signature


"invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-18 Thread Branton Davis
Hey all.

We've been seeing this warning on one of our clusters:

2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
org.apache.cassandra.db.context.CounterContext invalid global counter shard
detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
(4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
pick highest to self-heal on compaction


>From what I've read and heard in the IRC channel, this warning could be
related to not running upgradesstables after upgrading from 2.0.x to
2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
November.  Looking back, the warnings start appearing around June, when no
maintenance had been performed on the cluster.  At that time, we had been
on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
(the upgrade was when we noticed this warning for the first time).

>From a suggestion in IRC, I went ahead and ran upgradesstables on all the
nodes.  Our weekly repair also ran this morning.  But the warnings still
show up throughout the day.

So, we have many questions:

   - How much should we be freaking out?
   - Why is this recurring?  If I understand what's happening, this is a
   self-healing process.  So, why would it keep happening?  Are we possibly
   using counters incorrectly?
   - What does it even mean that there were multiple shards for the same
   counter?  How does that situation even occur?

We're pretty lost here, so any help would be greatly appreciated.

Thanks!