Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Sylvain Lebresne
On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com wrote:
 me neither don't want to repair one CF at the time.
 the node repair took a week and still running, compactionstats and
 netstream shows nothing is running on every node,  and also no error
 message, no exception, really no idea what was it doing,

To add to the list of things repair does wrong in 0.7, we'll have to add that
if one of the node participating in the repair (so any node that share a range
with the node on which repair was started) goes down (even for a short time),
then the repair will simply hang forever doing nothing. And no specific
error message will be logged. That could be what happened. Again, recent
releases of 0.8 fix that too.

--
Sylvain

 I stopped yesterday.  maybe I should run repair again while disable
 compaction on all nodes?
 thanks!

 On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
 peter.schul...@infidyne.com wrote:

  I think it is a serious problem since I can not repair.  I am
  using cassandra on production servers. is there some way to fix it
  without upgrade?  I heard of that 0.8.x is still not quite ready in
  production environment.

 It is a serious issue if you really need to repair one CF at the time.
 However, looking at your original post it seems this is not
 necessarily your issue. Do you need to, or was your concern rather the
 overall time repair took?

 There are other things that are improved in 0.8 that affect 0.7. In
 particular, (1) in 0.7 compaction, including validating compactions
 that are part of repair, is non-concurrent so if your repair starts
 while there is a long-running compaction going it will have to wait,
 and (2) semi-related is that the merkle tree calculation that is part
 of repair/anti-entropy may happen out of synch if one of the nodes
 participating happen to be busy with compaction. This in turns causes
 additional data to be sent as part of repair.

 That might be why your immediately following repair took a long time,
 but it's difficult to tell.

 If you're having issues with repair and large data sets, I would
 generally say that upgrading to 0.8 is recommended. However, if you're
 on 0.7.4, beware of
 https://issues.apache.org/jira/browse/CASSANDRA-3166

 --
 / Peter Schuller (@scode on twitter)




Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu
is 0.8 ready for production use?   as I know currently many companies
including reddit.com are using 0.7, how does they get rid of the repair
problem?

On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com wrote:
  me neither don't want to repair one CF at the time.
  the node repair took a week and still running, compactionstats and
  netstream shows nothing is running on every node,  and also no error
  message, no exception, really no idea what was it doing,

 To add to the list of things repair does wrong in 0.7, we'll have to add
 that
 if one of the node participating in the repair (so any node that share a
 range
 with the node on which repair was started) goes down (even for a short
 time),
 then the repair will simply hang forever doing nothing. And no specific
 error message will be logged. That could be what happened. Again, recent
 releases of 0.8 fix that too.

 --
 Sylvain

  I stopped yesterday.  maybe I should run repair again while disable
  compaction on all nodes?
  thanks!
 
  On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
  peter.schul...@infidyne.com wrote:
 
   I think it is a serious problem since I can not repair.  I am
   using cassandra on production servers. is there some way to fix it
   without upgrade?  I heard of that 0.8.x is still not quite ready in
   production environment.
 
  It is a serious issue if you really need to repair one CF at the time.
  However, looking at your original post it seems this is not
  necessarily your issue. Do you need to, or was your concern rather the
  overall time repair took?
 
  There are other things that are improved in 0.8 that affect 0.7. In
  particular, (1) in 0.7 compaction, including validating compactions
  that are part of repair, is non-concurrent so if your repair starts
  while there is a long-running compaction going it will have to wait,
  and (2) semi-related is that the merkle tree calculation that is part
  of repair/anti-entropy may happen out of synch if one of the nodes
  participating happen to be busy with compaction. This in turns causes
  additional data to be sent as part of repair.
 
  That might be why your immediately following repair took a long time,
  but it's difficult to tell.
 
  If you're having issues with repair and large data sets, I would
  generally say that upgrading to 0.8 is recommended. However, if you're
  on 0.7.4, beware of
  https://issues.apache.org/jira/browse/CASSANDRA-3166
 
  --
  / Peter Schuller (@scode on twitter)
 
 



Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Sylvain Lebresne
On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu springri...@gmail.com wrote:
 is 0.8 ready for production use?

some related discussion here:
http://www.mail-archive.com/user@cassandra.apache.org/msg17055.html
but my personal answer is yes.

  as I know currently many companies including reddit.com are using 0.7, how
 does they get rid of the repair problem?

Repair problems in 0.7 don't hit everyone equally. For some people, it works
relatively well even if not in the most efficient ways. Also, for some workload
(if you don't do  much deletes for instance), you can set a big gc_grace_seconds
value (say a month) and only run repair that often, which can make repair
inefficiencies more bearable.
That being said, I can't speak for many companies, but I do advise evaluating
an upgrade to 0.8.

--
Sylvain


 On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.com
 wrote:

 On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com wrote:
  me neither don't want to repair one CF at the time.
  the node repair took a week and still running, compactionstats and
  netstream shows nothing is running on every node,  and also no error
  message, no exception, really no idea what was it doing,

 To add to the list of things repair does wrong in 0.7, we'll have to add
 that
 if one of the node participating in the repair (so any node that share a
 range
 with the node on which repair was started) goes down (even for a short
 time),
 then the repair will simply hang forever doing nothing. And no specific
 error message will be logged. That could be what happened. Again, recent
 releases of 0.8 fix that too.

 --
 Sylvain

  I stopped yesterday.  maybe I should run repair again while disable
  compaction on all nodes?
  thanks!
 
  On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
  peter.schul...@infidyne.com wrote:
 
   I think it is a serious problem since I can not repair.  I am
   using cassandra on production servers. is there some way to fix it
   without upgrade?  I heard of that 0.8.x is still not quite ready in
   production environment.
 
  It is a serious issue if you really need to repair one CF at the time.
  However, looking at your original post it seems this is not
  necessarily your issue. Do you need to, or was your concern rather the
  overall time repair took?
 
  There are other things that are improved in 0.8 that affect 0.7. In
  particular, (1) in 0.7 compaction, including validating compactions
  that are part of repair, is non-concurrent so if your repair starts
  while there is a long-running compaction going it will have to wait,
  and (2) semi-related is that the merkle tree calculation that is part
  of repair/anti-entropy may happen out of synch if one of the nodes
  participating happen to be busy with compaction. This in turns causes
  additional data to be sent as part of repair.
 
  That might be why your immediately following repair took a long time,
  but it's difficult to tell.
 
  If you're having issues with repair and large data sets, I would
  generally say that upgrading to 0.8 is recommended. However, if you're
  on 0.7.4, beware of
  https://issues.apache.org/jira/browse/CASSANDRA-3166
 
  --
  / Peter Schuller (@scode on twitter)
 
 




Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Sasha Dolgy
It was mentioned in another thread that Twitter uses 0.8 in
productionfor me that was a fairly strong testimonial...
On Sep 14, 2011 9:28 AM, Yan Chunlu springri...@gmail.com wrote:
 is 0.8 ready for production use? as I know currently many companies
 including reddit.com are using 0.7, how does they get rid of the repair
 problem?

 On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.com
wrote:

 On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com
wrote:
  me neither don't want to repair one CF at the time.
  the node repair took a week and still running, compactionstats and
  netstream shows nothing is running on every node, and also no error
  message, no exception, really no idea what was it doing,

 To add to the list of things repair does wrong in 0.7, we'll have to add
 that
 if one of the node participating in the repair (so any node that share a
 range
 with the node on which repair was started) goes down (even for a short
 time),
 then the repair will simply hang forever doing nothing. And no specific
 error message will be logged. That could be what happened. Again, recent
 releases of 0.8 fix that too.

 --
 Sylvain

  I stopped yesterday. maybe I should run repair again while disable
  compaction on all nodes?
  thanks!
 
  On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
  peter.schul...@infidyne.com wrote:
 
   I think it is a serious problem since I can not repair. I am
   using cassandra on production servers. is there some way to fix it
   without upgrade? I heard of that 0.8.x is still not quite ready in
   production environment.
 
  It is a serious issue if you really need to repair one CF at the time.
  However, looking at your original post it seems this is not
  necessarily your issue. Do you need to, or was your concern rather the
  overall time repair took?
 
  There are other things that are improved in 0.8 that affect 0.7. In
  particular, (1) in 0.7 compaction, including validating compactions
  that are part of repair, is non-concurrent so if your repair starts
  while there is a long-running compaction going it will have to wait,
  and (2) semi-related is that the merkle tree calculation that is part
  of repair/anti-entropy may happen out of synch if one of the nodes
  participating happen to be busy with compaction. This in turns causes
  additional data to be sent as part of repair.
 
  That might be why your immediately following repair took a long time,
  but it's difficult to tell.
 
  If you're having issues with repair and large data sets, I would
  generally say that upgrading to 0.8 is recommended. However, if you're
  on 0.7.4, beware of
  https://issues.apache.org/jira/browse/CASSANDRA-3166
 
  --
  / Peter Schuller (@scode on twitter)
 
 



Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu
thanks a lot for the help!

 I have read the post and think 0.8 might be good enough for me, especially
0.8.5.

also change gc_grace_seconds is a acceptable solution.



On Wed, Sep 14, 2011 at 4:03 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu springri...@gmail.com wrote:
  is 0.8 ready for production use?

 some related discussion here:
 http://www.mail-archive.com/user@cassandra.apache.org/msg17055.html
 but my personal answer is yes.

   as I know currently many companies including reddit.com are using 0.7,
 how
  does they get rid of the repair problem?

 Repair problems in 0.7 don't hit everyone equally. For some people, it
 works
 relatively well even if not in the most efficient ways. Also, for some
 workload
 (if you don't do  much deletes for instance), you can set a big
 gc_grace_seconds
 value (say a month) and only run repair that often, which can make repair
 inefficiencies more bearable.
 That being said, I can't speak for many companies, but I do advise
 evaluating
 an upgrade to 0.8.

 --
 Sylvain

 
  On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.com
  wrote:
 
  On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com
 wrote:
   me neither don't want to repair one CF at the time.
   the node repair took a week and still running, compactionstats and
   netstream shows nothing is running on every node,  and also no error
   message, no exception, really no idea what was it doing,
 
  To add to the list of things repair does wrong in 0.7, we'll have to add
  that
  if one of the node participating in the repair (so any node that share a
  range
  with the node on which repair was started) goes down (even for a short
  time),
  then the repair will simply hang forever doing nothing. And no specific
  error message will be logged. That could be what happened. Again, recent
  releases of 0.8 fix that too.
 
  --
  Sylvain
 
   I stopped yesterday.  maybe I should run repair again while disable
   compaction on all nodes?
   thanks!
  
   On Wed, Sep 14, 2011 at 6:57 AM, Peter Schuller
   peter.schul...@infidyne.com wrote:
  
I think it is a serious problem since I can not repair.  I am
using cassandra on production servers. is there some way to fix it
without upgrade?  I heard of that 0.8.x is still not quite ready in
production environment.
  
   It is a serious issue if you really need to repair one CF at the
 time.
   However, looking at your original post it seems this is not
   necessarily your issue. Do you need to, or was your concern rather
 the
   overall time repair took?
  
   There are other things that are improved in 0.8 that affect 0.7. In
   particular, (1) in 0.7 compaction, including validating compactions
   that are part of repair, is non-concurrent so if your repair starts
   while there is a long-running compaction going it will have to wait,
   and (2) semi-related is that the merkle tree calculation that is part
   of repair/anti-entropy may happen out of synch if one of the nodes
   participating happen to be busy with compaction. This in turns causes
   additional data to be sent as part of repair.
  
   That might be why your immediately following repair took a long time,
   but it's difficult to tell.
  
   If you're having issues with repair and large data sets, I would
   generally say that upgrading to 0.8 is recommended. However, if
 you're
   on 0.7.4, beware of
   https://issues.apache.org/jira/browse/CASSANDRA-3166
  
   --
   / Peter Schuller (@scode on twitter)
  
  
 
 



segment fault with 0.8.5

2011-09-14 Thread Yan Chunlu
just tried cassandra 0.8.5 binary version, and got Segment fault

I am using Sun JDK so this is not CASSANDRA-2441


OS is Debian 5.0


java -version

java version 1.6.0_04

Java(TM) SE Runtime Environment (build 1.6.0_04-b12)

Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)


uname -a

Linux mao 2.6.27.59 #1 SMP Mon Jul 25 14:30:33 CST 2011 i686 GNU/Linux


I also found that the format of configuration file cassandra.yaml is
different, are they compatible?



thanks!


Re: segment fault with 0.8.5

2011-09-14 Thread Roshan Dawrani
On Wed, Sep 14, 2011 at 3:43 PM, Yan Chunlu springri...@gmail.com wrote:

 I also found that the format of configuration file cassandra.yaml is
 different, are they compatible?

Format of 0.8.5 cassandra.yaml is different from what? You didn't mention
what u r comparing it to.

I recently did migration of a simple cassandra DB from 0.7.0 to 0.8.5 and
found quite a few differences in structure of cassandra.yaml - the biggest
one that affected us was that cassandra.yaml couldn't hold the defintion
of a keyspace, which we used for embedded cassandra we bring up for testing.

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani http://twitter.com/roshandawrani
Skype: roshandawrani


Re: Cassandra cluster on ec2 and ebs volumes

2011-09-14 Thread Jonathan Ellis
[moving to user@]

On Wed, Sep 14, 2011 at 6:22 AM, Giannis Neokleous
gian...@generalsentiment.com wrote:
 Hello,

 We currently have a cluster running on ec2 and all of the data are on
 the instance disks. We also have some old data which are now constant
 that we want to serve off from a different cluster still running on ec2.
 We want to have the ability to turn on/off this cluster at any time
 without having to reinsert any of the data. Is it possible to setup
 cassandra on ec2 so that the data can live on ebs volumes which can be
 attached/detached every time we want to bring down the cluster?
 Reloading the sstables will not work for us because we want to be able
 to turn on the cluster and have it serving data within minutes.

 Does anyone have this kind of setup working right now and if so how
 reliable is this?

 Thanks,

 -Giannis





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: segment fault with 0.8.5

2011-09-14 Thread Jonathan Ellis
That's a pretty old JDK.  You should upgrade.

On Wed, Sep 14, 2011 at 5:13 AM, Yan Chunlu springri...@gmail.com wrote:
 just tried cassandra 0.8.5 binary version, and got Segment fault

 I am using Sun JDK so this is not CASSANDRA-2441

 OS is Debian 5.0

 java -version

 java version 1.6.0_04

 Java(TM) SE Runtime Environment (build 1.6.0_04-b12)

 Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)

 uname -a

 Linux mao 2.6.27.59 #1 SMP Mon Jul 25 14:30:33 CST 2011 i686 GNU/Linux

 I also found that the format of configuration file cassandra.yaml is
 different, are they compatible?

 thanks!




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Error in upgrading cassandra to 0.8.5

2011-09-14 Thread Jonas Borgström
On 09/13/2011 05:21 PM, Jonathan Ellis wrote:
 More or less.  NEWS.txt explains upgrade procedure in more detail.

When moving from 0.7.x to 0.8.5 do I need to scrub all sstables post
upgrade?

NEWS.txt doesn't mention anything about that but your comment here seems
to indicate so:

https://issues.apache.org/jira/browse/CASSANDRA-2739?focusedCommentId=13071490page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13071490

/ Jonas


Re: Index search in provided list of rows (list of rowKeys).

2011-09-14 Thread Evgeniy Ryabitskiy
Why it's radically?

It will be same get_indexes_slices search but in specified set of rows. So
mostly it will be one more Search Expression over rowIDs not only column
values. Usually the more restrictions you could specify in search query, the
faster search it can be (not slower at least).

About moving to another engine:

Sphinx has it's advantages (quite fast) and disadvantages (painful
integration, lot's of limitations). Currently my company using it on
production, so moving to another search engine is a big step and it will be
considered.


What I want to discuss is common task of searching in Cassandra. Maybe I
missing some already well known solution for it (silver bullet)?
I see only 2 solutions:

1) Using external search engine that will index all storage fields

advantage:
 support full text search
some engines have nice search features like sorting by relevance

disadvantage:
for range scans it stores column values, it mean that huge part of cassandra
data will be also stored at Search Engine metadata
usually engines have set of limitations

2) Use Cassandra embedded Indexing search
advantage:
doesn't need to index all columns that are used for filtering.
Filtering performed at storage, close to data.

disadvantage:
not full text search support
require to create and maintain secondary indexes.

Both solutions are exclusive, you could choose only one and there is no way
to use combination of this 2 solutions (except intersection at client side
which is not a solution).

So API that was discussed would open some possibility to use that
combination.
For me it looks like third solution. Could it really change the way we are
searching in Cassandra?


Evgeny.


Get CL ONE / NTS

2011-09-14 Thread Pierre Chalamet
Hello,

I have 2 datacenters. Cassandra is configured as follow:
- RackInferringSnitch
- NetworkTopologyStrategy for CF
- strategy_options: DC1:3 DC2:3

Data are written using CL LOCAL_QUORUM so data written from one datacenter will 
eventually be replicated to the other datacenter. Data is always written 
exactly once. 

On the other side, I'd like to improve the read path. I'm using actually the CL 
ONE since data is only written once (ie: timestamp is more or less meaningless 
in my case).

This is where I have some doubts: if data is written on DC1 and tentatively 
read from DC2 while the data is still not replicated or partially replicated 
(for whatever good reason since replication is async), what is the behavior of 
Get with CL ONE / NTS ? 
1/ Will I have an error because DC2 does not have any copy of the data ? 
2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ?
3/ In case of partial replication to DC2, will I see sometimes errors about 
servers not holding the data in DC2 ?
4/ Does Get CL ONE failed as soon as the fastest server to answer tell it does 
not have the data or does it waits until all servers tell they do not have the 
data ? 

Thanks a lot,
- Pierre

Nodetool removetoken taking days to run.

2011-09-14 Thread Ryan Hadley
Hi,

So, here's the backstory:

We were running Cassandra 0.7.4 and at one point in time had a node in the ring 
at 10.84.73.18. We removed this node from the ring successfully in 0.7.4. It 
stopped showing in the nodetool ring command. But occasionally we'd still get 
weird log entries about failing to write/read to IP 10.84.73.18.

We upgraded to Cassandra 0.8.4. Now, nodetool ring shows this old node:

10.84.73.18 datacenter1 rack1   Down   Leaving ?   6.71%   
32695837177645752437561450928649262701  

So I started a nodetool removetoken on 32695837177645752437561450928649262701 
last Friday. It's still going strong this morning, on day 5:

./bin/nodetool -h 10.84.73.47 -p 8080 removetoken status
RemovalStatus: Removing token (32695837177645752437561450928649262701). Waiting 
for replication confirmation from [/10.84.73.49,/10.84.73.48,/10.84.73.51].

Should I just be patient? Or is something really weird with this node?

Thanks-
ryan

Re: Error in upgrading cassandra to 0.8.5

2011-09-14 Thread Jonathan Ellis
Added to NEWS:

- After upgrading, run nodetool scrub against each node before running
  repair, moving nodes, or adding new ones. 

2011/9/14 Jonas Borgström jonas.borgst...@trioptima.com:
 On 09/13/2011 05:21 PM, Jonathan Ellis wrote:
 More or less.  NEWS.txt explains upgrade procedure in more detail.

 When moving from 0.7.x to 0.8.5 do I need to scrub all sstables post
 upgrade?

 NEWS.txt doesn't mention anything about that but your comment here seems
 to indicate so:

 https://issues.apache.org/jira/browse/CASSANDRA-2739?focusedCommentId=13071490page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13071490

 / Jonas




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Anand Somani
On Tue, Sep 13, 2011 at 3:57 PM, Peter Schuller peter.schul...@infidyne.com
 wrote:

  I think it is a serious problem since I can not repair.  I am
  using cassandra on production servers. is there some way to fix it
  without upgrade?  I heard of that 0.8.x is still not quite ready in
  production environment.

 It is a serious issue if you really need to repair one CF at the time.

Why is it serious to do repair one CF at a time, if I cannot do that it at a
CF level, then does it mean that I cannot use more than 50% disk space? Is
this specific to this problem or is that a general statement? I ask because
I am planning on doing this so I can limit the max disk overhead to be a CF
(+ some factor) worth. I am going to be testing this in the next couple of
weeks or so.

 However, looking at your original post it seems this is not
 necessarily your issue. Do you need to, or was your concern rather the
 overall time repair took?

 There are other things that are improved in 0.8 that affect 0.7. In
 particular, (1) in 0.7 compaction, including validating compactions
 that are part of repair, is non-concurrent so if your repair starts
 while there is a long-running compaction going it will have to wait,
 and (2) semi-related is that the merkle tree calculation that is part
 of repair/anti-entropy may happen out of synch if one of the nodes
 participating happen to be busy with compaction. This in turns causes
 additional data to be sent as part of repair.

 That might be why your immediately following repair took a long time,
 but it's difficult to tell.

 If you're having issues with repair and large data sets, I would
 generally say that upgrading to 0.8 is recommended. However, if you're
 on 0.7.4, beware of
 https://issues.apache.org/jira/browse/CASSANDRA-3166

 --
 / Peter Schuller (@scode on twitter)



selective replication

2011-09-14 Thread Todd Burruss
Has anyone done any work on what I'll call selective replication between DCs? 
 I want to use Cassandra to replicate data to another virtual DC (for 
analytical purposes), but only inserts, not deletes.

Picture having two data centers, DC1 for OLTP of short lived data (say 90 day 
window) and DC2 for OLAP (years of data).  DC2 would probably be a Brisk setup.

In this scenario, clients would get/insert/delete from DC1 (the OLTP system) 
and DC1 would replicate inserts only to DC2 (the OLAP system) for analytics.  I 
don't have any experience (yet) with multi-dc replication, but I don't think 
this is possible.

Thoughts?


Re: Exception in Hadoop Word Count sample

2011-09-14 Thread Jonathan Ellis
You're using a 0.8 wordcount against a 0.7 Cassandra?

On Wed, Sep 14, 2011 at 2:19 PM, Tharindu Mathew mcclou...@gmail.com wrote:
 I see $subject. Can anyone help me to rectify this?
 Stacktrace:
 Exception in thread main org.apache.thrift.TApplicationException: Required
 field 'replication_factor' was not found in serialized data! Struct:
 KsDef(name:wordcount,
 strategy_class:org.apache.cassandra.locator.SimpleStrategy,
 strategy_options:{replication_factor=1}, replication_factor:0,
 cf_defs:[CfDef(keyspace:wordcount, name:input_words, column_type:Standard,
 comparator_type:AsciiType, default_validation_class:AsciiType),
 CfDef(keyspace:wordcount, name:output_words, column_type:Standard,
 comparator_type:AsciiType, default_validation_class:AsciiType),
 CfDef(keyspace:wordcount, name:input_words_count, column_type:Standard,
 comparator_type:UTF8Type, default_validation_class:CounterColumnType)])
 at
 org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
 at
 org.apache.cassandra.thrift.Cassandra$Client.recv_system_add_keyspace(Cassandra.java:1531)
 at
 org.apache.cassandra.thrift.Cassandra$Client.system_add_keyspace(Cassandra.java:1514)
 at WordCountSetup.setupKeyspace(Unknown Source)
 at WordCountSetup.main(Unknown Source)
 --
 Regards,

 Tharindu
 blog: http://mackiemathew.com/




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: selective replication

2011-09-14 Thread Adrian Cockcroft
This has been proposed a few times, there are some good use cases for
it, and there is no current mechanism for it, but it's been discussed
as a possible enhancement.

Adrian

On Wed, Sep 14, 2011 at 11:06 AM, Todd Burruss bburr...@expedia.com wrote:
 Has anyone done any work on what I'll call selective replication between
 DCs?  I want to use Cassandra to replicate data to another virtual DC (for
 analytical purposes), but only inserts, not deletes.
 Picture having two data centers, DC1 for OLTP of short lived data (say 90
 day window) and DC2 for OLAP (years of data).  DC2 would probably be a Brisk
 setup.
 In this scenario, clients would get/insert/delete from DC1 (the OLTP system)
 and DC1 would replicate inserts only to DC2 (the OLAP system) for analytics.
  I don't have any experience (yet) with multi-dc replication, but I don't
 think this is possible.
 Thoughts?


Re: Nodetool removetoken taking days to run.

2011-09-14 Thread Brandon Williams
On Wed, Sep 14, 2011 at 8:54 AM, Ryan Hadley r...@sgizmo.com wrote:
 Hi,

 So, here's the backstory:

 We were running Cassandra 0.7.4 and at one point in time had a node in the 
 ring at 10.84.73.18. We removed this node from the ring successfully in 
 0.7.4. It stopped showing in the nodetool ring command. But occasionally we'd 
 still get weird log entries about failing to write/read to IP 10.84.73.18.

 We upgraded to Cassandra 0.8.4. Now, nodetool ring shows this old node:

 10.84.73.18     datacenter1 rack1       Down   Leaving ?               6.71%  
  32695837177645752437561450928649262701

 So I started a nodetool removetoken on 32695837177645752437561450928649262701 
 last Friday. It's still going strong this morning, on day 5:

 ./bin/nodetool -h 10.84.73.47 -p 8080 removetoken status
 RemovalStatus: Removing token (32695837177645752437561450928649262701). 
 Waiting for replication confirmation from 
 [/10.84.73.49,/10.84.73.48,/10.84.73.51].

 Should I just be patient? Or is something really weird with this node?

5 days seems excessive unless there is a very large amount of data per
node.  I would check nodetool netstats, and if the streams don't look
active issue a 'removetoken force' against 10.84.73.47 and accept that
you may possibly need to run repair to restore the replica count.

-Brandon


Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Peter Schuller
 It is a serious issue if you really need to repair one CF at the time.

 Why is it serious to do repair one CF at a time, if I cannot do that it at a
 CF level, then does it mean that I cannot use more than 50% disk space? Is
 this specific to this problem or is that a general statement? I ask because
 I am planning on doing this so I can limit the max disk overhead to be a CF
 (+ some factor) worth. I am going to be testing this in the next couple of
 weeks or so.

The bug in 0.7 is causes data to be streamed for all CF:s when doing a
repair on one. So, if you specifically need to repair a specific CF at
a time, such as because you're trying to repair a small CF quite often
while leaving a huge CF with less frequent repairs, you have an issue.

If you're just wanting to repair the entire keyspace, it doesn't affect you.

I'm not sure how this relates to the 50% disk space bit though.

-- 
/ Peter Schuller (@scode on twitter)


Re: Index search in provided list of rows (list of rowKeys).

2011-09-14 Thread aaron morton
The way specify more restrictions to the query is to specify them in the 
index_clause.  The index clause is applied to the set of all rows in the 
database, not a sub set, applying them to a sub set is implicitly supporting a 
sub query. Currently it's doing select then project, this would be select 
then select then project.

Right now I would use Solandra, or do the entire search in Sphinx and get the 
row keys for the result documents. In the future you may be able to use this 
https://issues.apache.org/jira/browse/CASSANDRA-2915

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2011, at 12:46 AM, Evgeniy Ryabitskiy wrote:

 Why it's radically?
 
 It will be same get_indexes_slices search but in specified set of rows. So 
 mostly it will be one more Search Expression over rowIDs not only column 
 values. Usually the more restrictions you could specify in search query, the 
 faster search it can be (not slower at least).
 
 About moving to another engine:
 
 Sphinx has it's advantages (quite fast) and disadvantages (painful 
 integration, lot's of limitations). Currently my company using it on 
 production, so moving to another search engine is a big step and it will be 
 considered.
 
 
 What I want to discuss is common task of searching in Cassandra. Maybe I 
 missing some already well known solution for it (silver bullet)?
 I see only 2 solutions:
 
 1) Using external search engine that will index all storage fields
 
 advantage:
  support full text search
 some engines have nice search features like sorting by relevance
 
 disadvantage: 
 for range scans it stores column values, it mean that huge part of cassandra 
 data will be also stored at Search Engine metadata
 usually engines have set of limitations
 
 2) Use Cassandra embedded Indexing search
 advantage: 
 doesn't need to index all columns that are used for filtering. 
 Filtering performed at storage, close to data.
 
 disadvantage: 
 not full text search support
 require to create and maintain secondary indexes.
 
 Both solutions are exclusive, you could choose only one and there is no way 
 to use combination of this 2 solutions (except intersection at client side 
 which is not a solution).
 
 So API that was discussed would open some possibility to use that 
 combination. 
 For me it looks like third solution. Could it really change the way we are 
 searching in Cassandra?
 
 
 Evgeny.
  
 
 



Re: Nodetool removetoken taking days to run.

2011-09-14 Thread Ryan Hadley

On Sep 14, 2011, at 2:08 PM, Brandon Williams wrote:

 On Wed, Sep 14, 2011 at 8:54 AM, Ryan Hadley r...@sgizmo.com wrote:
 Hi,
 
 So, here's the backstory:
 
 We were running Cassandra 0.7.4 and at one point in time had a node in the 
 ring at 10.84.73.18. We removed this node from the ring successfully in 
 0.7.4. It stopped showing in the nodetool ring command. But occasionally 
 we'd still get weird log entries about failing to write/read to IP 
 10.84.73.18.
 
 We upgraded to Cassandra 0.8.4. Now, nodetool ring shows this old node:
 
 10.84.73.18 datacenter1 rack1   Down   Leaving ?   6.71% 
   32695837177645752437561450928649262701
 
 So I started a nodetool removetoken on 
 32695837177645752437561450928649262701 last Friday. It's still going strong 
 this morning, on day 5:
 
 ./bin/nodetool -h 10.84.73.47 -p 8080 removetoken status
 RemovalStatus: Removing token (32695837177645752437561450928649262701). 
 Waiting for replication confirmation from 
 [/10.84.73.49,/10.84.73.48,/10.84.73.51].
 
 Should I just be patient? Or is something really weird with this node?
 
 5 days seems excessive unless there is a very large amount of data per
 node.  I would check nodetool netstats, and if the streams don't look
 active issue a 'removetoken force' against 10.84.73.47 and accept that
 you may possibly need to run repair to restore the replica count.
 
 -Brandon

Hi Brandon,

Thanks for the reply. Quick question though:

1. We write all data to this ring with a TTL of 30 days
2. This node hasn't been in the ring for at least 90 days, more like 120 days 
since it's been in the ring.

So, if I nodetool removetoken forced it, would I still have to be concerned 
about running a repair?

Also, after this node is removed, I'm going to rebalance with nodetool move. 
Would that remove the repair requirement too?

Thanks-
Ryan

Configuring the keyspace correctly - NTS

2011-09-14 Thread Anthony Ikeda
Okay, in a previous post, it was stated that I could use a
NetworkTopologyStrategy in a singel data centre by setting up my keyspace
with:

create keyspace KeyspaceDEV

with placement_strategy =
'org.apache.cassandra.locator.NetworkTopologyStrategy'

and strategy_options=[{datacenter1:3}];



Whereby my understanding is that:

[{datacenter1:3}]

represents:


   - 1 Datacentre
   - 3 nodes in that datacentre

My infrastructure team were recommended to instead of use datacenter1 to
use the second value in the IP address:
x.130.x.x

[{130:3}]

However, when trying to access the keyspace the following error was return:

May not be enough replicas present to handle consistency level
When I rebuilt the keyspace using the datacenter1 semantic, it worked
fine.

My guess is that there is some correlation between the 130 value and
either the rpc_address or listen_address. Am I correct in thinking this?

I don't have access to the se configurations so I'm just going out on a whim
here trying to figure out why using the 130 form the IP address would
cause the error.

Anthony


Re: Get CL ONE / NTS

2011-09-14 Thread aaron morton
Your current approach to Consistency opens the door to some inconsistent 
behavior. 

 1/ Will I have an error because DC2 does not have any copy of the data ? 
If you read from DC2 at CL ONE and the data is not replicated it will not be 
returned. 

 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ?
Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the 
DC's. If DC2 is behind DC1 then you will get the data form DC1. 

 3/ In case of partial replication to DC2, will I see sometimes errors about 
 servers not holding the data in DC2 ?
Depending on the API call and the client, working at CL ONE, you will see 
either errors or missing data. 

 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it 
 does not have the data or does it waits until all servers tell they do not 
 have the data ? 
 yes

Consider 

using LOCAL QUORUM for write and read, will make things a bit more consistent 
but not add inter DC overhead into the request latency. Still possible to not 
get data in DC2 if it is totally disconnected from the DC1 

write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always read, 
requests in DC2 will fail if DC1 is not reachable. 

Hope that helps. 

 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2011, at 1:33 AM, Pierre Chalamet wrote:

 Hello,
 
 I have 2 datacenters. Cassandra is configured as follow:
 - RackInferringSnitch
 - NetworkTopologyStrategy for CF
 - strategy_options: DC1:3 DC2:3
 
 Data are written using CL LOCAL_QUORUM so data written from one datacenter 
 will eventually be replicated to the other datacenter. Data is always written 
 exactly once. 
 
 On the other side, I'd like to improve the read path. I'm using actually the 
 CL ONE since data is only written once (ie: timestamp is more or less 
 meaningless in my case).
 
 This is where I have some doubts: if data is written on DC1 and tentatively 
 read from DC2 while the data is still not replicated or partially replicated 
 (for whatever good reason since replication is async), what is the behavior 
 of Get with CL ONE / NTS ? 
 1/ Will I have an error because DC2 does not have any copy of the data ? 
 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ?
 3/ In case of partial replication to DC2, will I see sometimes errors about 
 servers not holding the data in DC2 ?
 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it 
 does not have the data or does it waits until all servers tell they do not 
 have the data ? 
 
 Thanks a lot,
 - Pierre



RE: Get CL ONE / NTS

2011-09-14 Thread Pierre Chalamet
After reading Cassandra source code, I will try to answer myself. It's kind
of good exercise :)

1/ Will I have an error because DC2 does not have any copy of the data ? 
I've not been able to find how endpoints are determined for the read
request, but I guess endpoints are just coming from the current datacenter.

2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ?
Probably no since 1/

3/ In case of partial replication to DC2, will I see sometimes errors about
servers not holding the data in DC2 ?
It seems to depend on RR. If read_repair_chance is set to 1 (default value),
RR happens all the time : the answer is no.
In case read_repair_chance is below 1, it seems CL.ONE will fail if the
single read request fails.

4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
does not have the data or does it waits until all servers tell they do not
have the data ?
It seems to depend on RR as in 3/

Are the answers right ?

- Pierre


-Original Message-
From: Pierre Chalamet [mailto:pie...@chalamet.net] 
Sent: Wednesday, September 14, 2011 3:33 PM
To: user@cassandra.apache.org
Subject: Get CL ONE / NTS

Hello,

I have 2 datacenters. Cassandra is configured as follow:
- RackInferringSnitch
- NetworkTopologyStrategy for CF
- strategy_options: DC1:3 DC2:3

Data are written using CL LOCAL_QUORUM so data written from one datacenter
will eventually be replicated to the other datacenter. Data is always
written exactly once. 

On the other side, I'd like to improve the read path. I'm using actually the
CL ONE since data is only written once (ie: timestamp is more or less
meaningless in my case).

This is where I have some doubts: if data is written on DC1 and tentatively
read from DC2 while the data is still not replicated or partially replicated
(for whatever good reason since replication is async), what is the behavior
of Get with CL ONE / NTS ? 
1/ Will I have an error because DC2 does not have any copy of the data ? 
2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2 ?
3/ In case of partial replication to DC2, will I see sometimes errors about
servers not holding the data in DC2 ?
4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
does not have the data or does it waits until all servers tell they do not
have the data ? 

Thanks a lot,
- Pierre



RE: Get CL ONE / NTS

2011-09-14 Thread Pierre Chalamet
Thanks Aaron, didn't seen your answer before mine.

I do agree for 2/ I might have read error. Good suggestion to use
EACH_QUORUM  - it could be a good trade off to read at this level if ONE
fails.

Maybe using LOCAL_QUORUM might be a good answer and will avoid headache
after all. Are you advising CL.ONE does not worth the game when considering
read performance ?

By the way, I do not have consistency problem at all - data is only written
once (and if more it is always the same data) and read several times across
DC. I only have replication problems. That's why I'm more inclined to use
CL.ONE for read if possible.

Thanks,
- Pierre


-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Wednesday, September 14, 2011 11:48 PM
To: user@cassandra.apache.org; pie...@chalamet.net
Subject: Re: Get CL ONE / NTS

Your current approach to Consistency opens the door to some inconsistent
behavior. 

 1/ Will I have an error because DC2 does not have any copy of the data ? 
If you read from DC2 at CL ONE and the data is not replicated it will not be
returned. 

 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2
?
Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the
DC's. If DC2 is behind DC1 then you will get the data form DC1. 

 3/ In case of partial replication to DC2, will I see sometimes errors
about servers not holding the data in DC2 ?
Depending on the API call and the client, working at CL ONE, you will see
either errors or missing data. 

 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
does not have the data or does it waits until all servers tell they do not
have the data ? 
 yes

Consider 

using LOCAL QUORUM for write and read, will make things a bit more
consistent but not add inter DC overhead into the request latency. Still
possible to not get data in DC2 if it is totally disconnected from the DC1 

write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always read,
requests in DC2 will fail if DC1 is not reachable. 

Hope that helps. 

 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2011, at 1:33 AM, Pierre Chalamet wrote:

 Hello,
 
 I have 2 datacenters. Cassandra is configured as follow:
 - RackInferringSnitch
 - NetworkTopologyStrategy for CF
 - strategy_options: DC1:3 DC2:3
 
 Data are written using CL LOCAL_QUORUM so data written from one datacenter
will eventually be replicated to the other datacenter. Data is always
written exactly once. 
 
 On the other side, I'd like to improve the read path. I'm using actually
the CL ONE since data is only written once (ie: timestamp is more or less
meaningless in my case).
 
 This is where I have some doubts: if data is written on DC1 and
tentatively read from DC2 while the data is still not replicated or
partially replicated (for whatever good reason since replication is async),
what is the behavior of Get with CL ONE / NTS ? 
 1/ Will I have an error because DC2 does not have any copy of the data ? 
 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2
?
 3/ In case of partial replication to DC2, will I see sometimes errors
about servers not holding the data in DC2 ?
 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
does not have the data or does it waits until all servers tell they do not
have the data ? 
 
 Thanks a lot,
 - Pierre




Re: Configuring the keyspace correctly - NTS

2011-09-14 Thread aaron morton
The strategy_options for NTS accept the data centre name and the rf, 
[{dc_name : dc_rf}]

Where the DC name comes from the snitch, so…

SimpleSnitch (gotta love this guy, in there day in day out putting in the hard 
yards) puts all the nodes in datacenter1 which is why thats in the defaults. 

RackInferringSnitch (or the Hollywood Snitch as I call it) puts the them in a 
DC named after the second octet of the IP. So 130 in your  case. 

PropertyFileSnitch does whats in the cassandra-topology.properties file. 
EC2Snitch uses the EC2 Region. Brisk snitch does it's thing.

If you want to use 130 you should be using the RackInferringSnitch, if you want 
to use human names use either the SimpleSnitch or the PropertyFileSnitch. 
Property File Snitch has a default catch all DC, see the 
cassandra-topology.properties file. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2011, at 9:43 AM, Anthony Ikeda wrote:

 Okay, in a previous post, it was stated that I could use a 
 NetworkTopologyStrategy in a singel data centre by setting up my keyspace 
 with:
 
 create keyspace KeyspaceDEV
 
 with placement_strategy = 
 'org.apache.cassandra.locator.NetworkTopologyStrategy'
 
 and strategy_options=[{datacenter1:3}];
 
  
 Whereby my understanding is that:
 
 [{datacenter1:3}]
 
 represents:
 
 
 1 Datacentre
 3 nodes in that datacentre
 My infrastructure team were recommended to instead of use datacenter1 to 
 use the second value in the IP address:
 x.130.x.x
 
 [{130:3}]
 
 However, when trying to access the keyspace the following error was return:
 May not be enough replicas present to handle consistency level
 
 When I rebuilt the keyspace using the datacenter1 semantic, it worked fine.
 
 My guess is that there is some correlation between the 130 value and either 
 the rpc_address or listen_address. Am I correct in thinking this?
 
 I don't have access to the se configurations so I'm just going out on a whim 
 here trying to figure out why using the 130 form the IP address would cause 
 the error.
 
 Anthony
 
 
 



Re: Get CL ONE / NTS

2011-09-14 Thread aaron morton
 Are you advising CL.ONE does not worth the game when considering
 read performance ?
Consistency is not performance, it's a whole new thing to tune in your 
application. If you have performance issues deal with those as performance 
issues, better code / data model / hard ware. 

 By the way, I do not have consistency problem at all - data is only written
 once
Nobody expects a consistency problem. It's chief weapon is surprise. Surprise 
and fear. It's two weapons are fear and surprise. And so forth 
http://www.youtube.com/watch?v=Ixgc_FGam3s

If you write at LOCAL QUORUM in DC 1 and DC 2 is down at the start of the 
request, a hint will be stored in DC 1. Some time later when DC 2 comes back 
that hint will be sent to DC 2. If in the mean time you read from DC 2 at CL 
ONE you will not get that change. With Read Repair enabled it will repair in 
the background and you may get a different response on the next read (Am 
guessing here, cannot remember exactly how RR works cross DC) 

 Cheers



-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 15/09/2011, at 10:07 AM, Pierre Chalamet wrote:

 Thanks Aaron, didn't seen your answer before mine.
 
 I do agree for 2/ I might have read error. Good suggestion to use
 EACH_QUORUM  - it could be a good trade off to read at this level if ONE
 fails.
 
 Maybe using LOCAL_QUORUM might be a good answer and will avoid headache
 after all. Are you advising CL.ONE does not worth the game when considering
 read performance ?
 
 By the way, I do not have consistency problem at all - data is only written
 once (and if more it is always the same data) and read several times across
 DC. I only have replication problems. That's why I'm more inclined to use
 CL.ONE for read if possible.
 
 Thanks,
 - Pierre
 
 
 -Original Message-
 From: aaron morton [mailto:aa...@thelastpickle.com] 
 Sent: Wednesday, September 14, 2011 11:48 PM
 To: user@cassandra.apache.org; pie...@chalamet.net
 Subject: Re: Get CL ONE / NTS
 
 Your current approach to Consistency opens the door to some inconsistent
 behavior. 
 
 1/ Will I have an error because DC2 does not have any copy of the data ? 
 If you read from DC2 at CL ONE and the data is not replicated it will not be
 returned. 
 
 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2
 ?
 Not at CL ONE. If you used CL EACH QUORUM then the read will go to all the
 DC's. If DC2 is behind DC1 then you will get the data form DC1. 
 
 3/ In case of partial replication to DC2, will I see sometimes errors
 about servers not holding the data in DC2 ?
 Depending on the API call and the client, working at CL ONE, you will see
 either errors or missing data. 
 
 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
 does not have the data or does it waits until all servers tell they do not
 have the data ? 
 yes
 
 Consider 
 
 using LOCAL QUORUM for write and read, will make things a bit more
 consistent but not add inter DC overhead into the request latency. Still
 possible to not get data in DC2 if it is totally disconnected from the DC1 
 
 write at LOCAL QUORUM and read at EACH QUORUM . Will so you can always read,
 requests in DC2 will fail if DC1 is not reachable. 
 
 Hope that helps. 
 
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 15/09/2011, at 1:33 AM, Pierre Chalamet wrote:
 
 Hello,
 
 I have 2 datacenters. Cassandra is configured as follow:
 - RackInferringSnitch
 - NetworkTopologyStrategy for CF
 - strategy_options: DC1:3 DC2:3
 
 Data are written using CL LOCAL_QUORUM so data written from one datacenter
 will eventually be replicated to the other datacenter. Data is always
 written exactly once. 
 
 On the other side, I'd like to improve the read path. I'm using actually
 the CL ONE since data is only written once (ie: timestamp is more or less
 meaningless in my case).
 
 This is where I have some doubts: if data is written on DC1 and
 tentatively read from DC2 while the data is still not replicated or
 partially replicated (for whatever good reason since replication is async),
 what is the behavior of Get with CL ONE / NTS ? 
 1/ Will I have an error because DC2 does not have any copy of the data ? 
 2/ Will Cassandra try to get the data from DC1 if nothing is found in DC2
 ?
 3/ In case of partial replication to DC2, will I see sometimes errors
 about servers not holding the data in DC2 ?
 4/ Does Get CL ONE failed as soon as the fastest server to answer tell it
 does not have the data or does it waits until all servers tell they do not
 have the data ? 
 
 Thanks a lot,
 - Pierre
 
 



Re: Configuring the keyspace correctly - NTS

2011-09-14 Thread Anthony Ikeda
Great that makes perfect sense - I apologise for not getting this right it
seems I'm doing someone elses job here.

Anthony


On Wed, Sep 14, 2011 at 3:15 PM, aaron morton aa...@thelastpickle.comwrote:

 The strategy_options for NTS accept the data centre name and the rf,
 [{dc_name : dc_rf}]

 Where the DC name comes from the snitch, so…

 SimpleSnitch (gotta love this guy, in there day in day out putting in the
 hard yards) puts all the nodes in datacenter1 which is why thats in the
 defaults.

 RackInferringSnitch (or the Hollywood Snitch as I call it) puts the them
 in a DC named after the second octet of the IP. So 130 in your  case.

 PropertyFileSnitch does whats in the cassandra-topology.properties file.
 EC2Snitch uses the EC2 Region. Brisk snitch does it's thing.

 If you want to use 130 you should be using the RackInferringSnitch, if you
 want to use human names use either the SimpleSnitch or the
 PropertyFileSnitch. Property File Snitch has a default catch all DC, see the
 cassandra-topology.properties file.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/09/2011, at 9:43 AM, Anthony Ikeda wrote:

 Okay, in a previous post, it was stated that I could use a
 NetworkTopologyStrategy in a singel data centre by setting up my keyspace
 with:

 create keyspace KeyspaceDEV

 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'

 and strategy_options=[{datacenter1:3}];


 Whereby my understanding is that:

 [{datacenter1:3}]

 represents:


- 1 Datacentre
- 3 nodes in that datacentre

 My infrastructure team were recommended to instead of use datacenter1 to
 use the second value in the IP address:
 x.130.x.x

 [{130:3}]

 However, when trying to access the keyspace the following error was return:

 May not be enough replicas present to handle consistency level
 When I rebuilt the keyspace using the datacenter1 semantic, it worked
 fine.

 My guess is that there is some correlation between the 130 value and
 either the rpc_address or listen_address. Am I correct in thinking this?

 I don't have access to the se configurations so I'm just going out on a
 whim here trying to figure out why using the 130 form the IP address would
 cause the error.

 Anthony







Re: Nodetool removetoken taking days to run.

2011-09-14 Thread Brandon Williams
On Wed, Sep 14, 2011 at 4:25 PM, Ryan Hadley r...@sgizmo.com wrote:
 Hi Brandon,

 Thanks for the reply. Quick question though:

 1. We write all data to this ring with a TTL of 30 days
 2. This node hasn't been in the ring for at least 90 days, more like 120 days 
 since it's been in the ring.

 So, if I nodetool removetoken forced it, would I still have to be concerned 
 about running a repair?

There have probably been some writes that thought that node was part
of the replica set, so you may still be missing a replica in that
regard.  If you're only holding the data for 30 days though, it might
not be worth the trouble of repairing and instead bet that not all of
the live replicas will die in the next month.

 Also, after this node is removed, I'm going to rebalance with nodetool move. 
 Would that remove the repair requirement too?

If you intend to replace the node, it's better to bootstrap the new
node at the dead node's token minus one, and then do the removetoken
force.  This would actually obviate the need to repair (except for one
key, you can move the node to the old token once it has been removed)
assuming that your consistency level was greater than ONE for writes,
or your clients always replayed any failures. This holds true for
moving to the old token as well.

-Brandon


Re: Configuring the keyspace correctly - NTS

2011-09-14 Thread Anthony Ikeda
Aaron, when using the RackInferringSnitch, is the octet correlated from the
rpc_address or listen_address?

I just noticed that when I tried to configure this locally on my laptop I
had to 0 (127.0.0.1) instead of 160 (192.160.202.235)

Anthony

On Wed, Sep 14, 2011 at 3:15 PM, aaron morton aa...@thelastpickle.comwrote:

 The strategy_options for NTS accept the data centre name and the rf,
 [{dc_name : dc_rf}]

 Where the DC name comes from the snitch, so…

 SimpleSnitch (gotta love this guy, in there day in day out putting in the
 hard yards) puts all the nodes in datacenter1 which is why thats in the
 defaults.

 RackInferringSnitch (or the Hollywood Snitch as I call it) puts the them
 in a DC named after the second octet of the IP. So 130 in your  case.

 PropertyFileSnitch does whats in the cassandra-topology.properties file.
 EC2Snitch uses the EC2 Region. Brisk snitch does it's thing.

 If you want to use 130 you should be using the RackInferringSnitch, if you
 want to use human names use either the SimpleSnitch or the
 PropertyFileSnitch. Property File Snitch has a default catch all DC, see the
 cassandra-topology.properties file.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/09/2011, at 9:43 AM, Anthony Ikeda wrote:

 Okay, in a previous post, it was stated that I could use a
 NetworkTopologyStrategy in a singel data centre by setting up my keyspace
 with:

 create keyspace KeyspaceDEV

 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy'

 and strategy_options=[{datacenter1:3}];


 Whereby my understanding is that:

 [{datacenter1:3}]

 represents:


- 1 Datacentre
- 3 nodes in that datacentre

 My infrastructure team were recommended to instead of use datacenter1 to
 use the second value in the IP address:
 x.130.x.x

 [{130:3}]

 However, when trying to access the keyspace the following error was return:

 May not be enough replicas present to handle consistency level
 When I rebuilt the keyspace using the datacenter1 semantic, it worked
 fine.

 My guess is that there is some correlation between the 130 value and
 either the rpc_address or listen_address. Am I correct in thinking this?

 I don't have access to the se configurations so I'm just going out on a
 whim here trying to figure out why using the 130 form the IP address would
 cause the error.

 Anthony