Re: Cassandra Counters and TTL

2011-11-07 Thread Vlad Paiu

Hello,

Thanks for your answer. See my reply in-line.

On 11/04/2011 01:46 PM, Amit Chavan wrote:


Answers inline.

On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org 
mailto:vladp...@opensips.org wrote:


Hello,

I'm a new user of Cassandra and I think it's great.
Still, while developing my APP using Cassandra, I got stuck with
some things and I'm not really sure that Cassandra can handle them
at the moment.

So, first of all, does Cassandra allow for Counters and regular
Keys to be located in the same ColumnFamily ?

What do you mean when you say regular Keys? If you are hinting at 
columns apart from counters, then the answer is *no*: only counters 
can exist in a CounterColumnFamily and other column families cannot 
hold counters.

Yes, this is what I was asking. Thanks for the answer.



Secondly, is there any way to dynamically set the TTL for a key ?
In the sense that I have a key, I initially set it with no TTL,
but after a while I decide that it should expire in 100 seconds.
Can Cassandra do this ?

TTL is not for one key, it is for one column.


When I was saying 'Key' I actually meant to say column. Seems I'm not 
yet very acquainted with Cassandra terminology. So in the end, can you 
dynamically alter the TTL of a Column ?




3. Can counters have a TTL ?

No. Currently, counters do not (or if I am correct - cannot) have TTL.



Ok. Any info if this will be implemented anytime soon ?


4. Is there any way to atomically reset a counter ? I read on the
website that the only way to do it is read the variable value, and
then set it to -value, which seems rather bogus to me.

I think that is the only way to reset a counter. I would like to know 
if there is another way.


Ok then, waiting for someone to confirm. It's bad that you cannot 
atomically reset a counter value, as a two-way resetting might lead to 
undetermined behaviour.


Also, can I set the counter to a specific value, without keeping state 
on the client ? For example, if the client does not know the current 
counter value is 3. Can it set the counter value to 10, without first 
getting the counter value, and then incrementing by 7 ?


Background: I am using Cassandra since the past two months. Hope the 
community corrects me if I am wrong.



Regards,

-- 
Vlad Paiu

OpenSIPS Developer




--
Regards
Amit S. Chavan





Regards,

Vlad Paiu
OpenSIPS Developer




Re: Second Cassandra users survey

2011-11-07 Thread Radim Kolar

Take a look at this:

http://www.oracle.com/technetwork/database/nosqldb/overview/index.html

 I understand the limitation/advantages of the architecture.
Read this http://en.wikipedia.org/wiki/CAP_theorem



Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency

2011-11-07 Thread Alain RODRIGUEZ
Hi again.

Did you receive my mail ? It's the first time I use this mailing list.

If you received it, did anybody face this problem ?

It looks like this subject is going to be discussed at Cassandra NYC
meeting.

http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc

Any idea of what they are going to say about this subject or have I to wait
? Will the video record of this conference be public ?

thanks,

Alain

2011/11/4 Alain RODRIGUEZ arodr...@gmail.com

 Hi all,

 I started this thread in the phpCassa google group, but I thinks its place
 is here.

 There is my first post :

 I was wondering about a specific point of Cassandra Modeling.

 If I need to know the number of connexion to my website using each
 browser, every hour, I can do:

 Row key: $browser, column key: date('YmdH', $timestamp), value: counter.

 I can increment this counter for any visit, this should work. The point is
 that I want to be able to render the results of a lot of statistics used as
 filters.

 I mean, I will have information such as browser, browser version, screen
 resolution, OS, OS version, localization... And I want to allow users to
 get data (number of views) filtering it as much as they want.

 For example, if I want to know how many people visited my website with
 safari, windos, and from New York, every hour, I can store:

 Row key : $browser:$os:$localization, column key : date('YmdH',
 $timestamp), value : counter.

 This can't be the best solution because according to the combinational
 mathematics I will have to store n! counters to be able to store data with
 all filters. If I got 10 filters I will increment 3 628 800 counters.

 That's not the good solution, for sure. How am I supposed to model this to
 be able to read data with any filter I want ?

 Thanks,

 Alain



 And there is the first answer given (thanks to Tyler Hobbs) :

 Technically, the number of potential different counters would be the
 cardinality of each field multiplied together.  (Since one of the fields
 holds a time, this number would continue to grow.) However, in practice
 you'll have far fewer than this number of counters, because not every
 possible combination of these will happen.

 That's not the good solution, for sure. How am I supposed to model

  this to be able to read data with any filter I want ?

 It's a reasonable solution if you want to be able to drill down and filter
 by any attribute.  If you want to be able to filter based on all of these
 attributes, you have to store that information about every request in one
 way or another.



 I know it's a non-trivial problem, but I'm sure that some people already
 faced this problem before I do.

 I'll allow user to filter however they want, chosing dimensions with
 checkboxes. They will be able to combine dimensions and ask for any
 combination.

 So, with this solution, I will have to store every event n times, with n =
 number of possible combinations.

 I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave Gardner).
 This company seems to something equivalent of the idea exposed in my first
 post

 Any experience to share with this kind of problem ?

 thank you,

 Alain




Multiple Keyword Lookup Indexes

2011-11-07 Thread Felix Sprick
Hallo,

We are implementing a Cassandra-backed user database. The challange in
this is that there are 4 different sort of user IDs that all need to
be indexed in order to access user data via them quickly. For example
the user has a unique UUID, but also a LoginName and an email address,
which can all be used for authentication.

How do I model this in Cassandra?

My approach would be to have one main table which is indexed by the
most frequently used lookup value as row-key, lets say this is the
UUID. This table would contain all customer data. Then I would create
a index table for each of the other login alternatives, where I just
reference to the UUID. So each alternative login which is not using
the UUID would require two Cassandra queries. Are there any better
approaches to model this?

Also, I read somewhere that Cassandra is not optimized for these
reference tables which are very short with two columns only. What is
the reason for that?

thanks,
Felix


Re: Modeling big data to allow filtering with a lot of distinct combinations of dimesions, in real time and with no latency

2011-11-07 Thread Alexander Konotop
В Mon, 7 Nov 2011 11:18:12 +0100
Alain RODRIGUEZ arodr...@gmail.com пишет:

 Hi again.
 
 Did you receive my mail ? It's the first time I use this mailing list.
 
 If you received it, did anybody face this problem ?
 
 It looks like this subject is going to be discussed at Cassandra NYC
 meeting.
 
 http://www.datastax.com/2011/11/joe-stein-of-medialets-to-speak-at-cassandra-nyc
 
 Any idea of what they are going to say about this subject or have I
 to wait ? Will the video record of this conference be public ?
 
 thanks,
 
 Alain
 
 2011/11/4 Alain RODRIGUEZ arodr...@gmail.com
 
  Hi all,
 
  I started this thread in the phpCassa google group, but I thinks
  its place is here.
 
  There is my first post :
 
  I was wondering about a specific point of Cassandra Modeling.
 
  If I need to know the number of connexion to my website using each
  browser, every hour, I can do:
 
  Row key: $browser, column key: date('YmdH', $timestamp), value:
  counter.
 
  I can increment this counter for any visit, this should work. The
  point is that I want to be able to render the results of a lot of
  statistics used as filters.
 
  I mean, I will have information such as browser, browser version,
  screen resolution, OS, OS version, localization... And I want to
  allow users to get data (number of views) filtering it as much as
  they want.
 
  For example, if I want to know how many people visited my website
  with safari, windos, and from New York, every hour, I can store:
 
  Row key : $browser:$os:$localization, column key : date('YmdH',
  $timestamp), value : counter.
 
  This can't be the best solution because according to the
  combinational mathematics I will have to store n! counters to be
  able to store data with all filters. If I got 10 filters I will
  increment 3 628 800 counters.
 
  That's not the good solution, for sure. How am I supposed to model
  this to be able to read data with any filter I want ?
 
  Thanks,
 
  Alain
 
 
 
  And there is the first answer given (thanks to Tyler Hobbs) :
 
  Technically, the number of potential different counters would be
  the cardinality of each field multiplied together.  (Since one of
  the fields holds a time, this number would continue to grow.)
  However, in practice you'll have far fewer than this number of
  counters, because not every possible combination of these will
  happen.
 
  That's not the good solution, for sure. How am I supposed to model
 
   this to be able to read data with any filter I want ?
 
  It's a reasonable solution if you want to be able to drill down and
  filter by any attribute.  If you want to be able to filter based on
  all of these attributes, you have to store that information about
  every request in one way or another.
 
 
 
  I know it's a non-trivial problem, but I'm sure that some people
  already faced this problem before I do.
 
  I'll allow user to filter however they want, chosing dimensions with
  checkboxes. They will be able to combine dimensions and ask for any
  combination.
 
  So, with this solution, I will have to store every event n times,
  with n = number of possible combinations.
 
  I saw this yesterday : http://t.co/EXL6yAO8 (thanks to Dave
  Gardner). This company seems to something equivalent of the idea
  exposed in my first post
 
  Any experience to share with this kind of problem ?
 
  thank you,
 
  Alain
 
 

Looks like Your mail has been recieved but for now nowbody has an
answer. As for me - I'm a cassandra newbie and definitely can't help :-(

Best regards
Alexander


Re: Cassandra Counters and TTL

2011-11-07 Thread Sylvain Lebresne
On Mon, Nov 7, 2011 at 10:12 AM, Vlad Paiu vladp...@opensips.org wrote:
 Hello,

 Thanks for your answer. See my reply in-line.

 On 11/04/2011 01:46 PM, Amit Chavan wrote:

 Answers inline.

 On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org wrote:

 Hello,

 I'm a new user of Cassandra and I think it's great.
 Still, while developing my APP using Cassandra, I got stuck with some
 things and I'm not really sure that Cassandra can handle them at the moment.

 So, first of all, does Cassandra allow for Counters and regular Keys to be
 located in the same ColumnFamily ?

 What do you mean when you say regular Keys? If you are hinting at columns
 apart from counters, then the answer is *no*: only counters can exist in a
 CounterColumnFamily and other column families cannot hold counters.


 Yes, this is what I was asking. Thanks for the answer.

 Secondly, is there any way to dynamically set the TTL for a key ? In the
 sense that I have a key, I initially set it with no TTL, but after a while I
 decide that it should expire in 100 seconds. Can Cassandra do this ?

 TTL is not for one key, it is for one column.

 When I was saying 'Key' I actually meant to say column. Seems I'm not yet
 very acquainted with Cassandra terminology. So in the end, can you
 dynamically alter the TTL of a Column ?

You'll have to update the column with the new TTL. Which does involve
that you know the column value and so may require reading the column
first.




 3. Can counters have a TTL ?

 No. Currently, counters do not (or if I am correct - cannot) have TTL.

 Ok. Any info if this will be implemented anytime soon ?

The current status is not anytime soon because we don't have a good solution
for it so far. See https://issues.apache.org/jira/browse/CASSANDRA-2103 for
more details.


 4. Is there any way to atomically reset a counter ? I read on the website
 that the only way to do it is read the variable value, and then set it to
 -value, which seems rather bogus to me.

 I think that is the only way to reset a counter. I would like to know if
 there is another way.

 Ok then, waiting for someone to confirm. It's bad that you cannot atomically
 reset a counter value, as a two-way resetting might lead to undetermined
 behaviour.

There is no other way. Which does mean that you need some external way
to make sure that not two client will attempt resetting the same counter at
the same time. Or model so that you don't need counter resets (I'm not
saying this is always possible, but there is probably a number of cases
where resetting a counter could be replaced by switching to a brand new
counter).

 Also, can I set the counter to a specific value, without keeping state on
 the client ? For example, if the client does not know the current counter
 value is 3. Can it set the counter value to 10, without first getting the
 counter value, and then incrementing by 7 ?

No.

--
Sylvain


 Background: I am using Cassandra since the past two months. Hope the
 community corrects me if I am wrong.


 Regards,

 --
 Vlad Paiu
 OpenSIPS Developer




 --
 Regards
 Amit S. Chavan




 Regards,

 Vlad Paiu
 OpenSIPS Developer



Re: Debian package jna bug workaroung

2011-11-07 Thread Peter Tillotson
Thanks - this is working correctly 
I have checked the classpath in vissual vm and it contains jna, and 
cassandra-cli describe reports SerializingCacheProvider,

If you add jna a second time with the Sun jvm you seem to get the exception 
which had me wondering what cache provider was active.

p

--
java.class.path=/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share
/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.3.jar:/usr/share/cassandra/apache-cassandra-1.0.1.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.1.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
java.class.version=50.0



From: paul cannon p...@datastax.com
To: user@cassandra.apache.org
Sent: Friday, 4 November 2011, 19:05
Subject: Re: Debian package jna bug workaroung


The cassandra-cli tool will show you, if you're using at least cassandra 1.0.1, 
in a describe command.  If not, you can make a thrift describe_keyspace() 
call some other way, and check the value of the appropriate CfDef's 
row_cache_provider string.  If it's SerializingCacheProvider, it's off-heap.  
Note that I think you need to create the columnfamily while JNA is present, not 
just have JNA present when cassandra starts.  Might be wrong on that.

p



On Thu, Nov 3, 2011 at 4:10 PM, Peter Tillotson slatem...@yahoo.co.uk wrote:

Cassandra 1.0.1 and only seemed to happen with
* JAVA_HOME=/usr/lib/jvm/java-6-sun
and jna.jar copied into /usr/share/cassandra(/lib)

I then saw the detail in the init script and how it was being linked

Is there a way I can verify which provider is being used? I want to make
sure Off heap is being used in the default config.


On 03/11/11 19:06, paul cannon wrote:
 I can't reproduce this. What version of the cassandra deb are you using,
 exactly, and why are you symlinking or copying jna.jar into
 /usr/share/cassandra?  The initscript should be adding
 /usr/sahre/java/jna.jar to the classpath, and that should be all you need.

 The failure you see with o.a.c.cache.FreeableMemory is not because the
 jre can't find the class, it's just that it can't initialize the class
 (because it needs JNA, and it can't find JNA).

 p

 On Wed, Nov 2, 2011 at 4:42 AM, Peter Tillotson slatem...@yahoo.co.uk

 mailto:slatem...@yahoo.co.uk wrote:

     see below
      * JAVA_HOME=/usr/lib/jvm/java-6-openjdk
     works
     --
     Reading the documentation over at Datastax
     “The Debian and RPM packages of Cassandra install JNA automatically”
     
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

     And indeed the Debian package depends on jna, and the
     /etc/init.d/cassandra looks as though in adds
     /usr/share/java/jna.jar to the classpath, and here is the but.

     If I copy or symlink jna.jar into:
      * /usr/share/cassandra
      * /usr/share/cassandra/lib
     Whenever a column family initialises I get:
     java.lang.NoClassDefFoundError: Could not initialize class
     org.apache.cassandra.cache.FreeableMemory

     This suggests to me that:
      1) By default, for me at least, in Debian jna.jar is not on the
     classpath
      2) There is an additional classpath issue
          jar -tf apache-cassandra.jar | grep FreeableMemory succeeds

     I'm running on:
      * Ubuntu 10.04 x64
      * JAVA_HOME=/usr/lib/jvm/java-6-sun

     Full stack traces:
     java.lang.NoClassDefFoundError: Could not initialize class
     com.sun.jna.Native
             at com.sun.jna.Pointer.clinit(Pointer.java:42)
             at
     
 org.apache.cassandra.cache.SerializingCache.serialize(SerializingCache.java:92)
             at
     
 org.apache.cassandra.cache.SerializingCache.put(SerializingCache.java:154)
             at
     
 org.apache.cassandra.cache.InstrumentingCache.put(InstrumentingCache.java:63)
             at
     
 org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1150)
             at
     
 

Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Riyad Kalla
Anthony and Jaydeep, thank you for weighing in. I am glad to see that they
are two different values (makes more sense mentally to me).

Anthony, what you said caught my attention to ensure all nodes have a copy
you may not be able to survive the loss of a single node. -- why would
this be the case?

I assumed (incorrectly?) that a node would simply disappear off the map
until I could bring it back up again, at which point all the missing values
that it didn't get while it was done, it would slowly retrieve from other
members of the ring. Is this the wrong understanding?

If forcing a replication factor equal to the number of nodes in my ring
will cause a hard-stop when one ring goes down (as I understood your
comment to mean), it seems to me I should go with a much lower replication
factor... something along the lines of 3 or roughly ceiling(N / 2) and just
deal with the latency when one of the nodes has to route a request to
another server when it doesn't contain the value.

Is there a better way to accomplish what I want, or is keeping the
replication factor that aggressively high generally a bad thing and using
Cassandra in the wrong way?

Thank you for the help.

-Riyad

On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep 
chovatia_jayd...@yahoo.co.in wrote:

 Hi Riyad,

 You can set replication = 5 (number of replicas) and write with CL = ONE.
 There is no hard requirement from Cassandra to write with CL=ALL to
 replicate the data unless you need it. Considering your example, If you
 write with CL=ONE then also it will replicate your data to all 5 replicas
 eventually.

 Thank you,
 Jaydeep
 --
 *From:* Riyad Kalla rka...@gmail.com
 *To:* user@cassandra.apache.org user@cassandra.apache.org
 *Sent:* Sunday, 6 November 2011 9:50 PM
 *Subject:* Will writes with  ALL consistency eventually propagate?

 I am new to Cassandra and was curious about the following scenario...

 Lets say i have a ring of 5 servers. Ultimately I would like each server
 to be a full replication of the next (master-master-*).

 In a presentation i watched today on Cassandra, the presenter mentioned
 that the ring members will shard data and route your requests to the right
 host when they come in to a server that doesnt physically contain the value
 you wanted. To the client requesting this is seamless excwpt for the added
 latency.

 If i wanted to avoid the routing and latency and ensure every server had
 the full data set, do i have to write with a consistency level of ALL and
 wait for all of those writes to return in my code, or can i write with a CL
 of 1 or 2 and let the ring propagate the rest of the copies to the other
 servers in the background after my code has continued executing?

 I dont mind eventual consistency in my case, but i do (eventually) want
 all nodes to have all values and cannot tell if this is default behavior,
 or if sharding is the default and i can only force duplicates onto the
 other servers explicitly with a CL of ALL.

 Best,
 Riyad




Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Anthony Ikeda
Riyad, I'm also just getting to know the different settings and values myself :)

I believe, and it also depends on your config, CL.ONE Should ignore the loss of 
a node if your RF is 5, once you increase the CL then if you lose a node the CL 
is not met and you will get exceptions returned. 

Sent from my iPhone

On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote:

 Anthony and Jaydeep, thank you for weighing in. I am glad to see that they 
 are two different values (makes more sense mentally to me).
 
 Anthony, what you said caught my attention to ensure all nodes have a copy 
 you may not be able to survive the loss of a single node. -- why would this 
 be the case?
 
 I assumed (incorrectly?) that a node would simply disappear off the map until 
 I could bring it back up again, at which point all the missing values that it 
 didn't get while it was done, it would slowly retrieve from other members of 
 the ring. Is this the wrong understanding?
 
 If forcing a replication factor equal to the number of nodes in my ring will 
 cause a hard-stop when one ring goes down (as I understood your comment to 
 mean), it seems to me I should go with a much lower replication factor... 
 something along the lines of 3 or roughly ceiling(N / 2) and just deal with 
 the latency when one of the nodes has to route a request to another server 
 when it doesn't contain the value.
 
 Is there a better way to accomplish what I want, or is keeping the 
 replication factor that aggressively high generally a bad thing and using 
 Cassandra in the wrong way?
 
 Thank you for the help.
 
 -Riyad
 
 On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep 
 chovatia_jayd...@yahoo.co.in wrote:
 Hi Riyad,
 
 You can set replication = 5 (number of replicas) and write with CL = ONE. 
 There is no hard requirement from Cassandra to write with CL=ALL to replicate 
 the data unless you need it. Considering your example, If you write with 
 CL=ONE then also it will replicate your data to all 5 replicas eventually.
 
 Thank you,
 Jaydeep
 From: Riyad Kalla rka...@gmail.com
 To: user@cassandra.apache.org user@cassandra.apache.org
 Sent: Sunday, 6 November 2011 9:50 PM
 Subject: Will writes with  ALL consistency eventually propagate?
 
 I am new to Cassandra and was curious about the following scenario...
 
 Lets say i have a ring of 5 servers. Ultimately I would like each server to 
 be a full replication of the next (master-master-*). 
 
 In a presentation i watched today on Cassandra, the presenter mentioned that 
 the ring members will shard data and route your requests to the right host 
 when they come in to a server that doesnt physically contain the value you 
 wanted. To the client requesting this is seamless excwpt for the added 
 latency.
 
 If i wanted to avoid the routing and latency and ensure every server had the 
 full data set, do i have to write with a consistency level of ALL and wait 
 for all of those writes to return in my code, or can i write with a CL of 1 
 or 2 and let the ring propagate the rest of the copies to the other servers 
 in the background after my code has continued executing?
 
 I dont mind eventual consistency in my case, but i do (eventually) want all 
 nodes to have all values and cannot tell if this is default behavior, or if 
 sharding is the default and i can only force duplicates onto the other 
 servers explicitly with a CL of ALL.
 
 Best,
 Riyad
 
 


Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
Allow for deterministic / manual sharding of rows.

Right now it seems that there is no way to force rows with different row keys 
will be stored on the same nodes in the ring.
This is our number one reason why we get data inconsistencies when nodes fail.

Sometimes a logical transaction requires writing rows with different row keys. 
If we could use something like this:

prefix.uniquekey and let the partitioner use only the prefix the probability 
that only part of the transaction would be written could be reduced 
considerably.



On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Riyad Kalla
Ah! Ok I was interpreting what you were saying to mean that if my RF was
too high, then the ring would die if I lost one.

Ultimately what I want (I think) is:

Replication Factor: 5 (aka all of my nodes)
Consistency Level: 2

Put another way, when I write a value, I want it to exist on two servers
*at least* before I consider that write successful enough for my code to
continue, but in the background I would like Cassandra to keep copying that
value around at its leisure until all the ring nodes know about it.

This sounds like what I need. Thanks for pointing me in the right direction.

Best,
Riyad

On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda
anthony.ikeda@gmail.comwrote:

 Riyad, I'm also just getting to know the different settings and values
 myself :)

 I believe, and it also depends on your config, CL.ONE Should ignore the
 loss of a node if your RF is 5, once you increase the CL then if you lose a
 node the CL is not met and you will get exceptions returned.

 Sent from my iPhone

 On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote:

 Anthony and Jaydeep, thank you for weighing in. I am glad to see that they
 are two different values (makes more sense mentally to me).

 Anthony, what you said caught my attention to ensure all nodes have a
 copy you may not be able to survive the loss of a single node. -- why
 would this be the case?

 I assumed (incorrectly?) that a node would simply disappear off the map
 until I could bring it back up again, at which point all the missing values
 that it didn't get while it was done, it would slowly retrieve from other
 members of the ring. Is this the wrong understanding?

 If forcing a replication factor equal to the number of nodes in my ring
 will cause a hard-stop when one ring goes down (as I understood your
 comment to mean), it seems to me I should go with a much lower replication
 factor... something along the lines of 3 or roughly ceiling(N / 2) and just
 deal with the latency when one of the nodes has to route a request to
 another server when it doesn't contain the value.

 Is there a better way to accomplish what I want, or is keeping the
 replication factor that aggressively high generally a bad thing and using
 Cassandra in the wrong way?

 Thank you for the help.

 -Riyad

 On Sun, Nov 6, 2011 at 11:14 PM, chovatia jaydeep 
 chovatia_jayd...@yahoo.co.in wrote:

 Hi Riyad,

 You can set replication = 5 (number of replicas) and write with CL = ONE.
 There is no hard requirement from Cassandra to write with CL=ALL to
 replicate the data unless you need it. Considering your example, If you
 write with CL=ONE then also it will replicate your data to all 5 replicas
 eventually.

 Thank you,
 Jaydeep
 --
 *From:* Riyad Kalla rka...@gmail.com
 *To:* user@cassandra.apache.org user@cassandra.apache.org
 *Sent:* Sunday, 6 November 2011 9:50 PM
 *Subject:* Will writes with  ALL consistency eventually propagate?

 I am new to Cassandra and was curious about the following scenario...

 Lets say i have a ring of 5 servers. Ultimately I would like each server
 to be a full replication of the next (master-master-*).

 In a presentation i watched today on Cassandra, the presenter mentioned
 that the ring members will shard data and route your requests to the right
 host when they come in to a server that doesnt physically contain the value
 you wanted. To the client requesting this is seamless excwpt for the added
 latency.

 If i wanted to avoid the routing and latency and ensure every server had
 the full data set, do i have to write with a consistency level of ALL and
 wait for all of those writes to return in my code, or can i write with a CL
 of 1 or 2 and let the ring propagate the rest of the copies to the other
 servers in the background after my code has continued executing?

 I dont mind eventual consistency in my case, but i do (eventually) want
 all nodes to have all values and cannot tell if this is default behavior,
 or if sharding is the default and i can only force duplicates onto the
 other servers explicitly with a CL of ALL.

 Best,
 Riyad





Re: Second Cassandra users survey

2011-11-07 Thread Peter Lin
This feature interests me, so I thought I'd add some comments.

Having used partition features in existing databases like DB2, Oracle
and manual partitioning, one of the biggest challenges is keeping the
partitions balanced. What I've seen with manual partitioning is that
often the partitions get unbalanced. Usually the developers take a
best guess and hope it ends up balanced.

Some of the approaches I've used in the past were zip code, area code,
state and some kind of hash.

So my question related deterministic sharding is this, what rebalance
feature(s) would be useful or needed once the partitions get
unbalanced?

Without a decent plan for rebalancing, it often ends up being a very
painful problem to solve in production. Back when I worked mobile
apps, we saw issues with how OpenWave WAP servers partitioned the
accounts. The early versions randomly assigned a phone to a server
when it is provisioned the first time. Once the phone was associated
to that server, it was stuck on that server. If the load on that
server was heavier than the others, the only choice was to scale up
the hardware.

My understanding of Cassandra's current sharding is consistent and
random. Does the new feature sit some where in-between? Are you
thinking of a pluggable API so that you can provide your own hash
algorithm for cassandra to use?



On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different row keys 
 will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes fail.

 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




order of output in get_slice

2011-11-07 Thread Roland Hänel
Does a call to

listColumnOrSuperColumn get_slice(binary key, ColumnParent
column_parent, SlicePredicate predicate, ConsistencyLevel
consistency_level)

give us any guarantees on the order of the returned list? I understand that
when the predicate actually contains a sliceRange, then the order _is_
guaranteed to be increasing (decreasing if the reverse flag is set). But
when the predicate contains a list of column names instead of a range, do
we also have the guarantee that the order is increasing (no decreasing
option because no reverse flag here)?

Greetings,
Roland


Determining Strategy options for a particular Strategy class

2011-11-07 Thread Dave Brosius
Is there a programmatic way to determine what the valid 'keys' are for 
the strategy options for a particular strategy class?


Re: Second Cassandra users survey

2011-11-07 Thread Flavio Baronti

We are using Cassandra for time series storage.
Strong points: write performance.
Pain points: dinamically adding column families as new time series come in. Caused a lot of headaches, mismatchers 
between nodes, etc. In the end we just put everything together in a single (huge) column family.
Wish list: A decent GUI to explore data kept in Cassandra would be much valuable. It should also be extendable to 
provide viewers for custom data.



Il 11/1/2011 23:59 PM, Jonathan Ellis ha scritto:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html





Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
Consistency Level is a pseudo-enum...

you have the choice between

ONE
Quorum (and there are different types of this)
ALL

At CL=ONE, only one node is guaranteed to have got the write if the
operation is a success.
At CL=ALL, all nodes that the RF says it should be stored at must
confirm the write before the operation succeeds, but a partial write
will succeed eventually if at least one node recorded the write
At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
operation to succeed, otherwise failure, but a partial write will
succeed eventually if at least one node recorded the write.

Read repair will eventually ensure that the write is replicated across
all RF nodes in the cluster.

The N in QUORUM above depends on the type of QUORUM you choose, in
general think N=RF unless you choose a fancy QUORUM.

To have a consistent read, CL of write + CL of read must be  RF...

Write at ONE, read at ONE = may not get the most recent write if RF 
1 [fastest write, fastest read] {data loss possible if node lost
before read repair}
Write at QUORUM, read at ONE = consistent read [moderate write,
fastest read] {multiple nodes must be lost for data loss to be
possible}
Write at ALL, read at ONE = consistent read, writes may be blocked if
any node fails [slowest write, fastest read]

Write at ONE, read at QUORUM = may not get the most recent write if
RF  2 [fastest write, moderate read]  {data loss possible if node
lost before read repair}
Write at QUORUM, read at QUORUM = consistent read [moderate write,
moderate read] {multiple nodes must be lost for data loss to be
possible}
Write at ALL, read at QUORUM = consistent read, writes may be blocked
if any node fails [slowest write, moderate read]

Write at ONE, read at ALL = consistent read, reads may fail if any
node fails [fastest write, slowest read] {data loss possible if node
lost before read repair}
Write at QUORUM, read at ALL = consistent read, reads may fail if any
node fails [moderate write, slowest read] {multiple nodes must be lost
for data loss to be possible}
Write at ALL, read at ALL = consistent read, writes may be blocked if
any node fails, reads may fail if any node fails [slowest write,
slowest read]

Note: You can choose the CL for each and every operation. This is
something that you should design into your application (unless you
exclusively use QUORUM for all operations, in which case you are
advised to bake the logic in, but it is less necessary)

The other thing to remember is that RF does not have to equal the
number of nodes in your cluster... in fact I would recommend designing
your app on the basis that RF  number of nodes in your cluster...
because at some point, when your data set grows big enough, you will
end up with RF  number of nodes.

-Stephen

On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote:
 Ah! Ok I was interpreting what you were saying to mean that if my RF was too
 high, then the ring would die if I lost one.
 Ultimately what I want (I think) is:
 Replication Factor: 5 (aka all of my nodes)
 Consistency Level: 2
 Put another way, when I write a value, I want it to exist on two servers *at
 least* before I consider that write successful enough for my code to
 continue, but in the background I would like Cassandra to keep copying that
 value around at its leisure until all the ring nodes know about it.
 This sounds like what I need. Thanks for pointing me in the right direction.
 Best,
 Riyad

 On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda anthony.ikeda@gmail.com
 wrote:

 Riyad, I'm also just getting to know the different settings and values
 myself :)
 I believe, and it also depends on your config, CL.ONE Should ignore the
 loss of a node if your RF is 5, once you increase the CL then if you lose a
 node the CL is not met and you will get exceptions returned.
 Sent from my iPhone
 On 07/11/2011, at 4:32, Riyad Kalla rka...@gmail.com wrote:

 Anthony and Jaydeep, thank you for weighing in. I am glad to see that they
 are two different values (makes more sense mentally to me).
 Anthony, what you said caught my attention to ensure all nodes have a
 copy you may not be able to survive the loss of a single node. -- why would
 this be the case?
 I assumed (incorrectly?) that a node would simply disappear off the map
 until I could bring it back up again, at which point all the missing values
 that it didn't get while it was done, it would slowly retrieve from other
 members of the ring. Is this the wrong understanding?
 If forcing a replication factor equal to the number of nodes in my ring
 will cause a hard-stop when one ring goes down (as I understood your comment
 to mean), it seems to me I should go with a much lower replication factor...
 something along the lines of 3 or roughly ceiling(N / 2) and just deal with
 the latency when one of the nodes has to route a request to another server
 when it doesn't contain the value.
 Is there a better way to accomplish what I want, or is keeping the
 

Re: Multiple Keyword Lookup Indexes

2011-11-07 Thread Benoit Perroud
You could directly use secondary indexes on the other fields instead
of handling yourself your indexes :

Define your global id (can be UUID), and have columns loginName, email
etc with a secondary index. Retrieval will then be fast.

2011/11/7 Felix Sprick fspr...@gmail.com:
 Hallo,

 We are implementing a Cassandra-backed user database. The challange in
 this is that there are 4 different sort of user IDs that all need to
 be indexed in order to access user data via them quickly. For example
 the user has a unique UUID, but also a LoginName and an email address,
 which can all be used for authentication.

 How do I model this in Cassandra?

 My approach would be to have one main table which is indexed by the
 most frequently used lookup value as row-key, lets say this is the
 UUID. This table would contain all customer data. Then I would create
 a index table for each of the other login alternatives, where I just
 reference to the UUID. So each alternative login which is not using
 the UUID would require two Cassandra queries. Are there any better
 approaches to model this?

 Also, I read somewhere that Cassandra is not optimized for these
 reference tables which are very short with two columns only. What is
 the reason for that?

 thanks,
 Felix




-- 
sent from my Nokia 3210


Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Riyad Kalla
Stephen,

Excellent breakdown; I appreciate all the detail.

Your last comment about RF being smaller than N (number of nodes) -- in my
particular case my data set isn't particularly large (a few GB) and is
distributed globally across a handful of data centers. What I am utilizing
Cassandra for is the replication in order to minimize latency for requests.

So when a request comes into any location, I want each node in the ring to
contain the full data set so it never needs to defer to another member of
the ring to answer a question (even if this means eventually consistency,
that is alright in my case).

Given that, the way I've understood this discussion so far is I would have
a RF of N (my total node count) but my Consistency Level with all my writes
will *likely* be QUORUM -- I think that is a good/safe default for me to
use as writes aren't the scenario I need to optimize for latency; that
being said, I also don't want to wait for a ConsistencyLevel of ALL to
complete before my code continues though.

Would you agree with this assessment or am I missing the boat on something?

Best,
Riyad

On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly 
stephen.alan.conno...@gmail.com wrote:

 Consistency Level is a pseudo-enum...

 you have the choice between

 ONE
 Quorum (and there are different types of this)
 ALL

 At CL=ONE, only one node is guaranteed to have got the write if the
 operation is a success.
 At CL=ALL, all nodes that the RF says it should be stored at must
 confirm the write before the operation succeeds, but a partial write
 will succeed eventually if at least one node recorded the write
 At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
 operation to succeed, otherwise failure, but a partial write will
 succeed eventually if at least one node recorded the write.

 Read repair will eventually ensure that the write is replicated across
 all RF nodes in the cluster.

 The N in QUORUM above depends on the type of QUORUM you choose, in
 general think N=RF unless you choose a fancy QUORUM.

 To have a consistent read, CL of write + CL of read must be  RF...

 Write at ONE, read at ONE = may not get the most recent write if RF 
 1 [fastest write, fastest read] {data loss possible if node lost
 before read repair}
 Write at QUORUM, read at ONE = consistent read [moderate write,
 fastest read] {multiple nodes must be lost for data loss to be
 possible}
 Write at ALL, read at ONE = consistent read, writes may be blocked if
 any node fails [slowest write, fastest read]

 Write at ONE, read at QUORUM = may not get the most recent write if
 RF  2 [fastest write, moderate read]  {data loss possible if node
 lost before read repair}
 Write at QUORUM, read at QUORUM = consistent read [moderate write,
 moderate read] {multiple nodes must be lost for data loss to be
 possible}
 Write at ALL, read at QUORUM = consistent read, writes may be blocked
 if any node fails [slowest write, moderate read]

 Write at ONE, read at ALL = consistent read, reads may fail if any
 node fails [fastest write, slowest read] {data loss possible if node
 lost before read repair}
 Write at QUORUM, read at ALL = consistent read, reads may fail if any
 node fails [moderate write, slowest read] {multiple nodes must be lost
 for data loss to be possible}
 Write at ALL, read at ALL = consistent read, writes may be blocked if
 any node fails, reads may fail if any node fails [slowest write,
 slowest read]

 Note: You can choose the CL for each and every operation. This is
 something that you should design into your application (unless you
 exclusively use QUORUM for all operations, in which case you are
 advised to bake the logic in, but it is less necessary)

 The other thing to remember is that RF does not have to equal the
 number of nodes in your cluster... in fact I would recommend designing
 your app on the basis that RF  number of nodes in your cluster...
 because at some point, when your data set grows big enough, you will
 end up with RF  number of nodes.

 -Stephen

 On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote:
  Ah! Ok I was interpreting what you were saying to mean that if my RF was
 too
  high, then the ring would die if I lost one.
  Ultimately what I want (I think) is:
  Replication Factor: 5 (aka all of my nodes)
  Consistency Level: 2
  Put another way, when I write a value, I want it to exist on two servers
 *at
  least* before I consider that write successful enough for my code to
  continue, but in the background I would like Cassandra to keep copying
 that
  value around at its leisure until all the ring nodes know about it.
  This sounds like what I need. Thanks for pointing me in the right
 direction.
  Best,
  Riyad
 
  On Mon, Nov 7, 2011 at 5:47 AM, Anthony Ikeda 
 anthony.ikeda@gmail.com
  wrote:
 
  Riyad, I'm also just getting to know the different settings and values
  myself :)
  I believe, and it also depends on your config, CL.ONE Should ignore the
  loss of a node 

Counters and replication factor

2011-11-07 Thread Alain RODRIGUEZ
Hi,

I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from
counters when doing so...

I got a CF that contains many counters of some events. When I'm at RF = 1
and simulate 10 events, they are well counted.
However, when I switch to a RF = 3, my counter show a wrong value that
sometimes change when requested twice (it can return 7, then 5 instead of
10 all the time).

I first thought that it was a problem of CL because I seem to remember that
I read once that I had to use CL.One for reads and writes with counters. So
I tried with CL.One, without success...

What am I doing wrong ? Is that some precaution to take when replicating
counters ?

Alain


Re: Second Cassandra users survey

2011-11-07 Thread Radim Kolar
 So my question related deterministic sharding is this, what 
rebalance feature(s) would be useful or needed once the partitions get 
unbalanced?


In current cassandra you can use nodetool move for rebalancing. Its 
fast operation, portion of existing data is moved to new server.




Re: Counters and replication factor

2011-11-07 Thread Riyad Kalla
Alain,

Try using a CL of 3 or ALL and see if that the problem goes away.

Your replication factor (as I just learned) dictates how many nodes each
piece of data is replicated to; by using a RF of 3 you are saying
replicate all my data to all my nodes (in this case counters).

This doesn't happen immediately, but you can *force* it to happen on write
by specifying a CL of ALL. If you specify 1 then your counter value is
written to one member of the ring, then your command returns.

If you keep querying you will bounce around your ring, reading the values
from the different nodes until a future date at *which point* all the
values will likely agree.

If you keep all your code you have now exactly the same, just change the
code at the end where you read the counter value back, to keep reading the
counter value back every second for 60 seconds and see if all the values
eventually match up -- they should (as the counter value is replicated to
all the nodes and their old values discarded).

-R

On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi,

 I trying to switch from a RF = 1 to a RF = 3, but I get wrong values from
 counters when doing so...

 I got a CF that contains many counters of some events. When I'm at RF = 1
 and simulate 10 events, they are well counted.
 However, when I switch to a RF = 3, my counter show a wrong value that
 sometimes change when requested twice (it can return 7, then 5 instead of
 10 all the time).

 I first thought that it was a problem of CL because I seem to remember
 that I read once that I had to use CL.One for reads and writes with
 counters. So I tried with CL.One, without success...

 What am I doing wrong ? Is that some precaution to take when replicating
 counters ?

 Alain



Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan
Actually, the data will be visible at QUORUM as well if you can see it 
with ONE.  QUORUM actually gives you a higher chance of seeing the new 
value than ONE does.  In the case of R=3 you have 2/3 chance of seeing 
the new value with QUORUM, with ONE you have 1/3...  And this JIRA fixed 
an issue where two QUORUM reads in a row could give you the NEW value 
and then the OLD value.


https://issues.apache.org/jira/browse/CASSANDRA-2494

So quorum read on fail for a single row always gives consistent results 
now.  For multiple rows your still have issues, but you can always 
mitigate that in app with something like giving all of the changes the 
same time stamp, and then on read checking to make sure the time stamps 
match, and reading the data again if they don't.


I'm not arguing against atomic batch operations, they would be nice =).  
Just clarifying how things work now.


-Jeremiah

On 11/06/2011 02:05 PM, Pierre Chalamet wrote:

- support for atomic operations or batches (if QUORUM fails, data should

not be visible with ONE)

zookeeper is solving that.

I might have screwed up a little bit since I didn't talk about isolation;
let's reformulate: support for read committed (using DB terminology).
Cassandra is more like read uncommitted.
Even if row mutations in one CF for one key are atomic on one server , stuff
is not rolled back when the CL can't be satisfied at the coordinator level.
Data won't be visible at QUORUM level, but when using weaker CL, invalid
data can appear imho.
Also it should be possible to tell which operations failed with batch_mutate
but unfortunately it is not


Re: Second Cassandra users survey

2011-11-07 Thread Jeremiah Jordan

- Batch read/slice from multiple column families.


On 11/01/2011 05:59 PM, Jonathan Ellis wrote:

Hi all,

Two years ago I asked for Cassandra use cases and feature requests.
[1]  The results [2] have been extremely useful in setting and
prioritizing goals for Cassandra development.  But with the release of
1.0 we've accomplished basically everything from our original wish
list. [3]

I'd love to hear from modern Cassandra users again, especially if
you're usually a quiet lurker.  What does Cassandra do well?  What are
your pain points?  What's your feature wish list?

As before, if you're in stealth mode or don't want to say anything in
public, feel free to reply to me privately and I will keep it off the
record.

[1] http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
[2] 
http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
[3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html



Re: Counters and replication factor

2011-11-07 Thread Riyad Kalla
Alain,

When you tried CL.All was that only after you had made the change of
ReplicationFactor=3 and restarted all the servers?

If you hadn't restarted the servers with the new RF, I am not sure that
CL.All would have the intended effect.

Also, I wasn't sure what you meant by but know every request returns me
always the same count value... -- didn't want the requests to always
return you the same values?

Or maybe you are saying that it always returns the same *wrong* value? Like
you do:

counter.increment (v=1)
counter.increment (v=2)
counter.increment (v=3)

counter.getValue = returns 7
counter.getValue = returns 7
counter.getValue = returns 7

or something inconsistent like that?

On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 I've tried with CL.All, but it doesn't wotk better. I still have strange
 values (between 4 and 10 events counted instead of 10) but know every
 request returns me always the same count value...

 It's very strange.

 Any other idea ?

 Alain


 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,

 Try using a CL of 3 or ALL and see if that the problem goes away.

 Your replication factor (as I just learned) dictates how many nodes each
 piece of data is replicated to; by using a RF of 3 you are saying
 replicate all my data to all my nodes (in this case counters).

 This doesn't happen immediately, but you can *force* it to happen on
 write by specifying a CL of ALL. If you specify 1 then your counter
 value is written to one member of the ring, then your command returns.

 If you keep querying you will bounce around your ring, reading the values
 from the different nodes until a future date at *which point* all the
 values will likely agree.

 If you keep all your code you have now exactly the same, just change the
 code at the end where you read the counter value back, to keep reading the
 counter value back every second for 60 seconds and see if all the values
 eventually match up -- they should (as the counter value is replicated to
 all the nodes and their old values discarded).

 -R


 On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 I trying to switch from a RF = 1 to a RF = 3, but I get wrong values
 from counters when doing so...

 I got a CF that contains many counters of some events. When I'm at RF =
 1 and simulate 10 events, they are well counted.
 However, when I switch to a RF = 3, my counter show a wrong value that
 sometimes change when requested twice (it can return 7, then 5 instead of
 10 all the time).

 I first thought that it was a problem of CL because I seem to remember
 that I read once that I had to use CL.One for reads and writes with
 counters. So I tried with CL.One, without success...

 What am I doing wrong ? Is that some precaution to take when replicating
 counters ?

 Alain






Re: Second Cassandra users survey

2011-11-07 Thread Ed Anuff
This is basically what entity groups are about -
https://issues.apache.org/jira/browse/CASSANDRA-1684

On Mon, Nov 7, 2011 at 5:26 AM, Peter Lin wool...@gmail.com wrote:
 This feature interests me, so I thought I'd add some comments.

 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.

 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.

 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?

 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.

 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?



 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.

 Right now it seems that there is no way to force rows with different row 
 keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes 
 fail.

 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:

 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.



 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:

 Hi all,

 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]

 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?

 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.

 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Re: Counters and replication factor

2011-11-07 Thread Alain RODRIGUEZ
I retried it after restarting all the servers.

I still have wrong results (I simulated an event 5 times and it was counted
3 times by some counters 4 or 5 times by others.

What I meant by but now every request returns me always the same count
value... will be easier to explain with an example :

event 1:

counter1.increment
counter2.increment
counter3.increment

.
.
.

event 5:

counter1.increment
counter2.increment
counter3.increment

Show results :

counter1.getValue = returns 4
counter2.getValue = returns 3
counter3.getValue = returns 5

counter1.getValue = returns 5
counter2.getValue = returns 3
counter3.getValue = returns 5

counter1.getValue = returns 4
counter2.getValue = returns 4
counter3.getValue = returns 5

...

So I've got wrong values, and not always the same ones. In my previous
email I tried to tell you by saying but now every request returns me
always the same count value... that I had all the time the same wrong
values, let us say :

counter1.getValue = returns 4
counter2.getValue = returns 3
counter3.getValue = returns 5

counter1.getValue = returns 4
counter2.getValue = returns 3
counter3.getValue = returns 5

counter1.getValue = returns 4
counter2.getValue = returns 3
counter3.getValue = returns 5

But that is not true, I still have some random wrong values, maybe
haven't I query to get counter values often enough to see it last time.

Sorry of not being clearer, that is not easy to explain, neither to
understand for me.

Thanks for help.

Alain


2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,

 When you tried CL.All was that only after you had made the change of
 ReplicationFactor=3 and restarted all the servers?

 If you hadn't restarted the servers with the new RF, I am not sure that
 CL.All would have the intended effect.

 Also, I wasn't sure what you meant by but know every request returns me
 always the same count value... -- didn't want the requests to always
 return you the same values?

 Or maybe you are saying that it always returns the same *wrong* value?
 Like you do:

 counter.increment (v=1)
 counter.increment (v=2)
 counter.increment (v=3)

 counter.getValue = returns 7
 counter.getValue = returns 7
 counter.getValue = returns 7

 or something inconsistent like that?

 On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 I've tried with CL.All, but it doesn't wotk better. I still have strange
 values (between 4 and 10 events counted instead of 10) but know every
 request returns me always the same count value...

 It's very strange.

 Any other idea ?

 Alain


 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,

 Try using a CL of 3 or ALL and see if that the problem goes away.

 Your replication factor (as I just learned) dictates how many nodes each
 piece of data is replicated to; by using a RF of 3 you are saying
 replicate all my data to all my nodes (in this case counters).

 This doesn't happen immediately, but you can *force* it to happen on
 write by specifying a CL of ALL. If you specify 1 then your counter
 value is written to one member of the ring, then your command returns.

 If you keep querying you will bounce around your ring, reading the
 values from the different nodes until a future date at *which point* all
 the values will likely agree.

 If you keep all your code you have now exactly the same, just change the
 code at the end where you read the counter value back, to keep reading the
 counter value back every second for 60 seconds and see if all the values
 eventually match up -- they should (as the counter value is replicated to
 all the nodes and their old values discarded).

 -R


 On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 I trying to switch from a RF = 1 to a RF = 3, but I get wrong values
 from counters when doing so...

 I got a CF that contains many counters of some events. When I'm at RF =
 1 and simulate 10 events, they are well counted.
 However, when I switch to a RF = 3, my counter show a wrong value that
 sometimes change when requested twice (it can return 7, then 5 instead of
 10 all the time).

 I first thought that it was a problem of CL because I seem to remember
 that I read once that I had to use CL.One for reads and writes with
 counters. So I tried with CL.One, without success...

 What am I doing wrong ? Is that some precaution to take when
 replicating counters ?

 Alain







Re: Counters and replication factor

2011-11-07 Thread Sylvain Lebresne
This sound like a bug 'a priori'. Do you mind opening a ticket at
https://issues.apache.org/jira/browse/CASSANDRA?
It will help if you can specify which version you are using and the
exact procedure you did that leads to that.
If know how to reproduce, that would be even better.

--
Sylvain

On Mon, Nov 7, 2011 at 5:57 PM, Alain RODRIGUEZ arodr...@gmail.com wrote:
 I retried it after restarting all the servers.
 I still have wrong results (I simulated an event 5 times and it was counted
 3 times by some counters 4 or 5 times by others.
 What I meant by but now every request returns me always the same count
 value... will be easier to explain with an example :
 event 1:
 counter1.increment
 counter2.increment
 counter3.increment
 .
 .
 .
 event 5:
 counter1.increment
 counter2.increment
 counter3.increment
 Show results :
 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5
 counter1.getValue = returns 5
 counter2.getValue = returns 3
 counter3.getValue = returns 5
 counter1.getValue = returns 4
 counter2.getValue = returns 4
 counter3.getValue = returns 5
 ...
 So I've got wrong values, and not always the same ones. In my previous email
 I tried to tell you by saying but now every request returns me always the
 same count value... that I had all the time the same wrong values, let us
 say :
 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5
 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5
 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5
 But that is not true, I still have some random wrong values, maybe haven't
 I query to get counter values often enough to see it last time.
 Sorry of not being clearer, that is not easy to explain, neither to
 understand for me.
 Thanks for help.
 Alain

 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,
 When you tried CL.All was that only after you had made the change of
 ReplicationFactor=3 and restarted all the servers?
 If you hadn't restarted the servers with the new RF, I am not sure that
 CL.All would have the intended effect.
 Also, I wasn't sure what you meant by but know every request returns me
 always the same count value... -- didn't want the requests to always return
 you the same values?
 Or maybe you are saying that it always returns the same *wrong* value?
 Like you do:
 counter.increment (v=1)
 counter.increment (v=2)
 counter.increment (v=3)
 counter.getValue = returns 7
 counter.getValue = returns 7
 counter.getValue = returns 7
 or something inconsistent like that?
 On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 I've tried with CL.All, but it doesn't wotk better. I still have strange
 values (between 4 and 10 events counted instead of 10) but know every
 request returns me always the same count value...
 It's very strange.
 Any other idea ?
 Alain

 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,
 Try using a CL of 3 or ALL and see if that the problem goes away.
 Your replication factor (as I just learned) dictates how many nodes each
 piece of data is replicated to; by using a RF of 3 you are saying 
 replicate
 all my data to all my nodes (in this case counters).
 This doesn't happen immediately, but you can *force* it to happen on
 write by specifying a CL of ALL. If you specify 1 then your counter
 value is written to one member of the ring, then your command returns.
 If you keep querying you will bounce around your ring, reading the
 values from the different nodes until a future date at *which point* all 
 the
 values will likely agree.
 If you keep all your code you have now exactly the same, just change the
 code at the end where you read the counter value back, to keep reading the
 counter value back every second for 60 seconds and see if all the values
 eventually match up -- they should (as the counter value is replicated to
 all the nodes and their old values discarded).
 -R

 On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.com
 wrote:

 Hi,
 I trying to switch from a RF = 1 to a RF = 3, but I get wrong values
 from counters when doing so...
 I got a CF that contains many counters of some events. When I'm at RF =
 1 and simulate 10 events, they are well counted.
 However, when I switch to a RF = 3, my counter show a wrong value that
 sometimes change when requested twice (it can return 7, then 5 instead of 
 10
 all the time).
 I first thought that it was a problem of CL because I seem to remember
 that I read once that I had to use CL.One for reads and writes with
 counters. So I tried with CL.One, without success...
 What am I doing wrong ? Is that some precaution to take when
 replicating counters ?
 Alain






Re: Counters and replication factor

2011-11-07 Thread Riyad Kalla
Alain thank you for all the clarification, I understand exactly what you
meant now... and as a result am just as confused as you are :)

What version of Cassandra are you using? Can you share the important parts
of your config? (you double checked that your replication factor is set on
all 3 to 3?)

Also out of curiosity, if you keep querying for up to 5 mins (say every 10
seconds) do counter1, 2 and 3 still show the same wrong values for getValue
or do the values eventually converge on the correct amounts?

(I assume 5mins is a long enough window to test, maybe I'm wrong and
another Cassandra dev can correct me here).

-R

On Mon, Nov 7, 2011 at 9:57 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 I retried it after restarting all the servers.

 I still have wrong results (I simulated an event 5 times and it was
 counted 3 times by some counters 4 or 5 times by others.

 What I meant by but now every request returns me always the same count
 value... will be easier to explain with an example :

 event 1:

 counter1.increment
 counter2.increment
 counter3.increment

 .
 .
 .

 event 5:

 counter1.increment
 counter2.increment
 counter3.increment

 Show results :

 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5

 counter1.getValue = returns 5
 counter2.getValue = returns 3
 counter3.getValue = returns 5

 counter1.getValue = returns 4
 counter2.getValue = returns 4
 counter3.getValue = returns 5

 ...

 So I've got wrong values, and not always the same ones. In my previous
 email I tried to tell you by saying but now every request returns me
 always the same count value... that I had all the time the same wrong
 values, let us say :

 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5

 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5

 counter1.getValue = returns 4
 counter2.getValue = returns 3
 counter3.getValue = returns 5

 But that is not true, I still have some random wrong values, maybe
 haven't I query to get counter values often enough to see it last time.

 Sorry of not being clearer, that is not easy to explain, neither to
 understand for me.

 Thanks for help.

 Alain


 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,

 When you tried CL.All was that only after you had made the change of
 ReplicationFactor=3 and restarted all the servers?

 If you hadn't restarted the servers with the new RF, I am not sure that
 CL.All would have the intended effect.

 Also, I wasn't sure what you meant by but know every request returns me
 always the same count value... -- didn't want the requests to always
 return you the same values?

 Or maybe you are saying that it always returns the same *wrong* value?
 Like you do:

 counter.increment (v=1)
 counter.increment (v=2)
 counter.increment (v=3)

 counter.getValue = returns 7
 counter.getValue = returns 7
 counter.getValue = returns 7

 or something inconsistent like that?

 On Mon, Nov 7, 2011 at 9:09 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 I've tried with CL.All, but it doesn't wotk better. I still have strange
 values (between 4 and 10 events counted instead of 10) but know every
 request returns me always the same count value...

 It's very strange.

 Any other idea ?

 Alain


 2011/11/7 Riyad Kalla rka...@gmail.com

 Alain,

 Try using a CL of 3 or ALL and see if that the problem goes away.

 Your replication factor (as I just learned) dictates how many nodes
 each piece of data is replicated to; by using a RF of 3 you are saying
 replicate all my data to all my nodes (in this case counters).

 This doesn't happen immediately, but you can *force* it to happen on
 write by specifying a CL of ALL. If you specify 1 then your counter
 value is written to one member of the ring, then your command returns.

 If you keep querying you will bounce around your ring, reading the
 values from the different nodes until a future date at *which point* all
 the values will likely agree.

 If you keep all your code you have now exactly the same, just change
 the code at the end where you read the counter value back, to keep reading
 the counter value back every second for 60 seconds and see if all the
 values eventually match up -- they should (as the counter value is
 replicated to all the nodes and their old values discarded).

 -R


 On Mon, Nov 7, 2011 at 8:15 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 I trying to switch from a RF = 1 to a RF = 3, but I get wrong values
 from counters when doing so...

 I got a CF that contains many counters of some events. When I'm at RF
 = 1 and simulate 10 events, they are well counted.
 However, when I switch to a RF = 3, my counter show a wrong value that
 sometimes change when requested twice (it can return 7, then 5 instead of
 10 all the time).

 I first thought that it was a problem of CL because I seem to remember
 that I read once that I had to use CL.One for reads and writes 

Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
Plan for the future

At some point your data set will become too big for the node that it
is running on, or your load will force you to split nodes once you
do that RF  N

To solve performance issues with C* the solution is add more nodes

To solve storage issues with C* the solution is add more nodes

In most cases the solution in C* is add more nodes.

Don't assume RF=Number of nodes as a core design decision of your
application and you will not have your ass bitten

;-)

-Stephen
P.S. making the point more extreme to make it clear

On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote:
 Stephen,
 Excellent breakdown; I appreciate all the detail.
 Your last comment about RF being smaller than N (number of nodes) -- in my
 particular case my data set isn't particularly large (a few GB) and is
 distributed globally across a handful of data centers. What I am utilizing
 Cassandra for is the replication in order to minimize latency for requests.
 So when a request comes into any location, I want each node in the ring to
 contain the full data set so it never needs to defer to another member of
 the ring to answer a question (even if this means eventually consistency,
 that is alright in my case).
 Given that, the way I've understood this discussion so far is I would have a
 RF of N (my total node count) but my Consistency Level with all my writes
 will *likely* be QUORUM -- I think that is a good/safe default for me to use
 as writes aren't the scenario I need to optimize for latency; that being
 said, I also don't want to wait for a ConsistencyLevel of ALL to complete
 before my code continues though.
 Would you agree with this assessment or am I missing the boat on something?
 Best,
 Riyad

 On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
 stephen.alan.conno...@gmail.com wrote:

 Consistency Level is a pseudo-enum...

 you have the choice between

 ONE
 Quorum (and there are different types of this)
 ALL

 At CL=ONE, only one node is guaranteed to have got the write if the
 operation is a success.
 At CL=ALL, all nodes that the RF says it should be stored at must
 confirm the write before the operation succeeds, but a partial write
 will succeed eventually if at least one node recorded the write
 At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
 operation to succeed, otherwise failure, but a partial write will
 succeed eventually if at least one node recorded the write.

 Read repair will eventually ensure that the write is replicated across
 all RF nodes in the cluster.

 The N in QUORUM above depends on the type of QUORUM you choose, in
 general think N=RF unless you choose a fancy QUORUM.

 To have a consistent read, CL of write + CL of read must be  RF...

 Write at ONE, read at ONE = may not get the most recent write if RF 
 1 [fastest write, fastest read] {data loss possible if node lost
 before read repair}
 Write at QUORUM, read at ONE = consistent read [moderate write,
 fastest read] {multiple nodes must be lost for data loss to be
 possible}
 Write at ALL, read at ONE = consistent read, writes may be blocked if
 any node fails [slowest write, fastest read]

 Write at ONE, read at QUORUM = may not get the most recent write if
 RF  2 [fastest write, moderate read]  {data loss possible if node
 lost before read repair}
 Write at QUORUM, read at QUORUM = consistent read [moderate write,
 moderate read] {multiple nodes must be lost for data loss to be
 possible}
 Write at ALL, read at QUORUM = consistent read, writes may be blocked
 if any node fails [slowest write, moderate read]

 Write at ONE, read at ALL = consistent read, reads may fail if any
 node fails [fastest write, slowest read] {data loss possible if node
 lost before read repair}
 Write at QUORUM, read at ALL = consistent read, reads may fail if any
 node fails [moderate write, slowest read] {multiple nodes must be lost
 for data loss to be possible}
 Write at ALL, read at ALL = consistent read, writes may be blocked if
 any node fails, reads may fail if any node fails [slowest write,
 slowest read]

 Note: You can choose the CL for each and every operation. This is
 something that you should design into your application (unless you
 exclusively use QUORUM for all operations, in which case you are
 advised to bake the logic in, but it is less necessary)

 The other thing to remember is that RF does not have to equal the
 number of nodes in your cluster... in fact I would recommend designing
 your app on the basis that RF  number of nodes in your cluster...
 because at some point, when your data set grows big enough, you will
 end up with RF  number of nodes.

 -Stephen

 On 7 November 2011 13:03, Riyad Kalla rka...@gmail.com wrote:
  Ah! Ok I was interpreting what you were saying to mean that if my RF was
  too
  high, then the ring would die if I lost one.
  Ultimately what I want (I think) is:
  Replication Factor: 5 (aka all of my nodes)
  Consistency Level: 2
  Put another way, when I write a 

Reminder: Cassandra Meetup, Thursday Nov. 10th in Vancouver

2011-11-07 Thread Eric Evans
Just a reminder; If you're planning to be at ApacheCon, or are
otherwise able to be in Vancouver on the 10th, we're having a
Cassandra Meetup.  There is no cost to attend (you don't even need to
be registered for the conference), and beer will be provided.

As a special treat, Chris Burroughs of Clearspring, and Paul Querna of
Rackspace have each agreed to spend a few minutes talking about their
use-cases, and answering any question you might have.

Hope to see you there!

http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Reminder: Cassandra Meetup, Thursday Nov. 10th in Vancouver

2011-11-07 Thread Jake Luciani
I'll be there!

On Mon, Nov 7, 2011 at 5:23 PM, Eric Evans eev...@acunu.com wrote:

 Just a reminder; If you're planning to be at ApacheCon, or are
 otherwise able to be in Vancouver on the 10th, we're having a
 Cassandra Meetup.  There is no cost to attend (you don't even need to
 be registered for the conference), and beer will be provided.

 As a special treat, Chris Burroughs of Clearspring, and Paul Querna of
 Rackspace have each agreed to spend a few minutes talking about their
 use-cases, and answering any question you might have.

 Hope to see you there!

 http://wiki.apache.org/cassandra/Meetup_ApacheConNA2011

 --
 Eric Evans
 Acunu | http://www.acunu.com | @acunu




-- 
http://twitter.com/tjake


Re: Key count mismatch in cluster for a column family

2011-11-07 Thread Daning
Sylvain - We have similar problem but the discrepancy is not that big. 
Do we have to do major compaction to fix it?  We did not do 'nodetool 
compact', just did repair regularly, which triggers minor compaction.


Thanks,

Daning

On 10/26/2011 03:23 AM, Sylvain Lebresne wrote:

The estimate for the number of keys is computed by summing the key
estimate for each sstable of the CF. For each sstable, the estimate
should be fairly good. However, that's when we sum all the sstable estimates
that we can loose potentially a lot of precision if there is a lot of rows that
have parts in different sstables. But that in turn would suggest a problem
with compaction lacking badly behind, especially with leveled compaction.

--
Sylvain

On Wed, Oct 26, 2011 at 3:58 AM, Terry Cumaranatungecumar...@gmail.com  wrote:

I have a cluster of 8 nodes all running 1.0. The stats shown on the 1st node
on one of the CFs for the number of keys is much larger than expected. The
first node shows the key count estimate to be 9.2M whereas the rest report
~650K on each node. The 650K is in the correct neighborhood of the number of
keys that have been inserted. The counts are comparable for all other CFs
across the cluster. I'm using Level compaction, but no compression.

The 'nodetool ring' shows that the load is equal across all nodes. What
could cause this large disparity in the number of keys? Is this just a stats
issue or does this suggest a functional problem?

1st node:
 Column Family: uid
 SSTable count: 395
 Space used (live): 1375262
 Space used (total): 5482088532
 Number of Keys (estimate): 9215104
 Memtable Columns Count: 514952
 Memtable Data Size: 295213448
 Memtable Switch Count: 290
 Read Count: 193102511
 Read Latency: 0.146 ms.
 Write Count: 176934874
 Write Latency: 0.018 ms.
 Pending Tasks: 0
 Key cache capacity: 8302131
 Key cache size: 8302131
 Key cache hit rate: 0.8644664668071792
 Row cache: disabled
 Compacted row minimum size: 87
 Compacted row maximum size: 7007506
 Compacted row mean size: 8944
2nd node:
 Column Family: uid
 SSTable count: 402
 Space used (live): 13723958304
 Space used (total): 4044833220
 Number of Keys (estimate): 652928
 Memtable Columns Count: 170290
 Memtable Data Size: 102378904
 Memtable Switch Count: 272
 Read Count: 192463595
 Read Latency: 0.289 ms.
 Write Count: 176527238
 Write Latency: 0.014 ms.
 Pending Tasks: 0
 Key cache capacity: 8783058
 Key cache size: 8783058
 Key cache hit rate: 0.7865727464740025
 Row cache: disabled
 Compacted row minimum size: 87
 Compacted row maximum size: 7007506
 Compacted row mean size: 12151
3rd node:
   Column Family: uid
 SSTable count: 401
 Space used (live): 13204714872
 Space used (total): 4030024144
 Number of Keys (estimate): 675968
 Memtable Columns Count: 42881
 Memtable Data Size: 30992298
 Memtable Switch Count: 304
 Read Count: 190769879
 Read Latency: 0.224 ms.
 Write Count: 175381826
 Write Latency: 0.014 ms.
 Pending Tasks: 0
 Key cache capacity: 8920108
 Key cache size: 8920108
 Key cache hit rate: 0.8053563128870577
 Row cache: disabled
 Compacted row minimum size: 87
 Compacted row maximum size: 4866323
 Compacted row mean size: 12074





Re: Cassandra NYC Summit on December 6th

2011-11-07 Thread Riyad Kalla
Very cool Nate, when will the tracks be locked in?

On Mon, Nov 7, 2011 at 11:14 AM, Nate McCall n...@datastax.com wrote:

 The first East Coast Apache Cassandra conference - Cassandra NYC -
 will be held on Tuesday, December 6, at the Lighthouse International
 conference center in New York City.

 This is a one day two-track event with lectures and workshops by
 leading Cassandra experts. Jonathan Ellis, head of the Apache
 Cassandra Project, will give the keynote. There is an initial list of
 speakers up, with others to be announced in the next few days.

 Continental breakfast, lunch, and continuous beverage service will be
 provided for all attendees. Following the conference there will be an
 after party worthy of NYC.

 For additional details and registration, please visit the event page:

 http://www.datastax.com/events/cassandranyc2011

 Hope to see you there!

 -Nate



Re: Cassandra NYC Summit on December 6th

2011-11-07 Thread Nate McCall
We should have the schedule set by the end of the week. The confirmed
speaker list can be found here (and should be growing by the day as
bios and summaries come in):
http://www.datastax.com/events/cassandranyc2011

Thanks,
-Nate

On Mon, Nov 7, 2011 at 12:25 PM, Riyad Kalla rka...@gmail.com wrote:
 Very cool Nate, when will the tracks be locked in?

 On Mon, Nov 7, 2011 at 11:14 AM, Nate McCall n...@datastax.com wrote:

 The first East Coast Apache Cassandra conference - Cassandra NYC -
 will be held on Tuesday, December 6, at the Lighthouse International
 conference center in New York City.

 This is a one day two-track event with lectures and workshops by
 leading Cassandra experts. Jonathan Ellis, head of the Apache
 Cassandra Project, will give the keynote. There is an initial list of
 speakers up, with others to be announced in the next few days.

 Continental breakfast, lunch, and continuous beverage service will be
 provided for all attendees. Following the conference there will be an
 after party worthy of NYC.

 For additional details and registration, please visit the event page:

 http://www.datastax.com/events/cassandranyc2011

 Hope to see you there!

 -Nate




RE: Second Cassandra users survey

2011-11-07 Thread Deeter, Derek
I second transparent disk encryption.
Also:
Matching column names via 'like' and %wildcards
Parameterized CQL plus Support for 'AND' and 'OR'
Bulk row deletion.
Also, more clarification on various parameters and configuration - If you are 
doing this, change 

Thanks for the opportunity,
-Derek

--
Derek Deeter, Sr. Software Engineer Intuit Financial 
Services
(818) 597-5932  (x76932)5601 Lindero Canyon Rd.
derek.dee...@digitalinsight.com Westlake, CA 91362
 


-Original Message-
From: Mohit Anchlia [mailto:mohitanch...@gmail.com] 
Sent: Sunday, November 06, 2011 10:58 AM
To: user@cassandra.apache.org
Subject: Re: Second Cassandra users survey

Transparent on disk encryption with pluggable keyprovider will also be
really helpful to secure sensitive information.

On Sun, Nov 6, 2011 at 9:42 AM, Aaron Turner synfina...@gmail.com wrote:
 The intent was to have a lighter solution for common problems then
 having to go with Hadoop or streaming large quantities of data back to
 the client.  Is this feature creep?  Yeah, prolly.  Is it useful?
 Yes.  If it can't be done well, then it probably shouldn't be done,
 but it never hurts to ask. :)

 On Sun, Nov 6, 2011 at 9:13 AM, Sarah Baker sba...@mspot.com wrote:
 Isn't this sort of heading on the slippery slope of things that weigh you 
 down?
 It was my understanding that Cassandra was stick to your core competency 
 sort of database
 that really wanted to leave such utilities external.  At its core was get 
 and put.
 Did I miss something in my reading of intent?
 -Sarah

 -Original Message-
 From: Aaron Turner [mailto:synfina...@gmail.com]
 Sent: Sunday, November 06, 2011 8:25 AM
 To: user@cassandra.apache.org
 Subject: Re: Second Cassandra users survey

 1. Basic SQL-like summary transforms for both CQL and Thrift API clients 
 like:

 SUM
 AVG
 MIN
 MAX






 --
 Aaron Turner
 http://synfin.net/         Twitter: @synfinatic
 http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix  
 Windows
 Those who would give up essential Liberty, to purchase a little temporary
 Safety, deserve neither Liberty nor Safety.
     -- Benjamin Franklin
 carpe diem quam minimum credula postero



Re: Second Cassandra users survey

2011-11-07 Thread Ian Danforth


 Wish list: A decent GUI to explore data kept in Cassandra would be much
 valuable. It should also be extendable to
 provide viewers for custom data.


+1 to that.

@jonathan - This is what google moderator is really good at. Perhaps start
one and move the idea creation / voting there.


Re: Determining Strategy options for a particular Strategy class

2011-11-07 Thread Jonathan Ellis
No, there isn't.

On Mon, Nov 7, 2011 at 8:04 AM, Dave Brosius dbros...@mebigfatguy.com wrote:
 Is there a programmatic way to determine what the valid 'keys' are for the
 strategy options for a particular strategy class?




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Stephen Connolly
at that point, your cluster will either have so much data on each node that
you will need to split them, keeping rf=5 so you have 10 nodes... or the
intra cluster traffic will swap you and you will split each node keeping
rf=5 so you have 10 nodes again.

safest thing is not to design with the assumption that rf=n

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 7 Nov 2011 17:47, Riyad Kalla rka...@gmail.com wrote:

 Stephen,

 I appreciate you making the point more strongly; I won't make this
 decision lightly given the stress you are putting on it, but the technical
 aspects of this make me curious...

 If I start with RF=N (number of nodes) now, and in 2 years
 (hypothetically) my dataset is too large and I say to myself Dangit,
 Stephen was right..., couldn't I just change the RF to some smaller value,
 say 3 at that point or would the Cassandra ring not rebalance the data
 set nicely at that point?

 More specifically, would it not know how best to slowly remove extraneous
 copies from the nodes and make the data more sparse among the ring members?

 Thanks for the hand-holding; it is helping me understand the operational
 landscape quickly.

 -R

 On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly 
 stephen.alan.conno...@gmail.com wrote:

 Plan for the future

 At some point your data set will become too big for the node that it
 is running on, or your load will force you to split nodes once you
 do that RF  N

 To solve performance issues with C* the solution is add more nodes

 To solve storage issues with C* the solution is add more nodes

 In most cases the solution in C* is add more nodes.

 Don't assume RF=Number of nodes as a core design decision of your
 application and you will not have your ass bitten

 ;-)

 -Stephen
 P.S. making the point more extreme to make it clear

 On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote:
  Stephen,
  Excellent breakdown; I appreciate all the detail.
  Your last comment about RF being smaller than N (number of nodes) -- in
 my
  particular case my data set isn't particularly large (a few GB) and is
  distributed globally across a handful of data centers. What I am
 utilizing
  Cassandra for is the replication in order to minimize latency for
 requests.
  So when a request comes into any location, I want each node in the ring
 to
  contain the full data set so it never needs to defer to another member
 of
  the ring to answer a question (even if this means eventually
 consistency,
  that is alright in my case).
  Given that, the way I've understood this discussion so far is I would
 have a
  RF of N (my total node count) but my Consistency Level with all my
 writes
  will *likely* be QUORUM -- I think that is a good/safe default for me
 to use
  as writes aren't the scenario I need to optimize for latency; that being
  said, I also don't want to wait for a ConsistencyLevel of ALL to
 complete
  before my code continues though.
  Would you agree with this assessment or am I missing the boat on
 something?
  Best,
  Riyad
 
  On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
  stephen.alan.conno...@gmail.com wrote:
 
  Consistency Level is a pseudo-enum...
 
  you have the choice between
 
  ONE
  Quorum (and there are different types of this)
  ALL
 
  At CL=ONE, only one node is guaranteed to have got the write if the
  operation is a success.
  At CL=ALL, all nodes that the RF says it should be stored at must
  confirm the write before the operation succeeds, but a partial write
  will succeed eventually if at least one node recorded the write
  At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
  operation to succeed, otherwise failure, but a partial write will
  succeed eventually if at least one node recorded the write.
 
  Read repair will eventually ensure that the write is replicated across
  all RF nodes in the cluster.
 
  The N in QUORUM above depends on the type of QUORUM you choose, in
  general think N=RF unless you choose a fancy QUORUM.
 
  To have a consistent read, CL of write + CL of read must be  RF...
 
  Write at ONE, read at ONE = may not get the most recent write if RF 
  1 [fastest write, fastest read] {data loss possible if node lost
  before read repair}
  Write at QUORUM, read at ONE = consistent read [moderate write,
  fastest read] {multiple nodes must be lost for data loss to be
  possible}
  Write at ALL, read at ONE = consistent read, writes may be blocked if
  any node fails [slowest write, fastest read]
 
  Write at ONE, read at QUORUM = may not get the most recent write if
  RF  2 [fastest write, moderate read]  {data loss possible if node
  lost before read repair}
  Write at QUORUM, read at QUORUM = consistent read [moderate write,
  moderate read] {multiple nodes must be lost for data loss to be
  possible}
  Write at ALL, read at QUORUM = consistent read, 

Re: Second Cassandra users survey

2011-11-07 Thread Daniel Doubleday
Well - given the example in our case the prefix that determines the endpoints 
where a token should be routed to could be something like a user-id

so with 

key = userid + . + userthingid;

instead of

// this is happening right now
getEndpoints(hash(key))

you would have

getEndpoints(userid)

Since count(users) is much larger than number of nodes in the ring we would 
still have a balanced cluster.

I guess what we would need is something like a compound row key

You could almost do something like this with the current code base but I 
remember that there are certain assumptions about how keys translate to tokens 
on the ring make this impossible. 

But in essence this would result in another partitioner implementation. 
So you'd have OrderPreserverPartitioner, RandomPartitioner and maybe 
ShardedPartitioner


On Nov 7, 2011, at 2:26 PM, Peter Lin wrote:

 This feature interests me, so I thought I'd add some comments.
 
 Having used partition features in existing databases like DB2, Oracle
 and manual partitioning, one of the biggest challenges is keeping the
 partitions balanced. What I've seen with manual partitioning is that
 often the partitions get unbalanced. Usually the developers take a
 best guess and hope it ends up balanced.
 
 Some of the approaches I've used in the past were zip code, area code,
 state and some kind of hash.
 
 So my question related deterministic sharding is this, what rebalance
 feature(s) would be useful or needed once the partitions get
 unbalanced?
 
 Without a decent plan for rebalancing, it often ends up being a very
 painful problem to solve in production. Back when I worked mobile
 apps, we saw issues with how OpenWave WAP servers partitioned the
 accounts. The early versions randomly assigned a phone to a server
 when it is provisioned the first time. Once the phone was associated
 to that server, it was stuck on that server. If the load on that
 server was heavier than the others, the only choice was to scale up
 the hardware.
 
 My understanding of Cassandra's current sharding is consistent and
 random. Does the new feature sit some where in-between? Are you
 thinking of a pluggable API so that you can provide your own hash
 algorithm for cassandra to use?
 
 
 
 On Mon, Nov 7, 2011 at 7:54 AM, Daniel Doubleday
 daniel.double...@gmx.net wrote:
 Allow for deterministic / manual sharding of rows.
 
 Right now it seems that there is no way to force rows with different row 
 keys will be stored on the same nodes in the ring.
 This is our number one reason why we get data inconsistencies when nodes 
 fail.
 
 Sometimes a logical transaction requires writing rows with different row 
 keys. If we could use something like this:
 
 prefix.uniquekey and let the partitioner use only the prefix the probability 
 that only part of the transaction would be written could be reduced 
 considerably.
 
 
 
 On Nov 1, 2011, at 11:59 PM, Jonathan Ellis wrote:
 
 Hi all,
 
 Two years ago I asked for Cassandra use cases and feature requests.
 [1]  The results [2] have been extremely useful in setting and
 prioritizing goals for Cassandra development.  But with the release of
 1.0 we've accomplished basically everything from our original wish
 list. [3]
 
 I'd love to hear from modern Cassandra users again, especially if
 you're usually a quiet lurker.  What does Cassandra do well?  What are
 your pain points?  What's your feature wish list?
 
 As before, if you're in stealth mode or don't want to say anything in
 public, feel free to reply to me privately and I will keep it off the
 record.
 
 [1] 
 http://www.mail-archive.com/cassandra-dev@incubator.apache.org/msg01148.html
 [2] 
 http://www.mail-archive.com/cassandra-user@incubator.apache.org/msg01446.html
 [3] http://www.mail-archive.com/dev@cassandra.apache.org/msg01524.html
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
 
 



Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Riyad Kalla
Ahh, I see your point.

Thanks for the help Stephen.

On Mon, Nov 7, 2011 at 12:43 PM, Stephen Connolly 
stephen.alan.conno...@gmail.com wrote:

 at that point, your cluster will either have so much data on each node
 that you will need to split them, keeping rf=5 so you have 10 nodes... or
 the intra cluster traffic will swap you and you will split each node
 keeping rf=5 so you have 10 nodes again.

 safest thing is not to design with the assumption that rf=n

 - Stephen

 ---
 Sent from my Android phone, so random spelling mistakes, random nonsense
 words and other nonsense are a direct result of using swype to type on the
 screen
 On 7 Nov 2011 17:47, Riyad Kalla rka...@gmail.com wrote:

 Stephen,

 I appreciate you making the point more strongly; I won't make this
 decision lightly given the stress you are putting on it, but the technical
 aspects of this make me curious...

 If I start with RF=N (number of nodes) now, and in 2 years
 (hypothetically) my dataset is too large and I say to myself Dangit,
 Stephen was right..., couldn't I just change the RF to some smaller value,
 say 3 at that point or would the Cassandra ring not rebalance the data
 set nicely at that point?

 More specifically, would it not know how best to slowly remove extraneous
 copies from the nodes and make the data more sparse among the ring members?

 Thanks for the hand-holding; it is helping me understand the operational
 landscape quickly.

 -R

 On Mon, Nov 7, 2011 at 10:18 AM, Stephen Connolly 
 stephen.alan.conno...@gmail.com wrote:

 Plan for the future

 At some point your data set will become too big for the node that it
 is running on, or your load will force you to split nodes once you
 do that RF  N

 To solve performance issues with C* the solution is add more nodes

 To solve storage issues with C* the solution is add more nodes

 In most cases the solution in C* is add more nodes.

 Don't assume RF=Number of nodes as a core design decision of your
 application and you will not have your ass bitten

 ;-)

 -Stephen
 P.S. making the point more extreme to make it clear

 On 7 November 2011 15:04, Riyad Kalla rka...@gmail.com wrote:
  Stephen,
  Excellent breakdown; I appreciate all the detail.
  Your last comment about RF being smaller than N (number of nodes) --
 in my
  particular case my data set isn't particularly large (a few GB) and is
  distributed globally across a handful of data centers. What I am
 utilizing
  Cassandra for is the replication in order to minimize latency for
 requests.
  So when a request comes into any location, I want each node in the
 ring to
  contain the full data set so it never needs to defer to another member
 of
  the ring to answer a question (even if this means eventually
 consistency,
  that is alright in my case).
  Given that, the way I've understood this discussion so far is I would
 have a
  RF of N (my total node count) but my Consistency Level with all my
 writes
  will *likely* be QUORUM -- I think that is a good/safe default for me
 to use
  as writes aren't the scenario I need to optimize for latency; that
 being
  said, I also don't want to wait for a ConsistencyLevel of ALL to
 complete
  before my code continues though.
  Would you agree with this assessment or am I missing the boat on
 something?
  Best,
  Riyad
 
  On Mon, Nov 7, 2011 at 7:42 AM, Stephen Connolly
  stephen.alan.conno...@gmail.com wrote:
 
  Consistency Level is a pseudo-enum...
 
  you have the choice between
 
  ONE
  Quorum (and there are different types of this)
  ALL
 
  At CL=ONE, only one node is guaranteed to have got the write if the
  operation is a success.
  At CL=ALL, all nodes that the RF says it should be stored at must
  confirm the write before the operation succeeds, but a partial write
  will succeed eventually if at least one node recorded the write
  At CL=QUORUM, at least ((N/2)+1) nodes must confirm the write for the
  operation to succeed, otherwise failure, but a partial write will
  succeed eventually if at least one node recorded the write.
 
  Read repair will eventually ensure that the write is replicated across
  all RF nodes in the cluster.
 
  The N in QUORUM above depends on the type of QUORUM you choose, in
  general think N=RF unless you choose a fancy QUORUM.
 
  To have a consistent read, CL of write + CL of read must be  RF...
 
  Write at ONE, read at ONE = may not get the most recent write if RF 
  1 [fastest write, fastest read] {data loss possible if node lost
  before read repair}
  Write at QUORUM, read at ONE = consistent read [moderate write,
  fastest read] {multiple nodes must be lost for data loss to be
  possible}
  Write at ALL, read at ONE = consistent read, writes may be blocked if
  any node fails [slowest write, fastest read]
 
  Write at ONE, read at QUORUM = may not get the most recent write if
  RF  2 [fastest write, moderate read]  {data loss possible if node
  lost before read repair}
  Write at QUORUM, read at QUORUM = 

Re: Running java stress tests

2011-11-07 Thread Joe Kaiser
I managed to fix my own problem.

I had rpc_address set to localhost. Running an strace on the stress
command shows this attempt to bind to an IPV6 address. Leaving this blank
fixes the problem since host resolution works fine on my cluster.

Thanks,

Joe


On Sun, Nov 6, 2011 at 8:15 PM, Joe Kaiser joe.kai...@stackiq.com wrote:

 Hi,

 I am attempting to run the java stress tests:

 Tests to the local machine work fine:

 # sh stress -d localhost -n 100

 Unable to create stress keyspace: Keyspace names must be
 case-insensitively unique
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 321975,32197,32197,0.0011982358878794939,10
 608889,28691,28691,0.0014594442934119632,20
 897246,28835,28835,0.0014675003554621528,30

 ..


 9894395,11904,11904,0.002851321382369248,505
 1000,10560,10560,0.003761071918943232,513
 END

 Tests to a number of the remote machines do not:


 # sh stress -d 10.1.255.254 -n 1000


 Exception in thread main java.lang.RuntimeException:
 java.net.ConnectException: Connection refused
 at org.apache.cassandra.stress.Session.getClient(Unknown Source)
  at org.apache.cassandra.stress.Session.createKeySpaces(Unknown Source)
 at org.apache.cassandra.stress.StressAction.run(Unknown Source)
  at org.apache.cassandra.stress.Stress.main(Unknown Source)


 This is without firewalls, on the same private subnet with DNS resolvings
 hostname correctly. Nothing shows up in the cassandra logs on either the
 machine where the stress test command was invoked or where it was intended
 to run.

 Has anyone seen this problem before?

 Thanks,

 Joe


 --
 Joe Kaiser
 StackIQ
 Systems Engineer




-- 
Joe Kaiser
Systems Engineer
801-477-0272
AIM: kobudojoe
GTalk: joe.kai...@stackiq.com
Skype: joe.kaiser


Re: Running java stress tests

2011-11-07 Thread Jonathan Ellis
Thanks for the update, Joe.  Good to know!

On Mon, Nov 7, 2011 at 2:44 PM, Joe Kaiser joe.kai...@stackiq.com wrote:

 I managed to fix my own problem.
 I had rpc_address set to localhost. Running an strace on the stress
 command shows this attempt to bind to an IPV6 address. Leaving this blank
 fixes the problem since host resolution works fine on my cluster.
 Thanks,
 Joe

 On Sun, Nov 6, 2011 at 8:15 PM, Joe Kaiser joe.kai...@stackiq.com wrote:

 Hi,
 I am attempting to run the java stress tests:
 Tests to the local machine work fine:
 # sh stress -d localhost -n 100
 Unable to create stress keyspace: Keyspace names must be
 case-insensitively unique
 total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
 321975,32197,32197,0.0011982358878794939,10
 608889,28691,28691,0.0014594442934119632,20
 897246,28835,28835,0.0014675003554621528,30
 ..

 9894395,11904,11904,0.002851321382369248,505
 1000,10560,10560,0.003761071918943232,513
 END
 Tests to a number of the remote machines do not:

 # sh stress -d 10.1.255.254 -n 1000

 Exception in thread main java.lang.RuntimeException:
 java.net.ConnectException: Connection refused
 at org.apache.cassandra.stress.Session.getClient(Unknown Source)
 at org.apache.cassandra.stress.Session.createKeySpaces(Unknown Source)
 at org.apache.cassandra.stress.StressAction.run(Unknown Source)
 at org.apache.cassandra.stress.Stress.main(Unknown Source)

 This is without firewalls, on the same private subnet with DNS resolvings
 hostname correctly. Nothing shows up in the cassandra logs on either the
 machine where the stress test command was invoked or where it was intended
 to run.
 Has anyone seen this problem before?
 Thanks,
 Joe

 --
 Joe Kaiser
 StackIQ
 Systems Engineer



 --
 Joe Kaiser
 Systems Engineer
 801-477-0272
 AIM: kobudojoe
 GTalk: joe.kai...@stackiq.com
 Skype: joe.kaiser




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Error connection to remote JMX agent during repair

2011-11-07 Thread Maxim Potekhin

Hello,

I'm trying to run repair on one of my nodes which needs to be 
repopulated after
a failure of the hard drive. What I'm getting is below. Note: I'm not 
loading JMX

with Cassandra, it always worked before... The version if 0.8.6.

Any help will be appreciated,

Maxim


Error connection to remote JMX agent!
java.io.IOException: Failed to retrieve RMIServer stub: 
javax.naming.CommunicationException [Root exception is 
java.rmi.ConnectIOException: error during JRMP connection establishment; 
nested exception is:

java.net.SocketTimeoutException: Read timed out]
at 
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:338)
at 
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)

at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:140)
at org.apache.cassandra.tools.NodeProbe.init(NodeProbe.java:110)
at org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:582)
Caused by: javax.naming.CommunicationException [Root exception is 
java.rmi.ConnectIOException: error during JRMP connection establishment; 
nested exception is:

java.net.SocketTimeoutException: Read timed out]
at 
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:101)
at 
com.sun.jndi.toolkit.url.GenericURLContext.lookup(GenericURLContext.java:185)

at javax.naming.InitialContext.lookup(InitialContext.java:392)
at 
javax.management.remote.rmi.RMIConnector.findRMIServerJNDI(RMIConnector.java:1886)
at 
javax.management.remote.rmi.RMIConnector.findRMIServer(RMIConnector.java:1856)
at 
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:257)

... 4 more
Caused by: java.rmi.ConnectIOException: error during JRMP connection 
establishment; nested exception is:

java.net.SocketTimeoutException: Read timed out
at 
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:286)
at 
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)

at sun.rmi.server.UnicastRef.newCall(UnicastRef.java:322)
at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source)
at 
com.sun.jndi.rmi.registry.RegistryContext.lookup(RegistryContext.java:97)

... 9 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readByte(DataInputStream.java:248)
at 
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:228)




[RELEASE] Apache Cassandra 1.0.2 released

2011-11-07 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 1.0.2.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. It comes quickly after the
release of 1.0.1 mainly because if fixes a bug that is making compression
unusable for some, and we preferred delivering the fix quickly. It contains a
few other fixes as well and upgrade is encouraged. As always, please pay
attention to the release notes[2] and Let us know[3] if you were
to encounter any problem.

Have fun!

[1]: http://goo.gl/81Xbe (CHANGES.txt)
[2]: http://goo.gl/XUedS (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Nate Sammons
Hello,

I'm experimenting with Cassandra (DataStax Enterprise 1.0.3), and I've got a CF 
with several secondary indexes to try out some options.  Right now I have the 
following to create my CF using the CLI:

create column family MyTest with
  key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
  -- absolute timestamp for this message, also indexed 
year/month/day/hour/minute
  -- index these as they are low cardinality
  {column_name:messageTimestamp, validation_class:LongType},
  {column_name:messageYear, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMonth, validation_class:IntegerType, index_type: 
KEYS},
  {column_name:messageDay, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageHour, validation_class:IntegerType, index_type: KEYS},
  {column_name:messageMinute, validation_class:IntegerType, index_type: 
KEYS},

... other non-indexed columns defined

  ];


So when I insert data, I calculate a year/month/day/hour/minute and set these 
values on a Hector ColumnFamilyUpdater instance and update that way.  Then 
later I can query from the command line with CQL such as:

get MyTest where messageYear=2011 and messageMonth=6 and 
messageDay=1 and messageHour=13 and messageMinute=44;

etc.  This generally works, however at some point queries that I know should 
return data no longer return any rows.

So for instance, part way through my test (inserting 250K rows), I can query 
for what should be there and get data back such as the above query, but later 
that same query returns 0 rows.  Similarly, with fewer clauses in the 
expression, like this:

get MyTest where messageYear=2011 and messageMonth=6;

Will also return 0 rows.


???
Any idea what could be going wrong?  I'm not getting any exceptions in my 
client during the write, and I don't see anything in the logs (no errors 
anyway).



A second question - is what I'm doing insane?  I'm not sure that performance on 
CQL queries with multiple indexed columns is good (does Cassandra intelligently 
use all available indexes on these queries?)



Thanks,

-nate


Re: jamm - memory meter

2011-11-07 Thread Radim Kolar
Currently cassandra/conf/cassandra-env.sh disables use of jamm if 
openjdk is detected.


I enabled it and tested it on openjdk 1.6 b23 and it works as expected. 
That openjdk test can be probably removed.


Re: jamm - memory meter

2011-11-07 Thread Jonathan Ellis
What version is shipping on debian stable / RHEL?

On Mon, Nov 7, 2011 at 4:13 PM, Radim Kolar h...@sendmail.cz wrote:
 Currently cassandra/conf/cassandra-env.sh disables use of jamm if openjdk is
 detected.

 I enabled it and tested it on openjdk 1.6 b23 and it works as expected. That
 openjdk test can be probably removed.




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Secondary index issue, unable to query for records that should be there

2011-11-07 Thread Riyad Kalla
Nate, is this all against a single Cassandra server, or do you have a ring
setup? If you do have a ring setup, what is your replicationfactor set to?
Also what ConsistencyLevel are you writing with when storing the values?

-R

On Mon, Nov 7, 2011 at 2:43 PM, Nate Sammons nsamm...@ften.com wrote:

 Hello,

 ** **

 I’m experimenting with Cassandra (DataStax Enterprise 1.0.3), and I’ve got
 a CF with several secondary indexes to try out some options.  Right now I
 have the following to create my CF using the CLI:

 ** **

 create column family MyTest with

   key_validation_class = UTF8Type

   and comparator = UTF8Type

   and column_metadata = [

   -- absolute timestamp for this message, also indexed
 year/month/day/hour/minute

   -- index these as they are low cardinality

   {column_name:messageTimestamp, validation_class:LongType},

   {column_name:messageYear, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMonth, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageDay, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageHour, validation_class:IntegerType, index_type:
 KEYS},

   {column_name:messageMinute, validation_class:IntegerType,
 index_type: KEYS},

 ** **

 … other non-indexed columns defined

 ** **

   ];

 ** **

 ** **

 So when I insert data, I calculate a year/month/day/hour/minute and set
 these values on a Hector ColumnFamilyUpdater instance and update that way.
 Then later I can query from the command line with CQL such as:

 ** **

 get MyTest where messageYear=2011 and messageMonth=6 and
 messageDay=1 and messageHour=13 and messageMinute=44;

 ** **

 etc.  This generally works, however at some point queries that I know
 should return data no longer return any rows.

 ** **

 So for instance, part way through my test (inserting 250K rows), I can
 query for what should be there and get data back such as the above query,
 but later that same query returns 0 rows.  Similarly, with fewer clauses in
 the expression, like this:

 ** **

 get MyTest where messageYear=2011 and messageMonth=6;

 ** **

 Will also return 0 rows.

 ** **

 ** **

 ???

 Any idea what could be going wrong?  I’m not getting any exceptions in my
 client during the write, and I don’t see anything in the logs (no errors
 anyway).

 ** **

 ** **

 ** **

 A second question – is what I’m doing insane?  I’m not sure that
 performance on CQL queries with multiple indexed columns is good (does
 Cassandra intelligently use all available indexes on these queries?)

 ** **

 ** **

 ** **

 Thanks,

 ** **

 -nate

 



RE: Using Cli to create a column family with column name metadata question

2011-11-07 Thread Arsene Lee
Hi,

Thanks for the replay. I'm not talking about the column name. I'm talking about 
the column metadata's column name. Right now cli can't not display the column's 
meta name correctly if the comparator type is not UTF8.

Regards,

Arsene

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, November 04, 2011 11:09 PM
To: user
Subject: Re: Using Cli to create a column family with column name metadata 
question

[Moving to user@]

Because Cassandra's sparse data model supports using rows as materialized 
views, having non-UTF8 column names is common and totally valid.

On Fri, Nov 4, 2011 at 5:19 AM, Arsene Lee arsene@ruckuswireless.com 
wrote:
 Hi,

 I'm trying to use Column Family's metadata to do some validation. I found out 
 that in Cassandra's CLI CliClient.java code when trying to create a column 
 family with column name metadata. It is based on CF's comparator type to 
 convert the name String to ByteBuffer. I'm wondering if there is any 
 particular reason for this? For the column name metadata shouldn't it be 
 easier just to all use UTF8Type. Because if CF's comparator is other than 
 UTF8Type, it is hard to convert the column name back.

 Regards,

 Arsene Lee




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support 
http://www.datastax.com



Re: Using Cli to create a column family with column name metadata question

2011-11-07 Thread Brandon Williams
On Mon, Nov 7, 2011 at 7:36 PM, Arsene Lee
arsene@ruckuswireless.com wrote:
 Hi,

 Thanks for the replay. I'm not talking about the column name. I'm talking 
 about the column metadata's column name. Right now cli can't not display the 
 column's meta name correctly if the comparator type is not UTF8.

Try 'help assume;'

-Brandon


propertyfilesnitch problem

2011-11-07 Thread Shu Zhang
Hi,

We have a 2 DC setup on version 0.7.9 and have observed the following:
1. Using a property file snitch, with dynamic snitch turned on. The performance 
of LOCAL_QUORUM operations is poor for a while (around a minute) after a 
cluster restart before drastically improving.
2. With the same setup, after each period as defined by 
dynamic_snitch_reset_interval_in_ms, the LOCAL_QUORUM performance greatly 
degrades before drastically improving again within a minute.
3. With the dynamic snitch turned off, LOCAL_QUORUM operations perform 
extremely poorly... same as the 1st minute after a restart.
4. With dynamic snitch turned on, QUORUM operations' performance is about the 
same as using LOCAL_QUORUM when the dynamic snitch is off or the first minute 
after a restart with the snitch turned on.

All of this seem to point to LOCAL_QUORUM operations not differentiating our 
DCs using the property file snitch and its performance effectively degrades to 
that of QUORUM when dynamic snitch doesn't have appropriate scores.

Our main concern is the performance degradation at the periods defined by 
dynamic_snitch_reset_interval_in_ms.

The DynamicEndpointSnitch in steady state assigns scores that matches the DCs 
we've configured through the network topology property file.

Our network topology property file appears to be properly configured and have 
been confirmed through the EndpointSnitchInfo mbean.

Please advice.

Thanks,
Shu
Medio Systems

Re: Second Cassandra users survey

2011-11-07 Thread Brian O'Neill
It should be dead-simple to build a slick GUI on the REST layer.
(@Virgilhttp://code.google.com/a/apache-extras.org/p/virgil/
)

I had planned to crank one out this week (using ExtJS) that mimicked the
Squirrel/Toad look and feel.  The UI would have a tree-panel of keyspaces
and column families on the left. Then the main panel would be partitioned
into two.  The top of the main panel would would allow a user to type in
CQL/Pig, etc.  The bottom of the main panel would show the data contained
in the column family / result set.  Any other thoughts on design before I
get started?

If we build this based on the JSON/REST interface, it should be pretty easy
to embed in other applications.

-brian

On Mon, Nov 7, 2011 at 2:36 PM, Ian Danforth idanfo...@numenta.com wrote:


 Wish list: A decent GUI to explore data kept in Cassandra would be much
 valuable. It should also be extendable to
 provide viewers for custom data.


 +1 to that.

 @jonathan - This is what google moderator is really good at. Perhaps start
 one and move the idea creation / voting there.





-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


RE: Using Cli to create a column family with column name metadata question

2011-11-07 Thread Arsene Lee
Hi,

I tried the assume and column metadata's column name still not right. I think 
CLI shouldn't use comparator type to convert the column meta string. It should 
all use UTF8 to convert column name metadata.

Regards,

Arsene

-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Tuesday, November 08, 2011 9:49 AM
To: user@cassandra.apache.org
Subject: Re: Using Cli to create a column family with column name metadata 
question

On Mon, Nov 7, 2011 at 7:36 PM, Arsene Lee arsene@ruckuswireless.com 
wrote:
 Hi,

 Thanks for the replay. I'm not talking about the column name. I'm talking 
 about the column metadata's column name. Right now cli can't not display the 
 column's meta name correctly if the comparator type is not UTF8.

Try 'help assume;'

-Brandon



Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Peter Schuller
 Given that, the way I've understood this discussion so far is I would have a
 RF of N (my total node count) but my Consistency Level with all my writes
 will *likely* be QUORUM -- I think that is a good/safe default for me to use
 as writes aren't the scenario I need to optimize for latency; that being
 said, I also don't want to wait for a ConsistencyLevel of ALL to complete
 before my code continues though.
 Would you agree with this assessment or am I missing the boat on something?

Are you *sure* you care about latency to the degree that data being
non-local actually matters to your application?

Normally you don't set RF=N unless you have particularly special
requirements. The extra latency implied by another network round-trip
is certainly greater than zero, but in many practical situations
outliers and the behavior in case of e.g. node problems is much more
important than an extra millisecond or two on the average request.
Setting RF=N causes a larger data set on each node, in addition to
causing more nodes to be involved in every request. Consider whether
it's a better use of resources to set RF to e.g. 3 instead, and let
the ring grow independently. That is what one normally does.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Riyad Kalla
Peter,

Thanks for the additional insight on this -- think of a CDN that needs to
respond to requests, distributed around the globe. Ultimately you would
hope that each edge location could respond as quickly as possible (RF=N)
but if each of the ring members keep open/active connections to each other,
and a request comes in to an edge location that does not contain a copy of
the data, does it request the data from the node that does, then cache it
(in the case of more requests coming into that edge location with the same
request) or does it reply once and forget it, requiring *each* subsequent
request to that node to always phone back home to the node that actually
contains it?

The CDN/edge-server scenario works particularly well to illustrate my
goals, if visualizing that helps.

Look forward to your thoughts.

-R

On Mon, Nov 7, 2011 at 8:05 PM, Peter Schuller
peter.schul...@infidyne.comwrote:

  Given that, the way I've understood this discussion so far is I would
 have a
  RF of N (my total node count) but my Consistency Level with all my writes
  will *likely* be QUORUM -- I think that is a good/safe default for me to
 use
  as writes aren't the scenario I need to optimize for latency; that being
  said, I also don't want to wait for a ConsistencyLevel of ALL to complete
  before my code continues though.
  Would you agree with this assessment or am I missing the boat on
 something?

 Are you *sure* you care about latency to the degree that data being
 non-local actually matters to your application?

 Normally you don't set RF=N unless you have particularly special
 requirements. The extra latency implied by another network round-trip
 is certainly greater than zero, but in many practical situations
 outliers and the behavior in case of e.g. node problems is much more
 important than an extra millisecond or two on the average request.
 Setting RF=N causes a larger data set on each node, in addition to
 causing more nodes to be involved in every request. Consider whether
 it's a better use of resources to set RF to e.g. 3 instead, and let
 the ring grow independently. That is what one normally does.

 --
 / Peter Schuller (@scode, http://worldmodscode.wordpress.com)



Re: Second Cassandra users survey

2011-11-07 Thread Colin Taylor
Decompression without compression (for lack of a better name).

We store into Cassandra log batches that come in over http either
uncompressed, deflate, snappy. We just add 'magic e.g. \0 \s \n \a \p
\p \y  as a prefix to the column value so we can decode it when serve
it back up.

Seems like Cassandra could detect data with the appropriate magic,
store as is and decode for us automatically on the way back.

Colin.


Re: order of output in get_slice

2011-11-07 Thread Nate McCall
The results still come back in comparator order (increasing).

2011/11/7 Roland Hänel rol...@haenel.me:
 Does a call to
 listColumnOrSuperColumn get_slice(binary key, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)

 give us any guarantees on the order of the returned list? I understand that
 when the predicate actually contains a sliceRange, then the order _is_
 guaranteed to be increasing (decreasing if the reverse flag is set). But
 when the predicate contains a list of column names instead of a range, do we
 also have the guarantee that the order is increasing (no decreasing option
 because no reverse flag here)?
 Greetings,
 Roland




Re: Will writes with ALL consistency eventually propagate?

2011-11-07 Thread Peter Schuller
 Thanks for the additional insight on this -- think of a CDN that needs to
 respond to requests, distributed around the globe. Ultimately you would hope
 that each edge location could respond as quickly as possible (RF=N) but if
 each of the ring members keep open/active connections to each other, and a
 request comes in to an edge location that does not contain a copy of the
 data, does it request the data from the node that does, then cache it (in
 the case of more requests coming into that edge location with the same
 request) or does it reply once and forget it, requiring *each* subsequent
 request to that node to always phone back home to the node that actually
 contains it?
 The CDN/edge-server scenario works particularly well to illustrate my goals,
 if visualizing that helps.
 Look forward to your thoughts.

Nodes will never cache any data. Nodes have the data that they own
according to the ring topology and the replication factor (to the
extent that the data has been replicated); the node you happen to talk
to is merely a co-ordinator of a request; essentially a proxy with
intelligent routing to the correct hosts.

In the CDN situation, if you're talking about e.g. having a group of
servers in one place (network topologically distinct location, such
as geographically distinct) then a better fit than RF=N is probably to
use multi-site support and say that you want a certain number of
copies for each location and have all clients talk to the most local
site.

But that's assuming you want to try to model this using just
Cassandra's replication to begin with. Dynamically caching wherever
data is accessed is a good idea for a CDN use-case (probably), but is
not something that Cassandra does itself, internally. It's really
difficult to know what the best solution is for a CDN; and in your
case you imply that it's really *not* a CDN and it's just an analogy
;)

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: shutdown by KILL

2011-11-07 Thread Radim Kolar



For things like rolling restarts, we do:

disablethrift
disablegossip
(...wait for all nodes to see this node go down..)
drain

i implemented this in our batch scripts for cassandra

disablegossip
sleep 10 seconds
dissablethrift
drain
KILL -TERM

similar thing should be added to bin/stop-server