Read failure when adding node + move; Or: What is the right way to add a node?

2011-09-21 Thread David Boxenhorn
Initial state: 3 nodes, RF=3, version = 0.7.8, some queries are with
CL=QUORUM

1. Add node with with correct token for 4 nodes, repair
2. Move first node to balance 4 nodes, repair
3. Move second node

=== Start getting timeouts, Hector warning: WARNING - Error:
me.prettyprint.hector.api.exceptions.HUnavailableException: : May not be
enough replicas present to handle consistency level.

What is going on? My traffic isn't high. None of my nodes' logs show
ANYTHING during the move

4. When the node finishes moving, the timeouts stop happening

Is there some state in the above scenario that I don't have the required
replication of at least 2?


Re: deleted counters keeps their value?

2011-09-21 Thread David Boxenhorn
The reason why counters work is that addition is commutative, i.e.

x + y = y + x

but deletes are not commutative, i.e.

x + delete ≠ delete + x

so the result depends on the order in which the messages arrive.


2011/9/21 Radim Kolar h...@sendmail.cz

 Dne 21.9.2011 12:07, aaron morton napsal(a):

  see technical limitations for deleting counters http://wiki.apache.org/**
 cassandra/Counters http://wiki.apache.org/cassandra/Counters

 For instance, if you issue very quickly the sequence increment, remove,
 increment it is possible for the removal to be lost (if for some reason the
 remove happens to be the last received messages).

 But i do not remove then very quickly. it does that even with 60 seconds
 between delete and increment. I do not understand what means: remove
 happens to be the last received messages.



What causes dropped messages?

2011-08-16 Thread David Boxenhorn
How can I tell what's causing dropped messages?

Is it just too much activity? I'm not getting any other, more specific
messages, just these:

WARN [ScheduledTasks:1] 2011-08-15 11:33:26,136 MessagingService.java (line
504) Dropped 1534 MUTATION messages in the last 5000ms
WARN [ScheduledTasks:1] 2011-08-15 11:33:26,137 MessagingService.java (line
504) Dropped 58 READ_REPAIR messages in the last 5000ms


Re: Changing the CLI, not a great idea!

2011-07-28 Thread David Boxenhorn
This is part of a much bigger problem, one which has many parts, among them:

1. Cassandra is complex. Getting a gestalt understanding of it makes me
think I understand how Alzheimer's patients must feel.
2. There is no official documentation. Perhaps everything is out there
somewhere, who knows?
3. Cassandra is a moving target. Books are out of date before they hit the
press.
4. Most of the important knowledge about Cassandra exists in a kind of oral
history, that is hard to keep up with, and even harder to understand once
it's long past.

I think it is clear that we need a better one-stop-shop for good
documentation. What hasn't been talked about much - but I think it's just as
important - is a good one-stop-shop for Cassandra's oral history.

(You might think this list is the place, but it's too noisy to be useful,
except at the very tip of the cowcatcher. Cassandra needs a canonized
version of its oral history.)


On Thu, Jul 28, 2011 at 7:24 AM, Edward Capriolo edlinuxg...@gmail.comwrote:



 On Thu, Jul 28, 2011 at 12:01 AM, Jonathan Ellis jbel...@gmail.comwrote:

 On Wed, Jul 27, 2011 at 10:53 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
  You can not even put two statements on the same line. So the ';' is semi
  useless syntax.

 Nobody ever asked for that, but lots of people asked to allow
 statements spanning multiple lines.

  Is their a way to move things forward without hurting backwards
  compatibility of the CLI?

 Yes.  Create a new one based on CQL but leave the old one around.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


 On a semi related note. How can you update a column family and add an
 index?

 [default@app] create column family people;
 4e3310c0-b8d1-11e0--242d50cf1f9f
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@app] update column family people with column_metadata = [{
 column_name : ascii(inserted_at), validation_class : LongType , index_type :
 0 , index_name : ins_idx}];
 org.apache.cassandra.db.marshal.MarshalException: cannot parse
 'FUNCTION_CALL' as hex bytes
 [default@app] update column family people with column_metadata = [{
 column_name : inserted_at, validation_class : LongType , index_type : 0 ,
 index_name : ins_idx}];
 org.apache.cassandra.db.marshal.MarshalException: cannot parse
 'inserted_at' as hex bytes

 Edward



Re: CompositeType for row Keys

2011-07-24 Thread David Boxenhorn
Why do you need another CF? Is there something wrong with repeating the key
as a column and indexing it?

On Fri, Jul 22, 2011 at 7:40 PM, Patrick Julien pjul...@gmail.com wrote:

 Exactly.  In any case, I just answered my own question.  If I need
 range, I can just make another column family where the column name are
 these keys

 On Fri, Jul 22, 2011 at 12:37 PM, Nate McCall n...@datastax.com wrote:
  yes,but why would you use CompositeType if you don't need range query?
 
  If you were doing composite keys anyway (common approach with time
  series data for example), you would not have to write parsing and
  concatenation code. Particularly useful if you had mixed types in the
  key.
 



Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn
I have this problem too, and I don't understand why.

I can repair my nodes very quickly by looping though all my data (when you
read your data it does read-repair), but nodetool repair takes forever. I
understand that nodetool repair builds merkle trees, etc. etc., so it's a
different algorithm, but why can't nodetool repair be smart enough to choose
the best algorithm? Also, I don't understand what's special about my data
that makes nodetool repair so much slower than looping through all my data.


On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin potek...@bnl.gov wrote:

 Thanks Edward. I'm told by our IT that the switch connecting the nodes is
 pretty fast.
 Seriously, in my house I copy complete DVD images from my bedroom to
 the living room downstairs via WiFi, and a dozen of GB does not seem like a
 problem, on dirt cheap hardware (Patriot Box Office).

 I also have just _one_ column major family but caveat emptor -- 8 indexes
 attached to
 it (and there will be more). There is one accounting CF which is small,
 can't possibly
 make a difference.

 By contrast, compaction (as in nodetool) performs quite well on this
 cluster. I start suspecting some
 sort of malfunction.

 Looked at the system log during the repair, there is some compaction
 agent doing
 work that I'm not sure makes sense (and I didn't call for it). Disk
 utilization all of a sudden goes up to 40%
 per Ganglia, and stays there, this is pretty silly considering the cluster
 is IDLE and we have SSDs. No external writes,
 no reads. There are occasional GC stoppages, but these I can live with.

 This repair debacle happens 2nd time in a row. Cr@p. I need to go to
 production soon
 and that doesn't look good at all. If I can't manage a system that simple
 (and/or get help
 on this list) I may have to cut losses i.e. stay with Oracle.

 Regards,

 Maxim




 On 7/19/2011 12:16 PM, Edward Capriolo wrote:


 Well most SSD's are pretty fast. There is one more to consider. If
 Cassandra determines nodes are out of sync it has to transfer data across
 the network. If that is the case you have to look at 'nodetool streams' and
 determine how much data is being transferred between nodes. There are some
 open tickets where with larger tables repair is streaming more then it needs
 to. But even if the transfers are only 10% of your 200GB. Transferring 20 GB
 is not trivial.

 If you have multiple keyspaces and column families repair one at a time
 might make the process more manageable.





Re: Repair taking a long, long time

2011-07-20 Thread David Boxenhorn
As I indicated below (but didn't say specifically) another option is to set
read repair chance to 1.0 for all your CFs and loop over all your data,
since read triggers a read repair.

On Wed, Jul 20, 2011 at 4:58 PM, Maxim Potekhin potek...@bnl.gov wrote:

 **
 I can re-load all data that I have in the cluster, from a flat-file cache I
 have
 on NFS, many times faster than the nodetool repair takes. And that's not
 even accurate because as other noted nodetool repair eats up disk space
 for breakfast and takes more than 24hrs on 200GB data load, at which point
 I have to cancel. That's not acceptable. I simply don't know what to do
 now.



 On 7/20/2011 8:47 AM, David Boxenhorn wrote:

 I have this problem too, and I don't understand why.

 I can repair my nodes very quickly by looping though all my data (when you
 read your data it does read-repair), but nodetool repair takes forever. I
 understand that nodetool repair builds merkle trees, etc. etc., so it's a
 different algorithm, but why can't nodetool repair be smart enough to choose
 the best algorithm? Also, I don't understand what's special about my data
 that makes nodetool repair so much slower than looping through all my data.


 On Wed, Jul 20, 2011 at 12:18 AM, Maxim Potekhin potek...@bnl.gov wrote:

 Thanks Edward. I'm told by our IT that the switch connecting the nodes is
 pretty fast.
 Seriously, in my house I copy complete DVD images from my bedroom to
 the living room downstairs via WiFi, and a dozen of GB does not seem like
 a
 problem, on dirt cheap hardware (Patriot Box Office).

 I also have just _one_ column major family but caveat emptor -- 8 indexes
 attached to
 it (and there will be more). There is one accounting CF which is small,
 can't possibly
 make a difference.

 By contrast, compaction (as in nodetool) performs quite well on this
 cluster. I start suspecting some
 sort of malfunction.

 Looked at the system log during the repair, there is some compaction
 agent doing
 work that I'm not sure makes sense (and I didn't call for it). Disk
 utilization all of a sudden goes up to 40%
 per Ganglia, and stays there, this is pretty silly considering the cluster
 is IDLE and we have SSDs. No external writes,
 no reads. There are occasional GC stoppages, but these I can live with.

 This repair debacle happens 2nd time in a row. Cr@p. I need to go to
 production soon
 and that doesn't look good at all. If I can't manage a system that simple
 (and/or get help
 on this list) I may have to cut losses i.e. stay with Oracle.

 Regards,

 Maxim




 On 7/19/2011 12:16 PM, Edward Capriolo wrote:


 Well most SSD's are pretty fast. There is one more to consider. If
 Cassandra determines nodes are out of sync it has to transfer data across
 the network. If that is the case you have to look at 'nodetool streams' and
 determine how much data is being transferred between nodes. There are some
 open tickets where with larger tables repair is streaming more then it needs
 to. But even if the transfers are only 10% of your 200GB. Transferring 20 GB
 is not trivial.

 If you have multiple keyspaces and column families repair one at a time
 might make the process more manageable.







Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn
I have lots of indexes on columns with the same name. Why don't I have this
problem?

For example:

Keyspace: City:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 3
  Column Families:
ColumnFamily: AttractionCheckins
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 0.1/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
ColumnFamily: Attractions
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 3.0/14400
  Key cache size / save period: 3.0/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
ColumnFamily: CityResources
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 5000.0/14400
  Key cache size / save period: 5000.0/14400
  Memtable thresholds: 0.3/64/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/64
  Read repair chance: 0.01
  Column Metadata:
Column Name: 09partition (09partition)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS

On Mon, Jul 18, 2011 at 8:20 AM, Boris Yen yulin...@gmail.com wrote:

 Will this have any side effect when doing a get_indexed_slices or when a
 user wants to drop an index by any means?

 Boris


 On Mon, Jul 18, 2011 at 1:13 PM, Jonathan Ellis jbel...@gmail.com wrote:

 0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
 can't fix the ones 0.8.0 allowed, retroactively.

 On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen yulin...@gmail.com wrote:
  I have tested another case, not sure if this is a bug.
  I created a few column families on 0.8.0 each has user_name column, in
  addition, I also enabled secondary index on this column.  Then, I
 upgraded
  to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
  user_name_idx appears for different columns families. It seems the
  validation rule for index_name on 0.8.1 has been skipped completely.
 
  Is this a bug? or is it intentional?
  Regards
  Boris
  On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen yulin...@gmail.com wrote:
 
  Done. It is CASSANDRA-2903.
  On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Please.
 
  On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen yulin...@gmail.com
 wrote:
   Hi Jonathan,
   Do I need to open a ticket for this?
   Regards
   Boris
  
   On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   Sounds reasonable to me.
  
   On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen yulin...@gmail.com
 wrote:
Hi,
I have a few column families, each has a column called user_name.
 I
tried to
use secondary index on user_name column for each of the column
family.
However, when creating these column families, cassandra keeps
reporting
Duplicate index name... exception. I finally figured out that
 it
seems
the
default index name is column name+_idx, this make my column
family
violate the uniqueness of index name rule.
I was wondering if the default index_name generating rule could
 be
like
column name+cf name, so the index name would not collide with
each
other
that easily, if the user do not assign index_name when creating
 a
column
family.
Regards
Boris
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of DataStax, the source for professional Cassandra
 support
   http://www.datastax.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn
Ah, that's it. I'm on 0.7

On Mon, Jul 18, 2011 at 1:27 PM, Boris Yen yulin...@gmail.com wrote:

 which version of cassandra do you use? What I mentioned here only happens
 on 0.8.1.


 On Mon, Jul 18, 2011 at 4:44 PM, David Boxenhorn da...@citypath.comwrote:

 I have lots of indexes on columns with the same name. Why don't I have
 this problem?

 For example:

 Keyspace: City:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
 Replication Factor: 3
   Column Families:
 ColumnFamily: AttractionCheckins
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 0.0/0
   Key cache size / save period: 0.1/14400
   Memtable thresholds: 0.3/64/60
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/64
   Read repair chance: 0.01
   Column Metadata:
 Column Name: 09partition (09partition)
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Type: KEYS
 ColumnFamily: Attractions
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 3.0/14400
   Key cache size / save period: 3.0/14400
   Memtable thresholds: 0.3/64/60
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/64
   Read repair chance: 0.01
   Column Metadata:
 Column Name: 09partition (09partition)
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Type: KEYS
 ColumnFamily: CityResources
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period: 5000.0/14400
   Key cache size / save period: 5000.0/14400
   Memtable thresholds: 0.3/64/60
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/64
   Read repair chance: 0.01
   Column Metadata:
 Column Name: 09partition (09partition)
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Type: KEYS


 On Mon, Jul 18, 2011 at 8:20 AM, Boris Yen yulin...@gmail.com wrote:

 Will this have any side effect when doing a get_indexed_slices or when a
 user wants to drop an index by any means?

 Boris


 On Mon, Jul 18, 2011 at 1:13 PM, Jonathan Ellis jbel...@gmail.comwrote:

 0.8.0 didn't check for name conflicts correctly.  0.8.1 does, but it
 can't fix the ones 0.8.0 allowed, retroactively.

 On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen yulin...@gmail.com wrote:
  I have tested another case, not sure if this is a bug.
  I created a few column families on 0.8.0 each has user_name column, in
  addition, I also enabled secondary index on this column.  Then, I
 upgraded
  to 0.8.1, when I used cassandra-cli: show keyspaces, I saw index name
  user_name_idx appears for different columns families. It seems the
  validation rule for index_name on 0.8.1 has been skipped completely.
 
  Is this a bug? or is it intentional?
  Regards
  Boris
  On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen yulin...@gmail.com
 wrote:
 
  Done. It is CASSANDRA-2903.
  On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  Please.
 
  On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen yulin...@gmail.com
 wrote:
   Hi Jonathan,
   Do I need to open a ticket for this?
   Regards
   Boris
  
   On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis 
 jbel...@gmail.com
   wrote:
  
   Sounds reasonable to me.
  
   On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen yulin...@gmail.com
 wrote:
Hi,
I have a few column families, each has a column called
 user_name. I
tried to
use secondary index on user_name column for each of the column
family.
However, when creating these column families, cassandra keeps
reporting
Duplicate index name... exception. I finally figured out that
 it
seems
the
default index name is column name+_idx, this make my column
family
violate the uniqueness of index name rule.
I was wondering if the default index_name generating rule could
 be
like
column name+cf name, so the index name would not collide
 with
each
other
that easily, if the user do not assign index_name when
 creating a
column
family.
Regards
Boris
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of DataStax, the source for professional Cassandra
 support
   http://www.datastax.com
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra
 support
  http://www.datastax.com
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com







Re: Default behavior of generate index_name for columns...

2011-07-18 Thread David Boxenhorn
It would be nice if this were fixed before I move up to 0.8...

On Mon, Jul 18, 2011 at 3:19 PM, Boris Yen yulin...@gmail.com wrote:

 If it would not cause the dev team to much trouble, I think the cassandra
 should maintain the backward compatability regarding the generation of the
 default index_name, otherwise when people start dropping columns indices,
 the result might not be what they want.


 On Mon, Jul 18, 2011 at 7:59 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Jul 18, 2011 at 12:20 AM, Boris Yen yulin...@gmail.com wrote:
  Will this have any side effect when doing a get_indexed_slices

 No

  or when a
  user wants to drop an index by any means?

 Sort of; one of the indexes with the name will be dropped, but not all.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
I just saw this

http://wiki.apache.org/cassandra/DigestQueries

and I was wondering why it returns a hash of the data. Wouldn't it be better
and easier to return the timestamp? You don't really care what the data is,
you only care whether it is more or less recent than another piece of data.


Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
If you have to pieces of data that are different but have the same
timestamp, how can you resolve consistency?

This is a pathological situation to begin with, why should you waste effort
to (not) solve it?

On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen yulin...@gmail.com wrote:

 I guess it is because the timestamp does not guarantee data consistency,
 but hash does.

 Boris


 On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn da...@citypath.comwrote:

 I just saw this

 http://wiki.apache.org/cassandra/DigestQueries

 and I was wondering why it returns a hash of the data. Wouldn't it be
 better and easier to return the timestamp? You don't really care what the
 data is, you only care whether it is more or less recent than another piece
 of data.





Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
How would you know which data is correct, if they both have the same
timestamp?

On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen yulin...@gmail.com wrote:

 I can only say, data does matter, that is why the developers use hash
 instead of timestamp. If hash value comes from other node is not a match, a
 read repair would perform. so that correct data can be returned.


 On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn da...@citypath.comwrote:

 If you have to pieces of data that are different but have the same
 timestamp, how can you resolve consistency?

 This is a pathological situation to begin with, why should you waste
 effort to (not) solve it?

 On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen yulin...@gmail.com wrote:

 I guess it is because the timestamp does not guarantee data consistency,
 but hash does.

 Boris


 On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn da...@citypath.comwrote:

 I just saw this

 http://wiki.apache.org/cassandra/DigestQueries

 and I was wondering why it returns a hash of the data. Wouldn't it be
 better and easier to return the timestamp? You don't really care what the
 data is, you only care whether it is more or less recent than another piece
 of data.







Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
Is that the actual reason?

This seems like a big inefficiency to me. For those of us who don't worry
about this extreme edge case (that probably will NEVER happen in real life,
for most applications), is there a way to turn this off?

Or am I wrong about this making the operation MUCH more expensive?


On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen yulin...@gmail.com wrote:

 For a specific column, If there are two versions with the same timestamp,
 the value of the column is used to break the tie.

 if v1.value().compareTo(v2.value())  0, it means that v2 wins.

 On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn da...@citypath.comwrote:

 How would you know which data is correct, if they both have the same
 timestamp?

 On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen yulin...@gmail.com wrote:

 I can only say, data does matter, that is why the developers use hash
 instead of timestamp. If hash value comes from other node is not a match, a
 read repair would perform. so that correct data can be returned.


 On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn da...@citypath.comwrote:

 If you have to pieces of data that are different but have the same
 timestamp, how can you resolve consistency?

 This is a pathological situation to begin with, why should you waste
 effort to (not) solve it?

 On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen yulin...@gmail.com wrote:

 I guess it is because the timestamp does not guarantee data
 consistency, but hash does.

 Boris


 On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn 
 da...@citypath.comwrote:

 I just saw this

 http://wiki.apache.org/cassandra/DigestQueries

 and I was wondering why it returns a hash of the data. Wouldn't it be
 better and easier to return the timestamp? You don't really care what the
 data is, you only care whether it is more or less recent than another 
 piece
 of data.









Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread David Boxenhorn
Got it.

Thanks!

On Wed, Jul 13, 2011 at 6:05 PM, Jonathan Ellis jbel...@gmail.com wrote:

 (1) the hash calculation is a small amount of CPU -- MD5 is
 specifically designed to be efficient in this kind of situation
 (2) we compute one hash per query, so for multiple columns the
 advantage over timestamp-per-column gets large quickly.

 On Wed, Jul 13, 2011 at 7:31 AM, David Boxenhorn da...@citypath.com
 wrote:
  Is that the actual reason?
 
  This seems like a big inefficiency to me. For those of us who don't worry
  about this extreme edge case (that probably will NEVER happen in real
 life,
  for most applications), is there a way to turn this off?
 
  Or am I wrong about this making the operation MUCH more expensive?
 
 
  On Wed, Jul 13, 2011 at 3:20 PM, Boris Yen yulin...@gmail.com wrote:
 
  For a specific column, If there are two versions with the same
 timestamp,
  the value of the column is used to break the tie.
  if v1.value().compareTo(v2.value())  0, it means that v2 wins.
  On Wed, Jul 13, 2011 at 7:13 PM, David Boxenhorn da...@citypath.com
  wrote:
 
  How would you know which data is correct, if they both have the same
  timestamp?
 
  On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen yulin...@gmail.com
 wrote:
 
  I can only say, data does matter, that is why the developers use
 hash
  instead of timestamp. If hash value comes from other node is not a
 match, a
  read repair would perform. so that correct data can be returned.
 
  On Wed, Jul 13, 2011 at 5:08 PM, David Boxenhorn da...@citypath.com
  wrote:
 
  If you have to pieces of data that are different but have the same
  timestamp, how can you resolve consistency?
 
  This is a pathological situation to begin with, why should you waste
  effort to (not) solve it?
 
  On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen yulin...@gmail.com
 wrote:
 
  I guess it is because the timestamp does not guarantee data
  consistency, but hash does.
  Boris
 
  On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn 
 da...@citypath.com
  wrote:
 
  I just saw this
 
  http://wiki.apache.org/cassandra/DigestQueries
 
  and I was wondering why it returns a hash of the data. Wouldn't it
 be
  better and easier to return the timestamp? You don't really care
 what the
  data is, you only care whether it is more or less recent than
 another piece
  of data.
 
 
 
 
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Questions about Cassandra reads

2011-07-03 Thread David Boxenhorn
 What do you think ?

 I think you should strongly consider denormalizing so that you can
 read ranges from a single row instead.

Why do you recommend denormalizing instead of secondary indexes?


Re: Questions about Cassandra reads

2011-07-03 Thread David Boxenhorn
Ah, I get it. Your normal access pattern should be one row at a time.

On Sun, Jul 3, 2011 at 11:41 AM, David Boxenhorn da...@citypath.com wrote:
 What do you think ?

 I think you should strongly consider denormalizing so that you can
 read ranges from a single row instead.

 Why do you recommend denormalizing instead of secondary indexes?



Re: Truncate introspection

2011-06-28 Thread David Boxenhorn
Does drop work in a similar way?

When I drop a CF and add it back with a different schema, it seems to work.

But I notice that in between the drop and adding it back, when the CLI
tells me the CF doesn't exist, the old data is still there.

I've been assuming that this works, but just wanted to make sure...

On Tue, Jun 28, 2011 at 12:56 AM, Jonathan Ellis jbel...@gmail.com wrote:
 Each node (independently) has logic that guarantees that any writes
 processed before the truncate, will be wiped out.

 This does not mean that each node will wipe out the same data, or even
 that each node will process the truncate (which would result in a
 timedoutexception).

 It also does not mean you can't have writes immediately after the
 truncate that would race w/ a truncate, check for zero sstables
 procedure.

 On Mon, Jun 27, 2011 at 3:35 PM, Ethan Rowe et...@the-rowes.com wrote:
 If those went to zero, it would certainly tell me something happened.  :)  I
 guess watching that would be a way of seeing something was going on.
 Is the truncate itself propagating a ring-wide marker or anything so the CF
 is logically empty before being physically removed?  That's the impression
 I got from the docs but it wasn't totally clear to me.

 On Mon, Jun 27, 2011 at 3:33 PM, Jonathan Ellis jbel...@gmail.com wrote:

 There's a JMX method to get the number of sstables in a CF, is that
 what you're looking for?

 On Mon, Jun 27, 2011 at 1:04 PM, Ethan Rowe et...@the-rowes.com wrote:
  Is there any straightforward means of seeing what's going on after
  issuing a
  truncate (on 0.7.5)?  I'm not seeing evidence that anything actually
  happened.  I've disabled read repair on the column family in question
  and
  don't have anything actively reading/writing at present, apart from my
  one-off tests to see if rows have disappeared.
  Thanks in advance.



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: 99.999% uptime - Operations Best Practices?

2011-06-23 Thread David Boxenhorn
I think very high uptime, and very low data loss is achievable in
Cassandra, but, for new users there are TONS of gotchas. You really
have to know what you're doing, and I doubt that many people acquire
that knowledge without making a lot of mistakes.

I see above that most people are talking about configuration issues.
But, the first thing that you will probably do, before you have any
experience with Cassandra(!), is architect your system. Architecture
is not easily changed when you bump into a gotcha, and for some reason
you really have to search the literature well to find out about them.
So, my contributions:

The too many CFs problem. Cassandra doesn't do well with many column
families. If you come from a relational world, a real application can
easily have hundreds of tables. Even if you combine them into entities
(which is the Cassandra way), you can easily end up with dozens of
entities. The most natural thing for someone with a relational
background is have one CF per entity, plus indexes according to your
needs. Don't do it. You need to store multiple entities in the same
CF. Group them together according to access patterns (i.e. when you
use X,  you probably also need Y), and distinguish them by adding a
prefix to their keys (e.g. entityName@key).

Don't use supercolumns, use composite columns. Supercolumns are
disfavored by the Cassandra community and are slowly being orphaned.
For example, secondary indexes don't work on supercolumns. Nor does
CQL. Bugs crop up with supercolumns that don't happen with regular
columns because internally there's a huge separate code base for
supercolumns, and every new feature is designed and implemented for
regular columns and then retrofitted for supercolumns (or not).

There should really be a database of gotchas somewhere, and how they
were solved...

On Thu, Jun 23, 2011 at 6:57 AM, Les Hazlewood l...@katasoft.com wrote:
 Edward,
 Thank you so much for this reply - this is great stuff, and I really
 appreciate it.
 You'll be happy to know that I've already pre-ordered your book.  I'm
 looking forward to it! (When is the ship date?)
 Best regards,
 Les

 On Wed, Jun 22, 2011 at 7:03 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:


 On Wed, Jun 22, 2011 at 8:31 PM, Les Hazlewood l...@katasoft.com wrote:

 Hi Thoku,
 You were able to more concisely represent my intentions (and their
 reasoning) in this thread than I was able to do so myself.  Thanks!

 On Wed, Jun 22, 2011 at 5:14 PM, Thoku Hansen tho...@gmail.com wrote:

 I think that Les's question was reasonable. Why *not* ask the community
 for the 'gotchas'?
 Whether the info is already documented or not, it could be an
 opportunity to improve the documentation based on users' perception.
 The you just have to learn responses are fair also, but that reminds
 me of the days when running Oracle was a black art, and accumulated wisdom
 made DBAs irreplaceable.

 Yes, this was my initial concern.  I know that Cassandra is still young,
 and I expect this to be the norm for a while, but I was hoping to make that
 process a bit easier (for me and anyone else reading this thread in the
 future).

 Some recommendations *are* documented, but they are dispersed / stale /
 contradictory / or counter-intuitive.
 Others have not been documented in the wiki nor in DataStax's doco, and
 are instead learned anecdotally or The Hard Way.
 For example, whether documented or not, some of the 'gotchas' that I
 encountered when I first started working with Cassandra were:
 * Don't use OpenJDK. Prefer the Sun JDK. (Wiki says this, Jira says
 that).
 * Its not viable to run without JNA installed.
 * Disable swap memory.
 * Need to run nodetool repair on a regular basis.
 I'm looking forward to Edward Capriolo's Cassandra book which Les will
 probably find helpful.

 Thanks for linking to this.  I'm pre-ordering right away.
 And thanks for the pointers, they are exactly the kind of enumerated
 things I was looking to elicit.  These are the kinds of things that are hard
 to track down in a single place.  I think it'd be nice for the community to
 contribute this stuff to a single page ('best practices', 'checklist',
 whatever you want to call it).  It would certainly make things easier when
 getting started.
 Thanks again,
 Les

 Since I got a plug on the book I will chip in again to the thread :)

 Some things that were mentioned already:

 Install JNA absolutely (without JNA the snapshot command has to fork to
 hard link the sstables, I have seen clients backoff from this). Also the
 performance focused Cassandra devs always try to squeeze out performance by
 utilizing more native features.

 OpenJDK vs Sun. I agree, almost always try to do what 'most others' do in
 production, this way you get surprised less.

 Other stuff:

 RAID. You might want to go RAID 1+0 if you are aiming for uptime. RAID 0
 has better performance, but if you lose a node your capacity is diminished,
 rebuilding and rejoining a node involves more 

Re: Replication-aware compaction

2011-06-07 Thread David Boxenhorn
Thanks! I'm actually on vacation now, so I hope to look into this next week.

On Mon, Jun 6, 2011 at 10:25 PM, aaron morton aa...@thelastpickle.com wrote:
 You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 
 releases were prone to marking nodes up and down when they should not have 
 been. See 
 https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22

 Are the TimedOutExceptions to the client for read or write requests ? During 
 the burst times which stages are backing up  nodetool tpstats ? Compaction 
 should not affect writes too much (assuming different log and data spindles).

 You could also take a look at the read and write latency stats for a 
 particular CF using nodetool cfstats or JConsole. These will give you the 
 stats for the local operations. You could also take a look at the iostats on 
 the box http://spyced.blogspot.com/2010/01/linux-performance-basics.html

 Hope that helps.

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7 Jun 2011, at 00:30, David Boxenhorn wrote:

 Version 0.7.3.

 Yes, I am talking about minor compactions. I have three nodes, RF=3.
 3G data (before replication). Not many users (yet). It seems like 3
 nodes should be plenty. But when all 3 nodes are compacting, I
 sometimes get timeouts on the client, and I see in my logs that each
 one is full of notifications that the other nodes have died (and come
 back to life after about a second). My cluster can tolerate one node
 being out of commission, so I would rather have longer compactions one
 at a time than shorter compactions all at the same time.

 I think that our usage pattern of bursty writes causes the three nodes
 to decide to compact at the same time. These bursts are followed by
 periods of relative quiet, so there should be time for the other two
 nodes to compact one at a time.


 On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn da...@citypath.com wrote:

 Version 0.7.3.

 Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G 
 data (before replication). Not many users (yet). It seems like 3 nodes 
 should be plenty. But when all 3 nodes are compacting, I sometimes get 
 timeouts on the client, and I see in my logs that each one is full of 
 notifications that the other nodes have died (and come back to life after 
 about a second). My cluster can tolerate one node being out of commission, 
 so I would rather have longer compactions one at a time than shorter 
 compactions all at the same time.

 I think that our usage pattern of bursty writes causes the three nodes to 
 decide to compact at the same time. These bursts are followed by periods of 
 relative quiet, so there should be time for the other two nodes to compact 
 one at a time.


 On Mon, Jun 6, 2011 at 2:36 PM, aaron morton aa...@thelastpickle.com 
 wrote:

 Are you talking about minor (automatic) compactions ? Can you provide some 
 more information on what's happening to make the node unusable and what 
 version you are using? It's not lightweight process, but it should not 
 hurt the node that badly. It is considered an online operation.

 Delaying compaction will only make it run for longer and take more 
 resources.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6 Jun 2011, at 20:14, David Boxenhorn wrote:

 Is there some deep architectural reason why compaction can't be
 replication-aware?

 What I mean is, if one node is doing compaction, its replicas
 shouldn't be doing compaction at the same time. Or, at least a quorum
 of nodes should be available at all times.

 For example, if RF=3, and one node is doing compaction, the nodes to
 its right and left in the ring should wait on compaction until that
 node is done.

 Of course, my real problem is that compaction makes a node pretty much
 unavailable. If we can fix that problem then this is not necessary.






Replication-aware compaction

2011-06-06 Thread David Boxenhorn
Is there some deep architectural reason why compaction can't be
replication-aware?

What I mean is, if one node is doing compaction, its replicas
shouldn't be doing compaction at the same time. Or, at least a quorum
of nodes should be available at all times.

For example, if RF=3, and one node is doing compaction, the nodes to
its right and left in the ring should wait on compaction until that
node is done.

Of course, my real problem is that compaction makes a node pretty much
unavailable. If we can fix that problem then this is not necessary.


Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn
Is there really a 10x difference between indexed CFs and non-indexed CFs?

On Mon, Jun 6, 2011 at 11:05 AM, Donal Zang zan...@ihep.ac.cn wrote:

 On 06/06/2011 05:38, Jonathan Ellis wrote:

 Index updates require read-before-write (to find out what the prior
 version was, if any, and update the index accordingly).  This is
 random i/o.

 Index creation on the other hand is a lot of sequential i/o, hence
 more efficient.

 So, the classic bulk load advice to ingest data prior to creating
 indexes applies.

 Thanks for the explanation!

 --
 Donal Zang
 Computing Center, IHEP
 19B YuquanLu, Shijingshan District,Beijing, 100049
 zan...@ihep.ac.cn
 86 010 8823 6018





Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn
Jonathan, are Donal Zang's results (10x slowdown) typical?

On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang zan...@ihep.ac.cn wrote:
  Another thing I noticed is : if you first do insertion, and then build
 the
  secondary index use update column family ..., and then do select based
 on
  the index, the result is not right (seems the index is still being built
  though the update commands returns quickly).

 That is correct. describe keyspace from the cli tells you when an
 index has finished building.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Replication-aware compaction

2011-06-06 Thread David Boxenhorn
Version 0.7.3.

Yes, I am talking about minor compactions. I have three nodes, RF=3.
3G data (before replication). Not many users (yet). It seems like 3
nodes should be plenty. But when all 3 nodes are compacting, I
sometimes get timeouts on the client, and I see in my logs that each
one is full of notifications that the other nodes have died (and come
back to life after about a second). My cluster can tolerate one node
being out of commission, so I would rather have longer compactions one
at a time than shorter compactions all at the same time.

I think that our usage pattern of bursty writes causes the three nodes
to decide to compact at the same time. These bursts are followed by
periods of relative quiet, so there should be time for the other two
nodes to compact one at a time.


On Mon, Jun 6, 2011 at 3:27 PM, David Boxenhorn da...@citypath.com wrote:

 Version 0.7.3.

 Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data 
 (before replication). Not many users (yet). It seems like 3 nodes should be 
 plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the 
 client, and I see in my logs that each one is full of notifications that the 
 other nodes have died (and come back to life after about a second). My 
 cluster can tolerate one node being out of commission, so I would rather have 
 longer compactions one at a time than shorter compactions all at the same 
 time.

 I think that our usage pattern of bursty writes causes the three nodes to 
 decide to compact at the same time. These bursts are followed by periods of 
 relative quiet, so there should be time for the other two nodes to compact 
 one at a time.


 On Mon, Jun 6, 2011 at 2:36 PM, aaron morton aa...@thelastpickle.com wrote:

 Are you talking about minor (automatic) compactions ? Can you provide some 
 more information on what's happening to make the node unusable and what 
 version you are using? It's not lightweight process, but it should not hurt 
 the node that badly. It is considered an online operation.

 Delaying compaction will only make it run for longer and take more resources.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 6 Jun 2011, at 20:14, David Boxenhorn wrote:

  Is there some deep architectural reason why compaction can't be
  replication-aware?
 
  What I mean is, if one node is doing compaction, its replicas
  shouldn't be doing compaction at the same time. Or, at least a quorum
  of nodes should be available at all times.
 
  For example, if RF=3, and one node is doing compaction, the nodes to
  its right and left in the ring should wait on compaction until that
  node is done.
 
  Of course, my real problem is that compaction makes a node pretty much
  unavailable. If we can fix that problem then this is not necessary.




CQL: Select for multiple ranges

2011-05-20 Thread David Boxenhorn
In order to fully implement the functionality of super columns using
compound columns I need to be able to select multiple column ranges - this
would be functionally equivalent to selecting multiple super columns (and
more!).

I would like to request the following CQL syntax:

SELECT [FIRST N] [REVERSED] name1..nameN1, name2..nameN2... FROM ...

I am heading into my weekend here. If no one has created a JIRA ticket for
this by Sunday, and I am not talked out of it, I will create one myself.


Re: Using composite column names in the CLI

2011-05-17 Thread David Boxenhorn
This is what I'm talking about

https://issues.apache.org/jira/browse/CASSANDRA-2231

The on-disk format is

(short)lengthconstituentend byte = 0(short)lengthconstituentend
byte = 0...

I would like to be able to input these kinds of keys into the CLI, something
like

set cf[key]['constituent1':'constituent2':'constituent3'] = val


On Tue, May 17, 2011 at 2:15 AM, Sameer Farooqui cassandral...@gmail.comwrote:

 Cassandra wouldn't know that the column name is composite of two different
 things. So you could just request the column names and values for a specific
 key like this and then just look at the column names that get returned:

 [default@MyKeyspace] get DemoCF[ascii('key_42')];
 = (column=CA_SanJose, value=50, timestamp=1305236885112000)
 = (column=CA_PaloAlto, value=49, timestamp=1305236885192000)
 = (column=FL_Orlando, value=45, timestamp=130523688528)
 = (column=NY_NYC, value=40, timestamp=1305236885361000)


 And I'm not sure what you mean by inputting composite column names. You
 just input them like any other column name:

 [default@MyKeyspace] set DemoCF['key_42']['CA_SanJose']='51';
 Value inserted.





 On Mon, May 16, 2011 at 2:34 PM, Aaron Morton aa...@thelastpickle.comwrote:

 What do you mean by composite column names?

 Do the data type functions supported by get and set help? Or the assume
 statement?

 Aaron
 On 17/05/2011, at 3:21 AM, David Boxenhorn da...@taotown.com wrote:

  Is there a way to view composite column names in the CLI?
 
  Is there a way to input them (i.e. in the set command)?
 





Re: Using composite column names in the CLI

2011-05-17 Thread David Boxenhorn
Excellent!

(I presume there is some way of representing :, like \:?)


On Tue, May 17, 2011 at 11:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 Provided you're working on a branch that has CASSANDRA-2231 applied (that's
 either the cassandra-0.8.1 branch or trunk), this work 'out of the box':

 The setup will look like:
 [default@unknown] create keyspace test;
 [default@unknown] use test;
 [default@test] create column family testCF with
 comparator='CompositeType(AsciiType, IntegerType(reversed=true),
 IntegerType)' and default_validation_class=AsciiType;

 Then:
 [default@test] set testCF[a]['foo:24:24'] = 'v1';
 Value inserted.
 [default@test] set testCF[a]['foo:42:24'] = 'v2';
 Value inserted.
 [default@test] set testCF[a]['foobar:42:24'] = 'v3';
 Value inserted.
 [default@test] set testCF[a]['boobar:42:24'] = 'v4';
 Value inserted.
 [default@test] set testCF[a]['boobar:42:42'] = 'v5';
 Value inserted.
 [default@test] get testCF[a];
 = (column=boobar:42:24, value=v4, timestamp=1305621115813000)
 = (column=boobar:42:42, value=v5, timestamp=1305621125563000)
 = (column=foo:42:24, value=v2, timestamp=1305621096473000)
 = (column=foo:24:24, value=v1, timestamp=1305621085548000)
 = (column=foobar:42:24, value=v3, timestamp=1305621110813000)
 Returned 5 results.

 --
 Sylvain

 On Tue, May 17, 2011 at 9:20 AM, David Boxenhorn da...@taotown.com
 wrote:
  This is what I'm talking about
 
  https://issues.apache.org/jira/browse/CASSANDRA-2231
 
  The on-disk format is
 
  (short)lengthconstituentend byte =
 0(short)lengthconstituentend
  byte = 0...
 
  I would like to be able to input these kinds of keys into the CLI,
 something
  like
 
  set cf[key]['constituent1':'constituent2':'constituent3'] = val
 
 
  On Tue, May 17, 2011 at 2:15 AM, Sameer Farooqui 
 cassandral...@gmail.com
  wrote:
 
  Cassandra wouldn't know that the column name is composite of two
 different
  things. So you could just request the column names and values for a
 specific
  key like this and then just look at the column names that get returned:
  [default@MyKeyspace] get DemoCF[ascii('key_42')];
  = (column=CA_SanJose, value=50, timestamp=1305236885112000)
  = (column=CA_PaloAlto, value=49, timestamp=1305236885192000)
  = (column=FL_Orlando, value=45, timestamp=130523688528)
  = (column=NY_NYC, value=40, timestamp=1305236885361000)
 
  And I'm not sure what you mean by inputting composite column names. You
  just input them like any other column name:
  [default@MyKeyspace] set DemoCF['key_42']['CA_SanJose']='51';
  Value inserted.
 
 
 
 
  On Mon, May 16, 2011 at 2:34 PM, Aaron Morton aa...@thelastpickle.com
  wrote:
 
  What do you mean by composite column names?
 
  Do the data type functions supported by get and set help? Or the assume
  statement?
 
  Aaron
  On 17/05/2011, at 3:21 AM, David Boxenhorn da...@taotown.com wrote:
 
   Is there a way to view composite column names in the CLI?
  
   Is there a way to input them (i.e. in the set command)?
  
 
 
 



Re: Import/Export of Schema Migrations

2011-05-16 Thread David Boxenhorn
What you describe below sounds like what I want to do. I think that the only
additional thing I am requesting is to export the migrations from the dev
cluster (since Cassandra already has a table that saves them - I just want
that information!) so I can import it to the other clusters. This would
ensure that my migrations are exactly right, without being dependent on
error-prone human intervention.

To really get rid of human intervention it would be nice to be able to mark
a certain migration with a version name. Then I could say something like,
export migrations version1.2.3 to version1.2.4 and I would get the exact
migration path from one version to another.


On Mon, May 16, 2011 at 1:04 AM, aaron morton aa...@thelastpickle.comwrote:

 personal preference

 Not sure what sort of changes you are making, but this is my approach.

 I've always managed database (my sql, sql server whatever) schema as source
 code (SQL DDL statements, CLI script etc). It makes it a lot easier to cold
 start the system, test changes and see who changed what.

 Once you have your initial schema you can hand roll a CLI script to update
  / drop existing CF's. For the update column family statement all the
 attributes are delta to the current setting, i.e. you do not need to say
 comparator is ascii again. Other than the indexes, you need to specify all
 the indexes again those not included will be dropped.

 If you want to be able to replay multiple schema changes made during dev
 against other clusters my personal approach would be:

 - create a cli script for every change (using update and delete CF),
 prefixed with 000X so you can see the order.
 - manage the scripts in source control
 - sanity check to see if they can be collapsed
 - replay the changes in order when applying them to a cluster.
 (you will still need to manually delete data from dropped cf's)

 changes to conf/cassandra.yaml can be managed using chef
 /person preference

 Others will have different ideas

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14 May 2011, at 00:15, David Boxenhorn wrote:

 Actually, I want a way to propagate *any* changes from development to
 staging to production, but schema changes are the most important.

 Could I use 2221 to propagate schema changes by deleting the schema in the
 target cluster, doing show schema in the source cluster, redirecting to a
 file, and running the file as a script in the target cluster?

 Of course, I would have to delete the files of dropped CFs by hand
 (something I *hate* to do, because I'm afraid of making a mistake), but it
 would be a big improvement.

 I am open to any other ideas of how to propagate changes from one cluster
 to another in an efficient non-error-prone fashion. Our development
 environment (i.e. development, staging, production) is pretty standard, so
 I'm sure that I'm not the only one with this problem!


 On Fri, May 13, 2011 at 12:51 PM, aaron morton aa...@thelastpickle.comwrote:

 What sort of schema changes are you making?  can you manage them as a CLI
 script under source control ?


 You may also be interested in  CASSANDRA-2221.

 Cheers
 Aaron
  -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12 May 2011, at 20:45, David Boxenhorn wrote:

 My use case is like this: I have a development cluster, a staging cluster
 and a production cluster. When I finish a set of migrations (i.e. changes)
 on the development cluster, I want to apply them to the staging cluster, and
 eventually the production cluster. I don't want to do it by hand, because
 it's a painful and error-prone process. What I would like to do is export
 the last N migrations from the development cluster as a text file, with
 exactly the same format as the original text commands, and import them to
 the staging and production clusters.

 I think the best place to do this might be the CLI, since you would
 probably want to view your migrations before exporting them. Something like
 this:

 show migrations N;Shows the last N migrations.
 export migrations N fileName;   Exports the last N migrations to
 file fileName.
 import migrations fileName; Imports migrations from fileName.

 The import process would apply the migrations one at a time giving you
 feedback like, applying migration: update column family If a migration
 fails, the process should give an appropriate message and stop.

 Is anyone else interested in this? I have created a Jira ticket for it
 here:

 https://issues.apache.org/jira/browse/CASSANDRA-2636








Using composite column names in the CLI

2011-05-16 Thread David Boxenhorn
Is there a way to view composite column names in the CLI?

Is there a way to input them (i.e. in the set command)?


Re: Import/Export of Schema Migrations

2011-05-13 Thread David Boxenhorn
Actually, I want a way to propagate *any* changes from development to
staging to production, but schema changes are the most important.

Could I use 2221 to propagate schema changes by deleting the schema in the
target cluster, doing show schema in the source cluster, redirecting to a
file, and running the file as a script in the target cluster?

Of course, I would have to delete the files of dropped CFs by hand
(something I *hate* to do, because I'm afraid of making a mistake), but it
would be a big improvement.

I am open to any other ideas of how to propagate changes from one cluster to
another in an efficient non-error-prone fashion. Our development environment
(i.e. development, staging, production) is pretty standard, so I'm sure that
I'm not the only one with this problem!


On Fri, May 13, 2011 at 12:51 PM, aaron morton aa...@thelastpickle.comwrote:

 What sort of schema changes are you making?  can you manage them as a CLI
 script under source control ?


 You may also be interested in  CASSANDRA-2221.

 Cheers
 Aaron
 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 12 May 2011, at 20:45, David Boxenhorn wrote:

 My use case is like this: I have a development cluster, a staging cluster
 and a production cluster. When I finish a set of migrations (i.e. changes)
 on the development cluster, I want to apply them to the staging cluster, and
 eventually the production cluster. I don't want to do it by hand, because
 it's a painful and error-prone process. What I would like to do is export
 the last N migrations from the development cluster as a text file, with
 exactly the same format as the original text commands, and import them to
 the staging and production clusters.

 I think the best place to do this might be the CLI, since you would
 probably want to view your migrations before exporting them. Something like
 this:

 show migrations N;Shows the last N migrations.
 export migrations N fileName;   Exports the last N migrations to file
 fileName.
 import migrations fileName; Imports migrations from fileName.

 The import process would apply the migrations one at a time giving you
 feedback like, applying migration: update column family If a migration
 fails, the process should give an appropriate message and stop.

 Is anyone else interested in this? I have created a Jira ticket for it
 here:

 https://issues.apache.org/jira/browse/CASSANDRA-2636






Import/Export of Schema Migrations

2011-05-12 Thread David Boxenhorn
My use case is like this: I have a development cluster, a staging cluster
and a production cluster. When I finish a set of migrations (i.e. changes)
on the development cluster, I want to apply them to the staging cluster, and
eventually the production cluster. I don't want to do it by hand, because
it's a painful and error-prone process. What I would like to do is export
the last N migrations from the development cluster as a text file, with
exactly the same format as the original text commands, and import them to
the staging and production clusters.

I think the best place to do this might be the CLI, since you would probably
want to view your migrations before exporting them. Something like this:

show migrations N;Shows the last N migrations.
export migrations N fileName;   Exports the last N migrations to file
fileName.
import migrations fileName; Imports migrations from fileName.

The import process would apply the migrations one at a time giving you
feedback like, applying migration: update column family If a migration
fails, the process should give an appropriate message and stop.

Is anyone else interested in this? I have created a Jira ticket for it here:

https://issues.apache.org/jira/browse/CASSANDRA-2636


Re: compaction strategy

2011-05-09 Thread David Boxenhorn
I'm also not too much in favor of triggering major compactions, because it
mostly have a nasty effect (create one huge sstable).

If that is the case, why can't major compactions create many,
non-overlapping SSTables?

In general, it seems to me that non-overlapping SSTables have all the
advantages of big SSTables (i.e. you know exactly where the data is) without
the disadvantages that come with being big. Why doesn't Cassandra take
advantage of that in a major way?


Re: compaction strategy

2011-05-09 Thread David Boxenhorn
If they each have their own copy of the data, then they are *not*
non-overlapping!

If you have non-overlapping SSTables (and you know the min/max keys), it's
like having one big SSTable because you know exactly where each row is, and
it becomes easy to merge a new SSTable in small batches, rather than in one
huge batch.

The only step that you have to add to the current merge process is, when you
going to write a new SSTable, if it's too big, to write N (non-overlapping!)
pieces instead.


On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen tmarthinus...@gmail.com
 wrote:

 Yes, agreed.

 I actually think cassandra has to.

 And if you do not go down to that single file, how do you avoid getting
 into a situation where you can very realistically end up with 4-5 big
 sstables each having its own copy of the same data massively increasing disk
 requirements?

 Terje

 On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn da...@taotown.com wrote:

 I'm also not too much in favor of triggering major compactions, because
 it mostly have a nasty effect (create one huge sstable).

 If that is the case, why can't major compactions create many,
 non-overlapping SSTables?

 In general, it seems to me that non-overlapping SSTables have all the
 advantages of big SSTables (i.e. you know exactly where the data is) without
 the disadvantages that come with being big. Why doesn't Cassandra take
 advantage of that in a major way?





Cassandra and JCR

2011-05-06 Thread David Boxenhorn
I think this is a question specifically for Patricio Echagüe, though I
welcome answers from anyone else who can contribute...

We are considering using Magnolia as a CMS. Magnolia uses Jackrabbit for its
data storage. Jackrabbit is a JCR implementation.

Questions:

1. Can we plug Cassandra into JCR/Jackrabbit as its data storage?

2. I see that some work has already been done on this issue (specifically, I
see that Patrico was involved in this). Where does that work stand now? Is
this a viable option for us?

3. How much work would it be for us?

4. What are the issues involved?


Cassandra CMS

2011-05-05 Thread David Boxenhorn
Does anyone know of a content management system that can be easily
customized to use Cassandra as its database?

(Even better, if it can use Cassandra without customization!)


Re: Cassandra CMS

2011-05-05 Thread David Boxenhorn
I'm looking at Magnolia at the moment (as in, this second). At first glance,
it looks like I should be able to use Cassandra as the database:

http://documentation.magnolia-cms.com/technical-guide/content-storage-and-structure.html#Persistent_storage

If it can use a filesystem as its database, it can use Cassandra, no?

On Thu, May 5, 2011 at 2:01 PM, aaron morton aa...@thelastpickle.comwrote:

 Would you think of Django as a CMS ?

 http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework


 http://stackoverflow.com/questions/2369793/how-to-use-cassandra-in-django-framework
 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5 May 2011, at 22:54, Eric tamme wrote:

 Does anyone know of a content management system that can be easily

 customized to use Cassandra as its database?


 (Even better, if it can use Cassandra without customization!)



 I think your best bet will be to look for a CMS that uses an ORM for
 the storage layer and write a specific ORM for Cassandra that you can
 plugin to whatever frame work the CMS uses.

 -Eric





Compound columns spec

2011-05-05 Thread David Boxenhorn
Is there a spec for compound columns?

I want to know the exact format of compound columns so I can adhere to it.
For example, what is the separator - or is some other format used (e.g.
length:value or type:length:value)?


Re: Compound columns spec

2011-05-05 Thread David Boxenhorn
Thanks, yes, I was referring to the compound columns in this quote (from a
previous thread):

No CQL will never support super columns, but later versions (not 1.0.0)
will support compound columns.  Compound columns are better; instead of
a two-deep structure, you can have one of arbitrary depth.

I would like to design my keys to take advantage of this future development,
when it comes.

On Thu, May 5, 2011 at 5:53 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 I suppose it depends what you are referring to by compound columns.
 If you're talking
 about the CompositeType of CASSANDRA-2231 (which is my only guess), then
 the
 format is in the javadoc and is:
 /*
 * The encoding of a CompositeType column name should be:
 *   componentcomponentcomponent ...
 * where component is:
 *   length of valuevalue'end-of-component' byte
 * where the 'end-of-component' byte should always be 0 for actual column
 * name.  However, it can set to 1 for query bounds. This allows to query
 for
 * the equivalent of 'give me the full super-column'. That is, if during a
 * slice query uses:
 *   start = 3foo.getBytes()0
 *   end   = 3foo.getBytes()1
 * then he will be sure to get *all* the columns whose first component is
 foo.
 * If for a component, the 'end-of-component' is != 0, there should not be
 any
 * following component.
 */

 I'll mention that this is not committed code yet (but soon hopefully
 and the format
 shouldn't change).

 --
 Sylvain

 On Thu, May 5, 2011 at 4:44 PM, David Boxenhorn da...@taotown.com wrote:
  Is there a spec for compound columns?
 
  I want to know the exact format of compound columns so I can adhere to
 it.
  For example, what is the separator - or is some other format used (e.g.
  length:value or type:length:value)?
 



Re: Compound columns spec

2011-05-05 Thread David Boxenhorn
What is the format of length of value ?

On Thu, May 5, 2011 at 6:14 PM, Eric Evans eev...@rackspace.com wrote:

 On Thu, 2011-05-05 at 17:44 +0300, David Boxenhorn wrote:
  Is there a spec for compound columns?
 
  I want to know the exact format of compound columns so I can adhere to
  it. For example, what is the separator - or is some other format used
  (e.g. length:value or type:length:value)?

 Tentatively, CQL will use colon delimited terms like this, yes
 (tentatively).

 --
 Eric Evans
 eev...@rackspace.com




One cluster or many?

2011-05-03 Thread David Boxenhorn
If I have a database that partitions naturally into non-overlapping
datasets, in which there are no references between datasets, where each
dataset is quite large (i.e. large enough to merit its own cluster from the
point of view of quantity of data), should I set up one cluster per database
or one large cluster for everything together?

As I see it:

The primary advantage of separate clusters is total isolation: if I have a
problem with one dataset, my application will continue working normally for
all other datasets.

The primary advantage of one big cluster is usage pooling: when one server
goes down in a large cluster it's much less important than when one server
goes down in a small cluster. Also, different temporal usage patterns of the
different datasets (i.e. there will be different peak hours on different
datasets) can be combined to ease capacity requirements.

Any thoughts?


Terrible CQL idea: and aliases of = and =

2011-05-02 Thread David Boxenhorn
Is this still true?

*Note: The greater-than and less-than operators ( and ) result in key
ranges that are inclusive of the terms. There is no supported notion of
“strictly” greater-than or less-than; these operators are merely supported
as aliases to = and =.

*
I think that making  and  aliases of = and = is a terrible idea!

First of all, it is very misleading. Second, what will happen to old code
when  and  are really supported? (*Some* day they will be supported!)


Re: Combining all CFs into one big one

2011-05-02 Thread David Boxenhorn
I guess I'm still feeling fuzzy on this because my actual use-case isn't so
black-and-white. I don't have any CFs that are accessed purely, or even
mostly, in once-through batch mode. What I have is CFs with more and less
data, and CFs that are accessed more and less frequently.


On Mon, May 2, 2011 at 7:52 PM, Tyler Hobbs ty...@datastax.com wrote:

 On Mon, May 2, 2011 at 5:05 AM, David Boxenhorn da...@taotown.com wrote:

 Wouldn't it be the case that the once-used rows in your batch process
 would quickly be traded out of the cache, and replaced by frequently-used
 rows?


 Yes, and you'll pay a cache miss penalty for each of the replacements.


 This would be the case even if your batch process goes on for a long time,
 since caching is done on a row-by-row basis. In effect, it would mean that
 part of your cache is taken up by the batch process, much as if you
 dedicated a permanent cache to the batch - except that it isn't permanent,
 so it's better!


 Right, but we didn't want to cache any of the batch CF in the first place,
 because caching that CF is worth very little.  With separate CFs, we could
 explicitly give it no cache.  Now we have no control over how much of the
 cache it evicts.




Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn
I'm having problems administering my cluster because I have too many CFs
(~40).

I'm thinking of combining them all into one big CF. I would prefix the
current CF name to the keys, repeat the CF name in a column, and index the
column (so I can loop over all rows, which I have to do sometimes, for some
CFs).

Can anyone think of any disadvantages to this approach?


Re: Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn
Shouldn't these kinds of problems be solved by Cassandra? Isn't there a
maximum SSTable size?

On Sun, May 1, 2011 at 3:24 PM, shimi shim...@gmail.com wrote:

 Big sstables, long compactions, in major compaction you will need to have
 free disk space in the size of all the sstables (which you should have
 anyway).

 Shimi


 On Sun, May 1, 2011 at 2:03 PM, David Boxenhorn da...@taotown.com wrote:

 I'm having problems administering my cluster because I have too many CFs
 (~40).

 I'm thinking of combining them all into one big CF. I would prefix the
 current CF name to the keys, repeat the CF name in a column, and index the
 column (so I can loop over all rows, which I have to do sometimes, for some
 CFs).

 Can anyone think of any disadvantages to this approach?





Re: Combining all CFs into one big one

2011-05-01 Thread David Boxenhorn
If you had one big cache, wouldn't it be the case that it's mostly populated
with frequently accessed rows, and less populated with rarely accessed rows?

In fact, wouldn't one big cache dynamically and automatically give you
exactly what you want? If you try to partition the same amount of memory
manually, by guesswork, among many tables, aren't you always going to do a
worse job?


On Sun, May 1, 2011 at 10:43 PM, Tyler Hobbs ty...@datastax.com wrote:

 On Sun, May 1, 2011 at 2:16 PM, Jake Luciani jak...@gmail.com wrote:



 On Sun, May 1, 2011 at 2:58 PM, shimi shim...@gmail.com wrote:

 On Sun, May 1, 2011 at 9:48 PM, Jake Luciani jak...@gmail.com wrote:

 If you have N column families you need N * memtable size of RAM to
 support this.  If that's not an option you can merge them into one as you
 suggest but then you will have much larger SSTables, slower compactions,
 etc.



 I don't necessarily agree with Tyler that the OS cache will be less
 effective... But I do agree that if the sizes of sstables are too large for
 you then more hardware is the solution...


 If you merge CFs which are hardly accessed with one which are accessed
 frequently, when you read the SSTable you load data that is hardly accessed
 to the OS cache.


  Only the rows or portions of rows you read will be loaded into the OS
 cache.  Just because different rows are in the same file doesn't mean the
 entire file is loaded into the OS cache.  The bloom filter and index file
 will be loaded but those are not large files.


 Right -- it does depend on the page size and the average amount of data
 read.  The effect will be more pronounced on CFs with small rows that those
 with wide rows.



Re: Indexes on heterogeneous rows

2011-04-17 Thread David Boxenhorn
Thanks, Jonathan. I think I understand now.

To sum up: Everything would work, but if your only equality is on type
(all the rest inequalities), it could be very inefficient.

Is that right?

On Thu, Apr 14, 2011 at 7:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Thu, Apr 14, 2011 at 6:48 AM, David Boxenhorn da...@taotown.com
 wrote:
  The reason why I put type first is that queries on type will
  always be an exact match, whereas the other clauses might be
 inequalities.

 Expression order doesn't matter, but as you imply, non-equalities
 can't be used in an index lookup and have to be checked in a nested
 loop phase afterwards.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: Indexes on heterogeneous rows

2011-04-14 Thread David Boxenhorn
Thank you for your answer, and sorry about the sloppy terminology.

I'm thinking of the scenario where there are a small number of results in
the result set, but there are billions of rows in the first of your
secondary indexes.

That is, I want to do something like (not sure of the CQL syntax):

select * where type=2 and e=5

where there are billions of rows of type 2, but some manageable number of
those rows have e=5.

As I understand it, secondary indexes are like column families, where each
value is a column. So the billions of rows where type=2 would go into a
single row of the secondary index. This sounds like a problem to me, is it?


I'm assuming that the billions of rows that don't have column e at all
(those rows of other types) are not a problem at all...

On Thu, Apr 14, 2011 at 12:12 PM, aaron morton aa...@thelastpickle.comwrote:

 Need to clear up some terminology here.

 Rows have a key and can be retrieved by key. This is *sort of* the primary
 index, but not primary in the normal RDBMS sense.
 Rows can have different columns and the column names are sorted and can be
 efficiently selected.
 There are secondary indexes in cassandra 0.7 based on column values
 http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes

 So you could create secondary indexes on the a,e, and h columns and get
 rows that have specific values. There are some limitations to secondary
 indexes, read the linked article.

 Or you can make your own secondary indexes using row keys as the index
 values.

 If you have billions of rows, how many do you need to read back at once?

 Hope that helps
 Aaron

 On 14 Apr 2011, at 04:23, David Boxenhorn wrote:

 Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
 different sets of columns?

 For example, let's say you have three types of objects (1, 2, 3) which each
 had three members. If your rows had the following pattern

 type=1 a=? b=? c=?
 type=2 d=? e=? f=?
 type=3 g=? h=? i=?

 could you index type as your primary index, and also index a, e, h
 as secondary indexes, to get the objects of that type that you are looking
 for?

 Would it work if you had billions of rows of each type?





Indexes on heterogeneous rows

2011-04-13 Thread David Boxenhorn
Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
different sets of columns?

For example, let's say you have three types of objects (1, 2, 3) which each
had three members. If your rows had the following pattern

type=1 a=? b=? c=?
type=2 d=? e=? f=?
type=3 g=? h=? i=?

could you index type as your primary index, and also index a, e, h
as secondary indexes, to get the objects of that type that you are looking
for?

Would it work if you had billions of rows of each type?


Re: Double ColumnType and comparing

2011-03-14 Thread David Boxenhorn
I you do it, I'd recommend BigDecimal. It's an exact type, and usually what
you want.

On Mon, Mar 14, 2011 at 3:40 PM, Jonathan Ellis jbel...@gmail.com wrote:

 We'd be happy to commit a patch contributing a DoubleType.

 On Sun, Mar 13, 2011 at 7:36 PM, Paul Teasdale teasda...@gmail.com
 wrote:
  I am quite new to Cassandra and am trying to model a simple Column Family
  which uses Doubles as column names:
  Datalines: { // ColumnFamilly
  dataline-1:{ // row key
  23.5: 'someValue',
  23.6: 'someValue',
  ...
 4334.99: 'someValue'
  },
  dataline-2:{
  10.5: 'someValue',
  12.6: 'someValue',
  ...
 23334.99: 'someValue'
  },
  ...
  dataline-n:{
  10.5: 'someValue',
  12.6: 'someValue',
  ...
 23334.99: 'someValue'
}
  }
  In declaring this column family, I need to specify a 'CompareWith'
 attribute
  for a Double type, but the only available values I found for this
 attribute
  are:
   * BytesType
   * AsciiType
   * UTF8Type
   * LongType
   * LexicalUUIDType
   * TimeUUIDType
  Is there any support anywere for double values (there has to be
 something)?
  And if not, does this mean we need to extend
   org.apache.cassandra.db.marshal.AbstractTypeDouble?
  package  com.mycom.types;
  class  DoubleType extends
  org.apache.cassandra.db.marshal.AbstractTypeDouble {
   public int compare(ByteBuffer o1, ByteBuffer o2){
 // trivial implementation
 Double d1  = o1.getDouble(0);
Double d2 = o2.getDoube(0);
return d1.compareTo(d2);
   }
   //...
  }
  And declare the column family:
  ColumnFamily CompareWith=com.mycom.types.DoubleType
 Name=Datalines/
  Thanks,
  Paul
 
 
 
 
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-14 Thread David Boxenhorn
How do you write to two versions of Cassandra from the same client? Two
versions of Hector?

On Mon, Mar 14, 2011 at 6:46 PM, Robert Coli rc...@digg.com wrote:

 On Mon, Mar 14, 2011 at 8:39 AM, Jedd Rashbrooke j...@visualdna.com
 wrote:
   But more importantly for us it would mean we'd have just the
   one major outage, rather than two (relocation and 0.6 - 0.7)

 Take zero major outages instead? :D

 a) Set up new cluster on new version.
 b) Fork application writes, so all writes go to both clusters.
 c) Backfill old data to new cluster via API writes.
 d) Flip the switch to read from the new cluster.
 e) Turn off old cluster.

 =Rob



Re: Nodes frozen in GC

2011-03-08 Thread David Boxenhorn
If RF=2 and CL= QUORUM, you're getting no benefit from replication. When a
node is in GC it stops everything. Set RF=3, so when one node is busy the
cluster will still work.

On Tue, Mar 8, 2011 at 11:46 AM, ruslan usifov ruslan.usi...@gmail.comwrote:



 2011/3/8 Chris Goffinet c...@chrisgoffinet.com

 How large are your SSTables on disk? My thought was because you have so
 many on disk, we have to store the bloom filter + every 128 keys from index
 in memory.


 0.5GB
  But as I understand store in memory happens only when read happens, i do
 only inserts. And i think that memory doesn't problem, because heap
 allocations looks like saw (in max Heap allocations get about 5,5 GB then
 they reduce to 2GB)


 Also when i increase Heap Size to 7GB, situation stay mach better, but
 nodes frozen still happens, and in gc.log I steel see:

 Total time for which application threads were stopped: 20.0686307 seconds

 lines (right not so often, like before)



Re: Exceptions on 0.7.0

2011-02-22 Thread David Boxenhorn
Shimi,

I am getting the same error that you report here. What did you do to solve
it?

David

On Thu, Feb 10, 2011 at 2:54 PM, shimi shim...@gmail.com wrote:

 I upgraded the version on all the nodes but I still gets the Exceptions.
 I run cleanup on one of the nodes but I don't think there is any cleanup
 going on.

 Another weird thing that I see is:
 INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353 CompactionIterator.java
 (line 135) Compacting large row
 333531353730363835363237353338383836383035363036393135323132383
 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
 (725849473109 bytes) incrementally

 In my production version the largest row is 10259. It shouldn't be
 different in this case.

 The first Exception is been thrown on 3 nodes during compaction.
 The second Exception (Internal error processing get_range_slices) is been
 thrown all the time by a forth node. I disabled gossip and any client
 traffic to it and I still get the Exceptions.
 Is it possible to boot a node with gossip disable?

 Shimi

 On Thu, Feb 10, 2011 at 11:11 AM, aaron morton aa...@thelastpickle.comwrote:

 I should be able to repair, install the new version and kick off nodetool
 repair .

 If you are uncertain search for cassandra-1992 on the list, there has been
 some discussion. You can also wait till some peeps in the states wake up if
 you want to be extra sure.

  The number if the number of columns the iterator is going to return from
 the row. I'm guessing that because this happening during compaction it's
 using asked for the maximum possible number of columns.

 Aaron



 On 10 Feb 2011, at 21:37, shimi wrote:

 On 10 Feb 2011, at 13:42, Dan Hendry wrote:

  Out of curiosity, do you really have on the order of 1,986,622,313
 elements (I believe elements=keys) in the cf?

 Dan

 No. I was too puzzled by the numbers


 On Thu, Feb 10, 2011 at 10:30 AM, aaron morton aa...@thelastpickle.com
  wrote:

 Shimi,
 You may be seeing the result of CASSANDRA-1992, are you able to test with
 the most recent 0.7 build ?
 https://hudson.apache.org/hudson/job/Cassandra-0.7/


 Aaron

 I will. I hope the data was not corrupted.



 On Thu, Feb 10, 2011 at 10:30 AM, aaron morton 
 aa...@thelastpickle.comwrote:

 Shimi,
 You may be seeing the result of CASSANDRA-1992, are you able to test with
 the most recent 0.7 build ?
 https://hudson.apache.org/hudson/job/Cassandra-0.7/


 Aaron

 On 10 Feb 2011, at 13:42, Dan Hendry wrote:

 Out of curiosity, do you really have on the order of 1,986,622,313
 elements (I believe elements=keys) in the cf?

 Dan

  *From:* shimi [mailto:shim...@gmail.com]
 *Sent:* February-09-11 15:06
 *To:* user@cassandra.apache.org
 *Subject:* Exceptions on 0.7.0

 I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
 On 3 out of the 4 nodes I get exceptions in the log.
 I am using RP.
 Changes that I did:
 1. changed the replication factor from 3 to 4
 2. configured the nodes to use Dynamic Snitch
 3. RR of 0.33

 I run repair on 2 nodes  before I noticed the errors. One of them is
 having the first error and the other the second.
 I restart the nodes but I still get the exceptions.

 The following Exception I get from 2 nodes:
  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java
 (line 84) Cannot provide an optimal Bloom
 Filter for 1986622313 elements (1/4 buckets per element).
 ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
 AbstractCassandraDaemon.java (line 91) Fatal exception in
 thread Thread[CompactionExecutor:1,1,main]
 java.io.IOError: java.io.EOFException
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
 at
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
 at
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 at
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 at
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at
 com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at
 org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76)
 at
 org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50)
 at
 org.apache.cassandra.io.LazilyCompactedRow.init(LazilyCompactedRow.java:88)
 at
 

Re: Exceptions on 0.7.0

2011-02-22 Thread David Boxenhorn
Thanks, Shimi. I'll keep you posted if we make progress. Riptano is working
on this problem too.

On Tue, Feb 22, 2011 at 3:30 PM, shimi shim...@gmail.com wrote:

 I didn't solved it.
 Since it is a test cluster I deleted all the data. I copied some sstables
 from my production cluster and I tried again, this time I didn't have this
 problem.
 I am planing on removing everything from this test cluster. I will start
 all over again with 0.6.x , then I will load it with 10th of GB of data (not
 sstable copy) and test the upgrade again.

 I did a mistake that I didn't backup the data files before I upgraded.

 Shimi

 On Tue, Feb 22, 2011 at 2:24 PM, David Boxenhorn da...@lookin2.comwrote:

 Shimi,

 I am getting the same error that you report here. What did you do to solve
 it?

 David


 On Thu, Feb 10, 2011 at 2:54 PM, shimi shim...@gmail.com wrote:

 I upgraded the version on all the nodes but I still gets the Exceptions.
 I run cleanup on one of the nodes but I don't think there is any cleanup
 going on.

 Another weird thing that I see is:
 INFO [CompactionExecutor:1] 2011-02-10 12:08:21,353
 CompactionIterator.java (line 135) Compacting large row
 333531353730363835363237353338383836383035363036393135323132383
 73630323034313a446f20322e384c20656e67696e657320686176652061646a75737461626c65206c696674657273
 (725849473109 bytes) incrementally

 In my production version the largest row is 10259. It shouldn't be
 different in this case.

 The first Exception is been thrown on 3 nodes during compaction.
 The second Exception (Internal error processing get_range_slices) is
 been thrown all the time by a forth node. I disabled gossip and any client
 traffic to it and I still get the Exceptions.
 Is it possible to boot a node with gossip disable?

 Shimi

 On Thu, Feb 10, 2011 at 11:11 AM, aaron morton 
 aa...@thelastpickle.comwrote:

 I should be able to repair, install the new version and kick off
 nodetool repair .

 If you are uncertain search for cassandra-1992 on the list, there has
 been some discussion. You can also wait till some peeps in the states wake
 up if you want to be extra sure.

  The number if the number of columns the iterator is going to return
 from the row. I'm guessing that because this happening during compaction
 it's using asked for the maximum possible number of columns.

 Aaron



 On 10 Feb 2011, at 21:37, shimi wrote:

 On 10 Feb 2011, at 13:42, Dan Hendry wrote:

  Out of curiosity, do you really have on the order of 1,986,622,313
 elements (I believe elements=keys) in the cf?

 Dan

 No. I was too puzzled by the numbers


 On Thu, Feb 10, 2011 at 10:30 AM, aaron morton aa...@thelastpickle.com
  wrote:

 Shimi,
 You may be seeing the result of CASSANDRA-1992, are you able to test
 with the most recent 0.7 build ?
 https://hudson.apache.org/hudson/job/Cassandra-0.7/


 Aaron

 I will. I hope the data was not corrupted.



 On Thu, Feb 10, 2011 at 10:30 AM, aaron morton aa...@thelastpickle.com
  wrote:

 Shimi,
 You may be seeing the result of CASSANDRA-1992, are you able to test
 with the most recent 0.7 build ?
 https://hudson.apache.org/hudson/job/Cassandra-0.7/


 Aaron

 On 10 Feb 2011, at 13:42, Dan Hendry wrote:

 Out of curiosity, do you really have on the order of 1,986,622,313
 elements (I believe elements=keys) in the cf?

 Dan

  *From:* shimi [mailto:shim...@gmail.com]
 *Sent:* February-09-11 15:06
 *To:* user@cassandra.apache.org
 *Subject:* Exceptions on 0.7.0

 I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
 On 3 out of the 4 nodes I get exceptions in the log.
 I am using RP.
 Changes that I did:
 1. changed the replication factor from 3 to 4
 2. configured the nodes to use Dynamic Snitch
 3. RR of 0.33

 I run repair on 2 nodes  before I noticed the errors. One of them is
 having the first error and the other the second.
 I restart the nodes but I still get the exceptions.

 The following Exception I get from 2 nodes:
  WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java
 (line 84) Cannot provide an optimal Bloom
 Filter for 1986622313 elements (1/4 buckets per element).
 ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
 AbstractCassandraDaemon.java (line 91) Fatal exception in
 thread Thread[CompactionExecutor:1,1,main]
 java.io.IOError: java.io.EOFException
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
 at
 org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
 at
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
 at
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
 at
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68

Distribution Factor: part of the solution to many-CF problem?

2011-02-21 Thread David Boxenhorn
Cassandra is both distributed and replicated. We have Replication Factor but
no Distribution Factor!

Distribution Factor would define over how many nodes a CF should be
distributed.

Say you want to support millions of multi-tenant users in clusters with
thousands of nodes, where you don't know the user's schema in advance, so
you can't have users share CFs.

In this case you wouldn't want to spread out each user's Column Families
over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
distribute each CF over 10 nodes, within those nodes replicate 3 times.

One implementation of DF would be to hash the CF name, and use the same
strategies defined for RF to choose the N nodes in DF=N.


Re: Distribution Factor: part of the solution to many-CF problem?

2011-02-21 Thread David Boxenhorn
No, that's not what I mean at all.

That message is about the ability to use different partitioners for
different CFs, say, RandomPartitioner for one, OPP for another.

I'm talking about defining how many nodes a CF should be distributed over,
which would be useful if you have a lot of nodes and a lot of small CFs
(small relative to the total amount of data).


On Mon, Feb 21, 2011 at 9:58 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Sounds a bit like this idea
 http://www.mail-archive.com/dev@cassandra.apache.org/msg01799.html

 Aaron

 On 22/02/2011, at 1:28 AM, David Boxenhorn da...@lookin2.com wrote:

  Cassandra is both distributed and replicated. We have Replication Factor
 but no Distribution Factor!
 
  Distribution Factor would define over how many nodes a CF should be
 distributed.
 
  Say you want to support millions of multi-tenant users in clusters with
 thousands of nodes, where you don't know the user's schema in advance, so
 you can't have users share CFs.
 
  In this case you wouldn't want to spread out each user's Column Families
 over thousands of nodes! You would want something like: RF=3, DF=10 i.e.
 distribute each CF over 10 nodes, within those nodes replicate 3 times.
 
  One implementation of DF would be to hash the CF name, and use the same
 strategies defined for RF to choose the N nodes in DF=N.
 



Re: Do supercolumns have a purpose?

2011-02-13 Thread David Boxenhorn
I agree, that is the way to go. Then each piece of new functionality will
not have to be implemented twice.

On Sat, Feb 12, 2011 at 9:41 AM, Stu Hood stuh...@gmail.com wrote:

 I would like to continue to support super columns, but to slowly convert
 them into compound column names, since that is really all they really are.


 On Thu, Feb 10, 2011 at 10:16 AM, Frank LoVecchio fr...@isidorey.comwrote:

 I've found super column families quite useful when using
 RandomOrderedPartioner on a low-maintenance cluster (as opposed to
 Byte/Ordered), e.g. returning ordered data from a TimeUUID comparator type;
 try doing that with one regular column family and secondary indexes (you
 could obviously sort on the client side, but that is tedious and not logical
 for older data).

 On Thu, Feb 10, 2011 at 12:32 AM, David Boxenhorn da...@lookin2.comwrote:

 Mike, my problem is that I have an database and codebase that already
 uses supercolumns. If I had to do it over, it wouldn't use them, for the
 reasons you point out. In fact, I have a feeling that over time supercolumns
 will become deprecated de facto, if not de jure. That's why I would like to
 see them represented internally as regular columns, with an upgrade path for
 backward compatibility.

 I would love to do it myself! (I haven't looked at the code base, but I
 don't understand why it should be so hard.) But my employer has other
 ideas...


 On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone m...@simplegeo.com wrote:

 On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn da...@lookin2.comwrote:

 Shaun, I agree with you, but marking them as deprecated is not good
 enough for me. I can't easily stop using supercolumns. I need an upgrade
 path.


 David,

 Cassandra is open source and community developed. The right thing to do
 is what's best for the community, which sometimes conflicts with what's 
 best
 for individual users. Such strife should be minimized, it will never be
 eliminated. Luckily, because this is an open source, liberal licensed
 project, if you feel strongly about something you should feel free to add
 whatever features you want yourself. I'm sure other people in your 
 situation
 will thank you for it.

 At a minimum I think it would behoove you to re-read some of the
 comments here re: why super columns aren't really needed and take another
 look at your data model and code. I would actually be quite surprised to
 find a use of super columns that could not be trivially converted to normal
 columns. In fact, it should be possible to do at the framework/client
 library layer - you probably wouldn't even need to change any application
 code.

 Mike

 On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts sh...@cuttshome.netwrote:


 I'm a newbie here, but, with apologies for my presumptuousness, I
 think you should deprecate SuperColumns. They are already distracting 
 you,
 and as the years go by the cost of supporting them as you add more and 
 more
 functionality is only likely to get worse. It would be better to 
 concentrate
 on making the core column families better (and I'm sure we can all 
 think
 of lots of things we'd like).

 Just dropping SuperColumns would be bad for your reputation -- and for
 users like David who are currently using them. But if you mark them 
 clearly
 as deprecated and explain why and what to do instead (perhaps putting a 
 bit
 of effort into migration tools... or even a virtual layer supporting
 arbitrary hierarchical data), then you can drop them in a few years (when
 you get to 1.0, say), without people feeling betrayed.

 -- Shaun

 On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

 My main point was to say that it's think it is better to create
 tickets for what you want, rather than for something else completely
 different that would, as a by-product, give you what you want.

 Then let me say what I want: I want supercolumn families to have any
 feature that regular column families have.

 My data model is full of supercolumns. I used them, even though I knew
 it didn't *have to*, because they were there, which implied to me that 
 I
 was supposed to use them for some good reason. Now I suspect that they 
 will
 gradually become less and less functional, as features are added to 
 regular
 column families and not supported for supercolumn families.


 On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.comwrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
 sylv...@datastax.com wrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.com
  wrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely

Re: Do supercolumns have a purpose?

2011-02-09 Thread David Boxenhorn
Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone m...@simplegeo.com wrote:

 On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn da...@lookin2.com wrote:

 Shaun, I agree with you, but marking them as deprecated is not good enough
 for me. I can't easily stop using supercolumns. I need an upgrade path.


 David,

 Cassandra is open source and community developed. The right thing to do is
 what's best for the community, which sometimes conflicts with what's best
 for individual users. Such strife should be minimized, it will never be
 eliminated. Luckily, because this is an open source, liberal licensed
 project, if you feel strongly about something you should feel free to add
 whatever features you want yourself. I'm sure other people in your situation
 will thank you for it.

 At a minimum I think it would behoove you to re-read some of the comments
 here re: why super columns aren't really needed and take another look at
 your data model and code. I would actually be quite surprised to find a use
 of super columns that could not be trivially converted to normal columns. In
 fact, it should be possible to do at the framework/client library layer -
 you probably wouldn't even need to change any application code.

 Mike

 On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts sh...@cuttshome.net wrote:


 I'm a newbie here, but, with apologies for my presumptuousness, I think
 you should deprecate SuperColumns. They are already distracting you, and as
 the years go by the cost of supporting them as you add more and more
 functionality is only likely to get worse. It would be better to concentrate
 on making the core column families better (and I'm sure we can all think
 of lots of things we'd like).

 Just dropping SuperColumns would be bad for your reputation -- and for
 users like David who are currently using them. But if you mark them clearly
 as deprecated and explain why and what to do instead (perhaps putting a bit
 of effort into migration tools... or even a virtual layer supporting
 arbitrary hierarchical data), then you can drop them in a few years (when
 you get to 1.0, say), without people feeling betrayed.

 -- Shaun

 On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

 My main point was to say that it's think it is better to create tickets
 for what you want, rather than for something else completely different that
 would, as a by-product, give you what you want.

 Then let me say what I want: I want supercolumn families to have any
 feature that regular column families have.

 My data model is full of supercolumns. I used them, even though I knew it
 didn't *have to*, because they were there, which implied to me that I was
 supposed to use them for some good reason. Now I suspect that they will
 gradually become less and less functional, as features are added to regular
 column families and not supported for supercolumn families.


 On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.comwrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.com
  wrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely subjective, and on such matters code
 speaks louder than words, but I don't think I agree with you on the issue 
 of
 which alternative is less work, or even which is a better solution.


 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what 
 you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598, which itself would 
 require
 quite some work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why

Re: Do supercolumns have a purpose?

2011-02-08 Thread David Boxenhorn
Shaun, I agree with you, but marking them as deprecated is not good enough
for me. I can't easily stop using supercolumns. I need an upgrade path.

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts sh...@cuttshome.net wrote:


 I'm a newbie here, but, with apologies for my presumptuousness, I think you
 should deprecate SuperColumns. They are already distracting you, and as the
 years go by the cost of supporting them as you add more and more
 functionality is only likely to get worse. It would be better to concentrate
 on making the core column families better (and I'm sure we can all think
 of lots of things we'd like).

 Just dropping SuperColumns would be bad for your reputation -- and for
 users like David who are currently using them. But if you mark them clearly
 as deprecated and explain why and what to do instead (perhaps putting a bit
 of effort into migration tools... or even a virtual layer supporting
 arbitrary hierarchical data), then you can drop them in a few years (when
 you get to 1.0, say), without people feeling betrayed.

 -- Shaun

 On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:

 My main point was to say that it's think it is better to create tickets
 for what you want, rather than for something else completely different that
 would, as a by-product, give you what you want.

 Then let me say what I want: I want supercolumn families to have any
 feature that regular column families have.

 My data model is full of supercolumns. I used them, even though I knew it
 didn't *have to*, because they were there, which implied to me that I was
 supposed to use them for some good reason. Now I suspect that they will
 gradually become less and less functional, as features are added to regular
 column families and not supported for supercolumn families.


 On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely subjective, and on such matters code
 speaks louder than words, but I don't think I agree with you on the issue of
 which alternative is less work, or even which is a better solution.


 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598, which itself would require
 quite some work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?

 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].

 It seems to me that super columns are a historical artifact from
 Cassandra's early life as Facebook's inbox storage system. They needed
 posting lists of messages, sharded by user. So that's what they built. In my
 dealings with the Cassandra code, super columns end up making a mess all
 over the place when algorithms need to be special cased and branch based on
 the column/supercolumn distinction.

 I won't even mention what it does to the thrift interface.


 Actually, I agree with you, more than you know. If I were to start coding
 Cassandra now, I wouldn't include super columns (and I would probably not go
 for a depth unlimited hierarchical model either). But it's there and I'm not
 sure getting rid of them fully (meaning, including in thrift) is an option
 (it would be a big compatibility breakage). And (even though I certainly
 though about this more than once :)) I'm slightly less enthusiastic about
 keeping them in thrift but encoding them in regular column family
 internally: it would still be a lot of work but we would still probably end
 up with nasty tricks to stick to the thrift api.

 --
 Sylvain


 Mike

 [1] http://www.anuff.com/2010/07

Re: time to live rows

2011-02-08 Thread David Boxenhorn
I hope you don't consider this a hijack of the thread...

What I'd like to know is the following:

The GC removes TTL rows some time after they expire, at its convenience. But
will they stop being returned as soon as they expire? (This is the expected
behavior...)

On Tue, Feb 8, 2011 at 5:11 PM, Kallin Nagelberg kallin.nagelb...@gmail.com
 wrote:

 So the empty row will be ultimately removed then? Is there a way to
 for the GC to verify this?

 Thanks,
 -Kal

 On Tue, Feb 8, 2011 at 2:21 AM, Stu Hood stuh...@gmail.com wrote:
  The expired columns were converted into tombstones, which will live for
 the
  GC timeout. The empty row will be cleaned up when those tombstones are
  removed.
  Returning the empty row is unfortunate... we'd love to find a more
  appropriate solution that might not involve endless scanning.
  See
  http://wiki.apache.org/cassandra/FAQ#i_deleted_what_gives
  http://wiki.apache.org/cassandra/FAQ#range_ghosts
 
  On Mon, Feb 7, 2011 at 1:49 PM, Kallin Nagelberg
  kallin.nagelb...@gmail.com wrote:
 
  I also tried forcing a major compaction on the column family using JMX
  but the row remains.
 
  On Mon, Feb 7, 2011 at 4:43 PM, Kallin Nagelberg
  kallin.nagelb...@gmail.com wrote:
   I tried that but I still see the row coming back on a list
   columnfamily in the CLI. My concern is that there will be a pointer
   to an empty row for all eternity.
  
   -Kal
  
   On Mon, Feb 7, 2011 at 4:38 PM, Aaron Morton aa...@thelastpickle.com
 
   wrote:
   Deleting all the columns in a row via TTL has the same affect as
   deleting th
   row, the data will physically by removed during compaction.
  
   Aaron
  
  
   On 08 Feb, 2011,at 10:24 AM, Bill Speirs bill.spe...@gmail.com
 wrote:
  
   I don't think this is supported (but I could be completely wrong).
   However, I'd love to see this functionality as well.
  
   How would one go about requesting such a feature?
  
   Bill-
  
   On Mon, Feb 7, 2011 at 4:15 PM, Kallin Nagelberg
   kallin.nagelb...@gmail.com wrote:
   Hey,
  
   I have read about the new TTL columns in Cassandra 0.7. In my case
 I'd
   like to expire an entire row automatically after a certain amount of
   time. Is this possible as well?
  
   Thanks,
   -Kal
  
  
  
 
 



Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up

2011-02-07 Thread David Boxenhorn
Why not synchronize on the client side? Make sure that the process that
allocates user ids runs on only a single machine, in a synchronized method,
and uses QUORUM for its reads and writes to Cassandra?

On Sun, Feb 6, 2011 at 11:02 PM, Aaron Morton aa...@thelastpickle.comwrote:

 If you mix mysql and Cassandra you risk creating a single point of failure
 around the mysql system.

 If you have use data that changes infrequently, a row cache in cassandra
 will give you fast reads.

 Aaron

 On 5/02/2011, at 8:13 AM, Aklin_81 asdk...@gmail.com wrote:

  Thanks so much Ryan for the links; I'll definitely take them into
  consideration.
 
  Just another thought which came to my mind:-
  perhaps it may be beneficial to store(or duplicate) some of the data
  like the Login credentials  particularly userId to User's Name
  mapping, etc (which is very heavily read), in a fast MyISAM table.
  This could solve the problem of keys though auto-generated unique 
  sequential primary keys. I could use the same keys for Cassandra rows
  for that user. And also since Cassandra reads are relatively slow, it
  makes sense to store data like userId to Name mapping in MyISAM as
  this data would be required after almost all queries to the database.
 
  Regards
  -Asil
 
 
 
  On Fri, Feb 4, 2011 at 10:14 PM, Ryan King r...@twitter.com wrote:
  On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 asdk...@gmail.com wrote:
  Thanks Matthew  Ryan,
 
  The main inspiration behind me trying to generate Ids in sequential
  manner is to reduce the size of the userId, since I am using it for
  heavy denormalization. UUIDs are 16 bytes long, but I can also have a
  unique Id in just 4 bytes, and since this is just a one time process
  when the user signs-up, it makes sense to try cutting down the space
  requirements, if it is feasible without any downsides(!?).
 
  I am also using userIds to attach to Id of the other data of the user
  on my application. If I could reduce the userId size that I can also
  reduce the size of other Ids, I could drastically cut down the space
  requirements.
 
 
  [Sorry for this question is not directly related to cassandra but I
  think Cassandra factors here because of its  tuneable consistency]
 
  Don't generate these ids in cassandra. Use something like snowflake,
  flickr's ticket servers [2] or zookeeper sequential nodes.
 
  -ryan
 
 
  1. http://github.com/twitter/snowflake
  2.
 http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/
 



Re: Do supercolumns have a purpose?

2011-02-06 Thread David Boxenhorn
My main point was to say that it's think it is better to create tickets for
what you want, rather than for something else completely different that
would, as a by-product, give you what you want.

Then let me say what I want: I want supercolumn families to have any feature
that regular column families have.

My data model is full of supercolumns. I used them, even though I knew it
didn't *have to*, because they were there, which implied to me that I was
supposed to use them for some good reason. Now I suspect that they will
gradually become less and less functional, as features are added to regular
column families and not supported for supercolumn families.


On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely subjective, and on such matters code speaks
 louder than words, but I don't think I agree with you on the issue of which
 alternative is less work, or even which is a better solution.


 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on super
 columns, then there is a good chance this would be less work than getting
 rid of super columns. But to be fair, secondary indexes on super columns may
 not make too much sense without #598, which itself would require quite some
 work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?

 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].

 It seems to me that super columns are a historical artifact from
 Cassandra's early life as Facebook's inbox storage system. They needed
 posting lists of messages, sharded by user. So that's what they built. In my
 dealings with the Cassandra code, super columns end up making a mess all
 over the place when algorithms need to be special cased and branch based on
 the column/supercolumn distinction.

 I won't even mention what it does to the thrift interface.


 Actually, I agree with you, more than you know. If I were to start coding
 Cassandra now, I wouldn't include super columns (and I would probably not go
 for a depth unlimited hierarchical model either). But it's there and I'm not
 sure getting rid of them fully (meaning, including in thrift) is an option
 (it would be a big compatibility breakage). And (even though I certainly
 though about this more than once :)) I'm slightly less enthusiastic about
 keeping them in thrift but encoding them in regular column family
 internally: it would still be a lot of work but we would still probably end
 up with nasty tricks to stick to the thrift api.

 --
 Sylvain


 Mike

 [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
 [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf





Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Is there any advantage to using supercolumns
(columnFamilyName[superColumnName[columnName[val]]]) instead of regular
columns with concatenated keys
(columnFamilyName[superColumnName@columnName[val]])?


When I designed my data model, I used supercolumns wherever I needed two
levels of key depth - just because they were there, and I figured that they
must be there for a reason.

Now I see that in 0.7 secondary indexes don't work on supercolumns or
subcolumns (is that right?), which seems to me like a very serious
limitation of supercolumn families.

It raises the question: Is there anything that supercolumn families are good
for?

And here's a related question: Why can't Cassandra implement supercolumn
families as regular column families, internally, and give you that
functionality?


Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Thanks Sylvain!

Can I vote for internally implementing supercolumn families as regular
column families? (With a smooth upgrade process that doesn't require
shutting down a live cluster.)

What if supercolumn families were supported as regular column families + an
index (on what used to be supercolumn keys)? Would that solve some problems?


On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne sylv...@datastax.comwrote:

  Is there any advantage to using supercolumns
  (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
  columns with concatenated keys
  (columnFamilyName[superColumnName@columnName[val]])?
 
  When I designed my data model, I used supercolumns wherever I needed two
  levels of key depth - just because they were there, and I figured that
 they
  must be there for a reason.
 
  Now I see that in 0.7 secondary indexes don't work on supercolumns or
  subcolumns (is that right?), which seems to me like a very serious
  limitation of supercolumn families.
 
  It raises the question: Is there anything that supercolumn families are
 good
  for?

 There is a bunch of queries that you cannot do (or less conveniently) if
 you
 encode super columns using regular columns with concatenated keys:

 1) If you use regular columns with concatenated keys, the count argument
 count simple columns. With super columns it counts super columns. It means
 that you can't do give me the 10 first super columns of this row.

 2) If you need to get x super columns by name, you'll have to issue x
 get_slice query (one of each super column). On the client side it sucks.
 Internally in Cassandra we could do it reasonably well though.

 3) You cannot remove entire super columns since there is no support for
 range
 deletions.

 Moreover, the encoding with concatenated keys uses more disk space (and
 less
 disk used for the same information means less things to read so it may have
 a slight impact on read performance too -- it's probably really slight on
 most
 usage but nevertheless).

  And here's a related question: Why can't Cassandra implement supercolumn
  families as regular column families, internally, and give you that
  functionality?

 For the 1) and 2) above, we could deal with those internally fairly easily
 I
 think and rather well (which means it wouldn't be much worse
 performance-wise
 than with the actual implementaion of super columns, not that it would be
 better). For 3), range deletes are harder and would require more
 significant
 changes (that doesn't mean that Cassandra will never have it). Even without
 that, there would be the disk space lost.

 --
 Sylvain




Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
The advantage would be to enable secondary indexes on supercolumn families.

I understand from this thread that indexes are supercolumn families are not
going to be:

http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html

Which, it seems to me, effectively deprecates supercolumn families. (I don't
see any of the three problems you brought up as overcoming this problem,
except, perhaps, for special cases.)


On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn da...@lookin2.com wrote:

 Thanks Sylvain!

 Can I vote for internally implementing supercolumn families as regular
 column families? (With a smooth upgrade process that doesn't require
 shutting down a live cluster.)


 I forgot to add that I don't know if this make a lot of sense. That would
 be a fairly major refactor (so error prone), you'd still have to deal with
 the point I mentioned in my previous mail (for range deletes you would have
 to change the on-disk format for instance), and all this for no actual
 benefits, even downsides actually (encoded supercolumn will take more space
 on-disk (and on-memory)). Super columns are there and work fairly well, so
 what would be the point ?

 I'm only just saying that 'in theory', super columns are not the super
 shiny magical feature that give you stuff you can't hope to have with only
 regular column family. That doesn't make then at least nice.

 That being said, you are free to create whatever ticket you want and vote
 for it. Don't expect too much support tough :)


 What if supercolumn families were supported as regular column families +
 an index (on what used to be supercolumn keys)? Would that solve some
 problems?


 You'd still have to remember for each CF if it has this index on what used
 to be supercolumn keys and handle those differently. Really not convince
 this would make the code cleaner that how it is now. And making the code
 cleaner is really the only reason I can thing of for wanting to get rid of
 super columns internally, so ...




 On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne sylv...@datastax.comwrote:

  Is there any advantage to using supercolumns
  (columnFamilyName[superColumnName[columnName[val]]]) instead of regular
  columns with concatenated keys
  (columnFamilyName[superColumnName@columnName[val]])?
 
  When I designed my data model, I used supercolumns wherever I needed
 two
  levels of key depth - just because they were there, and I figured that
 they
  must be there for a reason.
 
  Now I see that in 0.7 secondary indexes don't work on supercolumns or
  subcolumns (is that right?), which seems to me like a very serious
  limitation of supercolumn families.
 
  It raises the question: Is there anything that supercolumn families are
 good
  for?

 There is a bunch of queries that you cannot do (or less conveniently) if
 you
 encode super columns using regular columns with concatenated keys:

 1) If you use regular columns with concatenated keys, the count argument
 count simple columns. With super columns it counts super columns. It
 means
 that you can't do give me the 10 first super columns of this row.

 2) If you need to get x super columns by name, you'll have to issue x
 get_slice query (one of each super column). On the client side it sucks.
 Internally in Cassandra we could do it reasonably well though.

 3) You cannot remove entire super columns since there is no support for
 range
 deletions.

 Moreover, the encoding with concatenated keys uses more disk space (and
 less
 disk used for the same information means less things to read so it may
 have
 a slight impact on read performance too -- it's probably really slight on
 most
 usage but nevertheless).

  And here's a related question: Why can't Cassandra implement
 supercolumn
  families as regular column families, internally, and give you that
  functionality?

 For the 1) and 2) above, we could deal with those internally fairly
 easily I
 think and rather well (which means it wouldn't be much worse
 performance-wise
 than with the actual implementaion of super columns, not that it would be
 better). For 3), range deletes are harder and would require more
 significant
 changes (that doesn't mean that Cassandra will never have it). Even
 without
 that, there would be the disk space lost.

 --
 Sylvain






Re: Do supercolumns have a purpose?

2011-02-03 Thread David Boxenhorn
Well, I am an actual active developer and I have managed to do pretty
nice stuffs with Cassandra - without secondary indexes so far. But I'm
looking forward to having secondary indexes in my arsenal when new
functional requirements come up, and I'm bummed out that my early design
decision to use supercolums wherever I could, instead of concatenating keys,
has closed off a whole lot of possibilities. I knew when I started that
secondary keys were in the future, if I had known that they would be only
for regular column families I wouldn't have used supercolumn families in the
first place, now I'm pretty much stuck (too late to go back - we're
launching in March).


On Thu, Feb 3, 2011 at 4:44 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.com wrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to supercolumn
 families and voting on it. This will be 1 or 2 order of magnitude less work
 than getting rid of super column internally, and probably a much better
 solution anyway.


 I understand from this thread that indexes are supercolumn families are
 not going to be:

 http://www.mail-archive.com/user@cassandra.apache.org/msg09527.html


 I should maybe let Jonathan answer this one, but the way I understand it is
 that adding secondary indexes to super column is not a top priority to
 actual active developers. Not that it will never ever happen. And voting for
 tickets in JIRA is one way to help make it raise its priority.

 In any case, if the goal you're pursuing is adding secondary indexes to
 super column, then that's the ticket you should open, and if after careful
 consideration it is decided that getting rid of super column is the best way
 to reach that goal then so be it (spoiler: it is not).


 Which, it seems to me, effectively deprecates supercolumn families. (I
 don't see any of the three problems you brought up as overcoming this
 problem, except, perhaps, for special cases.)


 You're untitled to your opinions obviously but I doubt everyone share that
 feeling (I don't for instance). Before 0.7, there was no secondary indexes
 at all and still a bunch of people managed to do pretty nice stuffs with
 Cassandra. In particular denormalized views are sometimes (often?)
 preferable to secondary indexes for performance reasons. For that super
 columns are quite handy.

 --
 Sylvain




  On Thu, Feb 3, 2011 at 3:32 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 1:33 PM, David Boxenhorn da...@lookin2.comwrote:

 Thanks Sylvain!

 Can I vote for internally implementing supercolumn families as regular
 column families? (With a smooth upgrade process that doesn't require
 shutting down a live cluster.)


 I forgot to add that I don't know if this make a lot of sense. That would
 be a fairly major refactor (so error prone), you'd still have to deal with
 the point I mentioned in my previous mail (for range deletes you would have
 to change the on-disk format for instance), and all this for no actual
 benefits, even downsides actually (encoded supercolumn will take more space
 on-disk (and on-memory)). Super columns are there and work fairly well, so
 what would be the point ?

 I'm only just saying that 'in theory', super columns are not the super
 shiny magical feature that give you stuff you can't hope to have with only
 regular column family. That doesn't make then at least nice.

 That being said, you are free to create whatever ticket you want and vote
 for it. Don't expect too much support tough :)


 What if supercolumn families were supported as regular column families +
 an index (on what used to be supercolumn keys)? Would that solve some
 problems?


 You'd still have to remember for each CF if it has this index on what
 used to be supercolumn keys and handle those differently. Really not
 convince this would make the code cleaner that how it is now. And making the
 code cleaner is really the only reason I can thing of for wanting to get rid
 of super columns internally, so ...




 On Thu, Feb 3, 2011 at 2:00 PM, Sylvain Lebresne 
 sylv...@datastax.comwrote:

  Is there any advantage to using supercolumns
  (columnFamilyName[superColumnName[columnName[val]]]) instead of
 regular
  columns with concatenated keys
  (columnFamilyName[superColumnName@columnName[val]])?
 
  When I designed my data model, I used supercolumns wherever I needed
 two
  levels of key depth - just because they were there, and I figured
 that they
  must be there for a reason.
 
  Now I see that in 0.7 secondary indexes don't work on supercolumns or
  subcolumns (is that right?), which seems to me like a very serious
  limitation of supercolumn families.
 
  It raises the question: Is there anything that supercolumn families
 are good
  for?

 There is a bunch of queries that you cannot do (or less conveniently)
 if you
 encode super

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn
As far as I can tell, if Cassandra supports three levels of configuration
(server, keyspace, column family) we can support multi-tenancy. It is
trivial to give each tenant their own keyspace (e.g. just use the tenant's
id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
on the CF level will be stopped at the keyspace and server level before
doing any damage.)

I don't think Cassandra needs to know about end-users. From Cassandra's
point of view the tenant is the user.

On Thu, Jan 20, 2011 at 7:00 AM, indika kumara indika.k...@gmail.comwrote:

 +1   Are there JIRAs for these requirements? I would like to contribute
 from my capacity.

 As per my understanding, to support some muti-tenant models, it is needed
 to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
 (or id). The easiest way to do this would be to modify corresponding
 constructs transparently. I tought of a stage (optional and configurable)
 prior to authorization. Is there any better solutions? I appreciate the
 community's suggestions.

 Moreover, It is needed to send the tenant NS(id) with the user credentials
 (A users belongs to this tenant (org.)). For that purpose, I thought of
 using the user credentials in the AuthenticationRequest. s there any better
 solution?

 I would like to have a MT support at the Cassandra level which is optional
 and configurable.

 Thanks,

 Indika


 On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn da...@lookin2.comwrote:

 Yes, the way I see it - and it becomes even more necessary for a
 multi-tenant configuration - there should be completely separate
 configurations for applications and for servers.

 - Application configuration is based on data and usage characteristics of
 your application.
 - Server configuration is based on the specific hardware limitations of
 the server.

 Obviously, server limitations take priority over application
 configuration.

 Assuming that each tenant in a multi-tenant environment gets one keyspace,
 you would also want to enforce limitations based on keyspace (which
 correspond to parameters that the tenant payed for).

 So now we have three levels:

 1. Server configuration (top priority)
 2. Keyspace configuration (payed-for service - second priority)
 3. Column family configuration (configuration provided by tenant - third
 priority)


 On Wed, Jan 19, 2011 at 3:15 PM, indika kumara indika.k...@gmail.comwrote:

 As the actual problem is mostly related to the number of CFs in the
 system (may be number of the columns), I still believe that supporting
 exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
 need some fixes.  That multi-tenancy model allows a tenant to use the
 programming model of the Cassandra ‘as-is’, enabling the seamless migration
 of an application that uses the Cassandra into the cloud. Moreover, In order
 to support different SLA requirements of different tenants, the
 configurability of keyspaces, cfs, etc., per tenant may be critical.
 However, there are trade-offs among usability, memory consumption, and
 performance. I believe that it is important to consider the SLA requirements
 of different tenants when deciding the strategies for controlling resource
 consumption.

 I like to the idea of system-wide parameters for controlling resource
 usage. I believe that the tenant-specific parameters are equally important.
 There are resources, and each tenant can claim a portion of them based on
 SLA. For instance, if there is a threshold on the number of columns per a
 node, it should be able to decide how many columns a particular tenant can
 have.  It allows selecting a suitable Cassandra cluster for a tenant based
 on his or her SLA. I believe the capability to configure resource
 controlling parameters per keyspace would be important to support a keyspace
 per tenant model. Furthermore, In order to maximize the resource sharing
 among tenants, a threshold (on a resource) per keyspace should not be a hard
 limit. Rather, it should be oscillated between a hard minimum and a maximum.
 For example, if a particular tenant needs more resources at a given time, he
 or she should be possible to borrow from the others up to the maximum. The
 threshold is only considered when a tenant is assigned to a cluster - the
 remaining resources of a cluster should be equal or higher than the resource
 limit of the tenant. It may need to spread a single keyspace across multiple
 clusters; especially when there are no enough resources in a single
 cluster.

 I believe that it would be better to have a flexibility to change
 seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
 the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
 on what I have learnt, each model requires adding tenant id (name space) to
 a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
 better to have a kind of pluggable handler that can access those resources
 prior

Re: Multi-tenancy, and authentication and authorization

2011-01-20 Thread David Boxenhorn
I have added my comments to this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2006

Good luck!

On Thu, Jan 20, 2011 at 1:53 PM, indika kumara indika.k...@gmail.comwrote:

 Thanks David We decided to do it at our client-side as the initial
 implementation. I will investigate the approaches for supporting the fine
 grained control of the resources consumed by a sever, tenant, and CF.

 Thanks,

 Indika

 On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn da...@lookin2.comwrote:

 As far as I can tell, if Cassandra supports three levels of configuration
 (server, keyspace, column family) we can support multi-tenancy. It is
 trivial to give each tenant their own keyspace (e.g. just use the tenant's
 id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
 on the CF level will be stopped at the keyspace and server level before
 doing any damage.)

 I don't think Cassandra needs to know about end-users. From Cassandra's
 point of view the tenant is the user.

 On Thu, Jan 20, 2011 at 7:00 AM, indika kumara indika.k...@gmail.comwrote:

 +1   Are there JIRAs for these requirements? I would like to contribute
 from my capacity.

 As per my understanding, to support some muti-tenant models, it is needed
 to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
 (or id). The easiest way to do this would be to modify corresponding
 constructs transparently. I tought of a stage (optional and configurable)
 prior to authorization. Is there any better solutions? I appreciate the
 community's suggestions.

 Moreover, It is needed to send the tenant NS(id) with the user
 credentials (A users belongs to this tenant (org.)). For that purpose, I
 thought of using the user credentials in the AuthenticationRequest. s there
 any better solution?

 I would like to have a MT support at the Cassandra level which is
 optional and configurable.

 Thanks,

 Indika


 On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn da...@lookin2.comwrote:

 Yes, the way I see it - and it becomes even more necessary for a
 multi-tenant configuration - there should be completely separate
 configurations for applications and for servers.

 - Application configuration is based on data and usage characteristics
 of your application.
 - Server configuration is based on the specific hardware limitations of
 the server.

 Obviously, server limitations take priority over application
 configuration.

 Assuming that each tenant in a multi-tenant environment gets one
 keyspace, you would also want to enforce limitations based on keyspace
 (which correspond to parameters that the tenant payed for).

 So now we have three levels:

 1. Server configuration (top priority)
 2. Keyspace configuration (payed-for service - second priority)
 3. Column family configuration (configuration provided by tenant - third
 priority)


 On Wed, Jan 19, 2011 at 3:15 PM, indika kumara 
 indika.k...@gmail.comwrote:

 As the actual problem is mostly related to the number of CFs in the
 system (may be number of the columns), I still believe that supporting
 exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
 need some fixes.  That multi-tenancy model allows a tenant to use the
 programming model of the Cassandra ‘as-is’, enabling the seamless 
 migration
 of an application that uses the Cassandra into the cloud. Moreover, In 
 order
 to support different SLA requirements of different tenants, the
 configurability of keyspaces, cfs, etc., per tenant may be critical.
 However, there are trade-offs among usability, memory consumption, and
 performance. I believe that it is important to consider the SLA 
 requirements
 of different tenants when deciding the strategies for controlling resource
 consumption.

 I like to the idea of system-wide parameters for controlling resource
 usage. I believe that the tenant-specific parameters are equally 
 important.
 There are resources, and each tenant can claim a portion of them based on
 SLA. For instance, if there is a threshold on the number of columns per a
 node, it should be able to decide how many columns a particular tenant can
 have.  It allows selecting a suitable Cassandra cluster for a tenant based
 on his or her SLA. I believe the capability to configure resource
 controlling parameters per keyspace would be important to support a 
 keyspace
 per tenant model. Furthermore, In order to maximize the resource sharing
 among tenants, a threshold (on a resource) per keyspace should not be a 
 hard
 limit. Rather, it should be oscillated between a hard minimum and a 
 maximum.
 For example, if a particular tenant needs more resources at a given time, 
 he
 or she should be possible to borrow from the others up to the maximum. The
 threshold is only considered when a tenant is assigned to a cluster - the
 remaining resources of a cluster should be equal or higher than the 
 resource
 limit of the tenant. It may need to spread a single keyspace across 
 multiple
 clusters

Re: Use Cassandra to store 2 million records of persons

2011-01-20 Thread David Boxenhorn
Cassandra is not a good solution for data mining type problems, since it
doesn't have ad-hoc queries. Cassandra is designed to maximize throughput,
which is not usually a problem for data mining.

On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh suriait2...@gmail.comwrote:

 Hi All

 I want to use Apache Cassandra to store information (like first name, last
 name, gender, address)  about 2 million people.  Then need to perform
 analytic and reporting on that data.
 is need to store information about 2 million people in Mysql and then
 transfer that information into Cassandra.?

 Please help me as i m new to Apache Cassandra.

 if you have any use case like that, please share.

 Thanks and regards
 Surender Singh




Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn
I'm not sure that you'd still want to retain the ability to individually
control how flushing happens on a per-cf basis in order to cater to
different workloads that benefit from different flushing behavior. It seems
to me like a good system-wide algorithm that works dynamically, and takes
into account moment-by-moment usage, can do this better than a human who is
guessing and making decisions on a static basis.

Having said that, my suggestion doesn't really depend so much on having one
memtable or many. Rather, it depends on making flushing behavior dependent
on system-wide parameters, which reflect the actual physical resources
available per node, rather than per-CF parameters (though per-CF tuning can
be taken into account, it should be a suggestion that gets overridden by
system-wide needs).



On Wed, Jan 19, 2011 at 10:48 AM, Peter Schuller 
peter.schul...@infidyne.com wrote:

  Right now there is a one-to-one mapping between memtables and SSTables.
  Instead of that, would it be possible to have one giant memtable for each
  Cassandra instance, with partial flushing to SSTs?

 I think a complication here is that, although I agree things need to
 be easier to tweak at least for the common case, I'm pretty sure you'd
 still want to retain the ability to individually control how flushing
 happens on a per-cf basis in order to cater to different workloads
 that benefit from different flushing behavior.

 I suspect the main concern here may be that there is a desire to have
 better overal control over how flushing happens and when writes start
 blocking, rather than necessarily implying that there can't be more
 than one memtable (the ticket Stu posted seems to address one such
 means of control).

 --
 / Peter Schuller



Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn
+1



On Wed, Jan 19, 2011 at 10:35 AM, Stu Hood stuh...@gmail.com wrote:

 Opened https://issues.apache.org/jira/browse/CASSANDRA-2006 with the
 solution we had suggested on the MultiTenant wiki page.


 On Tue, Jan 18, 2011 at 11:56 PM, David Boxenhorn da...@lookin2.comwrote:

 I think tuning of Cassandra is overly complex, and even with a single
 tenant you can run into problems with too many CFs.

 Right now there is a one-to-one mapping between memtables and SSTables.
 Instead of that, would it be possible to have one giant memtable for each
 Cassandra instance, with partial flushing to SSTs?

 It seems to me like a single memtable would make it MUCH easier to tune
 Cassandra, since the decision whether to (partially) flush the memtable to
 disk could be made on a node-wide basis, based on the resources you really
 have, instead of the guess-work that we are forced to do today.





Re: Multi-tenancy, and authentication and authorization

2011-01-19 Thread David Boxenhorn
Yes, the way I see it - and it becomes even more necessary for a
multi-tenant configuration - there should be completely separate
configurations for applications and for servers.

- Application configuration is based on data and usage characteristics of
your application.
- Server configuration is based on the specific hardware limitations of the
server.

Obviously, server limitations take priority over application configuration.

Assuming that each tenant in a multi-tenant environment gets one keyspace,
you would also want to enforce limitations based on keyspace (which
correspond to parameters that the tenant payed for).

So now we have three levels:

1. Server configuration (top priority)
2. Keyspace configuration (payed-for service - second priority)
3. Column family configuration (configuration provided by tenant - third
priority)


On Wed, Jan 19, 2011 at 3:15 PM, indika kumara indika.k...@gmail.comwrote:

 As the actual problem is mostly related to the number of CFs in the system
 (may be number of the columns), I still believe that supporting exposing the
 Cassandra ‘as-is’ to a tenant is doable and suitable though need some
 fixes.  That multi-tenancy model allows a tenant to use the programming
 model of the Cassandra ‘as-is’, enabling the seamless migration of an
 application that uses the Cassandra into the cloud. Moreover, In order to
 support different SLA requirements of different tenants, the configurability
 of keyspaces, cfs, etc., per tenant may be critical. However, there are
 trade-offs among usability, memory consumption, and performance. I believe
 that it is important to consider the SLA requirements of different tenants
 when deciding the strategies for controlling resource consumption.

 I like to the idea of system-wide parameters for controlling resource
 usage. I believe that the tenant-specific parameters are equally important.
 There are resources, and each tenant can claim a portion of them based on
 SLA. For instance, if there is a threshold on the number of columns per a
 node, it should be able to decide how many columns a particular tenant can
 have.  It allows selecting a suitable Cassandra cluster for a tenant based
 on his or her SLA. I believe the capability to configure resource
 controlling parameters per keyspace would be important to support a keyspace
 per tenant model. Furthermore, In order to maximize the resource sharing
 among tenants, a threshold (on a resource) per keyspace should not be a hard
 limit. Rather, it should be oscillated between a hard minimum and a maximum.
 For example, if a particular tenant needs more resources at a given time, he
 or she should be possible to borrow from the others up to the maximum. The
 threshold is only considered when a tenant is assigned to a cluster - the
 remaining resources of a cluster should be equal or higher than the resource
 limit of the tenant. It may need to spread a single keyspace across multiple
 clusters; especially when there are no enough resources in a single
 cluster.

 I believe that it would be better to have a flexibility to change
 seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
 the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
 on what I have learnt, each model requires adding tenant id (name space) to
 a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
 better to have a kind of pluggable handler that can access those resources
 prior to doing the actual operation so that the required changes can be
 done? May be prior to authorization.

 Thanks,

 Indika



Getting the version number

2011-01-19 Thread David Boxenhorn
Is there any way to use nodetool (or anything else) to get the Cassandra
version number of a deployed cluster?


Re: Getting the version number

2011-01-19 Thread David Boxenhorn
Yet another reason to move up to 0.7...

Thanks.

On Wed, Jan 19, 2011 at 5:27 PM, Daniel Lundin d...@eintr.org wrote:

 in 0.7 nodetool has a `version` command.

 On Wed, Jan 19, 2011 at 4:09 PM, David Boxenhorn da...@lookin2.com
 wrote:
  Is there any way to use nodetool (or anything else) to get the Cassandra
  version number of a deployed cluster?
 



Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn
Thanks. In other words, before I delete something, I should check to see
whether it exists as a live row in the first place.

On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote:

 On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com
 wrote:
  If I delete a row, and later on delete it again, before GCGraceSeconds
 has
  elapsed, does the tombstone live longer?

 Each delete is a new tombstone, which should answer your question.

 -ryan

  In other words, if I have the following scenario:
 
  GCGraceSeconds = 10 days
  On day 1 I delete a row
  On day 5 I delete the row again
 
  Will the tombstone be removed on day 10 or day 15?
 



Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn
Thanks, Aaron, but I'm not 100% clear.

My situation is this: My use case spins off rows (not columns) that I no
longer need and want to delete. It is possible that these rows were never
created in the first place, or were already deleted. This is a very large
cleanup task that normally deletes a lot of rows, and the last thing that I
want to do is create tombstones for rows that didn't exist in the first
place, or lengthen the life on disk of tombstones of rows that are already
deleted.

So the question is: before I delete, do I have to retrieve the row to see if
it exists in the first place?



On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.comwrote:

 AFAIK that's not necessary, there is no need to worry about previous
 deletes. You can delete stuff that does not even exist, neither batch_mutate
 or remove are going to throw an error.

 All the columns that were (roughly speaking) present at your first deletion
 will be available for GC at the end of the first tombstones life. Same for
 the second.

 Say you were to write a col between the two deletes with the same name as
 one present at the start. The first version of the col is avail for GC after
 tombstone 1, and the second after tombstone 2.

 Hope that helps
 Aaron

 On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote:

 Thanks. In other words, before I delete something, I should check to see
 whether it exists as a live row in the first place.

 On Tue, Jan 18, 2011 at 9:24 AM, Ryan King  r...@twitter.com
 r...@twitter.com wrote:

 On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn  da...@lookin2.com
 da...@lookin2.com wrote:
  If I delete a row, and later on delete it again, before GCGraceSeconds
 has
  elapsed, does the tombstone live longer?

 Each delete is a new tombstone, which should answer your question.

 -ryan

  In other words, if I have the following scenario:
 
  GCGraceSeconds = 10 days
  On day 1 I delete a row
  On day 5 I delete the row again
 
  Will the tombstone be removed on day 10 or day 15?
 





Re: Tombstone lifespan after multiple deletions

2011-01-18 Thread David Boxenhorn
Thanks.

On Tue, Jan 18, 2011 at 3:55 PM, Sylvain Lebresne sylv...@riptano.comwrote:

 On Tue, Jan 18, 2011 at 2:41 PM, David Boxenhorn da...@lookin2.com
 wrote:
  Thanks, Aaron, but I'm not 100% clear.
 
  My situation is this: My use case spins off rows (not columns) that I no
  longer need and want to delete. It is possible that these rows were never
  created in the first place, or were already deleted. This is a very large
  cleanup task that normally deletes a lot of rows, and the last thing that
 I
  want to do is create tombstones for rows that didn't exist in the first
  place, or lengthen the life on disk of tombstones of rows that are
 already
  deleted.
 
  So the question is: before I delete, do I have to retrieve the row to see
 if
  it exists in the first place?

 Yes, in your situation you do.

 
 
 
  On Tue, Jan 18, 2011 at 11:38 AM, Aaron Morton aa...@thelastpickle.com
  wrote:
 
  AFAIK that's not necessary, there is no need to worry about previous
  deletes. You can delete stuff that does not even exist, neither
 batch_mutate
  or remove are going to throw an error.
  All the columns that were (roughly speaking) present at your first
  deletion will be available for GC at the end of the first tombstones
 life.
  Same for the second.
  Say you were to write a col between the two deletes with the same name
 as
  one present at the start. The first version of the col is avail for GC
 after
  tombstone 1, and the second after tombstone 2.
  Hope that helps
  Aaron
  On 18/01/2011, at 9:37 PM, David Boxenhorn da...@lookin2.com wrote:
 
  Thanks. In other words, before I delete something, I should check to see
  whether it exists as a live row in the first place.
 
  On Tue, Jan 18, 2011 at 9:24 AM, Ryan King r...@twitter.com wrote:
 
  On Sun, Jan 16, 2011 at 6:53 AM, David Boxenhorn da...@lookin2.com
  wrote:
   If I delete a row, and later on delete it again, before
 GCGraceSeconds
   has
   elapsed, does the tombstone live longer?
 
  Each delete is a new tombstone, which should answer your question.
 
  -ryan
 
   In other words, if I have the following scenario:
  
   GCGraceSeconds = 10 days
   On day 1 I delete a row
   On day 5 I delete the row again
  
   Will the tombstone be removed on day 10 or day 15?
  
 
 
 



Re: Multi-tenancy, and authentication and authorization

2011-01-18 Thread David Boxenhorn
I think tuning of Cassandra is overly complex, and even with a single tenant
you can run into problems with too many CFs.

Right now there is a one-to-one mapping between memtables and SSTables.
Instead of that, would it be possible to have one giant memtable for each
Cassandra instance, with partial flushing to SSTs?

It seems to me like a single memtable would make it MUCH easier to tune
Cassandra, since the decision whether to (partially) flush the memtable to
disk could be made on a node-wide basis, based on the resources you really
have, instead of the guess-work that we are forced to do today.


Tombstone lifespan after multiple deletions

2011-01-16 Thread David Boxenhorn
If I delete a row, and later on delete it again, before GCGraceSeconds has
elapsed, does the tombstone live longer?

In other words, if I have the following scenario:

GCGraceSeconds = 10 days
On day 1 I delete a row
On day 5 I delete the row again

Will the tombstone be removed on day 10 or day 15?


Re: Usage Pattern : quot;uniquequot; value of a key.

2011-01-13 Thread David Boxenhorn
It is unlikely that both racing threads will have exactly the same
microsecond timestamp at the moment of creating a new user - so if data you
read
have exactly the same timestamp you used to write data - this is your data.

I think this would have to be combined with CL=QUORUM for both write and
read.

On Thu, Jan 13, 2011 at 9:57 AM, Oleg Anastasyev olega...@gmail.com wrote:

 Benoit Perroud benoit at noisette.ch writes:

 
  My idea to solve such use case is to have both thread writing the
  username, but with a colum like lock-RANDOM VALUE, and then read
  the row, and find out if the first lock column appearing belong to the
  thread. If this is the case, it can continue the process, otherwise it
  has been preempted by another thread.

 This looks ok for this task. As an alternative you can avoid creating extra
 \lock-random value' column and compare timestamps of new user data you just
 written. It is unlikely that both racing threads will have exactly the same
 microsecond timestamp at the moment of creating a new user - so if data you
 read
 have exactly the same timestamp you used to write data - this is your data.

 Another possible way is to use some external lock coordinator, eg
 zookeeper.
 Although for this task it looks a bit overkill, but this can become even
 more
 valuable, if you have more data concurrency issues to solve and can bear
 extra
 5-10ms update operations latency.




Re: Reclaim deleted rows space

2011-01-12 Thread David Boxenhorn
I think that if SSTs are partitioned within the node using RP, so that each
partition is small and can be compacted independently of all other
partitions, you can implement an algorithm that will spread out the work of
compaction over time so that it never takes a node out of commission, as it
does now.

I have left a comment here to that effect here:

https://issues.apache.org/jira/browse/CASSANDRA-1608?focusedCommentId=12980654page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12980654

On Mon, Jan 10, 2011 at 10:56 PM, Jonathan Ellis jbel...@gmail.com wrote:

 I'd suggest describing your approach on
 https://issues.apache.org/jira/browse/CASSANDRA-1608, and if it's
 attractive, porting it to 0.8.  It's too late for us to make deep
 changes in 0.6 and probably even 0.7 for the sake of stability.

 On Mon, Jan 10, 2011 at 8:00 AM, shimi shim...@gmail.com wrote:
  I modified the code to limit the size of the SSTables.
  I will be glad if someone can take a look at it
  https://github.com/Shimi/cassandra/tree/cassandra-0.6
  Shimi
 
  On Fri, Jan 7, 2011 at 2:04 AM, Jonathan Shook jsh...@gmail.com wrote:
 
  I believe the following condition within submitMinorIfNeeded(...)
  determines whether to continue, so it's not a hard loop.
 
  // if (sstables.size() = minThreshold) ...
 
 
 
  On Thu, Jan 6, 2011 at 2:51 AM, shimi shim...@gmail.com wrote:
   According to the code it make sense.
   submitMinorIfNeeded() calls doCompaction() which
   calls submitMinorIfNeeded().
   With minimumCompactionThreshold = 1 submitMinorIfNeeded() will always
   run
   compaction.
  
   Shimi
   On Thu, Jan 6, 2011 at 10:26 AM, shimi shim...@gmail.com wrote:
  
  
   On Wed, Jan 5, 2011 at 11:31 PM, Jonathan Ellis jbel...@gmail.com
   wrote:
  
   Pretty sure there's logic in there that says don't bother
 compacting
   a single sstable.
  
   No. You can do it.
   Based on the log I have a feeling that it triggers an infinite
   compaction
   loop.
  
  
   On Wed, Jan 5, 2011 at 2:26 PM, shimi shim...@gmail.com wrote:
How does minor compaction is triggered? Is it triggered Only when
 a
new
SStable is added?
   
I was wondering if triggering a compaction
with minimumCompactionThreshold
set to 1 would be useful. If this can happen I assume it will do
compaction
on files with similar size and remove deleted rows on the rest.
Shimi
On Tue, Jan 4, 2011 at 9:56 PM, Peter Schuller
peter.schul...@infidyne.com
wrote:
   
 I don't have a problem with disk space. I have a problem with
 the
 data
 size.
   
[snip]
   
 Bottom line is that I want to reduce the number of requests
 that
 goes to
 disk. Since there is enough data that is no longer valid I can
 do
 it
 by
 reclaiming the space. The only way to do it is by running Major
 compaction.
 I can wait and let Cassandra do it for me but then the data
 size
 will
 get
 even bigger and the response time will be worst. I can do it
 manually
 but I
 prefer it to happen in the background with less impact on the
 system
   
Ok - that makes perfect sense then. Sorry for misunderstanding :)
   
So essentially, for workloads that are teetering on the edge of
cache
warmness and is subject to significant overwrites or removals, it
may
be beneficial to perform much more aggressive background
 compaction
even though it might waste lots of CPU, to keep the in-memory
working
set down.
   
There was talk (I think in the compaction redesign ticket) about
potentially improving the use of bloom filters such that obsolete
data
in sstables could be eliminated from the read set without
necessitating actual compaction; that might help address cases
 like
these too.
   
I don't think there's a pre-existing silver bullet in a current
release; you probably have to live with the need for
greater-than-theoretically-optimal memory requirements to keep
 the
working set in memory.
   
--
/ Peter Schuller
   
   
  
  
  
   --
   Jonathan Ellis
   Project Chair, Apache Cassandra
   co-founder of Riptano, the source for professional Cassandra support
   http://riptano.com
  
  
  
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Why my posts are marked as spam?

2011-01-12 Thread David Boxenhorn
What's wrong with topposting?

This email is non-plain and topposted...

On Wed, Jan 12, 2011 at 4:32 PM, zGreenfelder zgreenfel...@gmail.comwrote:

 
  On 12 January 2011 05:28, Oleg Tsvinev oleg.tsvi...@gmail.com wrote:
   Whatever I do, it happens :(
 On Wed, Jan 12, 2011 at 1:53 AM, Arijit Mukherjee ariji...@gmail.com
 wrote:
 
  I think this happens for RTF. Some of the mails in the post are RTF,
  and the reply button creates an RTF reply - that's when it happens.
  Wonder how the mail to which I replied was in RTF...
 
  Arijit
 
 
  --
  And when the night is cloudy,
  There is still a light that shines on me,
  Shine on until tomorrow, let it be.

 I think it happens for any non-plain text.. be it RTF, HTML, or
 whatever.   at least that's been my limited experience with mailing
 lists.

 and for what it's worth (I just had to correct myself, so don't take
 this as huge criticism), many people are also opposed to topposting ..
 or adding a reply to the top of an email.   FWIW.

 --
 Even the Magic 8 ball has an opinion on email clients: Outlook not so good.



Re: Bootstrapping taking long

2011-01-05 Thread David Boxenhorn
My nodes all have themselves in their list of seeds - always did - and
everything works. (You may ask why I did this. I don't know, I must have
copied it from an example somewhere.)

On Wed, Jan 5, 2011 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:

 I was able to make the node join the ring but I'm confused.
 What I did is, first when adding the node, this node was not in the seeds
 list of itself. AFAIK this is how it's supposed to be. So it was able to
 transfer all data to itself from other nodes but then it stayed in the
 bootstrapping state.
 So what I did (and I don't know why it works), is add this node to the
 seeds list in its own storage-conf.xml file. Then restart the server and
 then I finally see it in the ring...
 If I had added the node to the seeds list of itself when first joining it,
 it would not join the ring but if I do it in two phases it did work.
 So it's either my misunderstanding or a bug...


 On Wed, Jan 5, 2011 at 7:14 AM, Ran Tavory ran...@gmail.com wrote:

 The new node does not see itself as part of the ring, it sees all others
 but itself, so from that perspective the view is consistent.
 The only problem is that the node never finishes to bootstrap. It stays in
 this state for hours (It's been 20 hours now...)


 $ bin/nodetool -p 9004 -h localhost streams
 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.


 On Wed, Jan 5, 2011 at 1:20 AM, Nate McCall n...@riptano.com wrote:

 Does the new node have itself in the list of seeds per chance? This
 could cause some issues if so.

 On Tue, Jan 4, 2011 at 4:10 PM, Ran Tavory ran...@gmail.com wrote:
  I'm still at lost.   I haven't been able to resolve this. I tried
  adding another node at a different location on the ring but this node
  too remains stuck in the bootstrapping state for many hours without
  any of the other nodes being busy with anti compaction or anything
  else. I don't know what's keeping it from finishing the bootstrap,no
  CPU, no io, files were already streamed so what is it waiting for?
  I read the release notes of 0.6.7 and 0.6.8 and there didn't seem to
  be anything addressing a similar issue so I figured there was no point
  in upgrading. But let me know if you think there is.
  Or any other advice...
 
  On Tuesday, January 4, 2011, Ran Tavory ran...@gmail.com wrote:
  Thanks Jake, but unfortunately the streams directory is empty so I
 don't think that any of the nodes is anti-compacting data right now or had
 been in the past 5 hours. It seems that all the data was already transferred
 to the joining host but the joining node, after having received the data
 would still remain in bootstrapping mode and not join the cluster. I'm not
 sure that *all* data was transferred (perhaps other nodes need to transfer
 more data) but nothing is actually happening so I assume all has been moved.
  Perhaps it's a configuration error from my part. Should I use I use
 AutoBootstrap=true ? Anything else I should look out for in the
 configuration file or something else?
 
 
  On Tue, Jan 4, 2011 at 4:08 PM, Jake Luciani jak...@gmail.com
 wrote:
 
  In 0.6, locate the node doing anti-compaction and look in the
 streams subdirectory in the keyspace data dir to monitor the
 anti-compaction progress (it puts new SSTables for bootstrapping node in
 there)
 
 
  On Tue, Jan 4, 2011 at 8:01 AM, Ran Tavory ran...@gmail.com wrote:
 
 
  Running nodetool decommission didn't help. Actually the node refused
 to decommission itself (b/c it wasn't part of the ring). So I simply stopped
 the process, deleted all the data directories and started it again. It
 worked in the sense of the node bootstrapped again but as before, after it
 had finished moving the data nothing happened for a long time (I'm still
 waiting, but nothing seems to be happening).
 
 
 
 
  Any hints how to analyze a stuck bootstrapping node??thanks
  On Tue, Jan 4, 2011 at 1:51 PM, Ran Tavory ran...@gmail.com wrote:
  Thanks Shimi, so indeed anticompaction was run on one of the other
 nodes from the same DC but to my understanding it has already ended. A few
 hour ago...
 
 
 
  I plenty of log messages such as [1] which ended a couple of hours
 ago, and I've seen the new node streaming and accepting the data from the
 node which performed the anticompaction and so far it was normal so it
 seemed that data is at its right place. But now the new node seems sort of
 stuck. None of the other nodes is anticompacting right now or had been
 anticompacting since then.
 
 
 
 
  The new node's CPU is close to zero, it's iostats are almost zero so I
 can't find another bottleneck that would keep it hanging.
  On the IRC someone suggested I'd maybe retry to join this node,
 e.g. decommission and rejoin it again. I'll try it now...
 
 
 
 
 
 
  [1] INFO [COMPACTION-POOL:1] 2011-01-04 04:04:09,721
 CompactionManager.java (line 338) AntiCompacting
 

Re: The CLI sometimes gets 100 results even though there are more, and sometimes gets more than 100

2011-01-05 Thread David Boxenhorn
I know that there's a limit, and I just assumed that the CLI set it to 100,
until I saw more than 100 results.

On Wed, Jan 5, 2011 at 6:56 PM, Peter Schuller
peter.schul...@infidyne.comwrote:

  The CLI sometimes gets only 100 results (even though there are more) -
 and
  sometimes gets all the results, even when there are more than 100!
 
  What is going on here? Is there some logic that says if there are too
 many
  results return 100, even though too many can be more than 100?

 API calls have a limit since streaming is not supported and you could
 potentially have almost arbitrary large result sets. I believe
 cassandra-cli will allow you to set the limit if you look at the
 'help' output and look for the word 'limit'.

 The way to iterate over large amounts of data is to do paging, with
 multiple queries.

 --
 / Peter Schuller



Re: iterate over all the rows with RP

2010-12-13 Thread David Boxenhorn
Shimi, I am using Hector to do exactly what you want to do, with no
problems.

(In fact, the question didn't even occur to me...)

On Sun, Dec 12, 2010 at 9:03 PM, Ran Tavory ran...@gmail.com wrote:

 This should be the case, yes, semantics isn't affected by the
 connection and state isn't kept. What might happen if you read/write
 with low consistency levels then when you hit a different host on the
 ring it might have an inconsistent state in case of partition.

 On Sunday, December 12, 2010, shimi shim...@gmail.com wrote:
  So if I will use a different connection (thrift via Hector), will I get
 the same results? It's make sense when you use OPP and I assume it is the
 same with RP. I just wanted to make sure this is the case and there is no
 state which is kept.
 
  Shimi
 
  On Sun, Dec 12, 2010 at 8:14 PM, Peter Schuller 
 peter.schul...@infidyne.com wrote:
 
  Is the same connection is required when iterating over all the rows with
  Random Paritioner or is it possible to use a different connection for
 each
  iteration?
 
  In general, the choice of RPC connection (I assume you mean the
  underlying thrift connection) does not affect the semantics of the RPC
  calls.
 
  --
  / Peter Schuller
 
 
 

 --
 /Ran



Re: N to N relationships

2010-12-12 Thread David Boxenhorn
You want to store every value twice? That would be a pain to maintain, and
possibly lead to inconsistent data.

On Fri, Dec 10, 2010 at 3:50 AM, Nick Bailey n...@riptano.com wrote:

 I would also recommend two column families. Storing the key as NxN would
 require you to hit multiple machines to query for an entire row or column
 with RandomPartitioner. Even with OPP you would need to pick row or columns
 to order by and the other would require hitting multiple machines.  Two
 column families avoids this and avoids any problems with choosing OPP.


 On Thu, Dec 9, 2010 at 2:26 PM, Aaron Morton aa...@thelastpickle.comwrote:

 Am assuming you have one matrix and you know the dimensions. Also as you
 say the most important queries are to get an entire column or an entire row.

 I would consider using a standard CF for the Columns and one for the Rows.
  The key for each would be the col / row number, each cassandra column name
 would be the id of the other dimension and the value whatever you want.

 - when storing the data update both the Column and Row CF
 - reading a whole row/col would be simply reading from the appropriate CF.
 - reading an intersection is a get_slice to either col or row CF using the
 column_names field to identify the other dimension.

 You would not need secondary indexes to serve these queries.

 Hope that helps.
 Aaron

 On 10 Dec, 2010,at 07:02 AM, Sébastien Druon sdr...@spotuse.com wrote:

 I mean if I have secondary indexes. Apparently they are calculated in the
 background...

 On 9 December 2010 18:33, David Boxenhorn da...@lookin2.com wrote:

 What do you mean by indexing?


 On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again



 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.



 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the
 best performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index
 for the columns, right? and then get all the lines where the value of the
 column in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What
 impact in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon









Re: N to N relationships

2010-12-09 Thread David Boxenhorn
How about a regular CF where keys are n...@n ?

Then, getting a matrix row would be the same cost as getting a matrix column
(N gets), and it would be very easy to add element N+1.


On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix will
 mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact in
 terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon



Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
It seems to me that secondary indexes (new in 0.7) change everything when it
comes to data modeling.

- OOP becomes obsolete
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?


Re: Secondary indexes change everything?

2010-12-09 Thread David Boxenhorn
- OPP becomes obsolete (OOP is not obsolete!)
- primary indexes become obsolete if you ever want to do a range query
(which you probably will...), better to assign a random row id

Taken together, it's likely that very little will remain of your old
database schema...

Am I right?


Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
In other words, if you want to use QUORUM, you need to set RF=3.

(I know because I had exactly the same problem.)

On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com wrote:

 I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, you
 have
 2 replicas. And since quorum is also 2 with that replication factor,
 you cannot lose
 a node, otherwise some query will end up as UnavailableException.

 Again, this is not related to the total number of nodes. Even with 200
 nodes, if
 you use RF=2, you will have some query that fail (altough much less that
 what
 you are probably seeing).

 On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
 
  On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
 
  Quorum is really only useful when RF  2, since the for a quorum to
  succeed RF/2+1 replicas must be available.
 
  2/2+1==2 and I killed 1 of 3, so... don't get it.
 
  This means for RF = 2, consistency levels QUORUM and ALL yield the same
 result.
 
  /d
 
  On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  Hi!
 
  I've 3 servers running (0.7rc1) with a replication_factor of 2 and use
 quorum for writes. But when I shut down one of them UnavailableExceptions
 are thrown. Why is that? Isn't that the sense of quorum and a fault-tolerant
 DB that it continues with the remaining 2 nodes and redistributes the data
 to the broken one as soons as its up again?
 
  What may I be doing wrong?
 
  thx
  tcn
 
 



Re: Quorum: killing 1 out of 3 server kills the cluster (?)

2010-12-09 Thread David Boxenhorn
If that is what you want, use CL=ONE

On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig timo.nent...@toptarif.dewrote:


 On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

  In other words, if you want to use QUORUM, you need to set RF=3.
 
  (I know because I had exactly the same problem.)

 I naively assume that if I kill either node that holds N1 (i.e. node 1 or
 3), N1 will still remain on another node. Only if both fail, I actually lose
 data. But apparently this is not how it works...

  On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne sylv...@yakaz.com
 wrote:
  I'ts 2 out of the number of replicas, not the number of nodes. At RF=2,
 you have
  2 replicas. And since quorum is also 2 with that replication factor,
  you cannot lose
  a node, otherwise some query will end up as UnavailableException.
 
  Again, this is not related to the total number of nodes. Even with 200
  nodes, if
  you use RF=2, you will have some query that fail (altough much less that
 what
  you are probably seeing).
 
  On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig timo.nent...@toptarif.de
 wrote:
  
   On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
  
   Quorum is really only useful when RF  2, since the for a quorum to
   succeed RF/2+1 replicas must be available.
  
   2/2+1==2 and I killed 1 of 3, so... don't get it.
  
   This means for RF = 2, consistency levels QUORUM and ALL yield the
 same result.
  
   /d
  
   On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig 
 timo.nent...@toptarif.de wrote:
   Hi!
  
   I've 3 servers running (0.7rc1) with a replication_factor of 2 and
 use quorum for writes. But when I shut down one of them
 UnavailableExceptions are thrown. Why is that? Isn't that the sense of
 quorum and a fault-tolerant DB that it continues with the remaining 2 nodes
 and redistributes the data to the broken one as soons as its up again?
  
   What may I be doing wrong?
  
   thx
   tcn
  
  
 




Re: N to N relationships

2010-12-09 Thread David Boxenhorn
What do you mean by indexing?

On Thu, Dec 9, 2010 at 7:30 PM, Sébastien Druon sdr...@spotuse.com wrote:

 Thanks a lot for the answer

 What about the indexing when adding a new element? Is it incremental?

 Thanks again


 On 9 December 2010 14:38, David Boxenhorn da...@lookin2.com wrote:

 How about a regular CF where keys are n...@n ?

 Then, getting a matrix row would be the same cost as getting a matrix
 column (N gets), and it would be very easy to add element N+1.


 On Thu, Dec 9, 2010 at 1:48 PM, Sébastien Druon sdr...@spotuse.comwrote:

 Hello,

 For a specific case, we are thinking about representing a N to N
 relationship with a NxN Matrix in Cassandra.
 The relations will be only between a subset of elements, so the Matrix
 will mostly contain empty elements.

 We have a set of questions concerning this:
 - what is the best way to represent this matrix? what would have the best
 performance in reading? in writing?
   . a super column family with n column families, with n columns each
   . a column family with n columns and n lines

 In the second case, we would need to extract 2 kinds of information:
 - all the relations for a line: this should be no specific problem;
 - all the relations for a column: in that case we would need an index for
 the columns, right? and then get all the lines where the value of the column
 in question is not null... is it the correct way to do?
 When using indexes, say we want to add another element N+1. What impact
 in terms of time would it have on the indexation job?

 Thanks a lot for the answers,

 Best regards,

 Sébastien Druon






Using mySQL to emulate Cassandra

2010-11-28 Thread David Boxenhorn
As our launch date approaches, I am getting increasingly nervous about
Cassandra tuning. It is a mysterious black art that I haven't mastered even
at the low usages that we have now. I know of a few more things I can do to
improve things, but how will I know if it is enough? All this is
particularly ironic since - as we are just starting out - we don't have
scalability problems yet, though we hope to!

Luckily, I have completely wrapped Cassandra in an entity mapper, so that I
can easily trade in something else, perhaps temporarily, until we really
need Cassandra's scalability.

So, I'm thinking of emulating Cassandra with mySQL. I would use mySQL either
as a simple key-value store, without joins, or map Cassandra supercolumns to
mySQL columns, probably of type CLOB.

Does anyone want to talk me out of this?


Taking down a node in a 3-node cluster, RF=2

2010-11-28 Thread David Boxenhorn
For the vast majority of my data usage eventual consistency is fine (i.e.
CL=ONE) but I have a small amount of critical data for which I read and
write using CL=QUORUM.

If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that mean that
a value can be read from or written to any 2 nodes, or does it have to be
the particular 2 nodes that store the data? If it is the particular 2 nodes
that store the data, that means that I can't even take down one node, since
it will be the mandatory 2nd node for 1/3 of my data...


Re: Taking down a node in a 3-node cluster, RF=2

2010-11-28 Thread David Boxenhorn
Thank you, Jake. It does... except that in another context you told me:

Hints only happen when a node is unavailable and you are writing with CL.ANY
If you never write with CL.ANY then you can turn off hinted handoff.

How do I reconcile this?


On Sun, Nov 28, 2010 at 7:11 PM, Jake Luciani jak...@gmail.com wrote:

 If you read/write data with quorum then you can safely take a node down in
 this scenario.  Subsequent writes will use hinted handoff to be passed to
 the node when it comes back up.

 More info is here: http://wiki.apache.org/cassandra/HintedHandoff

 Does that answer your question?

 -Jake


 On Sun, Nov 28, 2010 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:

 to me it makes sense that if hinted handoff is off then cassandra cannot
 satisfy 2 out of every 3rd writes writes when one of the nodes is down since
 this node is the designated node of 2/3 writes.
 But I don't remember reading this somewhere. Does hinted handoff affect
 David's situation?
 (David, did you disable HH in your storage-config?
 HintedHandoffEnabledfalse/HintedHandoffEnabled)


 On Sun, Nov 28, 2010 at 4:32 PM, David Boxenhorn da...@lookin2.comwrote:

 For the vast majority of my data usage eventual consistency is fine (i.e.
 CL=ONE) but I have a small amount of critical data for which I read and
 write using CL=QUORUM.

 If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that mean
 that a value can be read from or written to any 2 nodes, or does it have to
 be the particular 2 nodes that store the data? If it is the particular 2
 nodes that store the data, that means that I can't even take down one node,
 since it will be the mandatory 2nd node for 1/3 of my data...




 --
 /Ran





Re: Taking down a node in a 3-node cluster, RF=2

2010-11-28 Thread David Boxenhorn
OK. To sum up: RF=2 and QUORUM are incompatible (if you want to be able to
take a node down).

Right?

On Sun, Nov 28, 2010 at 7:59 PM, Jake Luciani jak...@gmail.com wrote:

 I was wrong on this scenario and I'll explain where I was incorrect.

 Hints are stored for a downed node but they don't count towards meeting a
 consistency level.
 Let's take 2 scenarios:

 RF=6, Nodes=10

 If you READ/WRITE with CL.QUORUM you will need 4 alive nodes if one is down
 it will still have 4 active replicas to write to, one of these will store a
 hint and update the downed node when it comes back.

 RF=2, Nodes=3

 If you READ/WRITE with CL.QUORUM you need 2 live nodes.  If one of these 2
 are down you can't meet the QUORUM level so the write will fail.

 In your scenario your best bet is to update to RF=3, then any two nodes
 will accept QUORUM

 Sorry for the confusion,

 -Jake

 On Sun, Nov 28, 2010 at 12:26 PM, David Boxenhorn da...@lookin2.comwrote:

 Thank you, Jake. It does... except that in another context you told me:

 Hints only happen when a node is unavailable and you are writing with
 CL.ANY
 If you never write with CL.ANY then you can turn off hinted handoff.

 How do I reconcile this?


 On Sun, Nov 28, 2010 at 7:11 PM, Jake Luciani jak...@gmail.com wrote:

 If you read/write data with quorum then you can safely take a node down
 in this scenario.  Subsequent writes will use hinted handoff to be passed to
 the node when it comes back up.

 More info is here: http://wiki.apache.org/cassandra/HintedHandoff

 Does that answer your question?

 -Jake


 On Sun, Nov 28, 2010 at 9:42 AM, Ran Tavory ran...@gmail.com wrote:

 to me it makes sense that if hinted handoff is off then cassandra cannot
 satisfy 2 out of every 3rd writes writes when one of the nodes is down 
 since
 this node is the designated node of 2/3 writes.
 But I don't remember reading this somewhere. Does hinted handoff affect
 David's situation?
 (David, did you disable HH in your storage-config?
 HintedHandoffEnabledfalse/HintedHandoffEnabled)


 On Sun, Nov 28, 2010 at 4:32 PM, David Boxenhorn da...@lookin2.comwrote:

 For the vast majority of my data usage eventual consistency is fine
 (i.e. CL=ONE) but I have a small amount of critical data for which I read
 and write using CL=QUORUM.

 If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that mean
 that a value can be read from or written to any 2 nodes, or does it have 
 to
 be the particular 2 nodes that store the data? If it is the particular 2
 nodes that store the data, that means that I can't even take down one 
 node,
 since it will be the mandatory 2nd node for 1/3 of my data...




 --
 /Ran







Re: Facebook messaging and choice of HBase over Cassandra - what can we learn?

2010-11-22 Thread David Boxenhorn
It's true that Cassandra has tunable consistency, but if eventual
consistency is not sufficient for most of your use cases, Cassandra becomes
much less attractive. Am I wrong?



On Sun, Nov 21, 2010 at 7:56 PM, Eric Evans eev...@rackspace.com wrote:

 On Sun, 2010-11-21 at 11:32 -0500, Simon Reavely wrote:
  As a cassandra user I think the key sentence for this community is:
  We found Cassandra's eventual consistency model to be a difficult
  pattern to reconcile for our new Messages infrastructure.

 In my experience, we needed strong consistency, in conversations like
 these amounts to hand waving.  It's the fastest way to shut down that
 part of the discussion without having said anything at all.

  I think it would be useful to find out more about this statement from
  Kannan and the facebook team. Does anyone have any contacts in the
  Facebook team?

 Good luck.  Facebook is notoriously tight-lipped about such things.

  My goal here is to understand usage patterns and whether or not the
  Cassandra community can learn from this decision; maybe even
  understand whether the Cassandra roadmap should be influenced by this
  decision to address a target user base. Of course we might also
  conclude that its just not a Cassandra use-case!

 Understanding is a laudable goal, just try to avoid drawing conclusions
 (and call out others who are).

 rant
 This is usually the point where a frenzy kicks in and folks assume that
 the Smart Guys at Facebook know something they don't, something that
 would invalidate their decision if they'd only known.

 I seriously doubt they've uncovered some Truth that would fundamentally
 alter the reasoning behind *my* decision to use Cassandra, and so I plan
 to continue as I always have.  Following relevant research and
 development, collecting experience (my own and others), and applying it
 to the problems I face.
 /rant

 --
 Eric Evans
 eev...@rackspace.com




Re: Facebook messaging and choice of HBase over Cassandra - what can we learn?

2010-11-22 Thread David Boxenhorn
Yes, but the value is supposed to be 11, since the write failed.

On Mon, Nov 22, 2010 at 2:27 PM, André Fiedler fiedler.an...@googlemail.com
 wrote:

 Doesn´t sync Cassandra all nodes if the network is up again? I think this
 was one of the reasons, storing a timestamp at every key/value pair?
 So i think the response will only temporary be 11. If all nodes have synct
 it should be 12? Or isn´t that so?

 greetings André

 2010/11/22 Samuel Carrière samuel.carri...@gmail.com

 Cassandra can work in a consistent way, see some of this discussion and
 the Consistency section here
 http://wiki.apache.org/cassandra/ArchitectureOverview
 
 If you always read and write with CL.Quorum (or the other way discussed)
 you will have consistency. Even if some of the replicas are temporarily
 inconsistent, or off line or whatever. Your reads will be consistent, i.e.
 every client will get the same value or the read will not work. If you want
 to work at a lower or higher consistency you can.
 
 Eventually all replicas of a value will become consistent.
 
 There are a number of reasons why cassandra may not be a good fit, and I
 would guess something else would be a problem before the consistency model.
 
 Hope that helps.
 Aaron

 Hello,

 I like cassandra a lot and I'm sure it can be used in many use cases,
 but I'm not sure we can say that we have strong consistency,
 even if we read and write with CL.Quorum.

 Firstly, we can only expect consistency at the column level. Reading
 and writing with CL.Quorum gives you most of the time
 a consistent value for each individual column, but it does not mean if
 gives you a consistent view of your data.
 (Because cassandra gives you no isolation and no transactions, your
 application has to deal with data inconsistencies).

 Secondly, I may be wrong, but I'm not sure consistency at the column
 level is guaranteed. Here is an example, with a replication
 factor of 3.
 Imagine that the current value of col1 is 11. Your application tries
 to write col1 = 12 with CL.Quorum.
 Imagine the write arrives to node 1, but that the new value is not
 transmitted to nodes 2 and 3 because of network failures. So
 the write fails (this is the expected behaviour), but node 1 still has
 the new value (there is no rollback).

 Then, imagine that the network is back to normal, and that another
 client asked for the value of col1, with CL.Quorum. Here,
 the value of the response is not guaranteed. If the client asks for
 the value to node 2 and node 3, the response will be 11, but
 if he asks to node 1 and node 2 or 3, the response will be 12.

 Am I missing something ?

 Samuel





Consulting for Rollout + Cassandra

2010-07-13 Thread David Boxenhorn
We are planning a rollout of our online product ~September 1. Cassandra is a
major part of our online system.

We need some Cassandra consulting + general online consulting for
determining our server configuration so it will support Cassandra under all
possible scenarios.

Does anybody have any ideas for us?

Thanks!


OPP + Hash on client side

2010-07-07 Thread David Boxenhorn
Is there any strategy for using OPP with a hash algorithm on the client side
to get both uniform distribution of data in the cluster *and* the ability to
do range queries?

I'm thinking of something like this:

cassKey = (key % 97) + @ + key;

cassRange = 0 + @ + range; 1 + @ + range; ... 96 + @ + range;

Would something like that work?


Re: OPP + Hash on client side

2010-07-07 Thread David Boxenhorn
Aaron, thank you for the link.

What is discussed there is not exactly what I am thinking of. They propose
distributing the keys with MD5(ROWKEY).ROWKEY - which will distribute
the values in a way that cannot easily be reversed. What I am proposing is
to distribute the keys evenly among N buckets, where N is much larger than
your number of nodes, and then construct my range queries as the union of N
range queries that I actually perform on Cassandra.

You can do range queries with the Random Partitioner in 0.6.*

I went though this before, it's not true. What you can do is loop over your
entire set of keys in random order. There is no way to get an actual range
other than the whole range.


On Wed, Jul 7, 2010 at 1:15 PM, Aaron Morton aa...@thelastpickle.comwrote:

 That pattern is discussed here
 http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 It's also used in http://github.com/tjake/Lucandra

 You can do range queries with the Random Partitioner in 0.6.*, the order of
 the return is undefined and it's a bit slower.

 I think it's normally used when you want ordered range queries in some CF's
 and random distribution in others.

 Aaron


 On 07 Jul, 2010,at 09:47 PM, David Boxenhorn da...@lookin2.com wrote:

 Is there any strategy for using OPP with a hash algorithm on the client
 side to get both uniform distribution of data in the cluster *and* the
 ability to do range queries?

 I'm thinking of something like this:

 cassKey = (key % 97) + @ + key;

 cassRange = 0 + @ + range; 1 + @ + range; ... 96 + @ + range;

 Would something like that work?




  1   2   >