compression

2012-09-23 Thread Tamar Fraenkel
Hi! In datastax documentationhttp://www.datastax.com/docs/1.0/ddl/column_familythere is an explanation of what CFs are a good fit for compression: When to Use Compression Compression is best suited for column families where there are many rows, with each row having the same columns, or at least

Re: Correct model

2012-09-23 Thread Marcelo Elias Del Valle
2012/9/20 aaron morton aa...@thelastpickle.com I would consider: # User CF * row_key: user_id * columns: user properties, key=value # UserRequests CF * row_key: user_id : partition_start where partition_start is the start of a time partition that makes sense in your domain. e.g.

Re: Correct model

2012-09-23 Thread Hiller, Dean
But the only advantage in this solution is to split data among partitions? You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying a lot less rows. What do you mean here by current partition? He

found major difference in CQL vs Scalable SQL(PlayOrm) and question

2012-09-23 Thread Hiller, Dean
I have been digging more and more into CQL vs. PlayOrm S-SQL and found a major difference that is quite interesting(thought you might be interested plus I have a question). CQL uses a composite row key with the prefix so now any other tables that want to reference that entity have references to

Re: batch_mutate and erlang

2012-09-23 Thread Tyler Hobbs
It's a pretty solid standard at this point. The large majority of client library work from this point on will be based on cql. On Sun, Sep 23, 2012 at 12:45 AM, Bradford Toney bradford.to...@gmail.comwrote: Yeah i've seen how it's done in CQL3 is just wasn't sure if it was a solid standard

Re: compression

2012-09-23 Thread Tyler Hobbs
Due to repetition in the column metadata, you're still likely to get a reasonable amount of compression. This is especially true if there is some amount of repetition in the column names, values, or TTLs in wide rows. Compression will almost always be beneficial unless you're already somehow CPU

Re: Cassandra Messages Dropped

2012-09-23 Thread Michael Theroux
There were no errors in the log (other than the messages dropped exception pasted below), and the node does recover. We have only a small number of secondary indexes (3 in the whole system). However, I went through the cassandra code, and I believe I've worked through this problem. Just to

Re: compression

2012-09-23 Thread Hiller, Dean
As well as your unlimited column names may all have the same prefix, right? Like accounts.rowkey56, accounts.rowkey78, etc. etc. so the accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs ty...@datastax.commailto:ty...@datastax.com Reply-To:

Secondary index loss on node restart

2012-09-23 Thread Michael Theroux
Hello, We have been noticing an issue where, about 50% of the time in which a node fails or is restarted, secondary indexes appear to be partially lost or corrupted. A drop and re-add of the index appears to correct the issue. There are no errors in the cassandra logs that I see. Part of

Re: [problem with OOM in nodes]

2012-09-23 Thread aaron morton
/var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 Is it bad signal? Sorry, I do not know what this is outputting. As I can see in cfstats, compacted row maximum

Re: any ways to have compaction use less disk space?

2012-09-23 Thread Віталій Тимчишин
If you think about space, use Leveled compaction! This won't only allow you to fill more space, but also will shrink you data much faster in case of updates. Size compaction can give you 3x-4x more space used than there are live data. Consider the following (our simplified) scenario: 1) The data

Re: any ways to have compaction use less disk space?

2012-09-23 Thread Aaron Turner
On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин tiv...@gmail.com wrote: If you think about space, use Leveled compaction! This won't only allow you to fill more space, but also will shrink you data much faster in case of updates. Size compaction can give you 3x-4x more space used than there

Re: CQL 2, CQL 3 and Thrift confusion

2012-09-23 Thread Sylvain Lebresne
In CQL3, names are case insensitive by default, while they were case sensitive in CQL2. You can force whatever case you want in CQL3 however using double quotes. So in other words, in CQL3, USE TestKeyspace; should work as expected. -- Sylvain On Sun, Sep 23, 2012 at 9:22 PM, Oleksandr Petrov

Re: Disk configuration in new cluster node

2012-09-23 Thread Aaron Turner
On Fri, Sep 21, 2012 at 2:05 AM, aaron morton aa...@thelastpickle.com wrote: Would it help if I partitioned the computing resources of my physical machines into VMs? No. Just like cutting a cake into smaller pieces does not mean you can eat more without getting fat. In the general case,

Re: Varchar indexed column and IN(...)

2012-09-23 Thread aaron morton
If this is intended behavior, could somebody please point me to where this is documented? It is intended. The docs don't make it totally clear though: clause syntax is: primary key name { = | | | = | = } key_value primary key name IN (key_value [,...])

Re: Correct model

2012-09-23 Thread aaron morton
Yup. (Multi get is just a convenience method, it explodes into multiple gets on the server side. ) Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 24/09/2012, at 5:01 AM, Hiller, Dean dean.hil...@nrel.gov wrote: But the only advantage

Re: Cassandra Messages Dropped

2012-09-23 Thread aaron morton
To put in other words, Cassandra will lock down all tables until all pending flush requests fit in the pending queue. This was the first issue I looked at in my Cassandra SF talk http://www.datastax.com/events/cassandrasummit2012/presentations I've seen it occur more often with

Re: Cassandra Messages Dropped

2012-09-23 Thread Michael Theroux
Love the Mars lander analogies :) On Sep 23, 2012, at 5:39 PM, aaron morton wrote: To put in other words, Cassandra will lock down all tables until all pending flush requests fit in the pending queue. This was the first issue I looked at in my Cassandra SF talk

Re: Cassandra simulator

2012-09-23 Thread Tyler Hobbs
You might find these two projects useful: - ccm, which makes it easy to run a cluster on a single machine: https://github.com/pcmanus/ccm - Cassanova, which supports a large portion of the Thrift API with a lightweight python process: https://github.com/riptano/Cassanova On Sun, Sep 23, 2012 at

Re: [problem with OOM in nodes]

2012-09-23 Thread Denis Gabaydulin
On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com wrote: /var/log/cassandra$ cat system.log | grep Compacting large | grep -E [0-9]+ bytes -o | cut -d -f 1 | awk '{ foo = $1 / 1024 / 1024 ; print foo MB }' | sort -nr | head -n 50 Is it bad signal? Sorry, I do not