Re: Commit log + Data directory on same partition (software raid)

2012-08-11 Thread Thibaut Britz
Unfortunately ssd drives are no option at the moment. I have to use 2 regular hds. Has anyone tried above scenario? THanks, Thibaut On Fri, Aug 10, 2012 at 3:30 PM, Radim Kolar h...@filez.com wrote: I was thinking about putting both the commit log and the data directory on a software raid

Query for last (composite) columns

2012-08-11 Thread Ersin Er
Hi, I am new to Cassandra and trying to understand whether it's a good fit for my problems. So here is a case from my domain: Assume that we're storing session events of users in composite columns within a column family partitioned by user id. This is from an example given about composite

Re: Commit log + Data directory on same partition (software raid)

2012-08-11 Thread Tom Duffield
Having the both the commit log and data directory on the same volume is generally not recommended. You would actually see a performance decrease unless you can have most your reads be cache hits. On Friday, August 10, 2012, Thibaut Britz wrote: Hi, Has anyone of you made some experience with

Cassandra OOM crash while mapping commitlog

2012-08-11 Thread Robin Verlangen
Hi there, I currently see Cassandra crash every couple of days. I run a 3 node cluster on version 1.1.2. Does anyone have a clue why it crashes? I couldn't find it as fix in a newer release. Is this an actual bug or did I do something wrong? Thank you in advance for your time. Last 100 log

Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
So how does that work? An sstable is for a single CF, but it can and likely will have multiple rows. There is no read to write and as I understand it, writes are append operations. So if you have an sstable with say 26 different rows (A-Z) already in it with a bunch of columns and you add a new

Re: quick question about data layout on disk

2012-08-11 Thread Edward Capriolo
Aaron, I have not deep dived the data files in a while but this is how I understand it. http://wiki.apache.org/cassandra/ArchitectureSSTable There is no need to store the row key each time with the column. RowKey to columns is a one to many relationship. This would be a diagram of a physical

Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
Thanks Russell, that's the info I was looking for! On Sat, Aug 11, 2012 at 11:23 AM, Russell Haering russellhaer...@gmail.com wrote: Your update doesn't go directly to an sstable (which are immutable), it is first merged to an in-memory table. Eventually the memtable is flushed to a new

Re: anyone have any performance numbers? and here are some perf numbers of my own...

2012-08-11 Thread Tyler Hobbs
One node can typically handle 30k+ inserts per second, so you should be able to insert the 9 million rows in about 5 minutes with a single node cluster. My guess is that you're inserting with a single thread, which means you're bound by network latency. Try using 100 threads, or better, just use

Re: Cassandra OOM crash while mapping commitlog

2012-08-11 Thread Tyler Hobbs
We've seen something similar when running on a 32bit JVM, so make sure you're using the latest 64bit Java 6 JVM. On Sat, Aug 11, 2012 at 11:59 AM, Robin Verlangen ro...@us2.nl wrote: Hi there, I currently see Cassandra crash every couple of days. I run a 3 node cluster on version 1.1.2. Does

Re: Problem with version 1.1.3

2012-08-11 Thread Tyler Hobbs
On Fri, Aug 10, 2012 at 4:29 PM, Dwight Smith dwight.sm...@genesyslab.comwrote: Further info – it seems I had the seeds list backwards – it did not need both nodes – I have corrected that with each pointing to the other as a single seed entry – and it works fine. This might have worked by

Re: Project Management

2012-08-11 Thread Tyler Hobbs
On Tue, Aug 7, 2012 at 2:32 AM, Baskar Sikkayan baskar@gmail.comwrote: If i create one more column family based on my query instead of going with secondary index, Will it affect the write performance? It won't affect writes much more than the built-in secondary indexes would, and you'll

Re: Assume Keys in cqlsh?

2012-08-11 Thread Tyler Hobbs
As far as I know, assume isn't a CQL feature, it's only part of cassandra-cli. On Tue, Aug 7, 2012 at 10:16 PM, Jason Hill jasonhill...@gmail.com wrote: Hello, I'm using: [cqlsh 2.0.0 | Cassandra 1.0.10 | CQL spec 2.0.0 | Thrift protocol 19.20.0] I have a column family with a key that is

Re: Syncing nodes + Cassandra Data Availability

2012-08-11 Thread Tyler Hobbs
On Wed, Aug 8, 2012 at 8:58 PM, Ben Kaehne ben.kae...@sirca.org.au wrote: Our application runs on a 3 node cassandra cluster with RF of 3. We use quorum operations against this cluster in hopes of garunteeing consistency. One scenario in which an issue can occur here is: Out of our 3

Re: cassandra unable to start after upgrading to 1.1

2012-08-11 Thread Tyler Hobbs
Usually when you're using the packaged installations, you want to start cassandra with: sudo sevice cassandra start On Thu, Aug 9, 2012 at 4:18 AM, Ahmed Ababne ahmedabab...@yahoo.com wrote: Hi I am running 12.04 Ubuntu, and had cassandra ubuntu packaged installation. I have just upgraded

Re: Key order check in sstable2json

2012-08-11 Thread Tyler Hobbs
Sounds like bad behavior. Can you open a JIRA ticket for that (once jira is back up :) ? On Thu, Aug 9, 2012 at 9:14 AM, Mat Brown m...@brewster.com wrote: Hello, We've noticed that when passing multiple -k arguments to the sstable2json utility, we pretty much always get an IOException with

Re: Cassandra commitlog directory size increase on every restart - Cassandra 1.1.0

2012-08-11 Thread Tyler Hobbs
There have been some commitlog-related fixes in later versions of 1.1, so it's worth trying an upgrade. If that doesn't resolve the issue, open a JIRA ticket with these details. On Thu, Aug 9, 2012 at 9:15 AM, Kasun Weranga kas...@wso2.com wrote: Any idea on how to fix this? Thanks, Kasun

Re: Thrift batch_mutate erase previous data?

2012-08-11 Thread Tyler Hobbs
On Thu, Aug 9, 2012 at 10:43 AM, Cyril Auburtin cyril.aubur...@gmail.comwrote: It seems the Thrift method *batch-mutate*, with Mutations, will not update the previous data with the mutation given, but clear and replace by it? right? I'm not sure what you're asking. Writes in Cassandra are

Re: Physical storage of rowkey

2012-08-11 Thread Tyler Hobbs
Yes, if you're using RandomPartitioner. The hash is md5. On Thu, Aug 9, 2012 at 1:29 PM, A J s5a...@gmail.com wrote: Are row key hashed before being physically stored in Cassandra ? If so, what hash function is used to ensure collision is minimal. Thanks. -- Tyler Hobbs DataStax

Re: triggering the assertion at the start of ColumnFamilyStore.getRangeSlice

2012-08-11 Thread Tyler Hobbs
You can use something like the maven shade plugin to use both of the libthrift jars. On Thu, Aug 9, 2012 at 3:57 PM, Jose Flexa jose.fl...@gmail.com wrote: Hi. I´ve avoided the issue by disabling assertions (-da). Any suggestions on a better strategy? Thanks José On Thu, Aug 9, 2012 at

Re: problem of inserting columns of a great amount

2012-08-11 Thread Tyler Hobbs
There is a fair amount of overhead in the Thrift structures for columns and mutations, so that's a pretty large mutation. In general, you'll see better performance inserting many small batch mutations in parallel. On Fri, Aug 10, 2012 at 2:04 AM, Jin Lei jehovah.l...@gmail.com wrote: Sorry,

Re: Question regarding tombstone removal and compaction

2012-08-11 Thread Tyler Hobbs
On Fri, Aug 10, 2012 at 5:54 AM, Fredrik fredrik.l.stigb...@sitevision.sewrote: We've had a bug that caused one of our column families to grow very big 280 GB on a 500 GB disk. We're using size tiered compaction. Since it's only append data I've now issued deletes of 260 GB of superflous

Re: Node doesn't rejoin ring after restart

2012-08-11 Thread Tyler Hobbs
Make sure that your seed list is the same for every node. Just pick two of the three nodes and use those as the seeds everywhere. If that's not the issue, check your cassandra log to see if there are any exceptions during startup. On Fri, Aug 3, 2012 at 5:25 PM, Edward Sargisson