Re: Strategy to delete/expire keys in cassandra

2010-02-25 Thread Sylvain Lebresne
Hi, Should I just run command (in Cassandra 0.5 source folder?) like: patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch for all of the five patches in your ticket? Well, actually I lied. The patches were made for a version a little after 0.5. If you really want to try, I attach a version

Re: full text search

2010-02-25 Thread Hernan Badenes
My (very brief) testing of Lucandra over Cassandra 0.5 showed it uses different rows for every term, requiring to do a large number of insert() calls per document added. This was way too slow for my purposes Do you know if anything has changed? (the schema, or if it uses some newer api

Re: full text search

2010-02-25 Thread Jared winick
It looks like this will change with Cassandra 0.6 and the addition of the batch_mutate method. From http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/ http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/ For writes Lucadra is comparatively

Re: cassandra freezes

2010-02-25 Thread Jonathan Ellis
The only kind of freeze that makes sense there is your reads are i/o bound and the extra disk activity is killing you. In that case the fix is to add more RAM, or give less to the JVM so the OS can use more for buffer cache. On Thu, Feb 25, 2010 at 8:01 AM, Boris Shulman shulm...@gmail.com

Re: cassandra freezes

2010-02-25 Thread Jonathan Ellis
Then you should check GC timing with -Xverbose:gc option (see: http://wiki.apache.org/cassandra/RunningCassandra for how to modify jvm options) for a correlation. On Thu, Feb 25, 2010 at 8:09 AM, Boris Shulman shulm...@gmail.com wrote: In these tests I perform only write operations, no reads.

Re: cassandra freezes

2010-02-25 Thread Boris Shulman
I don't think it is gc related issue. There is no correlation between gc times and the freeze times. More over I don't see any gc activity that lasts for omre than o.03 sec. But there is a correlation between disk flushing operations. I've noticed that the system freezes each time when my commit

Re: cassandra freezes

2010-02-25 Thread Ted Zlatanov
On Thu, 25 Feb 2010 08:56:25 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE Are you swapping? JE http://spyced.blogspot.com/2010/01/linux-performance-basics.html JE otherwise there's something wrong w/ your vm (?), disk i/o doesn't JE block incoming writes in cassandra If the user has enough

Attach a binary stream

2010-02-25 Thread Charles Moulliard
Hi, Is it possible to attach a binary stream to a Cassandra DB ? I would like to say is it possible to add a String containing a message, a serialized java object ? Kind regards, Charles Moulliard Senior Enterprise Architect Apache Camel Committer * blog :

Multiple Data Directories

2010-02-25 Thread Anthony Molinaro
Hi, So is there anyway to force distribution among DataFileDirectory entries when you add a new one? Looking at the nodeprobe operations it seems like repair which causes a major compaction might do it? I've tried shutting a node down moving files around by hand and starting up, but the next

Re: Multiple Data Directories

2010-02-25 Thread Gary Dusbabek
Cassandra always compacts to the directory with the most free space. There is not a way to influence this. Gary On Thu, Feb 25, 2010 at 13:23, Anthony Molinaro antho...@alumni.caltech.edu wrote: Hi,  So is there anyway to force distribution among DataFileDirectory entries when you add a new

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
Compaction is why http://wiki.apache.org/cassandra/CassandraHardware recommends raid0-ing if you are concerned about free disk space limits. On Thu, Feb 25, 2010 at 1:36 PM, Gary Dusbabek gdusba...@gmail.com wrote: Cassandra always compacts to the directory with the most free space. There is

Re: 3 node installation

2010-02-25 Thread Jonathan Ellis
How about the debug output? On Thu, Feb 25, 2010 at 12:03 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: All nodes always agree on the ring. In fact,    nodeprobe -host name ring is probably one of commands and nodeprobe one of the the most reliable tools in Cassandra, as far as I

Consistency Level of CLI

2010-02-25 Thread Masood Mortazavi
What is the write and read consistency level for the CLI tool cassandra-cli ? Do the set and get commands in the cli allow the Consistency Level to be specified for a given set or get? Is there a current specification of CLI anywhere on the wiki? ( How are JIRA's related to the CLI tagged in

Re: Attach a binary stream

2010-02-25 Thread Jonathan Ellis
Cassandra column values are byte arrays. Turning your java object into a byte[] is your responsibility. :) On Thu, Feb 25, 2010 at 10:11 AM, Charles Moulliard cmoulli...@gmail.com wrote: Hi, Is it possible to attach a binary stream to a Cassandra DB ? I would like to say is it possible to

Re: 3 node installation

2010-02-25 Thread Masood Mortazavi
All nodes always agree on the ring. In fact, nodeprobe -host name ring is probably one of commands and nodeprobe one of the the most reliable tools in Cassandra, as far as I can tell. These are good suggestions. Thanks. (I don't know whether it is worth describing this in a JIRA as a bug. I

Re: Consistency Level of CLI

2010-02-25 Thread Jonathan Ellis
CLI uses CL.ONE for reads and writes. It has no user-level documentation other than its help output. On Thu, Feb 25, 2010 at 1:08 PM, Masood Mortazavi masoodmortaz...@gmail.com wrote: What is the write and read consistency level for the CLI tool cassandra-cli ? Do the set and get

Re: Multiple Data Directories

2010-02-25 Thread Anthony Molinaro
Okay, so the disk sizing seems to make sense for what I am seeing, the disk which seems to get all the data is the largest. On the new machines which have 3 disks of equal size, compaction seems to be distributing among the disks. Raid0 would sort of defeat the purpose of being able to add

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
In the worst case, compaction combines them all into a single file anyway. So I think your approach is flawed. It's designed to allow adding capacity by adding nodes, not just by adding more space, or your cpu / ram ratio will degrade. On Thu, Feb 25, 2010 at 2:48 PM, Anthony Molinaro

Re: Multiple Data Directories

2010-02-25 Thread Anthony Molinaro
What about the case where cpu and ram are underutilized, and your bottleneck is disk io (which seems to often be the case in ec2), then adding more spindles improves overall throughput of the system. I've actually tested this when adding an additional ebs, and hand moving files around, then

Re: Bulk Ingestion Issues

2010-02-25 Thread Sonny Heer
On Wed, Feb 24, 2010 at 11:36 AM, Jonathan Ellis jbel...@gmail.com wrote: the exception is unrelated, it's from the network layer (and is gone in 0.6) Thanks. How is the bulk loader suppose to be setup? I start Cassandra using a given storage file with the local IP as the seed and thrift IP.

Re: Bulk Ingestion Issues

2010-02-25 Thread Sonny Heer
On Wed, Feb 24, 2010 at 11:36 AM, Jonathan Ellis jbel...@gmail.com wrote: the exception is unrelated, it's from the network layer (and is gone in 0.6) Any other ideas as to what could be causing this? I'm getting inconsistent results between ingests. The sendOneWay method is called a lot

Re: Would deleted columns slow down reads?

2010-02-25 Thread Jonathan Ellis
Yes, that's going to hurt forward scans with no start column. (Reverse scans, or scans that start with a known live column, will still be fast b/c of the per-row column indexes.) On Thu, Feb 25, 2010 at 8:56 PM, Edmond Lau edm...@ooyala.com wrote: Given that Cassandra needs to maintain

Re: Multiple Data Directories

2010-02-25 Thread Jonathan Ellis
On Thu, Feb 25, 2010 at 3:54 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: What about the case where cpu and ram are underutilized, and your bottleneck is disk io (which seems to often be the case in ec2), then adding more spindles improves overall throughput of the system.  I've

Re: 3 node installation

2010-02-25 Thread Masood Mortazavi
If I get to repeat it, I will certainly include standard output from the servers assuming that's what you mean by the debug report In the meantime, couldn't this behavior be caused by some bug in the CLI's default consistency level. (I've not checked the code in this case.) It would be good

Re: Would deleted columns slow down reads?

2010-02-25 Thread Edmond Lau
Thanks for the confirmation - that's what I suspected. Edmond On Thu, Feb 25, 2010 at 7:00 PM, Jonathan Ellis jbel...@gmail.com wrote: Yes, that's going to hurt forward scans with no start column. (Reverse scans, or scans that start with a known live column, will still be fast b/c of the