performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory
I'm looking into performance issues on a 0.6.1 cluster. I see two symptoms: 1. Reads and writes are slow 2. One of the hosts is doing a lot of GC. 1 is slow in the sense that in normal state the cluster used to make around 3-5k read and writes per second (6-10k operations per second), but how

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook
I think you may found the eventually in eventually consistent. With a replication factor of 1, you are allowing the client thread to continue to the read on node#2 before it is replicated to node 2. Try setting your replication factor higher for different results. Jonathan On Tue, May 4, 2010 at

Re: Cassandra and Request routing

2010-05-04 Thread Olivier Mallassi
:) I think this is simpler and I am just stupid I retried with clean data and commit log directories and everything works well. I should have missed something (maybe when I upgraded from 0.5.1 to 0.6) but anyway, I am just in test. On Tue, May 4, 2010 at 8:47 AM, Jonathan Shook

Re: How do you, Bloom filter of the false positive rate or remove the problem of distributed databases?

2010-05-04 Thread vineet daniel
Only major compactions can clean out obsolete tombstones. On Tue, May 4, 2010 at 9:59 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, May 3, 2010 at 8:45 PM, Kauzki Aranami kazuki.aran...@gmail.com wrote: Let me rephrase my question. How does Cassandra deal with bloom filter's false

Re: Design Query

2010-05-04 Thread vineet daniel
As you havent specified all the details pertaining to filters and your data layout (structure) at a very high level what i can suggest is that you need to create a seperate CF for each filter. On Sat, May 1, 2010 at 5:04 PM, Rakesh Rajan rakes...@gmail.com wrote: I am evaluating cassandra to

Re: Trove maps

2010-05-04 Thread Jeff Hammerbacher
Hey, History repeating itself a bit, here: one delay in getting Cassandra into the open source world was removing its use of the Trove collections library, as the license (LGPL) is not compatible with the Apache 2.0 license. Later, Jeff On Sat, Apr 24, 2010 at 11:28 PM, Tatu Saloranta

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jordan Pittier
I'm facing the same issue with swap. It only occurs when I perform read operations (write are very fast :)). So I can't help you with the memory probleme. But to balance the load evenly between nodes in cluster just manually fix their token.(the formula is i * 2^127 / nb_nodes). Jordzn On Tue,

how to fetch latest data

2010-05-04 Thread vineet daniel
Hi In a cluster of cassandra if we are updating any key/value and perform the fetch query on that same key, we get old/stale data. This can be because of Read Repair. Is there any way to fetch the latest updated data from the cluster, as old data stands no significance and showing it to client

Re: how to fetch latest data

2010-05-04 Thread vineet daniel
If R + W N, where R, W, and N are respectively the read replica count, the write replica count, and the replication factor, all client reads will see the most recent write. On Tue, May 4, 2010 at 4:39 PM, vineet daniel vineetdan...@gmail.comwrote: Hi In a cluster of cassandra if we are

Re: Trove maps

2010-05-04 Thread Boris Shulman
LGPL ia listed as a part of a forbidden licenses for apache projects (see Excluded Licenses in http://www.apache.org/legal/3party.html)... On Tue, May 4, 2010 at 12:34 PM, Jeff Hammerbacher ham...@cloudera.com wrote: Hey, History repeating itself a bit, here: one delay in getting Cassandra

Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala
Hi, I am very new to Cassandra 0.6.1. I have setup the two node on two different server. I would like to know how data distribution and replication work. Node 1 IP:43.193.211.215Node 2 IP:43.193.213.160 Node 1: Configuraiton  Seeds      Seed43.193.211.215/Seed  /Seeds Node 2: ConfigurationSeeds  

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Shinpei Ohtani
All other parameters are identical in both servers. I have added some data from both node but i am confused on which node data stores. Does it stores in both node OR only stores in one node from where it has been added. I can retrieve data from both nodes but sometime can not. Not sure

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook
I may be wrong here. Someone please correct me if I am. There may be a race condition if you aren't increasing your replication factor. If you insert to node A with replication factor 1, and then get from node B with replication factor 1, it should be possible (and even more likely in uneven

Re: Cassandra 0.6.1 - Help Required to setup Multiple Nodes/Cluster

2010-05-04 Thread Mohammad Mamajiwala
Thanks for prompt reply. As per your reply, my configuration should be like, Node 1: Configuraiton Seeds     Seed43.193.211.215/Seed      Seed43.193.213.160/Seed /Seeds Node 2: Configuration Seeds       Seed43.193.211.215/Seed      Seed43.193.213.160/Seed   /Seeds About replication -  In my case

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Schubert Zhang
1. When initially startup your nodes, please plan your InitialToken of each node evenly. 2. DiskAccessModestandard/DiskAccessMode On Tue, May 4, 2010 at 9:09 PM, Boris Shulman shulm...@gmail.com wrote: I think that the extra (more than 4GB) memory usage comes from the mmaped io, that is why it

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory
I canceled mmap and indeed memory usage is sane again. So far performance hasn't been great, but I'll wait and see. I'm also interested in a way to cap mmap so I can take advantage of it but not swap the host to death... On Tue, May 4, 2010 at 9:38 PM, Kyusik Chung kyu...@discovereads.comwrote:

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Nathan McCall
You could try mmap_index_only - this would restrict mmap usage to the index files. -Nate On Tue, May 4, 2010 at 11:57 AM, Ran Tavory ran...@gmail.com wrote: I canceled mmap and indeed memory usage is sane again. So far performance hasn't been great, but I'll wait and see. I'm also interested

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Jonathan Ellis
Are you using 32 bit hosts? If not don't be scared of mmap using a lot of address space, you have plenty. It won't make you swap more than using buffered i/o. On Tue, May 4, 2010 at 1:57 PM, Ran Tavory ran...@gmail.com wrote: I canceled mmap and indeed memory usage is sane again. So far

Re: performance tuning - where does the slowness come from?

2010-05-04 Thread Ran Tavory
it's a 64bit host. when I cancel mmap I see less memory used and zero swapping, but it's slowly growing so I'll have to wait and see. Performance isn't much better, not sure what's the bottleneck now (could also be the application). Now on the same host I see: top - 15:43:59 up 12 days, 4:23, 1

Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Chris Dean
I have a ColumnFamily with a small number of keys, but each key has a large number of columns. What's the best way to get just the keys back? I don't want to load all the columns if I don't have to. There also isn't necessarily any column names in common between the different rows. Cheers,

BloomFilter is taking too much memory

2010-05-04 Thread Weijun Li
Hello, We stored about 47mil keys in one Cassandra node and what a memory dump shows for one of the SStableReader: SSTableReader: 386MB. Among this 386MB, IndexSummary takes about 231MB but BloomFilter takes 155MB with an embedded huge array long[19.4mil]. It seems that BloomFilter is

Cassandra training on May 21 in Palo Alto

2010-05-04 Thread Jonathan Ellis
I'll be running a day-long Cassandra training class on Friday, May 21. I'll cover - Installation and configuration - Application design - Basics of Cassandra internals - Operations - Tuning and troubleshooting Details at http://riptanobayarea20100521.eventbrite.com/ -- Jonathan Ellis Project

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread aaron
Thanks Jonathan. After looking at the Lucandra code I realized my confusions has to do with get_range_slices and the RandomPartitioner. When I switched to the OPP I got the expected behaviour. I was noticing cases under the random partitioner where keys I expected to be returned were not.

Re: strange get_range_slices behaviour v0.6.1

2010-05-04 Thread Jonathan Ellis
On Tue, May 4, 2010 at 4:17 PM, aaron aa...@thelastpickle.com wrote: I was noticing cases under the random partitioner where keys I expected to be returned were not. Can you give a little advice on the expected behaviour of get_range_slices with the RP and I'll try to write a JUnit for it.

sstable2jason bat script on windows

2010-05-04 Thread Dop Sun
Hi, As of 0.6.1, I don't find sstable2jason.bat. I don't know if I missed anything? It will good if we can have one, which can help import/ export data in/ out development machine. Thanks, Regards, Dop

Re: Trove maps

2010-05-04 Thread Prashant Malik
;) ya I it was painful On Tue, May 4, 2010 at 10:53 AM, Avinash Lakshman avinash.laksh...@gmail.com wrote: Hahaha, Jeff - I remember scampering to remove those references to the Trove maps, I think around 2 years ago. Avinash On Tue, May 4, 2010 at 2:34 AM, Jeff Hammerbacher

Re: Cassandra and Request routing

2010-05-04 Thread Jonathan Shook
Ah! Thank you. Explained better here: http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency On Tue, May 4, 2010 at 8:38 PM, Robert Coli rc...@digg.com wrote: On 5/4/10 7:16 AM, Jonathan Shook wrote: I may be wrong here. Someone please correct me if I

Export to another cassandra cluster

2010-05-04 Thread Joost Ouwerkerk
I want to export data from one cassandra cluster (production) to another (development). This is not a case of replication, because I just want a snapshot, not a continuous synchronization. I guess my options include 'nodetool snapshot' and 'sstable2json'. In our case, however, the development

Re: Getting all the keys from a ColumnFamily ?

2010-05-04 Thread Jonathan Ellis
get_range_slices with an empty list of column names should work On Tue, May 4, 2010 at 3:02 PM, Chris Dean ctd...@sokitomi.com wrote: I have a ColumnFamily with a small number of keys, but each key has a large number of columns. What's the best way to get just the keys back?  I don't want to

Appropriate use for Cassandra?

2010-05-04 Thread Denis Haskin
I've been reading everything I can get my hands on about Cassandra and it sounds like a possibly very good framework for our data needs; I'm about to take the plunge and do some prototyping, but I thought I'd see if I can get a reality check here on whether it makes sense. Our schema should be