Re: 10G Ethernet / Infiniband

2010-10-27 Thread Juho Mäkinen
The disks in cassandra node will most probably be your bottleneck. I'd suggest (haven't tried, this is just based on my intuition) to invest in SSD disks first and only after that think about going 10Gbps. - Garo On Tue, Oct 26, 2010 at 10:11 PM, Wayne wav...@gmail.com wrote: Is anyone out

Re: Cassandra newbie question

2010-10-27 Thread Arijit Mukherjee
Hi All I've another related question. I am using a stream of records of the form (A, B, n) where the pair (A,B) can occur multiple times. For example, you could have the following rset of records - A, B, 2 P, Q, 5 X, Y, 3 A, B, 8 A, B, 2 ... The data store has a set of columns - (key, count,

Re: New nodes won't bootstrap on .66

2010-10-27 Thread Dimitry Lvovsky
Hi Aaron, Thanks for your reply. We still haven't solved this unfortunately. How did you start the bootstrap for the .18 node ? Standard way: we set AutoBootstrap to true and added all the servers from the working ring as seeds. Was it the .18 or the .17 node you tried to add We first

Re: What happens if there is a collision?

2010-10-27 Thread Jérôme Verstrynge
Peter, many thanks for all this information. On 26/10/2010 21:17, Peter Sculler wrote: It does mention that timestamps are used for conflict resolution but does not really dwell on the issue, and the remainder elides timestamps. So perhaps it's easy to miss. I also notice that the phrasing is

Time to wait for CF to be consistent after stopping writes.

2010-10-27 Thread Utku Can Topçu
Hi, For a columnfamily in a keyspace which has RF=3, I'm issuing writes with ConsistencyLevel.ONE. in the configuration I have: - memtable_flush_after_mins : 30 - memtable_throughput_in_mb : 32 I'm writing to this columnfamily continuously for about 1 hour then stop writing. So the question

High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hi people We are currently moving our second use case from mysql to cassandra. While importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to be pretty high compared to another CF which is in used in production right now. Its a hierarchical data model and I cannot avoid

Re: Cassandra newbie question

2010-10-27 Thread Gary Dusbabek
On Wed, Oct 27, 2010 at 03:24, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I've another related question. I am using a stream of records of the form (A, B, n) where the pair (A,B) can occur multiple times. For example, you could have the following rset of records - A, B, 2 P, Q, 5

Re: Time to wait for CF to be consistent after stopping writes.

2010-10-27 Thread Gary Dusbabek
On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu u...@topcu.gen.tr wrote: Hi, For a columnfamily in a keyspace which has RF=3, I'm issuing writes with ConsistencyLevel.ONE. in the configuration I have: - memtable_flush_after_mins : 30 - memtable_throughput_in_mb : 32 I'm writing to this

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash function should evenly distribute no matter what keys come in. Keys are '/' separated strings (aka paths :-)) I do bulk inserts like: (1000

Re: High BloomFilterFalseRation

2010-10-27 Thread Jonathan Ellis
Do you have a key a/b then? What columns does it have? On Wed, Oct 27, 2010 at 9:14 AM, Daniel Doubleday daniel.double...@gmx.net wrote: Hm - not sure if I understand the random question. We are using RP. But I wouldn't know why that should matter. I thought that the bloom filter hash

Re: Cluster load balancing?

2010-10-27 Thread Tyler Hobbs
With OrderPreservingPartitioner, you have to keep the ring balanced manually. This is why people frequently suggest that you use RandomPartitioner unless you absolutely have to do otherwise. With OPP, keys are *not* evenly distributed around the ring. Apparently you have lots of keys that are

java.lang.OutOfMemoryError: Map failed

2010-10-27 Thread Koert Kuipers
While bootstrapping a new node, the existing node that is supposed to provide the data throws an error, and the bootstrapping hangs. The log from the existing node is below. Both nodes have little memory (only 2 Gig, windows machines). I used default configurations (Cassandra 0.7). Any

Re: High BloomFilterFalseRation

2010-10-27 Thread Daniel Doubleday
Ah of course - question makes total sense. But no: this is not the case: I am not constantly asking the same question since the tree is deep enough. Most data nodes are level 5 from the root. So the parents getting queried will be different most of the time. Since the parent nodes are

Re: Cluster load balancing?

2010-10-27 Thread Thibaut Britz
Depending on the range I choose, choosing manually a token will also fail. (node will never exit boostrap, streams doesn't list any open streams) INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120) Sampling index for /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db INFO

Re: java.lang.OutOfMemoryError: Map failed

2010-10-27 Thread Matthew Dennis
2 GiB is pretty small for a C* node. You can also try reducing all the caching to zero with so little memory. If you have lots of CFs you probably want to reduce the memtable throughput too. On Wed, Oct 27, 2010 at 12:43 PM, Koert Kuipers koert.kuip...@diamondnotch.com wrote: While

Re: Adding nodes wrong/data not balanced across nodes

2010-10-27 Thread Matthew Dennis
You need to specify your initial tokens. LoadBalance really doesn't do a good job of balancing the load. Take a look at Load Balancing in http://wiki.apache.org/cassandra/Operations There is a little python script in there to help you pick tokens for a given cluster size. If you don't want to

0.7 problem on cygwin

2010-10-27 Thread Chris Oei
Hi all, I'm getting the following when I try to bootstrap my Cassandra cluster on a Windows machine. INFO 11:47:10,300 Joining: sleeping 3 ms for pending range setup INFO 11:47:40,302 Bootstrapping ERROR 11:47:40,453 Fatal exception in thread Thread[Thread-5,5,main]

Re: 0.7 problem on cygwin

2010-10-27 Thread ruslan usifov
It occurs from for differences between pathseparator chars in windows(\) and unix(or mac os(/)), and this doesn't supported by cassandra. If you interesting a cant send patch to you which solve this problem. Why so? i don't know this question to developers of cassandra 2010/10/27 Chris Oei

Re: 0.7 problem on cygwin

2010-10-27 Thread Chris Oei
Sorry -- I don't quite understand: what is not supported by cassandra? The bin directory contains cassandra.bat, so I assumed cassandra works on Windows. Do you mean that cassandra works on Windows but not on cygwin? I had already checked my cassandra.yaml file to make sure that I used backslashes

Re: 0.7 problem on cygwin

2010-10-27 Thread ruslan usifov
Sorry for my bad english. In bootstrup cassandra send full file path between nodes, So for example win node deside send value-e-27-Data.db file to unix node(cygwin in you case). Unix node receive full file path of file value-e-27-Data.db on win node i.e.

Re: How to get the result from the closest node

2010-10-27 Thread Joe Alex
Thanks much, verifies what I thought it is doing when connecting to a random node. Will play with RackAware and DCQUORUM. Wanted to see if anybody else has a case where they want to connect to local Data Center always. A case where the Nodes are geographically apart like A (NY) and D (London).

Re: 0.7.0beta2 spinning/wedged after aggressive overnight writing

2010-10-27 Thread Jonathan Ellis
(moving to user list.) This sounds like you are GC storming (gcinspector lines in the log could confirm/refute this) and if I were to guess it would be that the memtable thresholds picked by b2 are too high. We cut them in half for rc1 in http://issues.apache.org/jira/browse/CASSANDRA-1641, but

Re: cassandra + avro | python client vs java client

2010-10-27 Thread Jonathan Ellis
Does Avro have a Python C extension yet? If not, 10x is right in line with how much faster I would expect Java to be than pure Python. On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers koert.kuip...@diamondnotch.com wrote: Hey all, I have Cassandra 0.7 (nightly build from halfway September)

Re: java.lang.OutOfMemoryError: Map failed

2010-10-27 Thread Jonathan Ellis
Sounds like either you are running on a 32bit architecture or JVM or you don't have OS level permissions to mmap large Cassandra data files. One workaround may be to switch to mmap_index_only mode. On Wed, Oct 27, 2010 at 1:49 PM, Matthew Dennis mden...@riptano.com wrote: 2 GiB is pretty small

Re: 0.7 problem on cygwin

2010-10-27 Thread Jonathan Ellis
Short version: don't mix nodes on different architectures in the same cluster. On Wed, Oct 27, 2010 at 2:09 PM, Chris Oei chris@nestria.com wrote: Hi all, I'm getting the following when I try to bootstrap my Cassandra cluster on a Windows machine. INFO 11:47:10,300 Joining: sleeping

RE: cassandra + avro | python client vs java client

2010-10-27 Thread Koert Kuipers
It does not have a c extension as far as I know -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, October 27, 2010 5:01 PM To: user Subject: Re: cassandra + avro | python client vs java client Does Avro have a Python C extension yet? If not, 10x is

Re: 0.7 problem on cygwin

2010-10-27 Thread Chris Oei
I guess so. I tried hacking a quick work-around for the Filename must include parent directory, but I got another error (below). So, since it appears that mixing architectures is not officially supported, I think I'll give up on this. Goodbye, Windows 7. Thanks, Chris ERROR 14:07:47,534 Fatal

Re: cassandra + avro | python client vs java client

2010-10-27 Thread Jonathan Ellis
Then you should use Thrift from Python if you are concerned about speed. (I think the speed penalty there is only about 2x w/ the extension.) On Wed, Oct 27, 2010 at 4:15 PM, Koert Kuipers koert.kuip...@diamondnotch.com wrote: It does not have a c extension as far as I know -Original

Re: Cluster load balancing?

2010-10-27 Thread Tyler Hobbs
Not sure if this is the cause, but do all of your nodes have the same seed list? Did you bring up the seeds first? - Tyler On Wed, Oct 27, 2010 at 1:46 PM, Thibaut Britz thibaut.br...@trendiction.com wrote: Depending on the range I choose, choosing manually a token will also fail. (node

Config Maximum heap size for Cassandra

2010-10-27 Thread JKnight JKnight
Hi all, When I config Maximum heap size -Xmx4G, the memory will consume to 3.5G. When I call Perform GC (jconsole), the used memory reduce to 1G. When I config Maximum heap size -Xmx2G, Cassandra system run well. Is that Casandra problem? I want Cassandra use memory more effective. How can I do

Re: Config Maximum heap size for Cassandra

2010-10-27 Thread Nicholas Knight
Presumably you're on a 32-bit architecture (or at least a 32-bit JVM). 32-bit processes won't be able to address more than X amount of memory, where X would usually be = 2GB, and 4GB. The reason you can't use a full 4GB is that part of the address space is necessarily reserved by the OS

Re: Config Maximum heap size for Cassandra

2010-10-27 Thread JKnight JKnight
Could you tell me why Cassandra use memory more than needed? On Thu, Oct 28, 2010 at 9:15 AM, Nicholas Knight nkni...@runawaynet.comwrote: Presumably you're on a 32-bit architecture (or at least a 32-bit JVM). 32-bit processes won't be able to address more than X amount of memory, where X

Re: Config Maximum heap size for Cassandra

2010-10-27 Thread Nicholas Knight
Cassandra needs all the RAM you can give it so it can cache things for optimum performance. If you need it to use less, give it less. -NK On Oct 28, 2010, at 10:30 AM, JKnight JKnight wrote: Could you tell me why Cassandra use memory more than needed? On Thu, Oct 28, 2010 at 9:15 AM,

Re: Config Maximum heap size for Cassandra

2010-10-27 Thread Jonathan Ellis
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html On Wed, Oct 27, 2010 at 9:30 PM, JKnight JKnight beukni...@gmail.com wrote: Could you tell me why Cassandra use memory more than needed? On Thu, Oct 28, 2010 at 9:15 AM, Nicholas Knight nkni...@runawaynet.com wrote:

Re: Cassandra newbie question

2010-10-27 Thread Arijit Mukherjee
Thanx Gary. I was thinking of using range partitioning for breaking the input. Say, we could have different threads handling diffierent rages - (A-J) by thread1, (K-P) by thread2. This way, there won't probably be any chance of collision. But the thread which actually performs the distribution