The disks in cassandra node will most probably be your bottleneck. I'd
suggest (haven't tried, this is just based on my intuition) to invest
in SSD disks first and only after that think about going 10Gbps.
- Garo
On Tue, Oct 26, 2010 at 10:11 PM, Wayne wav...@gmail.com wrote:
Is anyone out
Hi All
I've another related question.
I am using a stream of records of the form (A, B, n) where the pair
(A,B) can occur multiple times. For example, you could have the
following rset of records -
A, B, 2
P, Q, 5
X, Y, 3
A, B, 8
A, B, 2
...
The data store has a set of columns - (key, count,
Hi Aaron,
Thanks for your reply.
We still haven't solved this unfortunately.
How did you start the bootstrap for the .18 node ?
Standard way: we set AutoBootstrap to true and added all the servers from
the working ring as seeds.
Was it the .18 or the .17 node you tried to add
We first
Peter, many thanks for all this information.
On 26/10/2010 21:17, Peter Sculler wrote:
It does mention that timestamps are used for conflict resolution but
does not really dwell on the issue, and the remainder elides
timestamps. So perhaps it's easy to miss. I also notice that the
phrasing is
Hi,
For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
ConsistencyLevel.ONE.
in the configuration I have:
- memtable_flush_after_mins : 30
- memtable_throughput_in_mb : 32
I'm writing to this columnfamily continuously for about 1 hour then stop
writing.
So the question
Hi people
We are currently moving our second use case from mysql to cassandra. While
importing the data (ongoing) I noticed that the BloomFilterFalseRation seems to
be pretty high compared to another CF which is in used in production right now.
Its a hierarchical data model and I cannot avoid
On Wed, Oct 27, 2010 at 03:24, Arijit Mukherjee ariji...@gmail.com wrote:
Hi All
I've another related question.
I am using a stream of records of the form (A, B, n) where the pair
(A,B) can occur multiple times. For example, you could have the
following rset of records -
A, B, 2
P, Q, 5
On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu u...@topcu.gen.tr wrote:
Hi,
For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
ConsistencyLevel.ONE.
in the configuration I have:
- memtable_flush_after_mins : 30
- memtable_throughput_in_mb : 32
I'm writing to this
Hm -
not sure if I understand the random question. We are using RP. But I wouldn't
know why that should matter.
I thought that the bloom filter hash function should evenly distribute no
matter what keys come in.
Keys are '/' separated strings (aka paths :-))
I do bulk inserts like: (1000
Do you have a key a/b then? What columns does it have?
On Wed, Oct 27, 2010 at 9:14 AM, Daniel Doubleday
daniel.double...@gmx.net wrote:
Hm -
not sure if I understand the random question. We are using RP. But I wouldn't
know why that should matter.
I thought that the bloom filter hash
With OrderPreservingPartitioner, you have to keep the ring balanced
manually.
This is why people frequently suggest that you use RandomPartitioner unless
you absolutely have to do otherwise. With OPP, keys are *not* evenly
distributed
around the ring.
Apparently you have lots of keys that are
While bootstrapping a new node, the existing node that is supposed to provide
the data throws an error, and the bootstrapping hangs. The log from the
existing node is below. Both nodes have little memory (only 2 Gig, windows
machines). I used default configurations (Cassandra 0.7). Any
Ah of course - question makes total sense.
But no: this is not the case: I am not constantly asking the same
question since the tree is deep enough. Most data nodes are level 5 from
the root. So the parents getting queried will be different most of the time.
Since the parent nodes are
Depending on the range I choose, choosing manually a token will also fail.
(node will never exit boostrap, streams doesn't list any open streams)
INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
Sampling index for /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
INFO
2 GiB is pretty small for a C* node. You can also try reducing all the
caching to zero with so little memory. If you have lots of CFs you probably
want to reduce the memtable throughput too.
On Wed, Oct 27, 2010 at 12:43 PM, Koert Kuipers
koert.kuip...@diamondnotch.com wrote:
While
You need to specify your initial tokens. LoadBalance really doesn't do a
good job of balancing the load. Take a look at Load Balancing in
http://wiki.apache.org/cassandra/Operations There is a little python script
in there to help you pick tokens for a given cluster size.
If you don't want to
Hi all,
I'm getting the following when I try to bootstrap my Cassandra cluster on a
Windows
machine.
INFO 11:47:10,300 Joining: sleeping 3 ms for pending range setup
INFO 11:47:40,302 Bootstrapping
ERROR 11:47:40,453 Fatal exception in thread Thread[Thread-5,5,main]
It occurs from for differences between pathseparator chars in windows(\)
and unix(or mac os(/)), and this doesn't supported by cassandra. If you
interesting a cant send patch to you which solve this problem. Why so? i
don't know this question to developers of cassandra
2010/10/27 Chris Oei
Sorry -- I don't quite understand: what is not supported by cassandra? The
bin directory contains
cassandra.bat, so I assumed cassandra works on Windows. Do you mean that
cassandra works on
Windows but not on cygwin? I had already checked my cassandra.yaml file to
make sure that
I used backslashes
Sorry for my bad english.
In bootstrup cassandra send full file path between nodes, So for example win
node deside send value-e-27-Data.db file to unix node(cygwin in you case).
Unix node receive full file path of file value-e-27-Data.db on win node
i.e.
Thanks much, verifies what I thought it is doing when connecting to a
random node.
Will play with RackAware and DCQUORUM.
Wanted to see if anybody else has a case where they want to connect to
local Data Center always. A case where the Nodes are geographically
apart like A (NY) and D (London).
(moving to user list.)
This sounds like you are GC storming (gcinspector lines in the log
could confirm/refute this) and if I were to guess it would be that the
memtable thresholds picked by b2 are too high. We cut them in half
for rc1 in http://issues.apache.org/jira/browse/CASSANDRA-1641, but
Does Avro have a Python C extension yet?
If not, 10x is right in line with how much faster I would expect Java
to be than pure Python.
On Wed, Oct 27, 2010 at 11:59 AM, Koert Kuipers
koert.kuip...@diamondnotch.com wrote:
Hey all,
I have Cassandra 0.7 (nightly build from halfway September)
Sounds like either you are running on a 32bit architecture or JVM or
you don't have OS level permissions to mmap large Cassandra data
files.
One workaround may be to switch to mmap_index_only mode.
On Wed, Oct 27, 2010 at 1:49 PM, Matthew Dennis mden...@riptano.com wrote:
2 GiB is pretty small
Short version: don't mix nodes on different architectures in the same cluster.
On Wed, Oct 27, 2010 at 2:09 PM, Chris Oei chris@nestria.com wrote:
Hi all,
I'm getting the following when I try to bootstrap my Cassandra cluster on a
Windows
machine.
INFO 11:47:10,300 Joining: sleeping
It does not have a c extension as far as I know
-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Wednesday, October 27, 2010 5:01 PM
To: user
Subject: Re: cassandra + avro | python client vs java client
Does Avro have a Python C extension yet?
If not, 10x is
I guess so. I tried hacking a quick work-around for the Filename must
include parent directory, but I got another error (below).
So, since it appears that mixing architectures is not officially supported,
I think I'll
give up on this. Goodbye, Windows 7.
Thanks,
Chris
ERROR 14:07:47,534 Fatal
Then you should use Thrift from Python if you are concerned about
speed. (I think the speed penalty there is only about 2x w/ the
extension.)
On Wed, Oct 27, 2010 at 4:15 PM, Koert Kuipers
koert.kuip...@diamondnotch.com wrote:
It does not have a c extension as far as I know
-Original
Not sure if this is the cause, but do all of your nodes have the same seed
list? Did you bring up the seeds first?
- Tyler
On Wed, Oct 27, 2010 at 1:46 PM, Thibaut Britz
thibaut.br...@trendiction.com wrote:
Depending on the range I choose, choosing manually a token will also fail.
(node
Hi all,
When I config Maximum heap size -Xmx4G, the memory will consume to 3.5G.
When I call Perform GC (jconsole), the used memory reduce to 1G.
When I config Maximum heap size -Xmx2G, Cassandra system run well.
Is that Casandra problem?
I want Cassandra use memory more effective. How can I do
Presumably you're on a 32-bit architecture (or at least a 32-bit JVM). 32-bit
processes won't be able to address more than X amount of memory, where X
would usually be = 2GB, and 4GB.
The reason you can't use a full 4GB is that part of the address space is
necessarily reserved by the OS
Could you tell me why Cassandra use memory more than needed?
On Thu, Oct 28, 2010 at 9:15 AM, Nicholas Knight nkni...@runawaynet.comwrote:
Presumably you're on a 32-bit architecture (or at least a 32-bit JVM).
32-bit processes won't be able to address more than X amount of memory,
where X
Cassandra needs all the RAM you can give it so it can cache things for optimum
performance. If you need it to use less, give it less.
-NK
On Oct 28, 2010, at 10:30 AM, JKnight JKnight wrote:
Could you tell me why Cassandra use memory more than needed?
On Thu, Oct 28, 2010 at 9:15 AM,
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
On Wed, Oct 27, 2010 at 9:30 PM, JKnight JKnight beukni...@gmail.com wrote:
Could you tell me why Cassandra use memory more than needed?
On Thu, Oct 28, 2010 at 9:15 AM, Nicholas Knight nkni...@runawaynet.com
wrote:
Thanx Gary.
I was thinking of using range partitioning for breaking the input.
Say, we could have different threads handling diffierent rages - (A-J)
by thread1, (K-P) by thread2. This way, there won't probably be any
chance of collision. But the thread which actually performs the
distribution
35 matches
Mail list logo