I think it would be a good idea to add a bit more explanation
storage-conf.xml/wiki regarding the replication factor. It caused some
confusion until we dug around the mail archiveto realize that our
UnavailableExceptions were caused by our incorrect assumption and that RF=1
does NOT mean
Hi all,
Can someone post an example of how to define keyspaces in Cassandra 0.7?
My initial Cassandra node does not load the keyspaces defined at
Cassandra.yaml. Is there a way to define the keyspaces at startup or is
runtime defining an absolute must?
thanks,
BoriS
Defining at runtime is, very intentionally, an absolute must. It
would have been very simple and perhaps user-friendly to add a flag
that loads the schema specified in yaml when cassandra starts up. I
decided against it when implementing the feature because I figured it
would have been a
Yes, as the size of the data on disk increases and the OS cannot avoid disk
seeks the read performance degrades. You can see this in the results from the
original post where the number of keys in the test goes from 10M to 100M the
reads drop from 4,600/s to 200/s. 10M keys in the stress.py
Added: http://wiki.apache.org/cassandra/StorageConfiguration
On Mon, Jul 19, 2010 at 2:55 AM, Dimitry Lvovsky dimi...@reviewpro.comwrote:
I think it would be a good idea to add a bit more explanation
storage-conf.xml/wiki regarding the replication factor. It caused some
confusion until we
Thanks ;-).
On Mon, Jul 19, 2010 at 5:55 PM, Dave Viner davevi...@pobox.com wrote:
Added: http://wiki.apache.org/cassandra/StorageConfiguration
On Mon, Jul 19, 2010 at 2:55 AM, Dimitry Lvovsky dimi...@reviewpro.comwrote:
I think it would be a good idea to add a bit more explanation
Hello all, I'm Oren's partner in crime on all this. I've got a few more numbers
to add.
In an effort to eliminate everything but the scaling issue, I set up a cluster
on dedicated hardware (non-virtualized; 8-core, 16G RAM).
No data was loaded into Cassandra -- 100% of requests were misses.
This may be too much work... but you might consider building an Amazon EC2
AMI of your nodes. This would let others quickly boot up your nodes and run
the stress test against it.
I know you mentioned that you're using Rackspace Cloud. I'm not super
familiar with the internals of RSCloud, but
Another thing: Is the py_stress traffic definitely non-determinstic
such that each client will generate a definitely unique series of
requests? If all clients are deterministically requesting the same
sequence of keys, it would otherwise be plausible that they end up in
effective lock-step, if the
How many physical client machines are running stress.py?
One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.
On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood
Another thing: Is the py_stress traffic definitely non-determinstic
such that each client will generate a definitely unique series of
requests?
The tests were run both with --random and --std 0.1; in both cases, the
key-sequence is non-deterministic.
Cheers,
Dave
On Jul 19, 2010, at
This is absolutely your bottleneck, as Brandon mentioned before. Your client
machine is maxing out at 37K requests per second.
-Original Message-
From: David Schoonover david.schoono...@gmail.com
Sent: Monday, July 19, 2010 12:30pm
To: user@cassandra.apache.org
Subject: Re: Cassandra
One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.
Multiple client processes, or multiple client machines?
In particular, note that the way CPython works, if
stress.py uses multiprocessing if it is present, circumventing the GIL; we ran
the tests with python 2.6.5.
David Schoonover
On Jul 19, 2010, at 1:51 PM, Peter Schuller wrote:
One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with
Multiple client processes, or multiple client machines?
I ran it with both one and two client machines making requests, and ensured the
sum of the request threads across the clients was 50. That was on the cloud. I
am re-running the multi-host test against the 4-node cluster on dedicated
If you put 25 processes on each of the 2 machines, all you are testing is how
fast 50 processes can hit Cassandra... the point of using more machines is that
you can use more processes.
Presumably, for a single machine, there is some limit (K) to the number of
processes that will give you
I'm reading what this thread and I am a little lost, what should the
expected behavioral be?
Should it maintain 53K regardless of nodes?
nodes reads/sec
1 53,000
2 37,000
4 37,000
I ran this test previously on the cloud, with similar results:
nodes reads/sec
1
On Mon, Jul 19, 2010 at 11:02 AM, David Schoonover
david.schoono...@gmail.com wrote:
Multiple client processes, or multiple client machines?
I ran it with both one and two client machines making requests, and ensured
the sum of the request threads across the clients was 50. That was on the
Hi Torsten,
When i run bmt_example, M/R job gets executed, cassandra server gets the
data but it goes as HintedHandoff to 127.0.0.2 and it is trying to send data
to 127.0.0.2 as if 127.0.0.2 is an actual node. When the job was done,
close() stop the StorageService instance. Any idea, why does
When i run bmt_example, M/R job gets executed, cassandra server gets the
data but it goes as HintedHandoff to 127.0.0.2 and it is trying to send data
to 127.0.0.2 as if 127.0.0.2 is an actual node.
Well, it kind of becomes an actual node.
Any idea, why does StorageService
returns 127.0.0.2
Hi,
Being fairly new to Cassandra I have a question on the eventual
consistency. I'm currently performing experiments with a single-node
Cassandra system and a single client. In some of my tests I perform an
update to an existing subcolumn in a row and subsequently read it back
from the same
stress.py uses multiprocessing if it is present, circumventing the GIL; we
ran the tests with python 2.6.5.
Ah, sorry about that. I was mis-remembering because I had to use
threading with pystress because multiprocessing was broken/unavailabie
(can't remember which) on FreeBSD.
I agree with
if your test case is correct then it sounds like a bug to me. With one node,
unless you're writing with CL=0 you should get full consistency.
On Mon, Jul 19, 2010 at 10:14 PM, Hugo h...@unitedgames.com wrote:
Hi,
Being fairly new to Cassandra I have a question on the eventual
consistency.
I'm using CL=QUORUM (=Hector default) for both reads and writes. Most of
the times, the test passes, but sometimes it fails because I get back
the old value. Since the test is single-threaded, I guess it is a bug.
I'll try to reduce the test to something smaller that can be used for
Sorry, mixed signals in my response. I was partially replying to suggestions
that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no
dice there). I also ran the tests with -t50 on multiple tester machines in the
cloud with no change in performance; I've now rerun those
I'll just add that CPU usage hovered around 50% during these tests.
On Jul 19, 2010, at 3:51 PM, David Schoonover wrote:
Sorry, mixed signals in my response. I was partially replying to suggestions
that we were limited by the box's NIC or DC's bandwidth (which is gigabit, no
dice there). I
See my test case attached below. In my setup it usually fails around the
800th try...
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Random;
import me.prettyprint.cassandra.service.CassandraClient;
I'm about to extend my two node cluster with four dedicated nodes and
removing one of the old nodes, leaving a five node cluster. The
cluster is in production, but I can spare it to do some stress testing
in the meantime as I'm also interested about my cluster performance. I
can't dedicate the
Thanks a ton, Juho.
The command was:
./stress.py -o read -t 50 -d $NODELIST -n 7500 -k -i 2
I made a few minor modifications to stress.py to count errors instead of
logging them, and avoid the pointless try-catch on missing keys. (There are
also unrelated edits to restart long
Did you see about equal CPU usage on the cassandra nodes during the
test? Is it possible that most or all of the keys generated by
stress.py simply fall on a single node?
CPU was approximately equal across the cluster; it was around 50%.
stress.py generates keys randomly or using a gaussian
Now keep adding clients until it stops making the numbers go up...
On Mon, Jul 19, 2010 at 2:51 PM, David Schoonover
david.schoono...@gmail.com wrote:
Sorry, mixed signals in my response. I was partially replying to suggestions
that we were limited by the box's NIC or DC's bandwidth (which is
CPU was approximately equal across the cluster; it was around 50%.
stress.py generates keys randomly or using a gaussian distribution, both
methods showed the same results.
Finally, we're using a random partitioner, so Cassandra will hash the keys
using md5 to map it to a position on the
When the test fails what value does the verify array have ? Is it null
or a previous value?AaronOn 20 Jul, 2010,at 08:22 AM, Hugo h...@unitedgames.com wrote:
See my test case attached below. In my setup it usually fails around
the 800th try...
import java.util.ArrayList;
import
What gets logged on the old nodes at debug, when you try to add a
single new machine after a full cluster restart?
Removing Location would blow away the nodes' token information... It
should be safe if you set the InitialToken to what it used to be on
each machine before bringing it up after
cassandra get system.LocationInfo['L']
Exception Internal error processing get_slice
What's wrong?
Thanks.
Shen
Hi all,
I am new to Cassandra...
I want to use to cassandra for a billing system.
As I saw in many places that Joins won't work in BigTable implementation but
i feel i needed it for my App.
I am unable to get the data from multiple tables (columnFamilies) like
products and inventory
As I am
Hi, Stuart,
If I may paraphrase what Jonathan said, typically your batch_mutate
operation is idempotent.
That is, you can replay / retry the same operation within a short timeframe
without any undesirable side effect.
The assumption behind the short timeframe here refers to: there is no
other
Cassandra may not be the best fit for a billing system. I'm guessing the lack of transactions would be a problem if you want to update inventory levels.If you want to get data from multiple column families you will need to make multiple calls, or de-normalise the data so you can get all the data
It's the previous value. I've checked.
Groets, Hugo.
On 20 jul 2010, at 00:19, Aaron Morton aa...@thelastpickle.com wrote:
When the test fails what value does the verify array have ? Is it
null or a previous value?
Aaron
On 20 Jul, 2010,at 08:22 AM, Hugo h...@unitedgames.com wrote:
See
39 matches
Mail list logo