This is because of the warm up of Cassandra as it starts. On a start it
will start fetching the rows that were cached: this will have to be loaded
from the disk, as there is nothing in the cache yet. You can read more
about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations
2012/2/13 R. Verlangen ro...@us2.nl
This is because of the warm up of Cassandra as it starts. On a start it
will start fetching the rows that were cached: this will have to be loaded
from the disk, as there is nothing in the cache yet. You can read more
about this at
I also noticed that, Cassandra appears to perform better under a continues
load.
Are you sure the rows you're quering are actually in the cache?
2012/2/13 Franc Carter franc.car...@sirca.org.au
2012/2/13 R. Verlangen ro...@us2.nl
This is because of the warm up of Cassandra as it starts. On a
I actually has the opposite 'problem'. I have a pair of servers that have
been static since mid last week, but have seen performance vary
significantly (x10) for exactly the same query. I hypothesised it was
various caches so I shut down Cassandra, flushed the O/S buffer cache and
then bought
On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
I actually has the opposite 'problem'. I have a pair of servers that have
been static since mid last week, but have seen performance vary
significantly (x10) for exactly the same query. I hypothesised it was
2012/2/13 R. Verlangen ro...@us2.nl
I also noticed that, Cassandra appears to perform better under a continues
load.
Are you sure the rows you're quering are actually in the cache?
I'm making an assumption . . . I don't yet know enough about cassandra to
prove they are in the cache. I have
Yep - I've been looking at these - I don't see anything in iostat/dstat etc
that point strongly to a problem. There is quite a bit of I/O load, but it
looks roughly uniform on slow and fast instances of the queries. The last
compaction ran 4 days ago - which was before I started seeing
I'm making an assumption . . . I don't yet know enough about cassandra to
prove they are in the cache. I have my keycache set to 2 million, and am
only querying ~900,000 keys. so after the first time I'm assuming they are
in the cache.
Note that the key cache only caches the index positions
For one thing, what does ReadStage's pending look like if you
repeatedly run nodetool tpstats on these nodes? If you're simply
bottlenecking on I/O on reads, that is the most easy and direct way to
observe this empirically. If you're saturated, you'll see active close
to maximum at all times, and
On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
I'm making an assumption . . . I don't yet know enough about cassandra
to
prove they are in the cache. I have my keycache set to 2 million, and am
only querying ~900,000 keys. so after the first time I'm
What is your total data size (nodetool info/nodetool ring) per node,
your heap size, and the amount of memory on the system?
--
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)
On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
Yep - I've been looking at these - I don't see anything in iostat/dstat
etc
that point strongly to a problem. There is quite a bit of I/O load, but
it
looks roughly uniform on slow and fast instances of
On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
For one thing, what does ReadStage's pending look like if you
repeatedly run nodetool tpstats on these nodes? If you're simply
bottlenecking on I/O on reads, that is the most easy and direct way to
observe
the servers spending 50% of the time in io-wait
Note that I/O wait is not necessarily a good indicator, depending on
situation. In particular if you have multiple drives, I/O wait can
mostly be ignored. Similarly if you have non-trivial CPU usage in
addition to disk I/O, it is also not a good
Yep, the readstage is backlogging consistently - but the thing I am trying
to explain s why it is good sometimes in an environment that is pretty well
controlled - other than being on ec2
So pending is constantly 0? What are the clients? Is it batch jobs
or something similar where there is a
On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
What is your total data size (nodetool info/nodetool ring) per node,
your heap size, and the amount of memory on the system?
2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122
2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122 million keys
heap size is 1867M (default from the AMI I am running)
I'm reading about 900k keys
Ok, so basically a very significant portion of the data fits in page
cache, but not all.
As I was just going
On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
the servers spending 50% of the time in io-wait
Note that I/O wait is not necessarily a good indicator, depending on
situation. In particular if you have multiple drives, I/O wait can
mostly be ignored.
Are there plans to write partitioner based on faster hash alg. instead
of MD5? I did cassandra profiling and lot of time is spent inside MD5
function.
On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller peter.schul...@infidyne.com
wrote:
2 Node cluster, 7.9GB of ram (ec2 m1.large)
RF=2
11GB per node
Quorum reads
122 million keys
heap size is 1867M (default from the AMI I am running)
I'm reading about 900k keys
Ok, so basically a
https://issues.apache.org/jira/browse/CASSANDRA-3772
2012/2/13 Radim Kolar h...@sendmail.cz:
Are there plans to write partitioner based on faster hash alg. instead of
MD5? I did cassandra profiling and lot of time is spent inside MD5 function.
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.10.
Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:
Hi Cassandra Users,
Heard that indexing a field with high cardinality is not good. If we create a
CF to store the index information like indexed field as key and the keys of
original CF as cols in the row. Will there be any performance improvement? Is
this the way secondary indexes are
Hello everybody
I have a very simple cluster containing 2 servers. Replication_factor = 2,
Consistency_level of reads and writes = 1
10.111.1.141datacenter1 rack1 Up Normal 1.5 TB 100.00%
vjpigMzv4KkX3x7z
10.111.1.142datacenter1 rack1 Up Normal 1.41 TB
Hi Nikolay,
Some points that may be useful:
1/ auto_bootstrap = true is used for telling a new node to join the ring
(the cluster). It has nothing to do with hinted handoff
2/ both of your nodes seem to be using the same token? The output indicates
that 100% of your key range is assigned to
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly
tells me that my message looks like spam...
2/ both of your nodes seem to be using the same token? The output indicates
that 100% of your key range is assigned to 10.111.1.141 (and
therefore 10.111.1.142 holds
Hi Guys,
A very trivial question on batch mutation provided by Hector. Is the execution
of the batch sequential? (in the order data is added).
Also say there are 10 operations in a batch and 3rd fails will it try the
remaining 7?
Is execution of batch mutator multi threaded ?
Regards,
Hi all,
I am nursing an overloaded 0.6 cluster through compaction to get its disk
usage under 50%. Many rows' content have been replaced so that after
compaction there will be plenty of room, but a couple of nodes are
currently at 95%.
One strategy I considered is temporarily moving a couple of
My understanding is you expected to see
111:ticks
222:ticks
333:ticks
444:ticks
But instead you are getting
111:ticks
111:quote
222:ticks
222:quote
333:ticks
333:quote
444:ticks
If that is the case things are working as expected.
The slice operation gets a column range. So if you start at
What CL are you reading at ?
Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the
default probability that Read Repair will be running) of the time and CL number
of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2, every request
will involve all nodes.
Heard that indexing a field with high cardinality is not good.
http://www.datastax.com/docs/0.7/data_model/secondary_indexes
Will there be any performance improvement? Is this the way secondary indexes
are maintained?
Updating secondary indexes requires a read and a write.
Also this makes
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly
tells me that my message looks like spam…
Send as text.
What version are you using ?
It looks like you are using the ByteOrderedPartitioner , is that correct ?
I would try to get the repair done first, what was the
Is the execution of the batch sequential? (in the order data is added).
No, parallel see concurrent_writes in cassandra.yaml
Also say there are 10 operations in a batch and 3rd fails will it try the
remaining 7?
http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic
Cheers
if the composite column was rearranged as ticks:111wouldn't the result be as
desired? - Original Message -From: quot;aaron mortonquot;
;aa...@thelastpickle.com
If you want to get all the tick between two integers yes.
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 14/02/2012, at 8:36 AM, Dave Brosius wrote:
if the composite column was rearranged as
ticks:111
wouldn't the result be as
Too easy. Does anybody have a more difficult approach? :) Just kidding.
Thanks, Aaron.
On Mon, Feb 13, 2012 at 11:43 AM, aaron morton aa...@thelastpickle.comwrote:
I am nursing an overloaded 0.6 cluster
Shine on you crazy diamond.
If you have some additional storage available I would:
1)
Hi All
I am using expiring columns in my column family, and need to search for
the rows where a particular column expired (and no longer exists).. I am
using Hector client. How can I make a query to find the rows of my interest?
thanks
asankha
--
Asankha C. Perera
AdroitLogic,
Hi all,
Those in the UK might be interested in the next Cassandra London events:
Monday 20th February
Two talks: Cassandra as an email storage system and CQL - then and now
http://www.meetup.com/Cassandra-London/events/29569461/
Tuesday 6th March
How Netflix uses Cassandra with Adrian
Hi Experts,
My program is such that it queries all keys on Cassandra. I want to do this
as quick as possible, in order to get as close to real-time as possible.
One solution I heard was to use the sstables2json tool, and read the data
in as JSON. I understand that reading from each line in
On Tue, Feb 14, 2012 at 6:06 AM, aaron morton aa...@thelastpickle.comwrote:
What CL are you reading at ?
Quorum
Write ops go to RF number of nodes, read ops go to RF number of nodes 10%
(the default probability that Read Repair will be running) of the time and
CL number of nodes 90% of
Hi
I got the below exception to the system.log after upgrade to 1.0.7 from
1.0.6 version. I am using the same configuration files which I used in 1.0.6
version.
2012-02-14 10:48:12,379 ERROR [AbstractCassandraDaemon] Fatal exception in
thread Thread[OptionalTasks:1,5,main]
Perfect, Aaron, Thanks a lot
From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Tuesday, February 14, 2012 12:54 AM
To: user@cassandra.apache.org
Subject: Re: Secondary indexes and cardinality
Heard that indexing a field with high cardinality is not good.
42 matches
Mail list logo