Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
This is because of the warm up of Cassandra as it starts. On a start it will start fetching the rows that were cached: this will have to be loaded from the disk, as there is nothing in the cache yet. You can read more about this at http://wiki.apache.org/cassandra/LargeDataSetConsiderations

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen ro...@us2.nl This is because of the warm up of Cassandra as it starts. On a start it will start fetching the rows that were cached: this will have to be loaded from the disk, as there is nothing in the cache yet. You can read more about this at

Re: keycache persisted to disk ?

2012-02-13 Thread R. Verlangen
I also noticed that, Cassandra appears to perform better under a continues load. Are you sure the rows you're quering are actually in the cache? 2012/2/13 Franc Carter franc.car...@sirca.org.au 2012/2/13 R. Verlangen ro...@us2.nl This is because of the warm up of Cassandra as it starts. On a

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
I actually has the opposite 'problem'. I have a pair of servers that have been static since mid last week, but have seen performance vary significantly (x10) for exactly the same query. I hypothesised it was various caches so I shut down Cassandra, flushed the O/S buffer cache and then bought

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:21 PM, Peter Schuller peter.schul...@infidyne.com wrote: I actually has the opposite 'problem'. I have a pair of servers that have been static since mid last week, but have seen performance vary significantly (x10) for exactly the same query. I hypothesised it was

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
2012/2/13 R. Verlangen ro...@us2.nl I also noticed that, Cassandra appears to perform better under a continues load. Are you sure the rows you're quering are actually in the cache? I'm making an assumption . . . I don't yet know enough about cassandra to prove they are in the cache. I have

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
Yep - I've been looking at these - I don't see anything in iostat/dstat etc that point strongly to a problem. There is quite a bit of I/O load, but it looks roughly uniform on slow and fast instances of the queries. The last compaction ran 4 days ago - which was before I started seeing

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
I'm making an assumption . . .  I don't yet know enough about cassandra to prove they are in the cache. I have my keycache set to 2 million, and am only querying ~900,000 keys. so after the first time I'm assuming they are in the cache. Note that the key cache only caches the index positions

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
For one thing, what does ReadStage's pending look like if you repeatedly run nodetool tpstats on these nodes? If you're simply bottlenecking on I/O on reads, that is the most easy and direct way to observe this empirically. If you're saturated, you'll see active close to maximum at all times, and

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:49 PM, Peter Schuller peter.schul...@infidyne.com wrote: I'm making an assumption . . . I don't yet know enough about cassandra to prove they are in the cache. I have my keycache set to 2 million, and am only querying ~900,000 keys. so after the first time I'm

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
What is your total data size (nodetool info/nodetool ring) per node, your heap size, and the amount of memory on the system? -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:48 PM, Peter Schuller peter.schul...@infidyne.com wrote: Yep - I've been looking at these - I don't see anything in iostat/dstat etc that point strongly to a problem. There is quite a bit of I/O load, but it looks roughly uniform on slow and fast instances of

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 7:51 PM, Peter Schuller peter.schul...@infidyne.com wrote: For one thing, what does ReadStage's pending look like if you repeatedly run nodetool tpstats on these nodes? If you're simply bottlenecking on I/O on reads, that is the most easy and direct way to observe

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
the servers spending 50% of the time in io-wait Note that I/O wait is not necessarily a good indicator, depending on situation. In particular if you have multiple drives, I/O wait can mostly be ignored. Similarly if you have non-trivial CPU usage in addition to disk I/O, it is also not a good

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
Yep, the readstage is backlogging consistently - but the thing I am trying to explain s why it is good sometimes in an environment that is pretty well controlled - other than being on ec2 So pending is constantly 0? What are the clients? Is it batch jobs or something similar where there is a

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:00 PM, Peter Schuller peter.schul...@infidyne.com wrote: What is your total data size (nodetool info/nodetool ring) per node, your heap size, and the amount of memory on the system? 2 Node cluster, 7.9GB of ram (ec2 m1.large) RF=2 11GB per node Quorum reads 122

Re: keycache persisted to disk ?

2012-02-13 Thread Peter Schuller
2 Node cluster, 7.9GB of ram (ec2 m1.large) RF=2 11GB per node Quorum reads 122 million keys heap size is 1867M (default from the AMI I am running) I'm reading about 900k keys Ok, so basically a very significant portion of the data fits in page cache, but not all. As I was just going

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:09 PM, Peter Schuller peter.schul...@infidyne.com wrote: the servers spending 50% of the time in io-wait Note that I/O wait is not necessarily a good indicator, depending on situation. In particular if you have multiple drives, I/O wait can mostly be ignored.

murmurhash partitioner

2012-02-13 Thread Radim Kolar
Are there plans to write partitioner based on faster hash alg. instead of MD5? I did cassandra profiling and lot of time is spent inside MD5 function.

Re: keycache persisted to disk ?

2012-02-13 Thread Franc Carter
On Mon, Feb 13, 2012 at 8:15 PM, Peter Schuller peter.schul...@infidyne.com wrote: 2 Node cluster, 7.9GB of ram (ec2 m1.large) RF=2 11GB per node Quorum reads 122 million keys heap size is 1867M (default from the AMI I am running) I'm reading about 900k keys Ok, so basically a

Re: murmurhash partitioner

2012-02-13 Thread Sylvain Lebresne
https://issues.apache.org/jira/browse/CASSANDRA-3772 2012/2/13 Radim Kolar h...@sendmail.cz: Are there plans to write partitioner based on faster hash alg. instead of MD5? I did cassandra profiling and lot of time is spent inside MD5 function.

[RELEASE] Apache Cassandra 0.8.10 released

2012-02-13 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra version 0.8.10. Cassandra is a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. You can read more here:

Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Hi Cassandra Users, Heard that indexing a field with high cardinality is not good. If we create a CF to store the index information like indexed field as key and the keys of original CF as cols in the row. Will there be any performance improvement? Is this the way secondary indexes are

How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Hello everybody I have a very simple cluster containing 2 servers. Replication_factor = 2, Consistency_level of reads and writes = 1 10.111.1.141datacenter1 rack1 Up Normal 1.5 TB 100.00% vjpigMzv4KkX3x7z 10.111.1.142datacenter1 rack1 Up Normal 1.41 TB

Re: How to bring cluster to consistency

2012-02-13 Thread Dominic Williams
Hi Nikolay, Some points that may be useful: 1/ auto_bootstrap = true is used for telling a new node to join the ring (the cluster). It has nothing to do with hinted handoff 2/ both of your nodes seem to be using the same token? The output indicates that 100% of your key range is assigned to

Re: How to bring cluster to consistency

2012-02-13 Thread Nikolay Kоvshov
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly tells me that my message looks like spam... 2/ both of your nodes seem to be using the same token? The output indicates that 100% of your key range is assigned to 10.111.1.141 (and therefore 10.111.1.142 holds

Hector and batch mutation

2012-02-13 Thread Tiwari, Dushyant
Hi Guys, A very trivial question on batch mutation provided by Hector. Is the execution of the batch sequential? (in the order data is added). Also say there are 10 operations in a batch and 3rd fails will it try the remaining 7? Is execution of batch mutator multi threaded ? Regards,

SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Hi all, I am nursing an overloaded 0.6 cluster through compaction to get its disk usage under 50%. Many rows' content have been replaced so that after compaction there will be plenty of room, but a couple of nodes are currently at 95%. One strategy I considered is temporarily moving a couple of

Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
My understanding is you expected to see 111:ticks 222:ticks 333:ticks 444:ticks But instead you are getting 111:ticks 111:quote 222:ticks 222:quote 333:ticks 333:quote 444:ticks If that is the case things are working as expected. The slice operation gets a column range. So if you start at

Re: active/pending queue lengths

2012-02-13 Thread aaron morton
What CL are you reading at ? Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the default probability that Read Repair will be running) of the time and CL number of nodes 90% of the time. With 2 nodes and RF 2 the QUOURM is 2, every request will involve all nodes.

Re: Secondary indexes and cardinality

2012-02-13 Thread aaron morton
Heard that indexing a field with high cardinality is not good. http://www.datastax.com/docs/0.7/data_model/secondary_indexes Will there be any performance improvement? Is this the way secondary indexes are maintained? Updating secondary indexes requires a read and a write. Also this makes

Re: How to bring cluster to consistency

2012-02-13 Thread aaron morton
Sorry if this is a 4th copy of letter, but cassandra.apache.org constantly tells me that my message looks like spam… Send as text. What version are you using ? It looks like you are using the ByteOrderedPartitioner , is that correct ? I would try to get the repair done first, what was the

Re: Hector and batch mutation

2012-02-13 Thread aaron morton
Is the execution of the batch sequential? (in the order data is added). No, parallel see concurrent_writes in cassandra.yaml Also say there are 10 operations in a batch and 3rd fails will it try the remaining 7? http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic Cheers

Re: problem with sliceQuery with composite column

2012-02-13 Thread Dave Brosius
if the composite column was rearranged as ticks:111wouldn't the result be as desired? - Original Message -From: quot;aaron mortonquot; ;aa...@thelastpickle.com

Re: problem with sliceQuery with composite column

2012-02-13 Thread aaron morton
If you want to get all the tick between two integers yes. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 14/02/2012, at 8:36 AM, Dave Brosius wrote: if the composite column was rearranged as ticks:111 wouldn't the result be as

Re: SSTable symlinks

2012-02-13 Thread Dan Retzlaff
Too easy. Does anybody have a more difficult approach? :) Just kidding. Thanks, Aaron. On Mon, Feb 13, 2012 at 11:43 AM, aaron morton aa...@thelastpickle.comwrote: I am nursing an overloaded 0.6 cluster Shine on you crazy diamond. If you have some additional storage available I would: 1)

Querying for rows without a particular column

2012-02-13 Thread Asankha C. Perera
Hi All I am using expiring columns in my column family, and need to search for the rows where a particular column expired (and no longer exists).. I am using Hector client. How can I make a query to find the rows of my interest? thanks asankha -- Asankha C. Perera AdroitLogic,

London meetup - upcoming events

2012-02-13 Thread Dave Gardner
Hi all, Those in the UK might be interested in the next Cassandra London events: Monday 20th February Two talks: Cassandra as an email storage system and CQL - then and now http://www.meetup.com/Cassandra-London/events/29569461/ Tuesday 6th March How Netflix uses Cassandra with Adrian

Querying all keys in a column family

2012-02-13 Thread Martin Arrowsmith
Hi Experts, My program is such that it queries all keys on Cassandra. I want to do this as quick as possible, in order to get as close to real-time as possible. One solution I heard was to use the sstables2json tool, and read the data in as JSON. I understand that reading from each line in

Re: active/pending queue lengths

2012-02-13 Thread Franc Carter
On Tue, Feb 14, 2012 at 6:06 AM, aaron morton aa...@thelastpickle.comwrote: What CL are you reading at ? Quorum Write ops go to RF number of nodes, read ops go to RF number of nodes 10% (the default probability that Read Repair will be running) of the time and CL number of nodes 90% of

Got fatal exception after upgrade to 1.0.7 from 1.0.6

2012-02-13 Thread Roshan
Hi I got the below exception to the system.log after upgrade to 1.0.7 from 1.0.6 version. I am using the same configuration files which I used in 1.0.6 version. 2012-02-14 10:48:12,379 ERROR [AbstractCassandraDaemon] Fatal exception in thread Thread[OptionalTasks:1,5,main]

RE: Secondary indexes and cardinality

2012-02-13 Thread Tiwari, Dushyant
Perfect, Aaron, Thanks a lot From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Tuesday, February 14, 2012 12:54 AM To: user@cassandra.apache.org Subject: Re: Secondary indexes and cardinality Heard that indexing a field with high cardinality is not good.