Re: Large number of row keys in query kills cluster

2014-06-12 Thread Peter Sanford
On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky j...@basetechnology.com wrote: Hmmm... that multipl-gets section is not present in the 2.0 doc: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Was that intentional – is that

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Jeremy Jongsma
Good to know, thanks Peter. I am worried about client-to-node latency if I have to do 20,000 individual queries, but that makes it clearer that at least batching in smaller sizes is a good idea. On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford psanf...@retailnext.net wrote: On Wed, Jun 11, 2014

Re: Large number of row keys in query kills cluster

2014-06-12 Thread Laing, Michael
Just an FYI, my benchmarking of the new python driver, which uses the asynchronous CQL native transport, indicates that one can largely overcome client-to-node latency effects if you employ a suitable level of concurrency and non-blocking techniques. Of course response size and other factors come

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
I'm using Astyanax with a query like this: clusterContext .getClient() .getKeyspace(instruments) .prepareQuery(INSTRUMENTS_CF) .setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM) .getKeySlice(new String[] { ROW1, ROW2, // 20,000 keys here... ROW2 })

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jeremy Jongsma
The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100 sequential queries fixed the performance issue. When updating 20K rows at a time, I

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Robert Coli
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com wrote: Is there any documentation on this? Obviously these limits will vary by cluster capacity, but for new users it would be great to know that you can run into problems with large queries, and how they present themselves

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Peter Sanford
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com wrote: The big problem seems to have been requesting a large number of row keys combined with a large number of named columns in a query. 20K rows with 20K columns destroyed my cluster. Splitting it into slices of 100

Re: Large number of row keys in query kills cluster

2014-06-11 Thread Jack Krupansky
batches” as an anti-pattern: http://www.slideshare.net/mattdennis -- Jack Krupansky From: Peter Sanford Sent: Wednesday, June 11, 2014 7:34 PM To: user@cassandra.apache.org Subject: Re: Large number of row keys in query kills cluster On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer

Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I ran an application today that attempted to fetch 20,000+ unique row keys in one query against a set of completely empty column families. On a 4-node cluster (EC2 m1.large instances) with the recommended memory settings (2 GB heap), every single node immediately ran out of memory and became

Re: Large number of row keys in query kills cluster

2014-06-10 Thread DuyHai Doan
Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan on all the partitions across the cluster, it's normal that the nodes are somehow stressed. How did you make the query? Are you using Thrift or CQL3 API? Please note that there is another way to get

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Jeremy Jongsma
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting in a full scan), I'm requesting 2 specific rows by key. On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote: Hello Jeremy Basically what you are doing is to ask Cassandra to do a distributed full scan

Re: Large number of row keys in query kills cluster

2014-06-10 Thread Laing, Michael
Perhaps if you described both the schema and the query in more detail, we could help... e.g. did the query have an IN clause with 2 keys? Or is the key compound? More detail will help. On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com wrote: I didn't explain clearly - I'm