On Wed, Jun 11, 2014 at 9:17 PM, Jack Krupansky j...@basetechnology.com
wrote:
Hmmm... that multipl-gets section is not present in the 2.0 doc:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html
Was that intentional – is that
Good to know, thanks Peter. I am worried about client-to-node latency if I
have to do 20,000 individual queries, but that makes it clearer that at
least batching in smaller sizes is a good idea.
On Wed, Jun 11, 2014 at 6:34 PM, Peter Sanford psanf...@retailnext.net
wrote:
On Wed, Jun 11, 2014
Just an FYI, my benchmarking of the new python driver, which uses the
asynchronous CQL native transport, indicates that one can largely overcome
client-to-node latency effects if you employ a suitable level of
concurrency and non-blocking techniques.
Of course response size and other factors come
I'm using Astyanax with a query like this:
clusterContext
.getClient()
.getKeyspace(instruments)
.prepareQuery(INSTRUMENTS_CF)
.setConsistencyLevel(ConsistencyLevel.CL_LOCAL_QUORUM)
.getKeySlice(new String[] {
ROW1,
ROW2,
// 20,000 keys here...
ROW2
})
The big problem seems to have been requesting a large number of row keys
combined with a large number of named columns in a query. 20K rows with 20K
columns destroyed my cluster. Splitting it into slices of 100 sequential
queries fixed the performance issue.
When updating 20K rows at a time, I
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com
wrote:
Is there any documentation on this? Obviously these limits will vary by
cluster capacity, but for new users it would be great to know that you can
run into problems with large queries, and how they present themselves
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer...@barchart.com
wrote:
The big problem seems to have been requesting a large number of row keys
combined with a large number of named columns in a query. 20K rows with 20K
columns destroyed my cluster. Splitting it into slices of 100
batches” as an anti-pattern:
http://www.slideshare.net/mattdennis
-- Jack Krupansky
From: Peter Sanford
Sent: Wednesday, June 11, 2014 7:34 PM
To: user@cassandra.apache.org
Subject: Re: Large number of row keys in query kills cluster
On Wed, Jun 11, 2014 at 10:12 AM, Jeremy Jongsma jer
I ran an application today that attempted to fetch 20,000+ unique row keys
in one query against a set of completely empty column families. On a 4-node
cluster (EC2 m1.large instances) with the recommended memory settings (2 GB
heap), every single node immediately ran out of memory and became
Hello Jeremy
Basically what you are doing is to ask Cassandra to do a distributed full
scan on all the partitions across the cluster, it's normal that the nodes
are somehow stressed.
How did you make the query? Are you using Thrift or CQL3 API?
Please note that there is another way to get
I didn't explain clearly - I'm not requesting 2 unknown keys (resulting
in a full scan), I'm requesting 2 specific rows by key.
On Jun 10, 2014 6:02 PM, DuyHai Doan doanduy...@gmail.com wrote:
Hello Jeremy
Basically what you are doing is to ask Cassandra to do a distributed full
scan
Perhaps if you described both the schema and the query in more detail, we
could help... e.g. did the query have an IN clause with 2 keys? Or is
the key compound? More detail will help.
On Tue, Jun 10, 2014 at 7:15 PM, Jeremy Jongsma jer...@barchart.com wrote:
I didn't explain clearly - I'm
12 matches
Mail list logo