Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Tyler Hobbs
On Fri, Dec 12, 2014 at 4:50 PM, Eric Stevens wrote: > > I know that Thrift includes keyspace as part of the connection details, so > if you're reading or writing to many keyspaces, you'll end up having to > make a lot of additional round trips, and it will hurt your throughput. I > may be wrong

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Eric Stevens
Well we started with the thought that we'd have two keyspaces, one for searchables and one for non-searchables like you mentioned. But our concern is that we may change our mind about what column families are available for search in the future. Separate keyspaces per table give us greater flexibi

Re: `nodetool cfhistogram` utility script

2014-12-12 Thread Matt Brown
You can also collect these stats from the server via JMX, I believe the name of the MBean object is org.apache.cassandra.metrics:type=ColumnFamily,keyspace=,scope=,name=SSTablesPerReadHistogram where KEYSPACE is your keyspace and SCOPE is Read or Write. This has attributes for 50thPercentile, 7

Re: nodetool breaks on firewall ?

2014-12-12 Thread Ryan Svihla
well did you restart cassandra after changing the JVM_OPTS to match your desired address? On Fri, Dec 12, 2014 at 2:34 PM, Kevin Burton wrote: > > Oh. and if I specify —host it still doesn’t work. Very weird. > > On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton wrote: > >> OK..I’m stracing it and

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
OK..I’m stracing it and it’s definitely trying to connect to 173… here’s the log line below. (anonymized). the question is why.. is cassandra configured to return something on the public address via JMX? I guess I could dump all of JMX metrics and figure it out. [pid 32331] connect(41, {sa_famil

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
Oh. and if I specify —host it still doesn’t work. Very weird. On Fri, Dec 12, 2014 at 12:33 PM, Kevin Burton wrote: > OK..I’m stracing it and it’s definitely trying to connect to 173… here’s > the log line below. (anonymized). > > the question is why.. is cassandra configured to return somethi

Re: nodetool breaks on firewall ?

2014-12-12 Thread Ryan Svihla
hmm I was hoping it was changed in 2.1 https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/tools/NodeTool.java but still localhost, sorry I can't tell you why it would go to the public interface..maybe someone added a shell alias? On Fri, Dec 12, 2014 at 2:20 PM,

Re: nodetool breaks on firewall ?

2014-12-12 Thread Ryan Svihla
is appears to be localhost, I imagine the issue is more you changed the rpc_address to not be localhost anymore https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/tools/NodeCmd.java lines 87 and 88 private static final String DEFAULT_HOST = "127.0.0.1"; private s

Re: nodetool breaks on firewall ?

2014-12-12 Thread Kevin Burton
AH! … ok. I didn’t see that nodetool took a host. Hm.. How does it determine the host to read from by default? The problem is that somehow it wants to read from the public interface (which is fire walled) On Fri, Dec 12, 2014 at 5:19 AM, Ryan Svihla wrote: > yes the node needs to restart to ha

Re: upgrade cassandra from 2.0.6 to 2.1.2

2014-12-12 Thread Robert Coli
On Tue, Dec 9, 2014 at 11:26 PM, Jonathan Haddad wrote: > Yes. It is, in general, a best practice to upgrade to the latest bug fix > release before doing an upgrade to the next point release. AFAIK, no one is formally testing upgrades from the middle of a series, so... +1, users should consid

Fwd: getting column names

2014-12-12 Thread Stephen Jones
Hello there - I'm using the python-driver to get my queried rows with a row factory that's a dictionary. When I get back my row list, each list item is a dictionary, but the keys are hashes. Is there any way use the column family metadata to decrypt the dictionary keys to their original column nam

Re: `nodetool cfhistogram` utility script

2014-12-12 Thread Jonathan Haddad
Hey Jens, Unfortunately the output of the nodetool histograms changes between versions. While I think your script is useful, it's likely to break between versions. You might be interested to weigh in on the JIRA ticket to make the nodetool output machine friendly: https://issues.apache.org/jira/

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Jonathan Haddad
The really important thing to really take away from Ryan's original post is that batches are not there for performance. The only case I consider batches to be useful for is when you absolutely need to know that several tables all get a mutation (via logged batches). The use case for this is when

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Ryan Svihla
It would make more sense to just have a keyspace for each. Something like solr_tables, and cassandra_tables. I've done similar with most customers using DSE search (not a DSE mailing list, but the information is interesting background for your question). there is a cost to each keyspace and you'll

Re: Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Ryan Svihla
Clarification "keyspace for each" should be "keyspace for cassandra tables and solr tables" On Fri, Dec 12, 2014 at 11:25 AM, Ryan Svihla wrote: > > It would make more sense to just have a keyspace for each. Something like > solr_tables, and cassandra_tables. I've done similar with most customers

Using Per-Table Keyspaces for Tunable Replication

2014-12-12 Thread Eric Stevens
We're considering moving to a model where we put each of our tables in a dedicated keyspace. This is so we can tune replication per table, and change our mind about that replication on a per-table basis without a major migration. The biggest driver for this is Solr integration, we want to tune RF

`nodetool cfhistogram` utility script

2014-12-12 Thread Jens Rantil
Hi, I just quickly put together a tiny utility script to estimate average/mean/min/max/percentiles for `nodetool cfhistogram` latency output. Maybe could be useful to someone else, don’t know. You can find it here: https://gist.github.com/JensRantil/3da67e39f50aaf4f5bce Future improvements

Re: nodetool breaks on firewall ?

2014-12-12 Thread Ryan Svihla
yes the node needs to restart to have cassandra-env.sh take effect, and the links you're providing are about making cassandra's JMX bind to the interface you want, so nodetool isn't really the issue, nodetool can just take an ip argument to connect to the interface you desire.Something like: nodet

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Ryan Svihla
Any insert, update, or delete On Fri, Dec 12, 2014 at 1:31 AM, Jens Rantil wrote: > > Maybe slightly off-topic, but what is a mutation? Is it equivalent to a > CQL row? Or maybe a column in a row? Does include tombstones within the > selected range? > > Thanks, > Jens > > > > On Thu, Dec 11, 2014

Re: batch_size_warn_threshold_in_kb

2014-12-12 Thread Ryan Svihla
It's a rough observation and estimate, nothing more. In other words, some clusters can handle more, some can't, it depends on how many writes per second you're doing, cluster sizing, how far over that 5kb limit you are, heap size, disk IO, cpu speed, and many more factors. This is why it's just a w

Re: Get column family size

2014-12-12 Thread Ryan Svihla
What version are you on (key estimate I see in 1.2 and 2.0) ? What size is your heap (ideally 8GB, can be lower, but it requires a lot of tuning)? What kind of disk do you have (SANs are going to cause you problems)? Assuming all of those are the right answer, then you have the following options to