multitenant support with key spaces

2013-05-06 Thread Darren Smythe
How many keyspaces can you reasonably have? We have around 500 customers and expect that to double end of year. We're looking into C* and wondering if it makes sense for a separate KS per customer? If we have 1000 customers, so one KS per customer is 1000 keyspaces. Is that something C* can

Re: Cassandra won't restart : 7365....6c73 is not defined as a collection

2013-05-06 Thread aaron morton
Do you have the table definitions ? Any example data? Something is confused about a set / map / list type. It's failing when replying the log, if you want to work around move the commit log file out of the directory. There is a chance of data loss if this row mutation is being replied on all

Re: Repair session failed

2013-05-06 Thread aaron morton
Can your raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA and update the thread with the link? Please include: * nodetool status * nodetool ring (so we have all the token assignments) * The IP you started repair on * As much log as you can share, if you can run DEBUG for the

Re: How does a healthy node look like?

2013-05-06 Thread aaron morton
Confirm if your write timeouts are client side socket time outs or the TimedOutException from the server. Typically write latency is related to GC problems, like you are seeing. I'm unsure how much CPU resources each cassandra instance has. Is there one node on a machine with 6 cores ? How

Re: Slow retrieval using secondary indexes

2013-05-06 Thread aaron morton
cqlsh:Sessions select * from Items where mahoutItemid = 610866442877251584; key| mahoutItemid + 687474703a2f2f6573706f7| 610866442877251584 unsupported operand type(s) for /: 'NoneType' and 'float' Can you put together

Re: Error on Range queries

2013-05-06 Thread aaron morton
Bad Request: No indexed columns present in by-columns clause with Equal operator Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. My query is: select * from temp where min_update 10 limit 5; You have to have at least one indexes column in the where clause that

Re: Cassandra multi-datacenter

2013-05-06 Thread aaron morton
The broadcast_address can be set manually without using the EC2MultiRegionSnitch. It's the address the node wants other nodes to talk to it on http://www.datastax.com/docs/1.2/configuration/node_configuration#broadcast-address You may find it easier to run a VPN between the colo nodes and the

Re: How much heap does Cassandra 1.1.11 really need ?

2013-05-06 Thread aaron morton
My general I can haz heap space? approach. * determine total row count for the node from cfstats * determine if wide (10's of MB) rows are in use * determine total bloom filter space for the node from cfstats * enable full GC logging as cassandra-env.sh * determine tenured heap low point not

Re: multitenant support with key spaces

2013-05-06 Thread Brian O'Neill
You may want to look at using virtual keyspaces: http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html And follow these tickets: http://wiki.apache.org/cassandra/MultiTenant -brian On May 6, 2013, at 2:37 AM, Darren Smythe wrote: How many keyspaces can you

Re: hector or astyanax

2013-05-06 Thread Hiller, Dean
I was under the impression that it is multiple requests using a single connectin PARALLEL not serial as they have request ids and the responses do as well so you can send a request while a previous request has no response just yet. I think you do get a big speed advantage from the asynchronous

RE: Node went down and came back up

2013-05-06 Thread Dan Kogan
It seems that we did not have the JMX ports (1024+) opened in our firewall. Once we opened ports 1024+ the hinted handoffs completed and it seems that the cluster went back to normal. Does that make sense? Thanks, Dan This is what we saw in the logs after opening the ports: INFO

Cleanup the peers columnfamily

2013-05-06 Thread Shahryar Sedghi
I had a 4 node cluster in my dev environment and due to resource limitation, I had to remove two nodes. Nodetool status shows only two nodes on both machines , but peers table on one machine still shows entries of the nodes with a null rpc address. Thrift has no problem with it but new Binary

Re: Cleanup the peers columnfamily

2013-05-06 Thread Sylvain Lebresne
What version of Cassandra are you using. If you're using 1.2.0 (or *were* using 1.2.0 when the 2 nodes were removed), you might be seeing https://issues.apache.org/jira/browse/CASSANDRA-5167. Or I have to delete the row in the table That should work. On Mon, May 6, 2013 at 4:22 PM, Shahryar

Re: Hadoop jobs and data locality

2013-05-06 Thread cscetbon.ext
Unfortunately I've just tried with a new cluster with RandomPartitioner and it doesn't work better : it may come from hadoop/pig modifications : 18:02:53|elia:hadoop cyril$ git diff --stat cassandra-1.1.5..cassandra-1.2.1 . .../apache/cassandra/hadoop/BulkOutputFormat.java | 27 +--

Re: hector or astyanax

2013-05-06 Thread Aaron Turner
Just because you can batch queries or have the server process them out of order doesn't make it fully parellel. You're still using a single TCP connection which is by definition a serial data stream. Basically, if you send a bunch of queries which each return a large amount of data you've

Re: Cassandra won't restart : 7365....6c73 is not defined as a collection

2013-05-06 Thread Blair Zajac
Hi Aaron, The keyspace consistent of 3 column families for user management, see below. I have dropped these tables multiple times since I'm testing a script to automatically create the column families if they do not exists. I have also been changing types, e.g. lock_tokens__ from MAPUUID,

Re: multitenant support with key spaces

2013-05-06 Thread Robert Coli
On Sun, May 5, 2013 at 11:37 PM, Darren Smythe darren1...@gmail.com wrote: How many keyspaces can you reasonably have? Very Low Hundreds, though this relates more to CFs than Ks. If we have 1000 customers, so one KS per customer is 1000 keyspaces. Is that something C* can handle efficiently?

Re: Node went down and came back up

2013-05-06 Thread Robert Coli
On Mon, May 6, 2013 at 6:20 AM, Dan Kogan d...@iqtell.com wrote: It seems that we did not have the JMX ports (1024+) opened in our firewall. Once we opened ports 1024+ the hinted handoffs completed and it seems that the cluster went back to normal. Does that make sense? No, JMX should not

Re: SSTables not opened on new cluste

2013-05-06 Thread Robert Coli
On Sat, May 4, 2013 at 5:41 AM, Philippe watche...@gmail.com wrote: After trying every possible combination of parameters, config and the rest, I ended up downgrading the new node from 1.1.11 to 1.1.2 to match the existing 3 nodes. And that solved the issue immediately : the schema was

Re: Cassandra running High Load with no one using the cluster

2013-05-06 Thread Robert Coli
On Sat, May 4, 2013 at 9:22 PM, Aiman Parvaiz ai...@grapheffect.com wrote: We are using cassandra 1.1.0 and open-6-jdk 1.1.0 has significant issues, including non-working Hinted Handoff. Also, OpenJDK is not officially supported. Upgrade to 1.1.11 and Sun JDK. =Rob

Re: hector or astyanax

2013-05-06 Thread Hiller, Dean
You have me thinking more. I wonder in practice if 3 sockets is any faster than 1 socket when doing nio. If your buffer sizes were small, maybe that would be the case. Usually the nic buffers are big so when the selector fires it is reading from 3 buffers for 3 sockets or 1 buffer for one

Re: multitenant support with key spaces

2013-05-06 Thread Hiller, Dean
Another option may be virtual column families with PlayOrm. We currently do around 60,000 column families to store data from 60,000 different sensors that keep feeding us information. Dean On 5/6/13 11:18 AM, Robert Coli rc...@eventbrite.com wrote: On Sun, May 5, 2013 at 11:37 PM, Darren

RE: cost estimate about some Cassandra patchs

2013-05-06 Thread DE VITO Dominique
De : aaron morton [mailto:aa...@thelastpickle.com] Envoyé : dimanche 28 avril 2013 22:54 À : user@cassandra.apache.org Objet : Re: cost estimate about some Cassandra patchs Does anyone know enough of the inner working of Cassandra to tell me how much work is needed to patch Cassandra to

Re: Cassandra running High Load with no one using the cluster

2013-05-06 Thread Aiman Parvaiz
Correction, there was a typo in my original question, we are running cassandra 1.1.10 Thanks and sorry for the inconvenience. On May 6, 2013, at 10:23 AM, Robert Coli rc...@eventbrite.com wrote: including non-working Hinted Handoff

Re: hector or astyanax

2013-05-06 Thread Aaron Turner
From my experience, your NIC buffers generally aren't the problem (or at least it's easy to tune them to fix). It's TCP. Simply put, your raw NIC throughput single TCP socket throughput on most modern hardware/OS combinations. This is especially true as latency increases between the two hosts.

Re:Hadoop jobs and data locality

2013-05-06 Thread Shamim
I think It will be better to open a issue in jira Best regards Shamim A. Unfortunately I've just tried with a new cluster with RandomPartitioner and it doesn't work better : it may come from hadoop/pig modifications : 18:02:53|elia:hadoop cyril$ git diff -- stat

Re: hector or astyanax

2013-05-06 Thread Derek Williams
Also have to keep in mind that it should be rare to only use a single socket since you are usually making at least 1 connection per node in the cluster (or local datacenter). There is also nothing enforcing that a single client cannot open more than 1 connection to a node. In the end it should

RE: Node went down and came back up

2013-05-06 Thread Dan Kogan
Thanks. So then, Hinted Handoff should be sent over port 7000 (or 7001 with SSL), correct? -Original Message- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Monday, May 06, 2013 1:19 PM To: user@cassandra.apache.org Subject: Re: Node went down and came back up On Mon, May 6,

Re: Node went down and came back up

2013-05-06 Thread Robert Coli
On Mon, May 6, 2013 at 12:31 PM, Dan Kogan d...@iqtell.com wrote: Thanks. So then, Hinted Handoff should be sent over port 7000 (or 7001 with SSL), correct? Yes, hinted handoff goes over the storage protocol port, which is shared with the gossip port, 7000/1. =Rob

Re: Cassandra running High Load with no one using the cluster

2013-05-06 Thread Bryan Talbot
On Sat, May 4, 2013 at 9:22 PM, Aiman Parvaiz ai...@grapheffect.com wrote: When starting this cluster we set JVM_OPTS=$JVM_OPTS -Xss1000k Why did you increase the stack-size to 5.5 times greater than recommended? Since each threads now uses 1000KB minimum just for the stack, a large

Re: How to use Write Consistency 'ANY' with SSTABLELOADER - DSE Cassandra 1.1.9

2013-05-06 Thread aaron morton
While reading we are planning to use a CL of Quorum. So, we are hoping we will not hit any consistency issues before repair is run. There will be a chance of getting inconsistencies if less then QUORUM nodes were involved in the load for each row. Assuming RF 3, if you have two adjacent nodes

Re: index_interval

2013-05-06 Thread aaron morton
This is the closest I can find in Jira https://issues.apache.org/jira/browse/CASSANDRA-4478 It's a pretty handy tool to have in your tool kit, specially when you start to have over 1 billion rows per node. A - Aaron Morton Freelance Cassandra Consultant New Zealand

Re: Error on Range queries

2013-05-06 Thread himanshu.joshi
Thanks aaron.. -- Regards Himanshu Joshi On 05/06/2013 02:22 PM, aaron morton wrote: Bad Request: No indexed columns present in by-columns clause with Equal operator Perhaps you meant to use CQL 2? Try using the -2 option when starting cqlsh. My query is: select * from temp where min_update