Re: Reasonable range for the max number of tables?
Is there any mention of this limitation anywhere in the Cassandra documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra' section of the DataStax 2.0 documentation or anywhere else. When starting out with Cassandra as a store for a multi-tenant application it seems very attractive to segregate data for each tenant using a tenant specific keyspace each with their own set of tables. It's not until you start browsing through forums such as this that you find out that it isn't going to scale above a few tenants. If you want to be able to segregate customer data in Cassandra is it the accepted practice to have multiple Cassandra installations? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Node stuck during nodetool rebuild
Hello All, We are on 1.2.18 (running on Ubuntu 12.04) and we recently tried to add a second DC on our demo environment, just before trying it on live. The existing DC1 has two nodes which approximately hold 10G of data (RF=2). In order to add the second DC, DC2, we followed this procedure: On DC1 nodes: 1. Changed the Snitch in the cassandra.yaml from default to GossipingPropertyFileSnitch. 2. Configured the cassandra-rackdc.properties (DC1, RAC1). 3. Rolling restart 4. Update replication strategy for each keyspace, for example: ALTER KEYSPACE keyspace WITH REPLICATION = {'class':'NetworkTopologyStrategy','DC1':2}; On DC2 nodes: 5. Edit the cassandra.yaml with: auto_bootstrap: false, seeds (one IP from DC1), cluster name to match whatever we have on DC1 nodes, correct IP settings, num_tokens, initial_token left unset and finally the snitch (GossipingPropertyFileSnitch, as in DC1). 6. Changed the cassandra-rackdc.properties (DC2, RAC1) On the Application: 7. Changed the C# DataStax driver load balancing policy to be DCAwareRoundRobinPolicy 8. Changed the application consistency level from QUORUM to LOCAL_QUORUM 9. After deleting the data, commitlog and saved_caches directory we started cassandra both nodes in the new DC, DC2. According to the logs at this point all nodes were able to see all other nodes with the correct/expected output when running nodetool status. On DC1 nodes: 10. After cassandra was running on DC2, we changed the Keyspace RF to include the new DC as follows: ALTER KEYSPACE keyspace WITH REPLICATION = {'class':'NetworkTopologyStrategy','DC1':2, 'DC2':2}; 11. As a last step and in order to stream the data across to the second DC, we run this on node1 of DC2: nodetool rebuild DC1. After the successful completion of this, we were planning to run the same on node2 of DC2. The problem is that the nodetool seems to be stuck, and nodetool netstats on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2 at DC1. This doesn't tally with nodetool netstats when running it against either of the DC1 nodes. The DC1 nodes don't think they stream anything to DC2. It is worth pointing that initially we tried to run 'nodetool rebuild DC1' on both nodes at DC2, given the small amount of data to be streamed in total (approximately 10G as I explained above). We exoerienced the same problem, with the only difference being that 'nodetool rebuild DC1' stuck on both nodes at DC2 very soon after running it, whereas now it happened only after running it for an hour or so. We thought the problem was that we tried to run nodetool against both nodes at the same time. So, we tried running it only against node 1 after we deleted all the data, commitlog and caches on both nodes and started from step (9) again. Now nodetool rebuild is running against node1 at DC2 for more than 12 hours with no luck... The weird thing is that the cassandra logs appear to be clean and the VPN between the two DCs has no problems at all. Any thoughts? Have we missed something in the steps I described? Is anything wrong in the procedure? Any help would be much appreciated. Thanks, Vasilis
A question about using 'update keyspace with strategyoptions' command
Hi, All, I want to run 'update keyspace with strategy_options={dc1:3, dc2:3}' from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: A question about using 'update keyspace with strategyoptions' command
Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
RE: A question about using 'update keyspace with strategyoptions' command
Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. From: Rahul Menon [mailto:ra...@apigee.com] Sent: 2014年8月5日 16:38 To: user Subject: Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: A question about using 'update keyspace with strategyoptions' command
Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. *From:* Rahul Menon [mailto:ra...@apigee.com] *Sent:* 2014年8月5日 16:38 *To:* user *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: Reasonable range for the max number of tables?
Hi Phil, In theory, the max number of column families would be in the low number of hundreds. In practice the limit is related the amount of heap you have, as each column family will consume 1 MB of heap due to arena allocation. To segregate customer data, you could: - Use customer specific column families under a single keyspace - Use a keyspace per customer - Use the same column families and have a column that identifies the customer. On the application layer ensure that there are sufficient checks so one customer can't read another customers data Mark On Tue, Aug 5, 2014 at 9:09 AM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: Is there any mention of this limitation anywhere in the Cassandra documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra' section of the DataStax 2.0 documentation or anywhere else. When starting out with Cassandra as a store for a multi-tenant application it seems very attractive to segregate data for each tenant using a tenant specific keyspace each with their own set of tables. It's not until you start browsing through forums such as this that you find out that it isn't going to scale above a few tenants. If you want to be able to segregate customer data in Cassandra is it the accepted practice to have multiple Cassandra installations? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: A question about using 'update keyspace with strategyoptions' command
Yes. Sorry for not say it clearly. What I want to know is “are the strategy changed ?’ after the ‘udpate keyspace with strategy_options…’ command returns successfully Not the data change. e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, dc2:3]’ , when this command returns, are the strategy options already changed? Or I need to wait some time for the strategy to be changed? From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: 2014年8月5日 16:59 To: user@cassandra.apache.org Subject: Re: A question about using 'update keyspace with strategyoptions' command Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. From: Rahul Menon [mailto:ra...@apigee.commailto:ra...@apigee.com] Sent: 2014年8月5日 16:38 To: user Subject: Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: A question about using 'update keyspace with strategyoptions' command
On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote: What I want to know is “are the *strategy* changed ?’ after the ‘udpate keyspace with strategy_options…’ command returns successfully Like all schema changes, not necessarily on all nodes. You will have to check for schema agreement between nodes. Not the *data* change. e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, dc2:3]’ , when this command returns, are the *strategy* options already changed? Or I need to wait some time for the strategy to be changed? *From:* Sylvain Lebresne [mailto:sylv...@datastax.com] *Sent:* 2014年8月5日 16:59 *To:* user@cassandra.apache.org *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. *From:* Rahul Menon [mailto:ra...@apigee.com] *Sent:* 2014年8月5日 16:38 *To:* user *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: A question about using 'update keyspace with strategyoptions' command
Try running describe cluster from Cassandra-CLI to see if all nodes have the same schema version. Rahul Neelakantan On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote: What I want to know is “are the strategy changed ?’ after the ‘udpate keyspace with strategy_options…’ command returns successfully Like all schema changes, not necessarily on all nodes. You will have to check for schema agreement between nodes. Not the data change. e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, dc2:3]’ , when this command returns, are the strategy options already changed? Or I need to wait some time for the strategy to be changed? From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: 2014年8月5日 16:59 To: user@cassandra.apache.org Subject: Re: A question about using 'update keyspace with strategyoptions' command Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. From: Rahul Menon [mailto:ra...@apigee.com] Sent: 2014年8月5日 16:38 To: user Subject: Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Make an existing cluster multi data-center compatible.
Hi all, I want to add a data-center to an existing single data-center cluster. First I have to make the existing cluster multi data-center compatible. The existing cluster is a 12 node cluster with: - Replication factor = 3 - Placement strategy = SimpleStrategy - Endpoint snitch = SimpleSnitch If I change the following: - Placement strategy = NetworkTopologyStrategy - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong to the same data-center and rack. Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. Thanks! Rene
Re: Make an existing cluster multi data-center compatible.
Yes, you must run a full repair for the reasons stated in the yaml file. Mark On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com wrote: Hi all, I want to add a data-center to an existing single data-center cluster. First I have to make the existing cluster multi data-center compatible. The existing cluster is a 12 node cluster with: - Replication factor = 3 - Placement strategy = SimpleStrategy - Endpoint snitch = SimpleSnitch If I change the following: - Placement strategy = NetworkTopologyStrategy - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong to the same data-center and rack. Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. Thanks! Rene
Re: Make an existing cluster multi data-center compatible.
What I understand is that SimpleStrategy determines the endpoints for replica's by traversing the ring clock-wise. NetworkTopologyStrategy determines the replica's by traversing the ring clock-wise and taking into account the racks and DC locations. Since the file used by PropertyFileSnitch puts all endpoints in the same data-center and rack, isn't the result of the endpoint selection basically the same? Thanks! Rene 2014-08-05 12:56 GMT+02:00 Mark Reddy mark.re...@boxever.com: Yes, you must run a full repair for the reasons stated in the yaml file. Mark On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com wrote: Hi all, I want to add a data-center to an existing single data-center cluster. First I have to make the existing cluster multi data-center compatible. The existing cluster is a 12 node cluster with: - Replication factor = 3 - Placement strategy = SimpleStrategy - Endpoint snitch = SimpleSnitch If I change the following: - Placement strategy = NetworkTopologyStrategy - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong to the same data-center and rack. Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. Thanks! Rene
Re: Reasonable range for the max number of tables?
Hi Mark, Mark Reddy wrote To segregate customer data, you could: - Use customer specific column families under a single keyspace - Use a keyspace per customer These effectively amount to the same thing and they both fall foul to the limit in the number of column families so do not scale. Mark Reddy wrote - Use the same column families and have a column that identifies the customer. On the application layer ensure that there are sufficient checks so one customer can't read another customers data And while this gets around the column family limit it does not allow the same level of data segregation. For example with a separate keyspace or column families it is trivial to remove a single customer's data or move that data to another system. With one set of column families for all customers these types of actions become much more difficult as any change impacts all customers but perhaps that's the price we have to pay to scale. And I still think this needs to be made more prominent in the documentation. Thanks Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Issue with ALLOW FILTERING
Hi, I'm having an issue with ALLOW FILTERING with Cassandra 2.0.8. See a minimal example here: https://gist.github.com/JensRantil/ec43622c26acb56e5bc9 I expect the second last to fail, but the last query to return a single row. In particular I expect the last SELECT to first select using the clustering primary id and then do filtering. I've been reading https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt ALLOW FILTERING and can't wrap my head around why this won't work. Could anyone clarify this for me? Thanks, Jens
Re: Reasonable range for the max number of tables?
Multi-tenant remain a challenge - for most technologies. Yes, you can do what you suggest, but... you need to exercise great care and test and provision your cluster with great care. It's not like a free resource that scales wildly in all directions with no forethought or care. It is something that does work, sort of, but it wasn't one of the design goals or core strengths of Cassandra. IOW, it was/is more of a side effect rather than a core pattern. Anti-pattern simply means that it is not guaranteed to be a full-fledged, first-class feature. It means you can do it, and if it works well for you for your particular use case, great, but don't complain too loudly here if it doesn't. That said, anybody who has great success - or great failure - with multi-tenant for Cassandra, or any other technology, should definitely share their experiences here. And the bottom line is that dozens or low hundreds remains the recommended limit for tables in a single Cassandra cluster. Not a hard limit, but just a recommendation. Multi-tenant is an area of great interest, so I suspect Cassandra - and all other technologies - will see a lot of evolution in the coming years in this area. -- Jack Krupansky -Original Message- From: Phil Luckhurst Sent: Tuesday, August 5, 2014 4:09 AM To: cassandra-u...@incubator.apache.org Subject: Re: Reasonable range for the max number of tables? Is there any mention of this limitation anywhere in the Cassandra documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra' section of the DataStax 2.0 documentation or anywhere else. When starting out with Cassandra as a store for a multi-tenant application it seems very attractive to segregate data for each tenant using a tenant specific keyspace each with their own set of tables. It's not until you start browsing through forums such as this that you find out that it isn't going to scale above a few tenants. If you want to be able to segregate customer data in Cassandra is it the accepted practice to have multiple Cassandra installations? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Reasonable range for the max number of tables?
- Use a keyspace per customer These effectively amount to the same thing and they both fall foul to the limit in the number of column families so do not scale. But then you can scale by moving some of the customers to a new cluster easily. If you keep everything in a single keyspace or - worse - if you do your multitenancy by prefixing row keys with customer ids of some kind, it won't be that easy, as you wrote later in your e-mail. M. Kind regards, Michał Michalski, michal.michal...@boxever.com On 5 August 2014 12:36, Phil Luckhurst phil.luckhu...@powerassure.com wrote: Hi Mark, Mark Reddy wrote To segregate customer data, you could: - Use customer specific column families under a single keyspace - Use a keyspace per customer These effectively amount to the same thing and they both fall foul to the limit in the number of column families so do not scale. Mark Reddy wrote - Use the same column families and have a column that identifies the customer. On the application layer ensure that there are sufficient checks so one customer can't read another customers data And while this gets around the column family limit it does not allow the same level of data segregation. For example with a separate keyspace or column families it is trivial to remove a single customer's data or move that data to another system. With one set of column families for all customers these types of actions become much more difficult as any change impacts all customers but perhaps that's the price we have to pay to scale. And I still think this needs to be made more prominent in the documentation. Thanks Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Fail to reconnect to other nodes after intermittent network failure
Hi, we experienced a strange problem after intermittent network failure when the affected node did not reconnect to the rest of the cluster but did allow to autenticate users (which was not possible during the actual network outage, see below). The cluster consists of 1 node in each of 3 datacenters, it uses C* 1.2.16 with SSL enabled both to clients and between C* nodes. The authentication is enabled as well. The problem started around 2014-08-01 when Cassandra first noticed a network problem: INFO [GossipTasks:1] 2014-08-01 07:47:52,618 Gossiper.java (line 823) InetAddress /77.234.44.20 is now DOWN INFO [GossipTasks:1] 2014-08-01 07:47:55,619 Gossiper.java (line 823) InetAddress mia10.ff.avast.com/77.234.42.20 is now DOWN The network came up for a while: INFO [GossipStage:1] 2014-08-01 07:51:29,380 Gossiper.java (line 809) InetAddress /77.234.42.20 is now UP INFO [HintedHandoff:1] 2014-08-01 07:51:29,381 HintedHandOffManager.java (line 296) Started hinted handoff for host: 9252f37c-1c9a-418b-a49f-6065511946e4 with IP: /77.234.42.20 INFO [GossipStage:1] 2014-08-01 07:51:29,381 Gossiper.java (line 809) InetAddress /77.234.44.20 is now UP INFO [HintedHandoff:2] 2014-08-01 07:51:29,385 HintedHandOffManager.java (line 296) Started hinted handoff for host: 97b1943a-3689-4e4a-a39d-d5a11c0cc309 with IP: /77.234.44.20 But it failed to send hints: INFO [HintedHandoff:1] 2014-08-01 07:51:39,389 HintedHandOffManager.java (line 427) Timed out replaying hints to /77.234.42.20; aborting (0 delivered) INFO [HintedHandoff:2] 2014-08-01 07:51:39,390 HintedHandOffManager.java (line 427) Timed out replaying hints to /77.234.44.20; aborting (0 delivered) Also, the log started to be flooded with failed autentication tries. My understanding is that authentication data are read with QUORUM which failed as the other two nodes were down: ERROR [Native-Transport-Requests:446116] 2014-08-01 07:51:39,985 QueryMessage.java (line 97) Unexpected error during query com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258) at com.google.common.cache.LocalCache.get(LocalCache.java:3990) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878) at org.apache.cassandra.service.ClientState.authorize(ClientState.java:292) at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172) at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165) at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149) at org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:116) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:87) at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:287) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256) at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84) at org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50) at org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:278) at org.apache.cassandra.service.ClientState$1.load(ClientState.java:275) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374) at
Re: data type is object when metric instrument using Gauge?
If you look at VisualVM metadata, it'll show that what's return is java.lang.Object which is different than Meters or Counters. Looking at the source for metrics-core, it seems that this is a feature of Gauges because unlike Meters or Counters, Gauges can be of various types -- long, double, etc. Cassandra source sets them up as longs, however the JMXReporter class in metrics-core always exposes them as Objects. On Mon, Aug 4, 2014 at 7:32 PM, Patricia Gorla patri...@thelastpickle.com wrote: Mike, What metrics reporter are you using? How are you attempting to access the metric? On Sat, Aug 2, 2014 at 7:30 AM, mike maomao...@gmail.com wrote: Dear All We are trying to monitor Cassandra using JMX. The monitoring tool we are using works fine for meters, However, if the metrcis are collected using gauge, the data type is object, then, our tool treat it as a string instead of a double. for example org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity The Type of Attribute (Value) is java.lang.Object is it possible to implement the datatype of gauge as numeric types instead of object, or other way around for example using metric reporter...etc? Thanks a lot for any suggestion! Best Regard! Mike -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com -- *Ken Hancock *| System Architect, Advanced Advertising SeaChange International 50 Nagog Park Acton, Massachusetts 01720 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC http://www.schange.com/en-US/Company/InvestorRelations.aspx Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn] http://www.linkedin.com/in/kenhancock [image: SeaChange International] http://www.schange.com/This e-mail and any attachments may contain information which is SeaChange International confidential. The information enclosed is intended only for the addressees herein and may not be copied or forwarded without permission from SeaChange International.
Re: Node bootstrap
Thanks Patricia for your response! On the new node, I just see a lot of the following: INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) Writing Memtable INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java (line 262) Compacted 12 sstables to so basically it is just busy flushing, and compacting. Would you have any ideas on why the 2x disk space blow up. My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. And the heaviest node is around 2.1 TB, and the new node is already at 4 TB. Could this be because compaction is falling behind? Ruchir On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com wrote: Ruchir, What exactly are you seeing in the logs? Are you running major compactions on the new bootstrapping node? With respect to the seed list, it is generally advisable to use 3 seed nodes per AZ / DC. Cheers, On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote: I am trying to bootstrap the thirteenth node in a 12 node cluster where the average data size per node is about 2.1 TB. The bootstrap streaming has been going on for 2 days now, and the disk size on the new node is already above 4 TB and still going. Is this because the new node is running major compactions while the streaming is going on? One thing that I noticed that seemed off was the seeds property in the yaml of the 13th node comprises of 1..12. Where as the seeds property on the existing 12 nodes consists of all the other nodes except the thirteenth node. Is this an issue? Any other insight is appreciated? Ruchir. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
RE: A question about using 'update keyspace with strategyoptions' command
Thanks a lot. So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade keyspace …’ command returns and I can use ’describe cluster’ to check if the change has taken effect on all nodes right? From: Rahul Neelakantan [mailto:ra...@rahul.be] Sent: 2014年8月5日 18:46 To: user@cassandra.apache.org Subject: Re: A question about using 'update keyspace with strategyoptions' command Try running describe cluster from Cassandra-CLI to see if all nodes have the same schema version. Rahul Neelakantan On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne sylv...@datastax.commailto:sylv...@datastax.com wrote: On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: What I want to know is “are the strategy changed ?’ after the ‘udpate keyspace with strategy_options…’ command returns successfully Like all schema changes, not necessarily on all nodes. You will have to check for schema agreement between nodes. Not the data change. e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, dc2:3]’ , when this command returns, are the strategy options already changed? Or I need to wait some time for the strategy to be changed? From: Sylvain Lebresne [mailto:sylv...@datastax.commailto:sylv...@datastax.com] Sent: 2014年8月5日 16:59 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Re: A question about using 'update keyspace with strategyoptions' command Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. From: Rahul Menon [mailto:ra...@apigee.commailto:ra...@apigee.com] Sent: 2014年8月5日 16:38 To: user Subject: Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.commailto:boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: Node bootstrap
Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote: My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. If you are using vnodes and you have num_tokens set to 256 the new node will take token ranges dynamically. What is the configuration of your other nodes, are you setting num_tokens or initial_token on those? Mark On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote: Thanks Patricia for your response! On the new node, I just see a lot of the following: INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) Writing Memtable INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java (line 262) Compacted 12 sstables to so basically it is just busy flushing, and compacting. Would you have any ideas on why the 2x disk space blow up. My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. And the heaviest node is around 2.1 TB, and the new node is already at 4 TB. Could this be because compaction is falling behind? Ruchir On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com wrote: Ruchir, What exactly are you seeing in the logs? Are you running major compactions on the new bootstrapping node? With respect to the seed list, it is generally advisable to use 3 seed nodes per AZ / DC. Cheers, On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote: I am trying to bootstrap the thirteenth node in a 12 node cluster where the average data size per node is about 2.1 TB. The bootstrap streaming has been going on for 2 days now, and the disk size on the new node is already above 4 TB and still going. Is this because the new node is running major compactions while the streaming is going on? One thing that I noticed that seemed off was the seeds property in the yaml of the 13th node comprises of 1..12. Where as the seeds property on the existing 12 nodes consists of all the other nodes except the thirteenth node. Is this an issue? Any other insight is appreciated? Ruchir. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Node bootstrap
Also not sure if this is relevant but just noticed the nodetool tpstats output: Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote: My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. If you are using vnodes and you have num_tokens set to 256 the new node will take token ranges dynamically. What is the configuration of your other nodes, are you setting num_tokens or initial_token on those? Mark On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote: Thanks Patricia for your response! On the new node, I just see a lot of the following: INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) Writing Memtable INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java (line 262) Compacted 12 sstables to so basically it is just busy flushing, and compacting. Would you have any ideas on why the 2x disk space blow up. My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. And the heaviest node is around 2.1 TB, and the new node is already at 4 TB. Could this be because compaction is falling behind? Ruchir On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com wrote: Ruchir, What exactly are you seeing in the logs? Are you running major compactions on the new bootstrapping node? With respect to the seed list, it is generally advisable to use 3 seed nodes per AZ / DC. Cheers, On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote: I am trying to bootstrap the thirteenth node in a 12 node cluster where the average data size per node is about 2.1 TB. The bootstrap streaming has been going on for 2 days now, and the disk size on the new node is already above 4 TB and still going. Is this because the new node is running major compactions while the streaming is going on? One thing that I noticed that seemed off was the seeds property in the yaml of the 13th node comprises of 1..12. Where as the seeds property on the existing 12 nodes consists of all the other nodes except the thirteenth node. Is this an issue? Any other insight is appreciated? Ruchir. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Node bootstrap
Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. Ok so you have num_tokens set to 256 for all nodes with initial_token commented out, this means you are using vnodes and the new node will automatically grab a list of tokens to take over responsibility for. Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. This is a problem as it indicates that the IO system cannot keep up. Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 This is normal as the new node will most likely take tokens from all nodes in the cluster. Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? Seeds are only used when a new node is bootstrapping into the cluster and needs a set of ips to contact and discover the cluster, so this would have no impact on data sizes or streaming. In general it would be considered best practice to have a set of 2-3 seeds from each data center, with all nodes having the same seed list. What is the current output of 'nodetool compactionstats'? Could you also paste the output of nodetool status keyspace? Mark On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote: Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote: Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 Seems like the new node is receiving data from 10 other nodes. Is that expected in a vnodes enabled environment? Ruchir. On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha ruchir@gmail.com wrote: Also not sure if this is relevant but just noticed the nodetool tpstats output: Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote: My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. If you are using vnodes and you have num_tokens set to 256 the new node will take token ranges dynamically. What is the configuration of your other nodes, are you setting num_tokens or initial_token on those? Mark On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote: Thanks Patricia for your response! On the new node, I just see a lot of the following: INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400) Writing Memtable INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java (line 262) Compacted 12 sstables to so basically it is just busy flushing, and compacting. Would you have any ideas on why the 2x disk space blow up. My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. And the heaviest node is around 2.1 TB, and the new node is already at 4 TB. Could this be because compaction is falling behind? Ruchir On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com wrote: Ruchir, What exactly are you seeing in the logs? Are you running major compactions on the new bootstrapping node? With respect to the seed list, it is generally advisable to use 3 seed nodes per AZ / DC. Cheers, On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote: I am trying to bootstrap the thirteenth node in a 12 node cluster where the average data size per node is about 2.1 TB. The bootstrap streaming has been going on for 2 days now, and the disk size on the new node is already above 4 TB and still going. Is this because the new node is running major compactions while the streaming is going on? One thing that I noticed that seemed off was the seeds property in the yaml of the 13th node comprises of 1..12. Where as the seeds property on the existing 12 nodes consists of all the other nodes except the thirteenth node. Is this an issue? Any other insight is appreciated? Ruchir. -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: vnode and NetworkTopologyStrategy: not playing well together ?
This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: vnode and NetworkTopologyStrategy: not playing well together ?
* When I say wild imbalance, I do not mean all tokens on 1 node in the cluster, I really should have said slightly imbalanced On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad j...@jonhaddad.com wrote: This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
RE: vnode and NetworkTopologyStrategy: not playing well together ?
First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: vnode and NetworkTopologyStrategy: not playing well together ?
If your nodes are not actually evenly distributed across physical racks for redundancy, don't use multiple racks. On Tue, Aug 5, 2014 at 10:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: vnode and NetworkTopologyStrategy: not playing well together ?
Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: A question about using 'update keyspace with strategyoptions' command
So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade keyspace …’ command returns and I can use ’describe cluster’ to check if the change has taken effect on all nodes right? Correct, the change may take time to propagate to all nodes. As Rahul said you can check describe cluster in cli to be sure. Mark On Tue, Aug 5, 2014 at 3:06 PM, Lu, Boying boying...@emc.com wrote: Thanks a lot. So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade keyspace …’ command returns and I can use ’describe cluster’ to check if the change has taken effect on all nodes right? *From:* Rahul Neelakantan [mailto:ra...@rahul.be] *Sent:* 2014年8月5日 18:46 *To:* user@cassandra.apache.org *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Try running describe cluster from Cassandra-CLI to see if all nodes have the same schema version. Rahul Neelakantan On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne sylv...@datastax.com wrote: On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote: What I want to know is “are the *strategy* changed ?’ after the ‘udpate keyspace with strategy_options…’ command returns successfully Like all schema changes, not necessarily on all nodes. You will have to check for schema agreement between nodes. Not the *data* change. e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, dc2:3]’ , when this command returns, are the *strategy* options already changed? Or I need to wait some time for the strategy to be changed? *From:* Sylvain Lebresne [mailto:sylv...@datastax.com] *Sent:* 2014年8月5日 16:59 *To:* user@cassandra.apache.org *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Changing the strategy options, and in particular the replication factor, does not perform any data replication by itself. You need to run a repair to ensure data is replicated following the new replication. On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote: Thanks. yes. I can use the ‘show keyspace’ command to check and see the strategy does changed. But what I want to know is if the ‘update keyspace with strategy_options …’ command is a ‘sync’ operation or a ‘async’ operation. *From:* Rahul Menon [mailto:ra...@apigee.com] *Sent:* 2014年8月5日 16:38 *To:* user *Subject:* Re: A question about using 'update keyspace with strategyoptions' command Try the show keyspaces command and look for Options under each keyspace. Thanks Rahul On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote: Hi, All, I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from cassandra-cli to update the strategy options of some keyspace in a multi-DC environment. When the command returns successfully, does it mean that the strategy options have been updated successfully or I need to wait some time for the change to be propagated to all DCs? Thanks Boying
Re: Node bootstrap
nodetool status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.10.20.27 1.89 TB256 25.4% 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 UN 10.10.20.62 1.83 TB256 25.5% 84b47313-da75-4519-94f3-3951d554a3e5 rack1 UN 10.10.20.47 1.87 TB256 24.7% bcd51a92-3150-41ae-9c51-104ea154f6fa rack1 UN 10.10.20.45 1.7 TB 256 22.6% 8d6bce33-8179-4660-8443-2cf822074ca4 rack1 UN 10.10.20.15 1.86 TB256 24.5% 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1 UN 10.10.20.31 1.87 TB256 24.9% 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1 UN 10.10.20.35 1.86 TB256 25.8% 17cb8772-2444-46ff-8525-33746514727d rack1 UN 10.10.20.51 1.89 TB256 25.0% 0343cd58-3686-465f-8280-56fb72d161e2 rack1 UN 10.10.20.19 1.91 TB256 25.5% 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1 UN 10.10.20.39 1.93 TB256 26.0% b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1 UN 10.10.20.52 1.81 TB256 25.4% 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1 UN 10.10.20.22 1.89 TB256 24.8% 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1 Note: The new node is not part of the above list. nodetool compactionstats: pending tasks: 1649 compaction typekeyspace column family completed total unit progress Compaction iprod customerorder 1682804084 17956558077 bytes 9.37% Compactionprodgatecustomerorder 1664239271 1693502275 bytes98.27% Compaction qa_config_bkupfixsessionconfig_hist 2443 27253 bytes 8.96% Compactionprodgatecustomerorder_hist 1770577280 5026699390 bytes35.22% Compaction iprodgatecustomerorder_hist 2959560205312350192622 bytes 0.95% On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. Ok so you have num_tokens set to 256 for all nodes with initial_token commented out, this means you are using vnodes and the new node will automatically grab a list of tokens to take over responsibility for. Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. This is a problem as it indicates that the IO system cannot keep up. Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 This is normal as the new node will most likely take tokens from all nodes in the cluster. Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? Seeds are only used when a new node is bootstrapping into the cluster and needs a set of ips to contact and discover the cluster, so this would have no impact on data sizes or streaming. In general it would be considered best practice to have a set of 2-3 seeds from each data center, with all nodes having the same seed list. What is the current output of 'nodetool compactionstats'? Could you also paste the output of nodetool status keyspace? Mark On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote: Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote: Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 Seems like the new node is receiving data from 10 other nodes. Is that expected in a vnodes enabled environment? Ruchir. On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha ruchir@gmail.com wrote: Also not sure if this is relevant but just noticed the nodetool tpstats output: Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote: My understanding was that if initial_token is left empty on the new node, it just contacts the heaviest node and bisects its token range. If you are using vnodes and you have num_tokens set to 256 the new node will take token ranges dynamically.
RE: vnode and NetworkTopologyStrategy: not playing well together ?
Jonathan wrote: Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. Storage/load distro = function(1st replica placement, other replica placement) vnode solves the balancing pb for 1st replica placement // so, yes, I agree with you, but for 1st replica placement only But NetworkTopologyStrategy (NTS) influences other (2+) replica placement = as NTS best behavior relies on token distro, and you have no control on tokens with vnodes, the best option I see with **vnode** is to use only one rack with NTS. Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 18:04 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec ture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Node bootstrap
Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers? Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014 _x86_64_(24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.80 10.250.654.590.00 78.72 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 103.83 9630.62 11982.60 3231174328 4020290310 dm-0 13.57 160.1781.12 53739546 27217432 dm-1 7.5916.9443.775682200 14686784 dm-2 5792.76 32242.66 45427.12 10817753530 15241278360 sdb 206.09 22789.19 33569.27 7646015080 11262843224 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote: nodetool status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.10.20.27 1.89 TB256 25.4% 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 UN 10.10.20.62 1.83 TB256 25.5% 84b47313-da75-4519-94f3-3951d554a3e5 rack1 UN 10.10.20.47 1.87 TB256 24.7% bcd51a92-3150-41ae-9c51-104ea154f6fa rack1 UN 10.10.20.45 1.7 TB 256 22.6% 8d6bce33-8179-4660-8443-2cf822074ca4 rack1 UN 10.10.20.15 1.86 TB256 24.5% 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1 UN 10.10.20.31 1.87 TB256 24.9% 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1 UN 10.10.20.35 1.86 TB256 25.8% 17cb8772-2444-46ff-8525-33746514727d rack1 UN 10.10.20.51 1.89 TB256 25.0% 0343cd58-3686-465f-8280-56fb72d161e2 rack1 UN 10.10.20.19 1.91 TB256 25.5% 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1 UN 10.10.20.39 1.93 TB256 26.0% b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1 UN 10.10.20.52 1.81 TB256 25.4% 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1 UN 10.10.20.22 1.89 TB256 24.8% 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1 Note: The new node is not part of the above list. nodetool compactionstats: pending tasks: 1649 compaction typekeyspace column family completed total unit progress Compaction iprod customerorder 1682804084 17956558077 bytes 9.37% Compactionprodgatecustomerorder 1664239271 1693502275 bytes98.27% Compaction qa_config_bkupfixsessionconfig_hist 2443 27253 bytes 8.96% Compactionprodgatecustomerorder_hist 1770577280 5026699390 bytes35.22% Compaction iprodgatecustomerorder_hist 2959560205312350192622 bytes 0.95% On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. Ok so you have num_tokens set to 256 for all nodes with initial_token commented out, this means you are using vnodes and the new node will automatically grab a list of tokens to take over responsibility for. Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. This is a problem as it indicates that the IO system cannot keep up. Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 This is normal as the new node will most likely take tokens from all nodes in the cluster. Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? Seeds are only used when a new node is bootstrapping into the cluster and needs a set of ips to contact and discover the cluster, so this would have no impact on data sizes or streaming. In general it would be considered best practice to have a set of 2-3 seeds from each data center, with all nodes having the same seed list. What is the current output of 'nodetool compactionstats'? Could you also paste the output of nodetool status keyspace? Mark On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote: Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote: Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 Seems like
Read timeouts with ALLOW FILTERING turned on
Hi all, Allow me to rephrase a question I asked last week. I am performing some queries with ALLOW FILTERING and getting consistent read timeouts like the following: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) These errors occur only during multi-row scans, and only during integration tests on our build server. I tried to see if I could replicate this error by reducing read_request_timeout_in_ms when I run Cassandra on my local machine (where I have not seen this error), but that is not working. Are there any other parameters that I need to adjust? I'd feel better if I could at least replicate this failure by reducing the read_request_timeout_in_ms (since doing so would mean I actually understand what is going wrong...). Best regards, Clint
Re: Read timeouts with ALLOW FILTERING turned on
On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote: Allow me to rephrase a question I asked last week. I am performing some queries with ALLOW FILTERING and getting consistent read timeouts like the following: ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly describe its typical performance. As a general statement, if you have to ALLOW FILTERING, you are probably Doing It Wrong in terms of schema design. A correctly operated cluster is unlikely to need to increase the default timeouts. If you find yourself needing to do so, you are, again, probably Doing It Wrong. =Rob
Re: Node bootstrap
Hi Ruchir, With the large number of blocked flushes and the number of pending compactions would still indicate IO contention. Can you post the output of 'iostat -x 5 5' If you do in fact have spare IO, there are several configuration options you can tune such as increasing the number of flush writers and compaction_throughput_mb_per_sec Mark On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote: Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers? Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014 _x86_64_(24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.80 10.250.654.590.00 78.72 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 103.83 9630.62 11982.60 3231174328 4020290310 dm-0 13.57 160.1781.12 53739546 27217432 dm-1 7.5916.9443.775682200 14686784 dm-2 5792.76 32242.66 45427.12 10817753530 15241278360 sdb 206.09 22789.19 33569.27 7646015080 11262843224 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote: nodetool status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.10.20.27 1.89 TB256 25.4% 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 UN 10.10.20.62 1.83 TB256 25.5% 84b47313-da75-4519-94f3-3951d554a3e5 rack1 UN 10.10.20.47 1.87 TB256 24.7% bcd51a92-3150-41ae-9c51-104ea154f6fa rack1 UN 10.10.20.45 1.7 TB 256 22.6% 8d6bce33-8179-4660-8443-2cf822074ca4 rack1 UN 10.10.20.15 1.86 TB256 24.5% 01a01f07-4df2-4c87-98e9-8dd38b3e4aee rack1 UN 10.10.20.31 1.87 TB256 24.9% 1435acf9-c64d-4bcd-b6a4-abcec209815e rack1 UN 10.10.20.35 1.86 TB256 25.8% 17cb8772-2444-46ff-8525-33746514727d rack1 UN 10.10.20.51 1.89 TB256 25.0% 0343cd58-3686-465f-8280-56fb72d161e2 rack1 UN 10.10.20.19 1.91 TB256 25.5% 30ddf003-4d59-4a3e-85fa-e94e4adba1cb rack1 UN 10.10.20.39 1.93 TB256 26.0% b7d44c26-4d75-4d36-a779-b7e7bdaecbc9 rack1 UN 10.10.20.52 1.81 TB256 25.4% 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e rack1 UN 10.10.20.22 1.89 TB256 24.8% 46af9664-8975-4c91-847f-3f7b8f8d5ce2 rack1 Note: The new node is not part of the above list. nodetool compactionstats: pending tasks: 1649 compaction typekeyspace column family completed total unit progress Compaction iprod customerorder 1682804084 17956558077 bytes 9.37% Compactionprodgatecustomerorder 1664239271 1693502275 bytes98.27% Compaction qa_config_bkupfixsessionconfig_hist 2443 27253 bytes 8.96% Compactionprodgatecustomerorder_hist 1770577280 5026699390 bytes35.22% Compaction iprodgatecustomerorder_hist 2959560205312350192622 bytes 0.95% On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com wrote: Yes num_tokens is set to 256. initial_token is blank on all nodes including the new one. Ok so you have num_tokens set to 256 for all nodes with initial_token commented out, this means you are using vnodes and the new node will automatically grab a list of tokens to take over responsibility for. Pool NameActive Pending Completed Blocked All time blocked FlushWriter 0 0 1136 0 512 Looks like about 50% of flushes are blocked. This is a problem as it indicates that the IO system cannot keep up. Just ran this on the new node: nodetool netstats | grep Streaming from | wc -l 10 This is normal as the new node will most likely take tokens from all nodes in the cluster. Sorry for the multiple updates, but another thing I found was all the other existing nodes have themselves in the seeds list, but the new node does not have itself in the seeds list. Can that cause this issue? Seeds are only used when a new node is bootstrapping into the cluster and needs a set of ips to contact and discover the cluster, so this would have no impact on data sizes or streaming. In general it would be considered best practice to have a set of 2-3 seeds from each data center, with all nodes having the same seed list. What is the current output of 'nodetool compactionstats'? Could you also paste the output of nodetool status keyspace? Mark On Tue, Aug 5, 2014 at
Re: Fail to reconnect to other nodes after intermittent network failure
On Tue, Aug 5, 2014 at 5:48 AM, Jiri Horky ho...@avast.com wrote: What puzzles me is the fact that the authentization apparently started to work after the network recovered but the exchange of data did not. I would like to understand what could caused the problems and how to avoid them in the future. Very few people use SSL and very few people use auth, you have probably hit an edge case. I would file a JIRA with the details you described above. =Rob
Re: Read timeouts with ALLOW FILTERING turned on
How much did you reduce *read_request_timeout_in_ms* on your local machine? Cassandra timeout during read query is higher than one machine because Cassandra server must run the read operation in more servers (so you have network traffic). 2014-08-05 14:54 GMT-03:00 Robert Coli rc...@eventbrite.com: On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote: Allow me to rephrase a question I asked last week. I am performing some queries with ALLOW FILTERING and getting consistent read timeouts like the following: ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly describe its typical performance. As a general statement, if you have to ALLOW FILTERING, you are probably Doing It Wrong in terms of schema design. A correctly operated cluster is unlikely to need to increase the default timeouts. If you find yourself needing to do so, you are, again, probably Doing It Wrong. =Rob -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software CUIA Internet Brasil
Re: Make an existing cluster multi data-center compatible.
On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com wrote: Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. As long as you correctly configure the new snitch so that the replica sets do not change, no, you do not need to repair. Barring that, if you manage to transform the replica set in such a way that you always have one (fully repaired) replica from the old set, repair will help. I do not recommend this very risky practice. In practice the only transformation of snitch in a cluster with data which is likely to be safe is one whose result is a NOOP in terms of replica placement. In fact, the yaml file is stating something unreasonable there, because repair cannot protect against this case : - 6 node cluster, A B C D E F, RF = 2 1) Start with SimpleSnitch so that A, B have the two replicas of row key X. 2) Write row key X, value Y, to nodes A and B. 2) Change to OtherSnitch so that now C,D are responsible for row key X. 3) Repair and notice that neither C nor D answer Y when asked for row X. =Rob
RE: vnode and NetworkTopologyStrategy: not playing well together ?
The discussion about racks NTS is also mentioned in this recent article : planetcassandra.org/multi-data-center-replication-in-nosql-databases/ The last section may be of interest for you Le 5 août 2014 18:14, DE VITO Dominique dominique.dev...@thalesgroup.com a écrit : Jonathan wrote: Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. Storage/load distro = function(1st replica placement, other replica placement) vnode solves the balancing pb for 1st replica placement // so, yes, I agree with you, but for 1st replica placement only But NetworkTopologyStrategy (NTS) influences other (2+) replica placement = as NTS best behavior relies on token distro, and you have no control on tokens with vnodes, the best option I see with **vnode** is to use only one rack with NTS. Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 18:04 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. You're going to be able to dream up all sorts of weird failure cases when you choose a scenario like RF=2 totally imbalanced network arch. Vnodes attempt to solve the problem of imbalanced rings by choosing so many tokens that it's improbable that the ring will be imbalanced. On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: First, thanks for your answer. This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. IMHO, it's not a good enough condition. Let's use an example with RF=2 N1/rack_1 N2/rack_1 N3/rack_1 N4/rack_2 Here, you have RF= # of racks And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, leading to a completely imbalanced cluster. IMHO, it happens when using nodes *or* vnodes. As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen token distribution/path along the ring *and* as tokens are randomly-generated with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's better to define a single (logical) rack // due to carefully chosen tokens vs randomly-generated token clash. I don't see other options left. Do you see other ones ? Regards, Dominique -Message d'origine- De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : user@cassandra.apache.org Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ? This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the rest of the machines. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec ture/architectureDataDistributeReplication_c.html On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, My understanding is that NetworkTopologyStrategy does NOT play well with vnodes, due to: · Vnode = tokens are (usually) randomly generated (AFAIK) · NetworkTopologyStrategy = required carefully choosen tokens for all nodes in order to not to get a VERY unbalanced ring like in https://issues.apache.org/jira/browse/CASSANDRA-3810 When playing with vnodes, is the recommendation to define one rack for the entire cluster ? Thanks. Regards, Dominique -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Issue with ALLOW FILTERING
You need to create an index on attribute *c.* 2014-08-05 9:24 GMT-03:00 Jens Rantil jens.ran...@tink.se: Hi, I'm having an issue with ALLOW FILTERING with Cassandra 2.0.8. See a minimal example here: https://gist.github.com/JensRantil/ec43622c26acb56e5bc9 I expect the second last to fail, but the last query to return a single row. In particular I expect the last SELECT to first select using the clustering primary id and then do filtering. I've been reading https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt ALLOW FILTERING and can't wrap my head around why this won't work. Could anyone clarify this for me? Thanks, Jens -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software CUIA Internet Brasil
Re: moving older tables from SSD to HDD?
Have you looked nodetool? http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html 2014-08-04 16:43 GMT-03:00 Kevin Burton bur...@spinn3r.com: Is it possible to take older tables, which are immutable, and move them from SSD to HDD? We lower the SLA on older data so keeping it on HDD is totally fine. MySQL can *sort* of do this… and I think that Cassandra could if it was handled properly. Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com -- Atenciosamente, Sávio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ciências da Computação - UFG Arquiteto de Software CUIA Internet Brasil
Re: Node stuck during nodetool rebuild
On Tue, Aug 5, 2014 at 1:28 AM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: The problem is that the nodetool seems to be stuck, and nodetool netstats on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2 at DC1. This doesn't tally with nodetool netstats when running it against either of the DC1 nodes. The DC1 nodes don't think they stream anything to DC2. Yes, streaming is fragile and breaks and hangs forever and your only option in most cases is to stop the rebuilding node, nuke its data, and start again. I believe you might be able to tune the phi detector threshold to help this operation complete, hopefully someone with direct experience of same will chime in. =Rob
Re: moving older tables from SSD to HDD?
Hi Kevin, This is something we do plan to support, but don't right now. You can see the discussion around this and related issues here https://issues.apache.org/jira/browse/CASSANDRA-5863 (although it may seem unrelated at first glance). On Mon, Aug 4, 2014 at 8:43 PM, Kevin Burton bur...@spinn3r.com wrote: Is it possible to take older tables, which are immutable, and move them from SSD to HDD? We lower the SLA on older data so keeping it on HDD is totally fine. MySQL can *sort* of do this… and I think that Cassandra could if it was handled properly. Kevin -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: Read timeouts with ALLOW FILTERING turned on
Hi Rob, Thanks for your feedback. I understand that use of ALLOW FILTERING is not a best practice. In this case, however, I am building a tool on top of Cassandra that allows users to sometimes do things that are less than optimal. When they try to do expensive queries like this, I'd rather provide a higher limit before timing out, but I can't seem to change the behavior of Cassandra by tweaking any of the parameters in the cassandra.yaml file or in the DataStax Java driver's Cluster object. FWIW these queries are also in batch jobs where we can tolerate the extra latency. Thanks for your help! Best regards, Clint On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote: Allow me to rephrase a question I asked last week. I am performing some queries with ALLOW FILTERING and getting consistent read timeouts like the following: ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly describe its typical performance. As a general statement, if you have to ALLOW FILTERING, you are probably Doing It Wrong in terms of schema design. A correctly operated cluster is unlikely to need to increase the default timeouts. If you find yourself needing to do so, you are, again, probably Doing It Wrong. =Rob
Re: Node stuck during nodetool rebuild
Hi Vasilis, To further on what Rob said I believe you might be able to tune the phi detector threshold to help this operation complete, hopefully someone with direct experience of same will chime in. I have been through this operation where streams break due to a node falsely being marked down (flapping). In an attempt to mitigate this I increase the phi_convict_threshold in cassandra.yaml from 8 to 10, after which the rebuild was able to successfully complete. The default value for phi_convict_threshold is 8 with 12 being the maximum recommended value. Mark On Tue, Aug 5, 2014 at 7:22 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 5, 2014 at 1:28 AM, Vasileios Vlachos vasileiosvlac...@gmail.com wrote: The problem is that the nodetool seems to be stuck, and nodetool netstats on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2 at DC1. This doesn't tally with nodetool netstats when running it against either of the DC1 nodes. The DC1 nodes don't think they stream anything to DC2. Yes, streaming is fragile and breaks and hangs forever and your only option in most cases is to stop the rebuilding node, nuke its data, and start again. I believe you might be able to tune the phi detector threshold to help this operation complete, hopefully someone with direct experience of same will chime in. =Rob
Re: Read timeouts with ALLOW FILTERING turned on
Ah FWIW I was able to reproduce the problem by reducing range_request_timeout_in_ms. This is great since I want to increase the timeout for batch jobs where we scan a large set of rows, but leave the timeout for single-row queries alone. Best regards, Clint On Tue, Aug 5, 2014 at 11:42 AM, Clint Kelly clint.ke...@gmail.com wrote: Hi Rob, Thanks for your feedback. I understand that use of ALLOW FILTERING is not a best practice. In this case, however, I am building a tool on top of Cassandra that allows users to sometimes do things that are less than optimal. When they try to do expensive queries like this, I'd rather provide a higher limit before timing out, but I can't seem to change the behavior of Cassandra by tweaking any of the parameters in the cassandra.yaml file or in the DataStax Java driver's Cluster object. FWIW these queries are also in batch jobs where we can tolerate the extra latency. Thanks for your help! Best regards, Clint On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote: Allow me to rephrase a question I asked last week. I am performing some queries with ALLOW FILTERING and getting consistent read timeouts like the following: ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly describe its typical performance. As a general statement, if you have to ALLOW FILTERING, you are probably Doing It Wrong in terms of schema design. A correctly operated cluster is unlikely to need to increase the default timeouts. If you find yourself needing to do so, you are, again, probably Doing It Wrong. =Rob
Re: Node bootstrap
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is set to 0, which I believe disables throttling. Also, Here is the iostat -x 5 5 output: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.00 1450.35 50.79 55.92 9775.97 12030.14 204.34 1.56 14.62 1.05 11.21 dm-0 0.00 0.003.59 18.82 166.52 150.3514.14 0.44 19.49 0.54 1.22 dm-1 0.00 0.002.325.3718.5642.98 8.00 0.76 98.82 0.43 0.33 dm-2 0.00 0.00 162.17 5836.66 32714.46 47040.8713.30 5.570.90 0.06 36.00 sdb 0.40 4251.90 106.72 107.35 23123.61 35204.09 272.46 4.43 20.68 1.29 27.64 avg-cpu: %user %nice %system %iowait %steal %idle 14.64 10.751.81 13.500.00 59.29 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 15.40 1344.60 68.80 145.60 4964.80 11790.4078.15 0.381.80 0.80 17.10 dm-0 0.00 0.00 43.00 1186.20 2292.80 9489.60 9.59 4.883.90 0.09 11.58 dm-1 0.00 0.001.600.0012.80 0.00 8.00 0.03 16.00 2.00 0.32 dm-2 0.00 0.00 197.20 17583.80 35152.00 140664.00 9.89 2847.50 109.52 0.05 93.50 sdb 13.20 16552.20 159.00 742.20 32745.60 129129.60 179.62 72.88 66.01 1.04 93.42 avg-cpu: %user %nice %system %iowait %steal %idle 15.51 19.771.975.020.00 57.73 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 16.20 523.40 60.00 285.00 5220.80 5913.6032.27 0.250.72 0.60 20.86 dm-0 0.00 0.000.801.4032.0011.2019.64 0.013.18 1.55 0.34 dm-1 0.00 0.001.600.0012.80 0.00 8.00 0.03 21.00 2.62 0.42 dm-2 0.00 0.00 339.40 5886.80 66219.20 47092.8018.20 251.66 184.72 0.10 63.48 sdb 1.00 5025.40 264.20 209.20 60992.00 50422.40 235.35 5.98 40.92 1.23 58.28 avg-cpu: %user %nice %system %iowait %steal %idle 16.59 16.342.039.010.00 56.04 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 5.40 320.00 37.40 159.80 2483.20 3529.6030.49 0.100.52 0.39 7.76 dm-0 0.00 0.000.203.60 1.6028.80 8.00 0.000.68 0.68 0.26 dm-1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-2 0.00 0.00 287.20 13108.20 53985.60 104864.00 11.86 869.18 48.82 0.06 76.96 sdb 5.20 12163.40 238.20 532.00 51235.20 93753.60 188.25 21.46 23.75 0.97 75.08 On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote: Hi Ruchir, With the large number of blocked flushes and the number of pending compactions would still indicate IO contention. Can you post the output of 'iostat -x 5 5' If you do in fact have spare IO, there are several configuration options you can tune such as increasing the number of flush writers and compaction_throughput_mb_per_sec Mark On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote: Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers? Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014 _x86_64_(24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.80 10.250.654.590.00 78.72 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 103.83 9630.62 11982.60 3231174328 4020290310 dm-0 13.57 160.1781.12 53739546 27217432 dm-1 7.5916.9443.775682200 14686784 dm-2 5792.76 32242.66 45427.12 10817753530 15241278360 sdb 206.09 22789.19 33569.27 7646015080 11262843224 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote: nodetool status: Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.10.20.27 1.89 TB256 25.4% 76023cdd-c42d-4068-8b53-ae94584b8b04 rack1 UN 10.10.20.62 1.83 TB256 25.5% 84b47313-da75-4519-94f3-3951d554a3e5 rack1 UN 10.10.20.47 1.87 TB256 24.7%
Re: Node bootstrap
Also, right now the top command shows that we are at 500-700% CPU, and we have 23 total processors, which means we have a lot of idle CPU left over, so throwing more threads at compaction and flush should alleviate the problem? On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote: Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is set to 0, which I believe disables throttling. Also, Here is the iostat -x 5 5 output: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.00 1450.35 50.79 55.92 9775.97 12030.14 204.34 1.56 14.62 1.05 11.21 dm-0 0.00 0.003.59 18.82 166.52 150.3514.14 0.44 19.49 0.54 1.22 dm-1 0.00 0.002.325.3718.5642.98 8.00 0.76 98.82 0.43 0.33 dm-2 0.00 0.00 162.17 5836.66 32714.46 47040.8713.30 5.570.90 0.06 36.00 sdb 0.40 4251.90 106.72 107.35 23123.61 35204.09 272.46 4.43 20.68 1.29 27.64 avg-cpu: %user %nice %system %iowait %steal %idle 14.64 10.751.81 13.500.00 59.29 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 15.40 1344.60 68.80 145.60 4964.80 11790.4078.15 0.381.80 0.80 17.10 dm-0 0.00 0.00 43.00 1186.20 2292.80 9489.60 9.59 4.883.90 0.09 11.58 dm-1 0.00 0.001.600.0012.80 0.00 8.00 0.03 16.00 2.00 0.32 dm-2 0.00 0.00 197.20 17583.80 35152.00 140664.00 9.89 2847.50 109.52 0.05 93.50 sdb 13.20 16552.20 159.00 742.20 32745.60 129129.60 179.6272.88 66.01 1.04 93.42 avg-cpu: %user %nice %system %iowait %steal %idle 15.51 19.771.975.020.00 57.73 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 16.20 523.40 60.00 285.00 5220.80 5913.6032.27 0.250.72 0.60 20.86 dm-0 0.00 0.000.801.4032.0011.2019.64 0.013.18 1.55 0.34 dm-1 0.00 0.001.600.0012.80 0.00 8.00 0.03 21.00 2.62 0.42 dm-2 0.00 0.00 339.40 5886.80 66219.20 47092.8018.20 251.66 184.72 0.10 63.48 sdb 1.00 5025.40 264.20 209.20 60992.00 50422.40 235.35 5.98 40.92 1.23 58.28 avg-cpu: %user %nice %system %iowait %steal %idle 16.59 16.342.039.010.00 56.04 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 5.40 320.00 37.40 159.80 2483.20 3529.6030.49 0.100.52 0.39 7.76 dm-0 0.00 0.000.203.60 1.6028.80 8.00 0.000.68 0.68 0.26 dm-1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 dm-2 0.00 0.00 287.20 13108.20 53985.60 104864.00 11.86 869.18 48.82 0.06 76.96 sdb 5.20 12163.40 238.20 532.00 51235.20 93753.60 188.25 21.46 23.75 0.97 75.08 On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote: Hi Ruchir, With the large number of blocked flushes and the number of pending compactions would still indicate IO contention. Can you post the output of 'iostat -x 5 5' If you do in fact have spare IO, there are several configuration options you can tune such as increasing the number of flush writers and compaction_throughput_mb_per_sec Mark On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote: Also Mark to your comment on my tpstats output, below is my iostat output, and the iowait is at 4.59%, which means no IO pressure, but we are still seeing the bad flush performance. Should we try increasing the flush writers? Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp) 08/05/2014 _x86_64_(24 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 5.80 10.250.654.590.00 78.72 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 103.83 9630.62 11982.60 3231174328 4020290310 dm-0 13.57 160.1781.12 53739546 27217432 dm-1 7.5916.9443.775682200 14686784 dm-2 5792.76 32242.66 45427.12 10817753530 15241278360 sdb 206.09 22789.19 33569.27 7646015080 11262843224 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote: nodetool status: Datacenter: datacenter1 === Status=Up/Down |/
Re: Fail to reconnect to other nodes after intermittent network failure
OK, ticket 7696 [1] created. Jiri Horky https://issues.apache.org/jira/browse/CASSANDRA-7696 On 08/05/2014 07:57 PM, Robert Coli wrote: On Tue, Aug 5, 2014 at 5:48 AM, Jiri Horky ho...@avast.com mailto:ho...@avast.com wrote: What puzzles me is the fact that the authentization apparently started to work after the network recovered but the exchange of data did not. I would like to understand what could caused the problems and how to avoid them in the future. Very few people use SSL and very few people use auth, you have probably hit an edge case. I would file a JIRA with the details you described above. =Rob
Re: Read timeouts with ALLOW FILTERING turned on
On Tue, Aug 5, 2014 at 11:53 AM, Clint Kelly clint.ke...@gmail.com wrote: Ah FWIW I was able to reproduce the problem by reducing range_request_timeout_in_ms. This is great since I want to increase the timeout for batch jobs where we scan a large set of rows, but leave the timeout for single-row queries alone. You have just explicated (a subset of) the reason the timeouts were broken out. https://issues.apache.org/jira/browse/CASSANDRA-2819 =Rob
Re: Make an existing cluster multi data-center compatible.
As long as you correctly configure the new snitch so that the replica sets do not change, no, you do not need to repair. Is the following correct: The replica sets do not change if you modify the snitch from SimpleSnitch to NetworkTopologyStrategy and the topology file puts all nodes in the same data-center and rack. Thanks again! Rene 2014-08-05 20:05 GMT+02:00 Robert Coli rc...@eventbrite.com: On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com wrote: Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. As long as you correctly configure the new snitch so that the replica sets do not change, no, you do not need to repair. Barring that, if you manage to transform the replica set in such a way that you always have one (fully repaired) replica from the old set, repair will help. I do not recommend this very risky practice. In practice the only transformation of snitch in a cluster with data which is likely to be safe is one whose result is a NOOP in terms of replica placement. In fact, the yaml file is stating something unreasonable there, because repair cannot protect against this case : - 6 node cluster, A B C D E F, RF = 2 1) Start with SimpleSnitch so that A, B have the two replicas of row key X. 2) Write row key X, value Y, to nodes A and B. 2) Change to OtherSnitch so that now C,D are responsible for row key X. 3) Repair and notice that neither C nor D answer Y when asked for row X. =Rob
Re: Make an existing cluster multi data-center compatible.
On Tue, Aug 5, 2014 at 2:27 PM, Rene Kochen rene.koc...@schange.com wrote: As long as you correctly configure the new snitch so that the replica sets do not change, no, you do not need to repair. Is the following correct: The replica sets do not change if you modify the snitch from SimpleSnitch to NetworkTopologyStrategy and the topology file puts all nodes in the same data-center and rack. Yes, you can use nodetool getendpoints to illustrate this programatically. 1) make a set of keys with a key from each range 2) getendpoints for this set of keys 3) change snitch 4) getendpoints again =Rob
Re: Make an existing cluster multi data-center compatible.
I think the RAC placement of these 12 nodes will become important. As the 12 nodes are placed in SimpleSnitch, which is not RAC aware, it would be good to retain them in single RAC in the property file snitch also initially. node repair is a safe option. If you need to change the RAC placement, my take would be to increase the Replication factor to atleast 3 and then distribute the nodes in different RAC. This is not an expert opinion but a newbie thought. Regards, Rameez On Tue, Aug 5, 2014 at 11:35 PM, Robert Coli rc...@eventbrite.com wrote: On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com wrote: Do I have to run full repairs after this change? Because the yaml file states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER, YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE PLACED. As long as you correctly configure the new snitch so that the replica sets do not change, no, you do not need to repair. Barring that, if you manage to transform the replica set in such a way that you always have one (fully repaired) replica from the old set, repair will help. I do not recommend this very risky practice. In practice the only transformation of snitch in a cluster with data which is likely to be safe is one whose result is a NOOP in terms of replica placement. In fact, the yaml file is stating something unreasonable there, because repair cannot protect against this case : - 6 node cluster, A B C D E F, RF = 2 1) Start with SimpleSnitch so that A, B have the two replicas of row key X. 2) Write row key X, value Y, to nodes A and B. 2) Change to OtherSnitch so that now C,D are responsible for row key X. 3) Repair and notice that neither C nor D answer Y when asked for row X. =Rob
Cassandra process exiting mysteriously
Hi everyone, For some integration tests, we start up a CassandraDaemon in a separate process (using the Java 7 ProcessBuilder API). All of my integration tests run beautifully on my laptop, but one of them fails on our Jenkins cluster. The failing integration test does around 10k writes to different rows and then 10k reads. After running some number of reads, the job dies with this error: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.10:58209 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) This error appears to have occurred because the Cassandra process has stopped. The logs for the Cassandra process show some warnings during batch writes (the batches are too big), no activity for a few minutes (I assume this is because all of the read operations were proceeding smoothly), and then look like the following: INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903 ThriftServer.java (line 141) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java (line 182) Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930 MessagingService.java (line 683) Waiting for messaging service to quiesce INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931 MessagingService.java (line 923) MessagingService has terminated the accept() thread Does anyone have any ideas about how to debug this? Looking around on google I found some threads suggesting that this could occur from an OOM error (http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors). Wouldn't such an error be logged, however? The test that fails is a test of our MapReduce Hadoop InputFormat and as such it does some pretty big queries across multiple rows (over a range of partitioning key tokens). The default fetch size I believe is 5000 rows, and the values in the rows I am fetching are just simple strings, so I would not think the amount of data in a single read would be too big. FWIW I don't see any log messages about garbage collection for at least 3min before the process shuts down (and no GC messages after the test stops doing writes and starts doing reads). I'd greatly appreciate any help before my team kills me for breaking our Jenkins build so consistently! :) Best regards, Clint
Re: Cassandra process exiting mysteriously
If there is an oom it will be in the logs. On Aug 5, 2014 8:17 PM, Clint Kelly clint.ke...@gmail.com wrote: Hi everyone, For some integration tests, we start up a CassandraDaemon in a separate process (using the Java 7 ProcessBuilder API). All of my integration tests run beautifully on my laptop, but one of them fails on our Jenkins cluster. The failing integration test does around 10k writes to different rows and then 10k reads. After running some number of reads, the job dies with this error: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.10:58209 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) This error appears to have occurred because the Cassandra process has stopped. The logs for the Cassandra process show some warnings during batch writes (the batches are too big), no activity for a few minutes (I assume this is because all of the read operations were proceeding smoothly), and then look like the following: INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903 ThriftServer.java (line 141) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java (line 182) Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930 MessagingService.java (line 683) Waiting for messaging service to quiesce INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931 MessagingService.java (line 923) MessagingService has terminated the accept() thread Does anyone have any ideas about how to debug this? Looking around on google I found some threads suggesting that this could occur from an OOM error ( http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors ). Wouldn't such an error be logged, however? The test that fails is a test of our MapReduce Hadoop InputFormat and as such it does some pretty big queries across multiple rows (over a range of partitioning key tokens). The default fetch size I believe is 5000 rows, and the values in the rows I am fetching are just simple strings, so I would not think the amount of data in a single read would be too big. FWIW I don't see any log messages about garbage collection for at least 3min before the process shuts down (and no GC messages after the test stops doing writes and starts doing reads). I'd greatly appreciate any help before my team kills me for breaking our Jenkins build so consistently! :) Best regards, Clint
Re: Cassandra process exiting mysteriously
HI Kevin, Thanks for your reply. That is what I assumed, but some of the posts I read on Stack Overflow (e.g., the one that I referenced in my mail) suggested otherwise. I was just curious if others had experienced OOM problems that weren't logged or if there were other common culprits. Best regards, Clint On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton bur...@spinn3r.com wrote: If there is an oom it will be in the logs. On Aug 5, 2014 8:17 PM, Clint Kelly clint.ke...@gmail.com wrote: Hi everyone, For some integration tests, we start up a CassandraDaemon in a separate process (using the Java 7 ProcessBuilder API). All of my integration tests run beautifully on my laptop, but one of them fails on our Jenkins cluster. The failing integration test does around 10k writes to different rows and then 10k reads. After running some number of reads, the job dies with this error: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.10:58209 (com.datastax.driver.core.exceptions.DriverException: Timeout during read)) This error appears to have occurred because the Cassandra process has stopped. The logs for the Cassandra process show some warnings during batch writes (the batches are too big), no activity for a few minutes (I assume this is because all of the read operations were proceeding smoothly), and then look like the following: INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903 ThriftServer.java (line 141) Stop listening to thrift clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java (line 182) Stop listening for CQL clients INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930 Gossiper.java (line 1279) Announcing shutdown INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930 MessagingService.java (line 683) Waiting for messaging service to quiesce INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931 MessagingService.java (line 923) MessagingService has terminated the accept() thread Does anyone have any ideas about how to debug this? Looking around on google I found some threads suggesting that this could occur from an OOM error (http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors). Wouldn't such an error be logged, however? The test that fails is a test of our MapReduce Hadoop InputFormat and as such it does some pretty big queries across multiple rows (over a range of partitioning key tokens). The default fetch size I believe is 5000 rows, and the values in the rows I am fetching are just simple strings, so I would not think the amount of data in a single read would be too big. FWIW I don't see any log messages about garbage collection for at least 3min before the process shuts down (and no GC messages after the test stops doing writes and starts doing reads). I'd greatly appreciate any help before my team kills me for breaking our Jenkins build so consistently! :) Best regards, Clint