Re: Reasonable range for the max number of tables?

2014-08-05 Thread Phil Luckhurst
Is there any mention of this limitation anywhere in the Cassandra
documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra'
section of the DataStax 2.0 documentation or anywhere else.

When starting out with Cassandra as a store for a multi-tenant application
it seems very attractive to segregate data for each tenant using a tenant
specific keyspace each with their own set of tables. It's not until you
start browsing through forums such as this that you find out that it isn't
going to scale above a few tenants.

If you want to be able to segregate customer data in Cassandra is it the
accepted practice to have multiple Cassandra installations?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Node stuck during nodetool rebuild

2014-08-05 Thread Vasileios Vlachos
Hello All,

We are on 1.2.18 (running on Ubuntu 12.04) and we recently tried to add a
second DC on our demo environment, just before trying it on live. The
existing DC1 has two nodes which approximately hold 10G of data (RF=2). In
order to add the second DC, DC2, we followed this procedure:

On DC1 nodes:
1. Changed the Snitch in the cassandra.yaml from default to
GossipingPropertyFileSnitch.
2. Configured the cassandra-rackdc.properties (DC1, RAC1).
3. Rolling restart
4. Update replication strategy for each keyspace, for example: ALTER
KEYSPACE keyspace WITH REPLICATION =
{'class':'NetworkTopologyStrategy','DC1':2};

On DC2 nodes:
5. Edit the cassandra.yaml with: auto_bootstrap: false, seeds (one IP from
DC1), cluster name to match whatever we have on DC1 nodes, correct IP
settings, num_tokens, initial_token left unset and finally the snitch
(GossipingPropertyFileSnitch, as in DC1).
6. Changed the cassandra-rackdc.properties (DC2, RAC1)

On the Application:
7. Changed the C# DataStax driver load balancing policy to be
DCAwareRoundRobinPolicy
8. Changed the application consistency level from QUORUM to LOCAL_QUORUM
9. After deleting the data, commitlog and saved_caches directory we started
cassandra both nodes in the new DC, DC2. According to the logs at this
point all nodes were able to see all other nodes with the correct/expected
output when running nodetool status.

On DC1 nodes:
10. After cassandra was running on DC2, we changed the Keyspace RF to
include the new DC as follows:  ALTER KEYSPACE keyspace WITH REPLICATION
= {'class':'NetworkTopologyStrategy','DC1':2, 'DC2':2};
11. As a last step and in order to stream the data across to the second DC,
we run this on node1 of DC2: nodetool rebuild DC1. After the successful
completion of this, we were planning to run the same on node2 of DC2.

The problem is that the nodetool seems to be stuck, and nodetool netstats
on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2
at DC1. This doesn't tally with nodetool netstats when running it against
either of the DC1 nodes. The DC1 nodes don't think they stream anything to
DC2.

It is worth pointing that initially we tried to run 'nodetool rebuild DC1'
on both nodes at DC2, given the small amount of data to be streamed in
total (approximately 10G as I explained above). We exoerienced the same
problem, with the only difference being that 'nodetool rebuild DC1' stuck
on both nodes at DC2 very soon after running it, whereas now it happened
only after running it for an hour or so. We thought the problem was that we
tried to run nodetool against both nodes at the same time. So, we tried
running it only against node 1 after we deleted all the data, commitlog and
caches on both nodes and started from step (9) again. Now nodetool rebuild
is running against node1 at DC2 for more than 12 hours with no luck... The
weird thing is that the cassandra logs appear to be clean and the VPN
between the two DCs has no problems at all.

Any thoughts? Have we missed something in the steps I described? Is
anything wrong in the procedure? Any help would be much appreciated.

Thanks,

Vasilis


A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Lu, Boying
Hi, All,

I want to run 'update keyspace with strategy_options={dc1:3, dc2:3}' from 
cassandra-cli to update the strategy options of some keyspace
in a multi-DC environment.

When the command returns successfully, does it mean that the strategy options 
have been updated successfully or I need to wait
some time for the change to be propagated  to all DCs?

Thanks

Boying



Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Rahul Menon
Try the show keyspaces command and look for Options under each keyspace.

Thanks
Rahul


On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying





RE: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Lu, Boying
Thanks. yes. I can use the ‘show keyspace’ command to check and see the 
strategy does changed.

But what I want to know is if the ‘update keyspace with strategy_options …’ 
command is
a ‘sync’ operation or a ‘async’ operation.



From: Rahul Menon [mailto:ra...@apigee.com]
Sent: 2014年8月5日 16:38
To: user
Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Try the show keyspaces command and look for Options under each keyspace.

Thanks
Rahul

On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Hi, All,

I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from 
cassandra-cli to update the strategy options of some keyspace
in a multi-DC environment.

When the command returns successfully, does it mean that the strategy options 
have been updated successfully or I need to wait
some time for the change to be propagated  to all DCs?

Thanks

Boying




Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Sylvain Lebresne
Changing the strategy options, and in particular the replication factor,
does not perform any data replication by itself. You need to run a repair
to ensure data is replicated following the new replication.


On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:

 Thanks. yes. I can use the ‘show keyspace’ command to check and see the
 strategy does changed.



 But what I want to know is if the ‘update keyspace with strategy_options
 …’ command is

 a ‘sync’ operation or a ‘async’ operation.







 *From:* Rahul Menon [mailto:ra...@apigee.com]
 *Sent:* 2014年8月5日 16:38
 *To:* user
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try the show keyspaces command and look for Options under each keyspace.



 Thanks

 Rahul



 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying







Re: Reasonable range for the max number of tables?

2014-08-05 Thread Mark Reddy
Hi Phil,

In theory, the max number of column families would be in the low number of
hundreds. In practice the limit is related the amount of heap you have, as
each column family will consume 1 MB of heap due to arena allocation.

To segregate customer data, you could:
- Use customer specific column families under a single keyspace
- Use a keyspace per customer
- Use the same column families and have a column that identifies the
customer. On the application layer ensure that there are sufficient checks
so one customer can't read another customers data


Mark


On Tue, Aug 5, 2014 at 9:09 AM, Phil Luckhurst 
phil.luckhu...@powerassure.com wrote:

 Is there any mention of this limitation anywhere in the Cassandra
 documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra'
 section of the DataStax 2.0 documentation or anywhere else.

 When starting out with Cassandra as a store for a multi-tenant application
 it seems very attractive to segregate data for each tenant using a tenant
 specific keyspace each with their own set of tables. It's not until you
 start browsing through forums such as this that you find out that it isn't
 going to scale above a few tenants.

 If you want to be able to segregate customer data in Cassandra is it the
 accepted practice to have multiple Cassandra installations?



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



RE: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Lu, Boying
Yes.

Sorry for not say it clearly.

What I want to know is “are the strategy changed ?’ after the ‘udpate keyspace 
with strategy_options…’ command returns successfully
Not the data change.

e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, 
dc2:3]’ , when this command returns,
are the strategy options already changed? Or I need to wait some time for the 
strategy to be changed?


From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: 2014年8月5日 16:59
To: user@cassandra.apache.org
Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Changing the strategy options, and in particular the replication factor, does 
not perform any data replication by itself. You need to run a repair to ensure 
data is replicated following the new replication.

On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Thanks. yes. I can use the ‘show keyspace’ command to check and see the 
strategy does changed.

But what I want to know is if the ‘update keyspace with strategy_options …’ 
command is
a ‘sync’ operation or a ‘async’ operation.



From: Rahul Menon [mailto:ra...@apigee.commailto:ra...@apigee.com]
Sent: 2014年8月5日 16:38
To: user
Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Try the show keyspaces command and look for Options under each keyspace.

Thanks
Rahul

On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Hi, All,

I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from 
cassandra-cli to update the strategy options of some keyspace
in a multi-DC environment.

When the command returns successfully, does it mean that the strategy options 
have been updated successfully or I need to wait
some time for the change to be propagated  to all DCs?

Thanks

Boying





Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Sylvain Lebresne
On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote:

 What I want to know is “are the *strategy* changed ?’ after the ‘udpate
 keyspace with strategy_options…’ command returns successfully


Like all schema changes, not necessarily on all nodes. You will have to
check for schema agreement between nodes.



 Not the *data* change.



 e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3,
 dc2:3]’ , when this command returns,

 are the *strategy* options already changed? Or I need to wait some time
 for the strategy to be changed?





 *From:* Sylvain Lebresne [mailto:sylv...@datastax.com]
 *Sent:* 2014年8月5日 16:59
 *To:* user@cassandra.apache.org

 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Changing the strategy options, and in particular the replication factor,
 does not perform any data replication by itself. You need to run a repair
 to ensure data is replicated following the new replication.



 On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:

 Thanks. yes. I can use the ‘show keyspace’ command to check and see the
 strategy does changed.



 But what I want to know is if the ‘update keyspace with strategy_options
 …’ command is

 a ‘sync’ operation or a ‘async’ operation.







 *From:* Rahul Menon [mailto:ra...@apigee.com]
 *Sent:* 2014年8月5日 16:38
 *To:* user
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try the show keyspaces command and look for Options under each keyspace.



 Thanks

 Rahul



 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying









Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Rahul Neelakantan
Try running describe cluster from Cassandra-CLI to see if all nodes have the 
same schema version.

Rahul Neelakantan

 On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne sylv...@datastax.com wrote:
 
 On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote: 
 What I want to know is “are the strategy changed ?’ after the ‘udpate 
 keyspace with strategy_options…’ command returns successfully
 
 
 Like all schema changes, not necessarily on all nodes. You will have to check 
 for schema agreement between nodes.
 
  
 Not the data change.
 
  
 
 e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, 
 dc2:3]’ , when this command returns,
 
 are the strategy options already changed? Or I need to wait some time for 
 the strategy to be changed?
 
  
 
  
 
 From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
 Sent: 2014年8月5日 16:59
 To: user@cassandra.apache.org
 
 
 Subject: Re: A question about using 'update keyspace with strategyoptions' 
 command
  
 
 Changing the strategy options, and in particular the replication factor, 
 does not perform any data replication by itself. You need to run a repair to 
 ensure data is replicated following the new replication.
 
  
 
 On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:
 
 Thanks. yes. I can use the ‘show keyspace’ command to check and see the 
 strategy does changed.
 
  
 
 But what I want to know is if the ‘update keyspace with strategy_options …’ 
 command is
 
 a ‘sync’ operation or a ‘async’ operation.
 
  
 
  
 
  
 
 From: Rahul Menon [mailto:ra...@apigee.com] 
 Sent: 2014年8月5日 16:38
 To: user
 Subject: Re: A question about using 'update keyspace with strategyoptions' 
 command
 
  
 
 Try the show keyspaces command and look for Options under each keyspace. 
 
  
 
 Thanks
 
 Rahul
 
  
 
 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:
 
 Hi, All,
 
  
 
 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from 
 cassandra-cli to update the strategy options of some keyspace
 
 in a multi-DC environment.
 
  
 
 When the command returns successfully, does it mean that the strategy 
 options have been updated successfully or I need to wait
 
 some time for the change to be propagated  to all DCs?
 
  
 
 Thanks
 
  
 
 Boying
 
 


Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
Hi all,

I want to add a data-center to an existing single data-center cluster.
First I have to make the existing cluster multi data-center compatible.

The existing cluster is a 12 node cluster with:
- Replication factor = 3
- Placement strategy = SimpleStrategy
- Endpoint snitch = SimpleSnitch

If I change the following:
- Placement strategy = NetworkTopologyStrategy
- Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong
to the same data-center and rack.

Do I have to run full repairs after this change? Because the yaml file
states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
PLACED.

Thanks!

Rene


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Mark Reddy
Yes, you must run a full repair for the reasons stated in the yaml file.


Mark


On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com
wrote:

 Hi all,

 I want to add a data-center to an existing single data-center cluster.
 First I have to make the existing cluster multi data-center compatible.

 The existing cluster is a 12 node cluster with:
 - Replication factor = 3
 - Placement strategy = SimpleStrategy
 - Endpoint snitch = SimpleSnitch

 If I change the following:
 - Placement strategy = NetworkTopologyStrategy
 - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file belong
 to the same data-center and rack.

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.

 Thanks!

 Rene





Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
What I understand is that SimpleStrategy determines the endpoints for
replica's by traversing the ring clock-wise.

NetworkTopologyStrategy determines the replica's by traversing the ring
clock-wise and taking into account the racks and DC locations.

Since the file used by PropertyFileSnitch puts all endpoints in the same
data-center and rack, isn't the result of the endpoint selection basically
the same?

Thanks!

Rene


2014-08-05 12:56 GMT+02:00 Mark Reddy mark.re...@boxever.com:

 Yes, you must run a full repair for the reasons stated in the yaml file.


 Mark


 On Tue, Aug 5, 2014 at 11:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Hi all,

 I want to add a data-center to an existing single data-center cluster.
 First I have to make the existing cluster multi data-center compatible.

 The existing cluster is a 12 node cluster with:
 - Replication factor = 3
 - Placement strategy = SimpleStrategy
 - Endpoint snitch = SimpleSnitch

 If I change the following:
 - Placement strategy = NetworkTopologyStrategy
 - Endpoint snitch = PropertyFileSnitch - all 12 nodes in this file
 belong to the same data-center and rack.

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.

 Thanks!

 Rene






Re: Reasonable range for the max number of tables?

2014-08-05 Thread Phil Luckhurst
Hi Mark,

Mark Reddy wrote
 To segregate customer data, you could:
 - Use customer specific column families under a single keyspace
 - Use a keyspace per customer

These effectively amount to the same thing and they both fall foul to the
limit in the number of column families so do not scale.


Mark Reddy wrote
 - Use the same column families and have a column that identifies the
 customer. On the application layer ensure that there are sufficient checks
 so one customer can't read another customers data

And while this gets around the column family limit it does not allow the
same level of data segregation. For example with a separate keyspace or
column families it is trivial to remove a single customer's data or move
that data to another system. With one set of column families for all
customers these types of actions become much more difficult as any change
impacts all customers but perhaps that's the price we have to pay to scale.

And I still think this needs to be made more prominent in the documentation.

Thanks
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Issue with ALLOW FILTERING

2014-08-05 Thread Jens Rantil
Hi,

I'm having an issue with ALLOW FILTERING with Cassandra 2.0.8. See a
minimal example here:
https://gist.github.com/JensRantil/ec43622c26acb56e5bc9

I expect the second last to fail, but the last query to return a single
row. In particular I expect the last SELECT to first select using the
clustering primary id and then do filtering.

I've been reading https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt
ALLOW FILTERING and can't wrap my head around why this won't work.

Could anyone clarify this for me?

Thanks,
Jens


Re: Reasonable range for the max number of tables?

2014-08-05 Thread Jack Krupansky
Multi-tenant remain a challenge - for most technologies. Yes, you can do 
what you suggest, but... you need to exercise great care and test and 
provision your cluster with great care. It's not like a free resource that 
scales wildly in all directions with no forethought or care.


It is something that does work, sort of, but it wasn't one of the design 
goals or core strengths of Cassandra. IOW, it was/is more of a side effect 
rather than a core pattern. Anti-pattern simply means that it is not 
guaranteed to be a full-fledged, first-class feature. It means you can do 
it, and if it works well for you for your particular use case, great, but 
don't complain too loudly here if it doesn't.


That said, anybody who has great success - or great failure - with 
multi-tenant for Cassandra, or any other technology, should definitely share 
their experiences here.


And the bottom line is that dozens or low hundreds remains the 
recommended limit for tables in a single Cassandra cluster. Not a hard 
limit, but just a recommendation.


Multi-tenant is an area of great interest, so I suspect Cassandra - and all 
other technologies - will see a lot of evolution in the coming years in this 
area.


-- Jack Krupansky

-Original Message- 
From: Phil Luckhurst

Sent: Tuesday, August 5, 2014 4:09 AM
To: cassandra-u...@incubator.apache.org
Subject: Re: Reasonable range for the max number of tables?

Is there any mention of this limitation anywhere in the Cassandra
documentation? I don't see it mentioned in the 'Anti-patterns in Cassandra'
section of the DataStax 2.0 documentation or anywhere else.

When starting out with Cassandra as a store for a multi-tenant application
it seems very attractive to segregate data for each tenant using a tenant
specific keyspace each with their own set of tables. It's not until you
start browsing through forums such as this that you find out that it isn't
going to scale above a few tenants.

If you want to be able to segregate customer data in Cassandra is it the
accepted practice to have multiple Cassandra installations?



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596106.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com. 



Re: Reasonable range for the max number of tables?

2014-08-05 Thread Michal Michalski
 - Use a keyspace per customer
 These effectively amount to the same thing and they both fall foul to the
 limit in the number of column families so do not scale.

But then you can scale by moving some of the customers to a new cluster
easily. If you keep everything in a single keyspace or - worse - if you do
your multitenancy by prefixing row keys with customer ids of some kind, it
won't be that easy, as you wrote later in your e-mail.

M.



Kind regards,
Michał Michalski,
michal.michal...@boxever.com


On 5 August 2014 12:36, Phil Luckhurst phil.luckhu...@powerassure.com
wrote:

 Hi Mark,

 Mark Reddy wrote
  To segregate customer data, you could:
  - Use customer specific column families under a single keyspace
  - Use a keyspace per customer

 These effectively amount to the same thing and they both fall foul to the
 limit in the number of column families so do not scale.


 Mark Reddy wrote
  - Use the same column families and have a column that identifies the
  customer. On the application layer ensure that there are sufficient
 checks
  so one customer can't read another customers data

 And while this gets around the column family limit it does not allow the
 same level of data segregation. For example with a separate keyspace or
 column families it is trivial to remove a single customer's data or move
 that data to another system. With one set of column families for all
 customers these types of actions become much more difficult as any change
 impacts all customers but perhaps that's the price we have to pay to scale.

 And I still think this needs to be made more prominent in the
 documentation.

 Thanks
 Phil



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Reasonable-range-for-the-max-number-of-tables-tp7596094p7596119.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Fail to reconnect to other nodes after intermittent network failure

2014-08-05 Thread Jiri Horky
Hi,

we experienced a strange problem after intermittent network failure when
the affected node did not reconnect to the rest of the cluster but did
allow to autenticate users (which was not possible during the actual
network outage, see below). The cluster consists of 1 node in each of 3
datacenters, it uses C* 1.2.16 with SSL enabled both to clients and
between C* nodes. The authentication is enabled as well.

The problem started around 2014-08-01 when Cassandra first noticed a
network problem:

 INFO [GossipTasks:1] 2014-08-01 07:47:52,618 Gossiper.java (line 823)
InetAddress /77.234.44.20 is now DOWN
 INFO [GossipTasks:1] 2014-08-01 07:47:55,619 Gossiper.java (line 823)
InetAddress mia10.ff.avast.com/77.234.42.20 is now DOWN

The network came up for a while:

INFO [GossipStage:1] 2014-08-01 07:51:29,380 Gossiper.java (line 809)
InetAddress /77.234.42.20 is now UP
 INFO [HintedHandoff:1] 2014-08-01 07:51:29,381
HintedHandOffManager.java (line 296) Started hinted handoff for host:
9252f37c-1c9a-418b-a49f-6065511946e4 with IP: /77.234.42.20
 INFO [GossipStage:1] 2014-08-01 07:51:29,381 Gossiper.java (line 809)
InetAddress /77.234.44.20 is now UP
 INFO [HintedHandoff:2] 2014-08-01 07:51:29,385
HintedHandOffManager.java (line 296) Started hinted handoff for host:
97b1943a-3689-4e4a-a39d-d5a11c0cc309 with IP: /77.234.44.20

But it failed to send hints:


 INFO [HintedHandoff:1] 2014-08-01 07:51:39,389
HintedHandOffManager.java (line 427) Timed out replaying hints to
/77.234.42.20; aborting (0 delivered)
 INFO [HintedHandoff:2] 2014-08-01 07:51:39,390
HintedHandOffManager.java (line 427) Timed out replaying hints to
/77.234.44.20; aborting (0 delivered)

Also, the log started to be flooded with failed autentication tries.
My understanding is that authentication data are read with QUORUM which
failed as the other two nodes were down:


ERROR [Native-Transport-Requests:446116] 2014-08-01 07:51:39,985
QueryMessage.java (line 97) Unexpected error during query
com.google.common.util.concurrent.UncheckedExecutionException:
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
out - received only 0 responses.
at
com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2258)
at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
at
com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
at
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
at
org.apache.cassandra.service.ClientState.authorize(ClientState.java:292)
at
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:172)
at
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:165)
at
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:149)
at
org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:116)
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:102)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:113)
at
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:87)
at
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:287)
at
org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
at
org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
out - received only 0 responses.
at org.apache.cassandra.auth.Auth.selectUser(Auth.java:256)
at org.apache.cassandra.auth.Auth.isSuperuser(Auth.java:84)
at
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:50)
at
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:68)
at
org.apache.cassandra.service.ClientState$1.load(ClientState.java:278)
at
org.apache.cassandra.service.ClientState$1.load(ClientState.java:275)
at
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
at
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
at

Re: data type is object when metric instrument using Gauge?

2014-08-05 Thread Ken Hancock
If you look at VisualVM metadata, it'll show that what's return is
java.lang.Object which is different than Meters or Counters.

Looking at the source for metrics-core, it seems that this is a feature
of Gauges because unlike Meters or Counters, Gauges can be of various types
-- long, double, etc.  Cassandra source sets them up as longs, however the
JMXReporter class in metrics-core always exposes them as Objects.




On Mon, Aug 4, 2014 at 7:32 PM, Patricia Gorla patri...@thelastpickle.com
wrote:

 Mike,

 What metrics reporter are you using? How are you attempting to access the
 metric?



 On Sat, Aug 2, 2014 at 7:30 AM, mike maomao...@gmail.com wrote:

 Dear All

   We are trying to monitor Cassandra using JMX. The monitoring tool we
 are using works fine for meters, However, if the metrcis are collected
 using gauge, the data type is object, then, our tool treat it as a string
 instead of a double. for example

 org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity

 The Type of Attribute (Value) is java.lang.Object

 is it possible to implement the datatype of gauge as numeric types
 instead of object, or other way around for example using metric
 reporter...etc?

 Thanks a lot for any suggestion!

 Best Regard!
   Mike






 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com




-- 
*Ken Hancock *| System Architect, Advanced Advertising
SeaChange International
50 Nagog Park
Acton, Massachusetts 01720
ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
http://www.schange.com/en-US/Company/InvestorRelations.aspx
Office: +1 (978) 889-3329 | [image: Google Talk:]
ken.hanc...@schange.com | [image:
Skype:]hancockks | [image: Yahoo IM:]hancockks [image: LinkedIn]
http://www.linkedin.com/in/kenhancock

[image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
information which is SeaChange International confidential. The information
enclosed is intended only for the addressees herein and may not be copied
or forwarded without permission from SeaChange International.


Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Thanks Patricia for your response!

On the new node, I just see a lot of the following:

INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
Writing Memtable
INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
(line 262) Compacted 12 sstables to

so basically it is just busy flushing, and compacting. Would you have any
ideas on why the 2x disk space blow up. My understanding was that if
initial_token is left empty on the new node, it just contacts the heaviest
node and bisects its token range. And the heaviest node is around 2.1 TB,
and the new node is already at 4 TB. Could this be because compaction is
falling behind?

Ruchir


On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla patri...@thelastpickle.com
wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major compactions
 on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster where
 the average data size per node is about 2.1 TB. The bootstrap streaming has
 been going on for 2 days now, and the disk size on the new node is already
 above 4 TB and still going. Is this because the new node is running major
 compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com



RE: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Lu, Boying
Thanks a lot.

So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade 
keyspace …’ command returns and I can use ’describe cluster’ to check if
the change has taken effect on all nodes right?

From: Rahul Neelakantan [mailto:ra...@rahul.be]
Sent: 2014年8月5日 18:46
To: user@cassandra.apache.org
Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Try running describe cluster from Cassandra-CLI to see if all nodes have the 
same schema version.
Rahul Neelakantan

On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne 
sylv...@datastax.commailto:sylv...@datastax.com wrote:
On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
What I want to know is “are the strategy changed ?’ after the ‘udpate keyspace 
with strategy_options…’ command returns successfully

Like all schema changes, not necessarily on all nodes. You will have to check 
for schema agreement between nodes.


Not the data change.

e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3, 
dc2:3]’ , when this command returns,
are the strategy options already changed? Or I need to wait some time for the 
strategy to be changed?


From: Sylvain Lebresne 
[mailto:sylv...@datastax.commailto:sylv...@datastax.com]
Sent: 2014年8月5日 16:59
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org

Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Changing the strategy options, and in particular the replication factor, does 
not perform any data replication by itself. You need to run a repair to ensure 
data is replicated following the new replication.

On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Thanks. yes. I can use the ‘show keyspace’ command to check and see the 
strategy does changed.

But what I want to know is if the ‘update keyspace with strategy_options …’ 
command is
a ‘sync’ operation or a ‘async’ operation.



From: Rahul Menon [mailto:ra...@apigee.commailto:ra...@apigee.com]
Sent: 2014年8月5日 16:38
To: user
Subject: Re: A question about using 'update keyspace with strategyoptions' 
command

Try the show keyspaces command and look for Options under each keyspace.

Thanks
Rahul

On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying 
boying...@emc.commailto:boying...@emc.com wrote:
Hi, All,

I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from 
cassandra-cli to update the strategy options of some keyspace
in a multi-DC environment.

When the command returns successfully, does it mean that the strategy options 
have been updated successfully or I need to wait
some time for the change to be propagated  to all DCs?

Thanks

Boying






Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Yes num_tokens is set to 256. initial_token is blank on all nodes including
the new one.


On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com wrote:

 My understanding was that if initial_token is left empty on the new node,
 it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new node
 will take token ranges dynamically. What is the configuration of your other
 nodes, are you setting num_tokens or initial_token on those?


 Mark


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:

 Thanks Patricia for your response!

 On the new node, I just see a lot of the following:

 INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
 Writing Memtable
 INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
 (line 262) Compacted 12 sstables to

 so basically it is just busy flushing, and compacting. Would you have any
 ideas on why the 2x disk space blow up. My understanding was that if
 initial_token is left empty on the new node, it just contacts the heaviest
 node and bisects its token range. And the heaviest node is around 2.1 TB,
 and the new node is already at 4 TB. Could this be because compaction is
 falling behind?

 Ruchir


 On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major
 compactions on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster where
 the average data size per node is about 2.1 TB. The bootstrap streaming has
 been going on for 2 days now, and the disk size on the new node is already
 above 4 TB and still going. Is this because the new node is running major
 compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com






Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also not sure if this is relevant but just noticed the nodetool tpstats
output:

Pool NameActive   Pending  Completed   Blocked  All
time blocked
FlushWriter   0 0   1136 0
  512

Looks like about 50% of flushes are blocked.


On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 My understanding was that if initial_token is left empty on the new node,
 it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new node
 will take token ranges dynamically. What is the configuration of your other
 nodes, are you setting num_tokens or initial_token on those?


 Mark


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:

 Thanks Patricia for your response!

 On the new node, I just see a lot of the following:

 INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line 400)
 Writing Memtable
 INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132 CompactionTask.java
 (line 262) Compacted 12 sstables to

 so basically it is just busy flushing, and compacting. Would you have
 any ideas on why the 2x disk space blow up. My understanding was that if
 initial_token is left empty on the new node, it just contacts the heaviest
 node and bisects its token range. And the heaviest node is around 2.1 TB,
 and the new node is already at 4 TB. Could this be because compaction is
 falling behind?

 Ruchir


 On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major
 compactions on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3 seed
 nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster
 where the average data size per node is about 2.1 TB. The bootstrap
 streaming has been going on for 2 days now, and the disk size on the new
 node is already above 4 TB and still going. Is this because the new node 
 is
 running major compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in the
 yaml of the 13th node comprises of 1..12. Where as the seeds property on
 the existing 12 nodes consists of all the other nodes except the 
 thirteenth
 node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com







Re: Node bootstrap

2014-08-05 Thread Mark Reddy

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


Ok so you have num_tokens set to 256 for all nodes with initial_token
commented out, this means you are using vnodes and the new node will
automatically grab a list of tokens to take over responsibility for.

Pool NameActive   Pending  Completed   Blocked  All
 time blocked
 FlushWriter   0 0   1136 0
   512

 Looks like about 50% of flushes are blocked.


This is a problem as it indicates that the IO system cannot keep up.

Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


This is normal as the new node will most likely take tokens from all nodes
in the cluster.

Sorry for the multiple updates, but another thing I found was all the other
 existing nodes have themselves in the seeds list, but the new node does not
 have itself in the seeds list. Can that cause this issue?


Seeds are only used when a new node is bootstrapping into the cluster and
needs a set of ips to contact and discover the cluster, so this would have
no impact on data sizes or streaming. In general it would be considered
best practice to have a set of 2-3 seeds from each data center, with all
nodes having the same seed list.


What is the current output of 'nodetool compactionstats'? Could you also
paste the output of nodetool status keyspace?

Mark



On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote:

 Just ran this on the new node:

 nodetool netstats | grep Streaming from | wc -l
 10

 Seems like the new node is receiving data from 10 other nodes. Is that
 expected in a vnodes enabled environment?

 Ruchir.



 On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha ruchir@gmail.com wrote:

 Also not sure if this is relevant but just noticed the nodetool tpstats
 output:

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
 512

 Looks like about 50% of flushes are blocked.


 On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 My understanding was that if initial_token is left empty on the new
 node, it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new
 node will take token ranges dynamically. What is the configuration of your
 other nodes, are you setting num_tokens or initial_token on those?


 Mark


 On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com
 wrote:

 Thanks Patricia for your response!

 On the new node, I just see a lot of the following:

 INFO [FlushWriter:75] 2014-08-05 09:53:04,394 Memtable.java (line
 400) Writing Memtable
 INFO [CompactionExecutor:3] 2014-08-05 09:53:11,132
 CompactionTask.java (line 262) Compacted 12 sstables to

 so basically it is just busy flushing, and compacting. Would you have
 any ideas on why the 2x disk space blow up. My understanding was that if
 initial_token is left empty on the new node, it just contacts the 
 heaviest
 node and bisects its token range. And the heaviest node is around 2.1 TB,
 and the new node is already at 4 TB. Could this be because compaction is
 falling behind?

 Ruchir


 On Mon, Aug 4, 2014 at 7:23 PM, Patricia Gorla 
 patri...@thelastpickle.com wrote:

 Ruchir,

 What exactly are you seeing in the logs? Are you running major
 compactions on the new bootstrapping node?

 With respect to the seed list, it is generally advisable to use 3
 seed nodes per AZ / DC.

 Cheers,


 On Mon, Aug 4, 2014 at 11:41 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 I am trying to bootstrap the thirteenth node in a 12 node cluster
 where the average data size per node is about 2.1 TB. The bootstrap
 streaming has been going on for 2 days now, and the disk size on the 
 new
 node is already above 4 TB and still going. Is this because the new 
 node is
 running major compactions while the streaming is going on?

 One thing that I noticed that seemed off was the seeds property in
 the yaml of the 13th node comprises of 1..12. Where as the seeds 
 property
 on the existing 12 nodes consists of all the other nodes except the
 thirteenth node. Is this an issue?

 Any other insight is appreciated?

 Ruchir.





 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com










Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
you've got RF= # of racks.  For each token, replicas are chosen based
on the strategy.  Essentially, you could have a wild imbalance in
token ownership, but it wouldn't matter because the replicas would be
distributed across the rest of the machines.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique
dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well with
 vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for the
 entire cluster ?



 Thanks.



 Regards,

 Dominique







-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
* When I say wild imbalance, I do not mean all tokens on 1 node in the
cluster, I really should have said slightly imbalanced

On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad j...@jonhaddad.com wrote:
 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.  For each token, replicas are chosen based
 on the strategy.  Essentially, you could have a wild imbalance in
 token ownership, but it wouldn't matter because the replicas would be
 distributed across the rest of the machines.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique
 dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well with
 vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for the
 entire cluster ?



 Thanks.



 Regards,

 Dominique







 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


RE: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread DE VITO Dominique
First, thanks for your answer.

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
 got RF= # of racks.  

IMHO, it's not a good enough condition.
Let's use an example with RF=2

N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2

Here, you have RF= # of racks
And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, 
leading to a completely imbalanced cluster.

IMHO, it happens when using nodes *or* vnodes.

As well-balanced clusters with NetworkTopologyStrategy rely on carefully chosen 
token distribution/path along the ring *and* as tokens are randomly-generated 
with vnodes, my guess is that with vnodes and NetworkTopologyStrategy, it's 
better to define a single (logical) rack // due to carefully chosen tokens vs 
randomly-generated token clash.

I don't see other options left.
Do you see other ones ?

Regards,
Dominique




-Message d'origine-
De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de 
Jonathan Haddad
Envoyé : mardi 5 août 2014 17:43
À : user@cassandra.apache.org
Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
got RF= # of racks.  For each token, replicas are chosen based on the strategy. 
 Essentially, you could have a wild imbalance in token ownership, but it 
wouldn't matter because the replicas would be distributed across the rest of 
the machines.

http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well 
 with vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for 
 the entire cluster ?



 Thanks.



 Regards,

 Dominique







--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jeremy Jongsma
If your nodes are not actually evenly distributed across physical racks for
redundancy, don't use multiple racks.


On Tue, Aug 5, 2014 at 10:57 AM, DE VITO Dominique 
dominique.dev...@thalesgroup.com wrote:

 First, thanks for your answer.

  This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.

 IMHO, it's not a good enough condition.
 Let's use an example with RF=2

 N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2

 Here, you have RF= # of racks
 And due to NetworkTopologyStrategy, N4 will store *all* the cluster data,
 leading to a completely imbalanced cluster.

 IMHO, it happens when using nodes *or* vnodes.

 As well-balanced clusters with NetworkTopologyStrategy rely on carefully
 chosen token distribution/path along the ring *and* as tokens are
 randomly-generated with vnodes, my guess is that with vnodes and
 NetworkTopologyStrategy, it's better to define a single (logical) rack //
 due to carefully chosen tokens vs randomly-generated token clash.

 I don't see other options left.
 Do you see other ones ?

 Regards,
 Dominique




 -Message d'origine-
 De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la
 part de Jonathan Haddad
 Envoyé : mardi 5 août 2014 17:43
 À : user@cassandra.apache.org
 Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.  For each token, replicas are chosen based on
 the strategy.  Essentially, you could have a wild imbalance in token
 ownership, but it wouldn't matter because the replicas would be distributed
 across the rest of the machines.


 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
  Hi,
 
 
 
  My understanding is that NetworkTopologyStrategy does NOT play well
  with vnodes, due to:
 
  · Vnode = tokens are (usually) randomly generated (AFAIK)
 
  · NetworkTopologyStrategy = required carefully choosen tokens
 for
  all nodes in order to not to get a VERY unbalanced ring like in
  https://issues.apache.org/jira/browse/CASSANDRA-3810
 
 
 
  When playing with vnodes, is the recommendation to define one rack for
  the entire cluster ?
 
 
 
  Thanks.
 
 
 
  Regards,
 
  Dominique
 
 
 
 



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
Yes, if you have only 1 machine in a rack then your cluster will be
imbalanced.  You're going to be able to dream up all sorts of weird
failure cases when you choose a scenario like RF=2  totally
imbalanced network arch.

Vnodes attempt to solve the problem of imbalanced rings by choosing so
many tokens that it's improbable that the ring will be imbalanced.



On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique
dominique.dev...@thalesgroup.com wrote:
 First, thanks for your answer.

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
 got RF= # of racks.

 IMHO, it's not a good enough condition.
 Let's use an example with RF=2

 N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2

 Here, you have RF= # of racks
 And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, 
 leading to a completely imbalanced cluster.

 IMHO, it happens when using nodes *or* vnodes.

 As well-balanced clusters with NetworkTopologyStrategy rely on carefully 
 chosen token distribution/path along the ring *and* as tokens are 
 randomly-generated with vnodes, my guess is that with vnodes and 
 NetworkTopologyStrategy, it's better to define a single (logical) rack // due 
 to carefully chosen tokens vs randomly-generated token clash.

 I don't see other options left.
 Do you see other ones ?

 Regards,
 Dominique




 -Message d'origine-
 De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part 
 de Jonathan Haddad
 Envoyé : mardi 5 août 2014 17:43
 À : user@cassandra.apache.org
 Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
 got RF= # of racks.  For each token, replicas are chosen based on the 
 strategy.  Essentially, you could have a wild imbalance in token ownership, 
 but it wouldn't matter because the replicas would be distributed across the 
 rest of the machines.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well
 with vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack for
 the entire cluster ?



 Thanks.



 Regards,

 Dominique







 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: A question about using 'update keyspace with strategyoptions' command

2014-08-05 Thread Mark Reddy

 So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade
 keyspace …’ command returns and I can use ’describe cluster’ to check if
 the change has taken effect on all nodes right?


Correct, the change may take time to propagate to all nodes. As Rahul said
you can check describe cluster in cli to be sure.


Mark


On Tue, Aug 5, 2014 at 3:06 PM, Lu, Boying boying...@emc.com wrote:

 Thanks a lot.



 So the ‘strategy’ change may not be seen by all nodes when the ‘upgrade
 keyspace …’ command returns and I can use ’describe cluster’ to check if

 the change has taken effect on all nodes right?



 *From:* Rahul Neelakantan [mailto:ra...@rahul.be]
 *Sent:* 2014年8月5日 18:46
 *To:* user@cassandra.apache.org
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try running describe cluster from Cassandra-CLI to see if all nodes have
 the same schema version.

 Rahul Neelakantan


 On Aug 5, 2014, at 6:13 AM, Sylvain Lebresne sylv...@datastax.com wrote:

 On Tue, Aug 5, 2014 at 11:40 AM, Lu, Boying boying...@emc.com wrote:

 What I want to know is “are the *strategy* changed ?’ after the ‘udpate
 keyspace with strategy_options…’ command returns successfully



 Like all schema changes, not necessarily on all nodes. You will have to
 check for schema agreement between nodes.




 Not the *data* change.



 e.g. say I run the command ‘update keyspace with strategy_opitons [dc1: 3,
 dc2:3]’ , when this command returns,

 are the *strategy* options already changed? Or I need to wait some time
 for the strategy to be changed?





 *From:* Sylvain Lebresne [mailto:sylv...@datastax.com]
 *Sent:* 2014年8月5日 16:59
 *To:* user@cassandra.apache.org


 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Changing the strategy options, and in particular the replication factor,
 does not perform any data replication by itself. You need to run a repair
 to ensure data is replicated following the new replication.



 On Tue, Aug 5, 2014 at 10:52 AM, Lu, Boying boying...@emc.com wrote:

 Thanks. yes. I can use the ‘show keyspace’ command to check and see the
 strategy does changed.



 But what I want to know is if the ‘update keyspace with strategy_options
 …’ command is

 a ‘sync’ operation or a ‘async’ operation.







 *From:* Rahul Menon [mailto:ra...@apigee.com]
 *Sent:* 2014年8月5日 16:38
 *To:* user
 *Subject:* Re: A question about using 'update keyspace with
 strategyoptions' command



 Try the show keyspaces command and look for Options under each keyspace.



 Thanks

 Rahul



 On Tue, Aug 5, 2014 at 2:01 PM, Lu, Boying boying...@emc.com wrote:

 Hi, All,



 I want to run ‘update keyspace with strategy_options={dc1:3, dc2:3}’ from
 cassandra-cli to update the strategy options of some keyspace

 in a multi-DC environment.



 When the command returns successfully, does it mean that the strategy
 options have been updated successfully or I need to wait

 some time for the change to be propagated  to all DCs?



 Thanks



 Boying












Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
nodetool status:

Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  Owns (effective)  Host ID
Rack
UN  10.10.20.27  1.89 TB256 25.4%
76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
UN  10.10.20.62  1.83 TB256 25.5%
84b47313-da75-4519-94f3-3951d554a3e5  rack1
UN  10.10.20.47  1.87 TB256 24.7%
bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
UN  10.10.20.45  1.7 TB 256 22.6%
8d6bce33-8179-4660-8443-2cf822074ca4  rack1
UN  10.10.20.15  1.86 TB256 24.5%
01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
UN  10.10.20.31  1.87 TB256 24.9%
1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
UN  10.10.20.35  1.86 TB256 25.8%
17cb8772-2444-46ff-8525-33746514727d  rack1
UN  10.10.20.51  1.89 TB256 25.0%
0343cd58-3686-465f-8280-56fb72d161e2  rack1
UN  10.10.20.19  1.91 TB256 25.5%
30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
UN  10.10.20.39  1.93 TB256 26.0%
b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
UN  10.10.20.52  1.81 TB256 25.4%
6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
UN  10.10.20.22  1.89 TB256 24.8%
46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


Note: The new node is not part of the above list.

nodetool compactionstats:

pending tasks: 1649
  compaction typekeyspace   column family   completed
total  unit  progress
   Compaction   iprod   customerorder  1682804084
  17956558077 bytes 9.37%
   Compactionprodgatecustomerorder  1664239271
 1693502275 bytes98.27%
   Compaction  qa_config_bkupfixsessionconfig_hist
 2443   27253 bytes 8.96%
   Compactionprodgatecustomerorder_hist
 1770577280  5026699390 bytes35.22%
   Compaction   iprodgatecustomerorder_hist
 2959560205312350192622 bytes 0.95%




On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 Ok so you have num_tokens set to 256 for all nodes with initial_token
 commented out, this means you are using vnodes and the new node will
 automatically grab a list of tokens to take over responsibility for.

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
 512

 Looks like about 50% of flushes are blocked.


 This is a problem as it indicates that the IO system cannot keep up.

 Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


 This is normal as the new node will most likely take tokens from all nodes
 in the cluster.

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 Seeds are only used when a new node is bootstrapping into the cluster and
 needs a set of ips to contact and discover the cluster, so this would have
 no impact on data sizes or streaming. In general it would be considered
 best practice to have a set of 2-3 seeds from each data center, with all
 nodes having the same seed list.


 What is the current output of 'nodetool compactionstats'? Could you also
 paste the output of nodetool status keyspace?

 Mark



 On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com wrote:

 Just ran this on the new node:

 nodetool netstats | grep Streaming from | wc -l
 10

 Seems like the new node is receiving data from 10 other nodes. Is that
 expected in a vnodes enabled environment?

 Ruchir.



 On Tue, Aug 5, 2014 at 10:21 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Also not sure if this is relevant but just noticed the nodetool tpstats
 output:

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
   512

 Looks like about 50% of flushes are blocked.


 On Tue, Aug 5, 2014 at 10:14 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 On Tue, Aug 5, 2014 at 10:03 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 My understanding was that if initial_token is left empty on the new
 node, it just contacts the heaviest node and bisects its token range.


 If you are using vnodes and you have num_tokens set to 256 the new
 node will take token ranges dynamically. 

RE: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread DE VITO Dominique
 Jonathan wrote:

 Yes, if you have only 1 machine in a rack then your cluster will be 
 imbalanced.  You're going to be able to dream up all sorts of weird failure 
 cases when you choose a scenario like RF=2  totally imbalanced network arch.
 
 Vnodes attempt to solve the problem of imbalanced rings by choosing so many 
 tokens that it's improbable that the ring will be imbalanced.

Storage/load distro = function(1st replica placement, other replica placement)

vnode solves the balancing pb for 1st replica placement // so, yes, I agree 
with you, but for 1st replica placement only

But NetworkTopologyStrategy (NTS) influences other (2+) replica placement = as 
NTS best behavior relies on token distro, and you have no control on tokens 
with vnodes, the best option I see with **vnode** is to use only one rack with 
NTS.

Dominique


-Message d'origine-
De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la part de 
Jonathan Haddad
Envoyé : mardi 5 août 2014 18:04
À : user@cassandra.apache.org
Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

Yes, if you have only 1 machine in a rack then your cluster will be imbalanced. 
 You're going to be able to dream up all sorts of weird failure cases when you 
choose a scenario like RF=2  totally imbalanced network arch.

Vnodes attempt to solve the problem of imbalanced rings by choosing so many 
tokens that it's improbable that the ring will be imbalanced.



On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique 
dominique.dev...@thalesgroup.com wrote:
 First, thanks for your answer.

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
 got RF= # of racks.

 IMHO, it's not a good enough condition.
 Let's use an example with RF=2

 N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2

 Here, you have RF= # of racks
 And due to NetworkTopologyStrategy, N4 will store *all* the cluster data, 
 leading to a completely imbalanced cluster.

 IMHO, it happens when using nodes *or* vnodes.

 As well-balanced clusters with NetworkTopologyStrategy rely on carefully 
 chosen token distribution/path along the ring *and* as tokens are 
 randomly-generated with vnodes, my guess is that with vnodes and 
 NetworkTopologyStrategy, it's better to define a single (logical) rack // due 
 to carefully chosen tokens vs randomly-generated token clash.

 I don't see other options left.
 Do you see other ones ?

 Regards,
 Dominique




 -Message d'origine-
 De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De 
 la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À : 
 user@cassandra.apache.org Objet : Re: vnode and 
 NetworkTopologyStrategy: not playing well together ?

 This is incorrect.  Network Topology w/ Vnodes will be fine, assuming you've 
 got RF= # of racks.  For each token, replicas are chosen based on the 
 strategy.  Essentially, you could have a wild imbalance in token ownership, 
 but it wouldn't matter because the replicas would be distributed across the 
 rest of the machines.

 http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec
 ture/architectureDataDistributeReplication_c.html

 On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
 Hi,



 My understanding is that NetworkTopologyStrategy does NOT play well 
 with vnodes, due to:

 · Vnode = tokens are (usually) randomly generated (AFAIK)

 · NetworkTopologyStrategy = required carefully choosen tokens for
 all nodes in order to not to get a VERY unbalanced ring like in
 https://issues.apache.org/jira/browse/CASSANDRA-3810



 When playing with vnodes, is the recommendation to define one rack 
 for the entire cluster ?



 Thanks.



 Regards,

 Dominique







 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



--
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade


Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also Mark to your comment on my tpstats output, below is my iostat output,
and the iowait is at 4.59%, which means no IO pressure, but we are still
seeing the bad flush performance. Should we try increasing the flush
writers?


Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
 _x86_64_(24 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  5.80   10.250.654.590.00   78.72

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 103.83  9630.62 11982.60 3231174328 4020290310
dm-0 13.57   160.1781.12   53739546   27217432
dm-1  7.5916.9443.775682200   14686784
dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
sdb 206.09 22789.19 33569.27 7646015080 11262843224



On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns (effective)  Host ID
   Rack
 UN  10.10.20.27  1.89 TB256 25.4%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.62  1.83 TB256 25.5%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.47  1.87 TB256 24.7%
 bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
 UN  10.10.20.45  1.7 TB 256 22.6%
 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
 UN  10.10.20.15  1.86 TB256 24.5%
 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
 UN  10.10.20.31  1.87 TB256 24.9%
 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
 UN  10.10.20.35  1.86 TB256 25.8%
 17cb8772-2444-46ff-8525-33746514727d  rack1
 UN  10.10.20.51  1.89 TB256 25.0%
 0343cd58-3686-465f-8280-56fb72d161e2  rack1
 UN  10.10.20.19  1.91 TB256 25.5%
 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
 UN  10.10.20.39  1.93 TB256 26.0%
 b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
 UN  10.10.20.52  1.81 TB256 25.4%
 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
 UN  10.10.20.22  1.89 TB256 24.8%
 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


 Note: The new node is not part of the above list.

 nodetool compactionstats:

 pending tasks: 1649
   compaction typekeyspace   column family   completed
   total  unit  progress
Compaction   iprod   customerorder  1682804084
 17956558077 bytes 9.37%
Compactionprodgatecustomerorder  1664239271
  1693502275 bytes98.27%
Compaction  qa_config_bkupfixsessionconfig_hist
  2443   27253 bytes 8.96%
Compactionprodgatecustomerorder_hist
  1770577280  5026699390 bytes35.22%
Compaction   iprodgatecustomerorder_hist
  2959560205312350192622 bytes 0.95%




 On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 Ok so you have num_tokens set to 256 for all nodes with initial_token
 commented out, this means you are using vnodes and the new node will
 automatically grab a list of tokens to take over responsibility for.

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
 512

 Looks like about 50% of flushes are blocked.


 This is a problem as it indicates that the IO system cannot keep up.

 Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


 This is normal as the new node will most likely take tokens from all
 nodes in the cluster.

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 Seeds are only used when a new node is bootstrapping into the cluster and
 needs a set of ips to contact and discover the cluster, so this would have
 no impact on data sizes or streaming. In general it would be considered
 best practice to have a set of 2-3 seeds from each data center, with all
 nodes having the same seed list.


 What is the current output of 'nodetool compactionstats'? Could you also
 paste the output of nodetool status keyspace?

 Mark



 On Tue, Aug 5, 2014 at 3:59 PM, Ruchir Jha ruchir@gmail.com wrote:

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 On Tue, Aug 5, 2014 at 10:30 AM, Ruchir Jha ruchir@gmail.com
 wrote:

 Just ran this on the new node:

 nodetool netstats | grep Streaming from | wc -l
 10

 Seems like 

Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Clint Kelly
Hi all,

Allow me to rephrase a question I asked last week.  I am performing some
queries with ALLOW FILTERING and getting consistent read timeouts like the
following:



com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
timeout during read query at consistency ONE (1 responses were
required but only 0 replica responded)


These errors occur only during multi-row scans, and only during integration
tests on our build server.

I tried to see if I could replicate this error by reducing
read_request_timeout_in_ms when I run Cassandra on my local machine
(where I have not seen this error), but that is not working.  Are there any
other parameters that I need to adjust?  I'd feel better if I could at
least replicate this failure by reducing the read_request_timeout_in_ms
(since doing so would mean I actually understand what is going wrong...).

Best regards,
Clint


Re: Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Allow me to rephrase a question I asked last week.  I am performing some
 queries with ALLOW FILTERING and getting consistent read timeouts like the
 following:


ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly
describe its typical performance.

As a general statement, if you have to ALLOW FILTERING, you are probably
Doing It Wrong in terms of schema design.

A correctly operated cluster is unlikely to need to increase the default
timeouts. If you find yourself needing to do so, you are, again, probably
Doing It Wrong.

=Rob


Re: Node bootstrap

2014-08-05 Thread Mark Reddy
Hi Ruchir,

With the large number of blocked flushes and the number of pending
compactions would still indicate IO contention. Can you post the output of
'iostat -x 5 5'

If you do in fact have spare IO, there are several configuration options
you can tune such as increasing the number of flush writers and
compaction_throughput_mb_per_sec

Mark


On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also Mark to your comment on my tpstats output, below is my iostat output,
 and the iowait is at 4.59%, which means no IO pressure, but we are still
 seeing the bad flush performance. Should we try increasing the flush
 writers?


 Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
  _x86_64_(24 CPU)

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.80   10.250.654.590.00   78.72

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda 103.83  9630.62 11982.60 3231174328 4020290310
 dm-0 13.57   160.1781.12   53739546   27217432
 dm-1  7.5916.9443.775682200   14686784
 dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
 sdb 206.09 22789.19 33569.27 7646015080 11262843224



 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns (effective)  Host ID
   Rack
 UN  10.10.20.27  1.89 TB256 25.4%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.62  1.83 TB256 25.5%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.47  1.87 TB256 24.7%
 bcd51a92-3150-41ae-9c51-104ea154f6fa  rack1
 UN  10.10.20.45  1.7 TB 256 22.6%
 8d6bce33-8179-4660-8443-2cf822074ca4  rack1
 UN  10.10.20.15  1.86 TB256 24.5%
 01a01f07-4df2-4c87-98e9-8dd38b3e4aee  rack1
 UN  10.10.20.31  1.87 TB256 24.9%
 1435acf9-c64d-4bcd-b6a4-abcec209815e  rack1
 UN  10.10.20.35  1.86 TB256 25.8%
 17cb8772-2444-46ff-8525-33746514727d  rack1
 UN  10.10.20.51  1.89 TB256 25.0%
 0343cd58-3686-465f-8280-56fb72d161e2  rack1
 UN  10.10.20.19  1.91 TB256 25.5%
 30ddf003-4d59-4a3e-85fa-e94e4adba1cb  rack1
 UN  10.10.20.39  1.93 TB256 26.0%
 b7d44c26-4d75-4d36-a779-b7e7bdaecbc9  rack1
 UN  10.10.20.52  1.81 TB256 25.4%
 6b5aca07-1b14-4bc2-a7ba-96f026fa0e4e  rack1
 UN  10.10.20.22  1.89 TB256 24.8%
 46af9664-8975-4c91-847f-3f7b8f8d5ce2  rack1


 Note: The new node is not part of the above list.

 nodetool compactionstats:

 pending tasks: 1649
   compaction typekeyspace   column family   completed
   total  unit  progress
Compaction   iprod   customerorder  1682804084
 17956558077 bytes 9.37%
Compactionprodgatecustomerorder
  1664239271  1693502275 bytes98.27%
Compaction  qa_config_bkupfixsessionconfig_hist
  2443   27253 bytes 8.96%
Compactionprodgatecustomerorder_hist
  1770577280  5026699390 bytes35.22%
Compaction   iprodgatecustomerorder_hist
  2959560205312350192622 bytes 0.95%




 On Tue, Aug 5, 2014 at 11:37 AM, Mark Reddy mark.re...@boxever.com
 wrote:

 Yes num_tokens is set to 256. initial_token is blank on all nodes
 including the new one.


 Ok so you have num_tokens set to 256 for all nodes with initial_token
 commented out, this means you are using vnodes and the new node will
 automatically grab a list of tokens to take over responsibility for.

 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 FlushWriter   0 0   1136 0
   512

 Looks like about 50% of flushes are blocked.


 This is a problem as it indicates that the IO system cannot keep up.

 Just ran this on the new node:
 nodetool netstats | grep Streaming from | wc -l
 10


 This is normal as the new node will most likely take tokens from all
 nodes in the cluster.

 Sorry for the multiple updates, but another thing I found was all the
 other existing nodes have themselves in the seeds list, but the new node
 does not have itself in the seeds list. Can that cause this issue?


 Seeds are only used when a new node is bootstrapping into the cluster
 and needs a set of ips to contact and discover the cluster, so this would
 have no impact on data sizes or streaming. In general it would be
 considered best practice to have a set of 2-3 seeds from each data center,
 with all nodes having the same seed list.


 What is the current output of 'nodetool compactionstats'? Could you also
 paste the output of nodetool status keyspace?

 Mark



 On Tue, Aug 5, 2014 at 

Re: Fail to reconnect to other nodes after intermittent network failure

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 5:48 AM, Jiri Horky ho...@avast.com wrote:

 What puzzles me is the fact that the authentization apparently started
 to work after the network recovered but the exchange of data did not.

 I would like to understand what could caused the problems and how to
 avoid  them in the future.


Very few people use SSL and very few people use auth, you have probably hit
an edge case.

I would file a JIRA with the details you described above.

=Rob


Re: Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Sávio S . Teles de Oliveira
How much did you reduce *read_request_timeout_in_ms* on your local machine?
Cassandra timeout during read query is higher than one machine because
Cassandra server must run the read operation in more servers (so you have
network traffic).


2014-08-05 14:54 GMT-03:00 Robert Coli rc...@eventbrite.com:

 On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com
 wrote:

 Allow me to rephrase a question I asked last week.  I am performing some
 queries with ALLOW FILTERING and getting consistent read timeouts like the
 following:


 ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly
 describe its typical performance.

 As a general statement, if you have to ALLOW FILTERING, you are probably
 Doing It Wrong in terms of schema design.

 A correctly operated cluster is unlikely to need to increase the default
 timeouts. If you find yourself needing to do so, you are, again, probably
 Doing It Wrong.

 =Rob




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
CUIA Internet Brasil


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


As long as you correctly configure the new snitch so that the replica sets
do not change, no, you do not need to repair.

Barring that, if you manage to transform the replica set in such a way that
you always have one (fully repaired) replica from the old set, repair will
help. I do not recommend this very risky practice. In practice the only
transformation of snitch in a cluster with data which is likely to be safe
is one whose result is a NOOP in terms of replica placement.

In fact, the yaml file is stating something unreasonable there, because
repair cannot protect against this case :

- 6 node cluster, A B C D E F,  RF = 2

1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
2) Write row key X, value Y, to nodes A and B.
2) Change to OtherSnitch so that now C,D are responsible for row key X.
3) Repair and notice that neither C nor D answer Y when asked for row X.

=Rob


RE: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread DuyHai Doan
The discussion about racks  NTS is also mentioned in this recent article :
planetcassandra.org/multi-data-center-replication-in-nosql-databases/

The last section may be of interest for you
Le 5 août 2014 18:14, DE VITO Dominique dominique.dev...@thalesgroup.com
a écrit :

  Jonathan wrote:
 
  Yes, if you have only 1 machine in a rack then your cluster will be
 imbalanced.  You're going to be able to dream up all sorts of weird failure
 cases when you choose a scenario like RF=2  totally imbalanced network
 arch.
 
  Vnodes attempt to solve the problem of imbalanced rings by choosing so
 many tokens that it's improbable that the ring will be imbalanced.

 Storage/load distro = function(1st replica placement, other replica
 placement)

 vnode solves the balancing pb for 1st replica placement // so, yes, I
 agree with you, but for 1st replica placement only

 But NetworkTopologyStrategy (NTS) influences other (2+) replica placement
 = as NTS best behavior relies on token distro, and you have no control on
 tokens with vnodes, the best option I see with **vnode** is to use only one
 rack with NTS.

 Dominique


 -Message d'origine-
 De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De la
 part de Jonathan Haddad
 Envoyé : mardi 5 août 2014 18:04
 À : user@cassandra.apache.org
 Objet : Re: vnode and NetworkTopologyStrategy: not playing well together ?

 Yes, if you have only 1 machine in a rack then your cluster will be
 imbalanced.  You're going to be able to dream up all sorts of weird failure
 cases when you choose a scenario like RF=2  totally imbalanced network
 arch.

 Vnodes attempt to solve the problem of imbalanced rings by choosing so
 many tokens that it's improbable that the ring will be imbalanced.



 On Tue, Aug 5, 2014 at 8:57 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
  First, thanks for your answer.
 
  This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.
 
  IMHO, it's not a good enough condition.
  Let's use an example with RF=2
 
  N1/rack_1   N2/rack_1   N3/rack_1   N4/rack_2
 
  Here, you have RF= # of racks
  And due to NetworkTopologyStrategy, N4 will store *all* the cluster
 data, leading to a completely imbalanced cluster.
 
  IMHO, it happens when using nodes *or* vnodes.
 
  As well-balanced clusters with NetworkTopologyStrategy rely on carefully
 chosen token distribution/path along the ring *and* as tokens are
 randomly-generated with vnodes, my guess is that with vnodes and
 NetworkTopologyStrategy, it's better to define a single (logical) rack //
 due to carefully chosen tokens vs randomly-generated token clash.
 
  I don't see other options left.
  Do you see other ones ?
 
  Regards,
  Dominique
 
 
 
 
  -Message d'origine-
  De : jonathan.had...@gmail.com [mailto:jonathan.had...@gmail.com] De
  la part de Jonathan Haddad Envoyé : mardi 5 août 2014 17:43 À :
  user@cassandra.apache.org Objet : Re: vnode and
  NetworkTopologyStrategy: not playing well together ?
 
  This is incorrect.  Network Topology w/ Vnodes will be fine, assuming
 you've got RF= # of racks.  For each token, replicas are chosen based on
 the strategy.  Essentially, you could have a wild imbalance in token
 ownership, but it wouldn't matter because the replicas would be distributed
 across the rest of the machines.
 
  http://www.datastax.com/documentation/cassandra/2.0/cassandra/architec
  ture/architectureDataDistributeReplication_c.html
 
  On Tue, Aug 5, 2014 at 8:19 AM, DE VITO Dominique 
 dominique.dev...@thalesgroup.com wrote:
  Hi,
 
 
 
  My understanding is that NetworkTopologyStrategy does NOT play well
  with vnodes, due to:
 
  · Vnode = tokens are (usually) randomly generated (AFAIK)
 
  · NetworkTopologyStrategy = required carefully choosen tokens
 for
  all nodes in order to not to get a VERY unbalanced ring like in
  https://issues.apache.org/jira/browse/CASSANDRA-3810
 
 
 
  When playing with vnodes, is the recommendation to define one rack
  for the entire cluster ?
 
 
 
  Thanks.
 
 
 
  Regards,
 
  Dominique
 
 
 
 
 
 
 
  --
  Jon Haddad
  http://www.rustyrazorblade.com
  skype: rustyrazorblade



 --
 Jon Haddad
 http://www.rustyrazorblade.com
 skype: rustyrazorblade



Re: Issue with ALLOW FILTERING

2014-08-05 Thread Sávio S . Teles de Oliveira
You need to create an index on attribute *c.*


2014-08-05 9:24 GMT-03:00 Jens Rantil jens.ran...@tink.se:

 Hi,

 I'm having an issue with ALLOW FILTERING with Cassandra 2.0.8. See a
 minimal example here:
 https://gist.github.com/JensRantil/ec43622c26acb56e5bc9

 I expect the second last to fail, but the last query to return a single
 row. In particular I expect the last SELECT to first select using the
 clustering primary id and then do filtering.

 I've been reading
 https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt ALLOW
 FILTERING and can't wrap my head around why this won't work.

 Could anyone clarify this for me?

 Thanks,
 Jens




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
CUIA Internet Brasil


Re: moving older tables from SSD to HDD?

2014-08-05 Thread Sávio S . Teles de Oliveira
Have you looked nodetool?
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNodetool_r.html


2014-08-04 16:43 GMT-03:00 Kevin Burton bur...@spinn3r.com:

 Is it possible to take older tables, which are immutable, and move them
 from SSD to HDD?

 We lower the SLA on older data so keeping it on HDD is totally fine.

 MySQL can *sort* of do this… and I think that Cassandra could if it was
 handled properly.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




-- 
Atenciosamente,
Sávio S. Teles de Oliveira
voice: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ciências da Computação - UFG
Arquiteto de Software
CUIA Internet Brasil


Re: Node stuck during nodetool rebuild

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 1:28 AM, Vasileios Vlachos 
vasileiosvlac...@gmail.com wrote:

 The problem is that the nodetool seems to be stuck, and nodetool netstats
 on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2
 at DC1. This doesn't tally with nodetool netstats when running it against
 either of the DC1 nodes. The DC1 nodes don't think they stream anything to
 DC2.


Yes, streaming is fragile and breaks and hangs forever and your only option
in most cases is to stop the rebuilding node, nuke its data, and start
again.

I believe you might be able to tune the phi detector threshold to help this
operation complete, hopefully someone with direct experience of same will
chime in.

=Rob


Re: moving older tables from SSD to HDD?

2014-08-05 Thread Benedict Elliott Smith
Hi Kevin,

This is something we do plan to support, but don't right now. You can see
the discussion around this and related issues here
https://issues.apache.org/jira/browse/CASSANDRA-5863 (although it may
seem unrelated at first glance).




On Mon, Aug 4, 2014 at 8:43 PM, Kevin Burton bur...@spinn3r.com wrote:

 Is it possible to take older tables, which are immutable, and move them
 from SSD to HDD?

 We lower the SLA on older data so keeping it on HDD is totally fine.

 MySQL can *sort* of do this… and I think that Cassandra could if it was
 handled properly.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com




Re: Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Clint Kelly
Hi Rob,

Thanks for your feedback.  I understand that use of ALLOW FILTERING is
not a best practice.  In this case, however, I am building a tool on
top of Cassandra that allows users to sometimes do things that are
less than optimal.  When they try to do expensive queries like this,
I'd rather provide a higher limit before timing out, but I can't seem
to change the behavior of Cassandra by tweaking any of the parameters
in the cassandra.yaml file or in the DataStax Java driver's Cluster
object.

FWIW these queries are also in batch jobs where we can tolerate the
extra latency.

Thanks for your help!

Best regards,
Clint


On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli rc...@eventbrite.com wrote:
 On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Allow me to rephrase a question I asked last week.  I am performing some
 queries with ALLOW FILTERING and getting consistent read timeouts like the
 following:


 ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly
 describe its typical performance.

 As a general statement, if you have to ALLOW FILTERING, you are probably
 Doing It Wrong in terms of schema design.

 A correctly operated cluster is unlikely to need to increase the default
 timeouts. If you find yourself needing to do so, you are, again, probably
 Doing It Wrong.

 =Rob


Re: Node stuck during nodetool rebuild

2014-08-05 Thread Mark Reddy
Hi Vasilis,

To further on what Rob said

I believe you might be able to tune the phi detector threshold to help this
 operation complete, hopefully someone with direct experience of same will
 chime in.


I have been through this operation where streams break due to a node
falsely being marked down (flapping). In an attempt to  mitigate this I
increase the phi_convict_threshold in cassandra.yaml from 8 to 10, after
which the rebuild was able to successfully complete. The default value for
phi_convict_threshold is 8 with 12 being the maximum recommended value.


Mark


On Tue, Aug 5, 2014 at 7:22 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 5, 2014 at 1:28 AM, Vasileios Vlachos 
 vasileiosvlac...@gmail.com wrote:

 The problem is that the nodetool seems to be stuck, and nodetool netstats
 on node1 of DC2 appears to be stuck at 10% streaming a 5G file from node2
 at DC1. This doesn't tally with nodetool netstats when running it against
 either of the DC1 nodes. The DC1 nodes don't think they stream anything to
 DC2.


 Yes, streaming is fragile and breaks and hangs forever and your only
 option in most cases is to stop the rebuilding node, nuke its data, and
 start again.

 I believe you might be able to tune the phi detector threshold to help
 this operation complete, hopefully someone with direct experience of same
 will chime in.

 =Rob




Re: Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Clint Kelly
Ah FWIW I was able to reproduce the problem by reducing
range_request_timeout_in_ms.  This is great since I want to increase
the timeout for batch jobs where we scan a large set of rows, but
leave the timeout for single-row queries alone.

Best regards,
Clint


On Tue, Aug 5, 2014 at 11:42 AM, Clint Kelly clint.ke...@gmail.com wrote:
 Hi Rob,

 Thanks for your feedback.  I understand that use of ALLOW FILTERING is
 not a best practice.  In this case, however, I am building a tool on
 top of Cassandra that allows users to sometimes do things that are
 less than optimal.  When they try to do expensive queries like this,
 I'd rather provide a higher limit before timing out, but I can't seem
 to change the behavior of Cassandra by tweaking any of the parameters
 in the cassandra.yaml file or in the DataStax Java driver's Cluster
 object.

 FWIW these queries are also in batch jobs where we can tolerate the
 extra latency.

 Thanks for your help!

 Best regards,
 Clint


 On Tue, Aug 5, 2014 at 10:54 AM, Robert Coli rc...@eventbrite.com wrote:
 On Tue, Aug 5, 2014 at 10:01 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Allow me to rephrase a question I asked last week.  I am performing some
 queries with ALLOW FILTERING and getting consistent read timeouts like the
 following:


 ALLOW FILTERING should be renamed PROBABLY TIMEOUT in order to properly
 describe its typical performance.

 As a general statement, if you have to ALLOW FILTERING, you are probably
 Doing It Wrong in terms of schema design.

 A correctly operated cluster is unlikely to need to increase the default
 timeouts. If you find yourself needing to do so, you are, again, probably
 Doing It Wrong.

 =Rob


Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
set to 0, which I believe disables throttling.

Also, Here is the iostat -x 5 5 output:


Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
1.56   14.62   1.05  11.21
dm-0  0.00 0.003.59   18.82   166.52   150.3514.14
0.44   19.49   0.54   1.22
dm-1  0.00 0.002.325.3718.5642.98 8.00
0.76   98.82   0.43   0.33
dm-2  0.00 0.00  162.17 5836.66 32714.46 47040.8713.30
5.570.90   0.06  36.00
sdb   0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
4.43   20.68   1.29  27.64

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
14.64   10.751.81   13.500.00   59.29

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  15.40  1344.60   68.80  145.60  4964.80 11790.4078.15
0.381.80   0.80  17.10
dm-0  0.00 0.00   43.00 1186.20  2292.80  9489.60 9.59
4.883.90   0.09  11.58
dm-1  0.00 0.001.600.0012.80 0.00 8.00
0.03   16.00   2.00   0.32
dm-2  0.00 0.00  197.20 17583.80 35152.00 140664.00
9.89  2847.50  109.52   0.05  93.50
sdb  13.20 16552.20  159.00  742.20 32745.60 129129.60   179.62
   72.88   66.01   1.04  93.42

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  15.51   19.771.975.020.00   57.73

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  16.20   523.40   60.00  285.00  5220.80  5913.6032.27
0.250.72   0.60  20.86
dm-0  0.00 0.000.801.4032.0011.2019.64
0.013.18   1.55   0.34
dm-1  0.00 0.001.600.0012.80 0.00 8.00
0.03   21.00   2.62   0.42
dm-2  0.00 0.00  339.40 5886.80 66219.20 47092.8018.20
  251.66  184.72   0.10  63.48
sdb   1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
5.98   40.92   1.23  58.28

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  16.59   16.342.039.010.00   56.04

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   5.40   320.00   37.40  159.80  2483.20  3529.6030.49
0.100.52   0.39   7.76
dm-0  0.00 0.000.203.60 1.6028.80 8.00
0.000.68   0.68   0.26
dm-1  0.00 0.000.000.00 0.00 0.00 0.00
0.000.00   0.00   0.00
dm-2  0.00 0.00  287.20 13108.20 53985.60 104864.00
 11.86   869.18   48.82   0.06  76.96
sdb   5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
   21.46   23.75   0.97  75.08



On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Ruchir,

 With the large number of blocked flushes and the number of pending
 compactions would still indicate IO contention. Can you post the output of
 'iostat -x 5 5'

 If you do in fact have spare IO, there are several configuration options
 you can tune such as increasing the number of flush writers and
 compaction_throughput_mb_per_sec

 Mark


 On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also Mark to your comment on my tpstats output, below is my iostat
 output, and the iowait is at 4.59%, which means no IO pressure, but we are
 still seeing the bad flush performance. Should we try increasing the flush
 writers?


 Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
  _x86_64_(24 CPU)

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.80   10.250.654.590.00   78.72

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda 103.83  9630.62 11982.60 3231174328 4020290310
 dm-0 13.57   160.1781.12   53739546   27217432
 dm-1  7.5916.9443.775682200   14686784
 dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
 sdb 206.09 22789.19 33569.27 7646015080 11262843224



 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  10.10.20.27  1.89 TB256 25.4%
 76023cdd-c42d-4068-8b53-ae94584b8b04  rack1
 UN  10.10.20.62  1.83 TB256 25.5%
 84b47313-da75-4519-94f3-3951d554a3e5  rack1
 UN  10.10.20.47  1.87 TB256 24.7%
 

Re: Node bootstrap

2014-08-05 Thread Ruchir Jha
Also, right now the top command shows that we are at 500-700% CPU, and we
have 23 total processors, which means we have a lot of idle CPU left over,
so throwing more threads at compaction and flush should alleviate the
problem?


On Tue, Aug 5, 2014 at 2:57 PM, Ruchir Jha ruchir@gmail.com wrote:


 Right now, we have 6 flush writers and compaction_throughput_mb_per_sec is
 set to 0, which I believe disables throttling.

 Also, Here is the iostat -x 5 5 output:


 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  10.00  1450.35   50.79   55.92  9775.97 12030.14   204.34
 1.56   14.62   1.05  11.21
 dm-0  0.00 0.003.59   18.82   166.52   150.3514.14
 0.44   19.49   0.54   1.22
 dm-1  0.00 0.002.325.3718.5642.98 8.00
 0.76   98.82   0.43   0.33
 dm-2  0.00 0.00  162.17 5836.66 32714.46 47040.8713.30
 5.570.90   0.06  36.00
 sdb   0.40  4251.90  106.72  107.35 23123.61 35204.09   272.46
 4.43   20.68   1.29  27.64

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 14.64   10.751.81   13.500.00   59.29

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  15.40  1344.60   68.80  145.60  4964.80 11790.4078.15
 0.381.80   0.80  17.10
 dm-0  0.00 0.00   43.00 1186.20  2292.80  9489.60 9.59
 4.883.90   0.09  11.58
 dm-1  0.00 0.001.600.0012.80 0.00 8.00
 0.03   16.00   2.00   0.32
 dm-2  0.00 0.00  197.20 17583.80 35152.00 140664.00
 9.89  2847.50  109.52   0.05  93.50
 sdb  13.20 16552.20  159.00  742.20 32745.60 129129.60
 179.6272.88   66.01   1.04  93.42

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   15.51   19.771.975.020.00   57.73

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda  16.20   523.40   60.00  285.00  5220.80  5913.6032.27
 0.250.72   0.60  20.86
 dm-0  0.00 0.000.801.4032.0011.2019.64
 0.013.18   1.55   0.34
 dm-1  0.00 0.001.600.0012.80 0.00 8.00
 0.03   21.00   2.62   0.42
 dm-2  0.00 0.00  339.40 5886.80 66219.20 47092.8018.20
   251.66  184.72   0.10  63.48
 sdb   1.00  5025.40  264.20  209.20 60992.00 50422.40   235.35
 5.98   40.92   1.23  58.28

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   16.59   16.342.039.010.00   56.04

 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   5.40   320.00   37.40  159.80  2483.20  3529.6030.49
 0.100.52   0.39   7.76
 dm-0  0.00 0.000.203.60 1.6028.80 8.00
 0.000.68   0.68   0.26
 dm-1  0.00 0.000.000.00 0.00 0.00 0.00
 0.000.00   0.00   0.00
 dm-2  0.00 0.00  287.20 13108.20 53985.60 104864.00
  11.86   869.18   48.82   0.06  76.96
 sdb   5.20 12163.40  238.20  532.00 51235.20 93753.60   188.25
21.46   23.75   0.97  75.08



 On Tue, Aug 5, 2014 at 1:55 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Ruchir,

 With the large number of blocked flushes and the number of pending
 compactions would still indicate IO contention. Can you post the output of
 'iostat -x 5 5'

 If you do in fact have spare IO, there are several configuration options
 you can tune such as increasing the number of flush writers and
 compaction_throughput_mb_per_sec

 Mark


 On Tue, Aug 5, 2014 at 5:22 PM, Ruchir Jha ruchir@gmail.com wrote:

 Also Mark to your comment on my tpstats output, below is my iostat
 output, and the iowait is at 4.59%, which means no IO pressure, but we are
 still seeing the bad flush performance. Should we try increasing the flush
 writers?


 Linux 2.6.32-358.el6.x86_64 (ny4lpcas13.fusionts.corp)  08/05/2014
  _x86_64_(24 CPU)

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.80   10.250.654.590.00   78.72

 Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
 sda 103.83  9630.62 11982.60 3231174328 4020290310
 dm-0 13.57   160.1781.12   53739546   27217432
 dm-1  7.5916.9443.775682200   14686784
 dm-2   5792.76 32242.66 45427.12 10817753530 15241278360
 sdb 206.09 22789.19 33569.27 7646015080 11262843224



 On Tue, Aug 5, 2014 at 12:13 PM, Ruchir Jha ruchir@gmail.com
 wrote:

 nodetool status:

 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ 

Re: Fail to reconnect to other nodes after intermittent network failure

2014-08-05 Thread Jiri Horky
OK, ticket 7696 [1] created.

Jiri Horky

https://issues.apache.org/jira/browse/CASSANDRA-7696

On 08/05/2014 07:57 PM, Robert Coli wrote:

 On Tue, Aug 5, 2014 at 5:48 AM, Jiri Horky ho...@avast.com
 mailto:ho...@avast.com wrote:

 What puzzles me is the fact that the authentization apparently started
 to work after the network recovered but the exchange of data did not.

 I would like to understand what could caused the problems and how to
 avoid  them in the future.


 Very few people use SSL and very few people use auth, you have
 probably hit an edge case.

 I would file a JIRA with the details you described above.

 =Rob
  



Re: Read timeouts with ALLOW FILTERING turned on

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 11:53 AM, Clint Kelly clint.ke...@gmail.com wrote:

 Ah FWIW I was able to reproduce the problem by reducing
 range_request_timeout_in_ms.  This is great since I want to increase
 the timeout for batch jobs where we scan a large set of rows, but
 leave the timeout for single-row queries alone.


You have just explicated (a subset of) the reason the timeouts were broken
out.

https://issues.apache.org/jira/browse/CASSANDRA-2819

=Rob


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rene Kochen
As long as you correctly configure the new snitch so that the replica sets
do not change, no, you do not need to repair.

Is the following correct:

The replica sets do not change if you modify the snitch from SimpleSnitch
to NetworkTopologyStrategy and the topology file puts all nodes in the same
data-center and rack.

Thanks again!

Rene


2014-08-05 20:05 GMT+02:00 Robert Coli rc...@eventbrite.com:

 On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


 As long as you correctly configure the new snitch so that the replica sets
 do not change, no, you do not need to repair.

 Barring that, if you manage to transform the replica set in such a way
 that you always have one (fully repaired) replica from the old set, repair
 will help. I do not recommend this very risky practice. In practice the
 only transformation of snitch in a cluster with data which is likely to be
 safe is one whose result is a NOOP in terms of replica placement.

 In fact, the yaml file is stating something unreasonable there, because
 repair cannot protect against this case :

 - 6 node cluster, A B C D E F,  RF = 2

 1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
 2) Write row key X, value Y, to nodes A and B.
 2) Change to OtherSnitch so that now C,D are responsible for row key X.
 3) Repair and notice that neither C nor D answer Y when asked for row X.

 =Rob




Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Robert Coli
On Tue, Aug 5, 2014 at 2:27 PM, Rene Kochen rene.koc...@schange.com wrote:

 As long as you correctly configure the new snitch so that the replica
 sets do not change, no, you do not need to repair.

 Is the following correct:

 The replica sets do not change if you modify the snitch from SimpleSnitch
 to NetworkTopologyStrategy and the topology file puts all nodes in the same
 data-center and rack.


Yes, you can use nodetool getendpoints to illustrate this programatically.

1) make a set of keys with a key from each range
2) getendpoints for this set of keys
3) change snitch
4) getendpoints again

=Rob


Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rameez Thonnakkal
I think the RAC placement of these 12 nodes will become important. As the
12 nodes are placed in SimpleSnitch, which is not RAC aware, it would be
good to retain them in single RAC in the property file snitch also
initially. node repair is a safe option. If you need to change the RAC
placement, my take would be to increase the Replication factor to atleast 3
and then distribute the nodes in different RAC.

This is not an expert opinion but a newbie thought.

Regards,
Rameez


On Tue, Aug 5, 2014 at 11:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


 As long as you correctly configure the new snitch so that the replica sets
 do not change, no, you do not need to repair.

 Barring that, if you manage to transform the replica set in such a way
 that you always have one (fully repaired) replica from the old set, repair
 will help. I do not recommend this very risky practice. In practice the
 only transformation of snitch in a cluster with data which is likely to be
 safe is one whose result is a NOOP in terms of replica placement.

 In fact, the yaml file is stating something unreasonable there, because
 repair cannot protect against this case :

 - 6 node cluster, A B C D E F,  RF = 2

 1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
 2) Write row key X, value Y, to nodes A and B.
 2) Change to OtherSnitch so that now C,D are responsible for row key X.
 3) Repair and notice that neither C nor D answer Y when asked for row X.

 =Rob




Cassandra process exiting mysteriously

2014-08-05 Thread Clint Kelly
Hi everyone,

For some integration tests, we start up a CassandraDaemon in a
separate process (using the Java 7 ProcessBuilder API).  All of my
integration tests run beautifully on my laptop, but one of them fails
on our Jenkins cluster.

The failing integration test does around 10k writes to different rows
and then 10k reads.  After running some number of reads, the job dies
with this error:

com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.10:58209
(com.datastax.driver.core.exceptions.DriverException: Timeout during
read))

This error appears to have occurred because the Cassandra process has
stopped.  The logs for the Cassandra process show some warnings during
batch writes (the batches are too big), no activity for a few minutes
(I assume this is because all of the read operations were proceeding
smoothly), and then look like the following:

INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
ThriftServer.java (line 141) Stop listening to thrift clients
 INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java
(line 182) Stop listening for CQL clients
 INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
Gossiper.java (line 1279) Announcing shutdown
 INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
MessagingService.java (line 683) Waiting for messaging service to
quiesce
 INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931
MessagingService.java (line 923) MessagingService has terminated the
accept() thread

Does anyone have any ideas about how to debug this?  Looking around on
google I found some threads suggesting that this could occur from an
OOM error 
(http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors).
Wouldn't such an error be logged, however?

The test that fails is a test of our MapReduce Hadoop InputFormat and
as such it does some pretty big queries across multiple rows (over a
range of partitioning key tokens).  The default fetch size I believe
is 5000 rows, and the values in the rows I am fetching are just simple
strings, so I would not think the amount of data in a single read
would be too big.

FWIW I don't see any log messages about garbage collection for at
least 3min before the process shuts down (and no GC messages after the
test stops doing writes and starts doing reads).

I'd greatly appreciate any help before my team kills me for breaking
our Jenkins build so consistently!  :)

Best regards,
Clint


Re: Cassandra process exiting mysteriously

2014-08-05 Thread Kevin Burton
If there is an oom it will be in the logs.
On Aug 5, 2014 8:17 PM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi everyone,

 For some integration tests, we start up a CassandraDaemon in a
 separate process (using the Java 7 ProcessBuilder API).  All of my
 integration tests run beautifully on my laptop, but one of them fails
 on our Jenkins cluster.

 The failing integration test does around 10k writes to different rows
 and then 10k reads.  After running some number of reads, the job dies
 with this error:

 com.datastax.driver.core.exceptions.NoHostAvailableException: All
 host(s) tried for query failed (tried: /127.0.0.10:58209
 (com.datastax.driver.core.exceptions.DriverException: Timeout during
 read))

 This error appears to have occurred because the Cassandra process has
 stopped.  The logs for the Cassandra process show some warnings during
 batch writes (the batches are too big), no activity for a few minutes
 (I assume this is because all of the read operations were proceeding
 smoothly), and then look like the following:

 INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
 ThriftServer.java (line 141) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java
 (line 182) Stop listening for CQL clients
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
 Gossiper.java (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
 MessagingService.java (line 683) Waiting for messaging service to
 quiesce
  INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931
 MessagingService.java (line 923) MessagingService has terminated the
 accept() thread

 Does anyone have any ideas about how to debug this?  Looking around on
 google I found some threads suggesting that this could occur from an
 OOM error (
 http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors
 ).
 Wouldn't such an error be logged, however?

 The test that fails is a test of our MapReduce Hadoop InputFormat and
 as such it does some pretty big queries across multiple rows (over a
 range of partitioning key tokens).  The default fetch size I believe
 is 5000 rows, and the values in the rows I am fetching are just simple
 strings, so I would not think the amount of data in a single read
 would be too big.

 FWIW I don't see any log messages about garbage collection for at
 least 3min before the process shuts down (and no GC messages after the
 test stops doing writes and starts doing reads).

 I'd greatly appreciate any help before my team kills me for breaking
 our Jenkins build so consistently!  :)

 Best regards,
 Clint



Re: Cassandra process exiting mysteriously

2014-08-05 Thread Clint Kelly
HI Kevin,

Thanks for your reply.  That is what I assumed, but some of the posts
I read on Stack Overflow (e.g., the one that I referenced in my mail)
suggested otherwise.  I was just curious if others had experienced OOM
problems that weren't logged or if there were other common culprits.

Best regards,
Clint



On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton bur...@spinn3r.com wrote:
 If there is an oom it will be in the logs.

 On Aug 5, 2014 8:17 PM, Clint Kelly clint.ke...@gmail.com wrote:

 Hi everyone,

 For some integration tests, we start up a CassandraDaemon in a
 separate process (using the Java 7 ProcessBuilder API).  All of my
 integration tests run beautifully on my laptop, but one of them fails
 on our Jenkins cluster.

 The failing integration test does around 10k writes to different rows
 and then 10k reads.  After running some number of reads, the job dies
 with this error:

 com.datastax.driver.core.exceptions.NoHostAvailableException: All
 host(s) tried for query failed (tried: /127.0.0.10:58209
 (com.datastax.driver.core.exceptions.DriverException: Timeout during
 read))

 This error appears to have occurred because the Cassandra process has
 stopped.  The logs for the Cassandra process show some warnings during
 batch writes (the batches are too big), no activity for a few minutes
 (I assume this is because all of the read operations were proceeding
 smoothly), and then look like the following:

 INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903
 ThriftServer.java (line 141) Stop listening to thrift clients
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java
 (line 182) Stop listening for CQL clients
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930
 Gossiper.java (line 1279) Announcing shutdown
  INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930
 MessagingService.java (line 683) Waiting for messaging service to
 quiesce
  INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931
 MessagingService.java (line 923) MessagingService has terminated the
 accept() thread

 Does anyone have any ideas about how to debug this?  Looking around on
 google I found some threads suggesting that this could occur from an
 OOM error
 (http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors).
 Wouldn't such an error be logged, however?

 The test that fails is a test of our MapReduce Hadoop InputFormat and
 as such it does some pretty big queries across multiple rows (over a
 range of partitioning key tokens).  The default fetch size I believe
 is 5000 rows, and the values in the rows I am fetching are just simple
 strings, so I would not think the amount of data in a single read
 would be too big.

 FWIW I don't see any log messages about garbage collection for at
 least 3min before the process shuts down (and no GC messages after the
 test stops doing writes and starts doing reads).

 I'd greatly appreciate any help before my team kills me for breaking
 our Jenkins build so consistently!  :)

 Best regards,
 Clint