Re: Make an existing cluster multi data-center compatible.

2014-08-05 Thread Rameez Thonnakkal
I think the RAC placement of these 12 nodes will become important. As the
12 nodes are placed in SimpleSnitch, which is not RAC aware, it would be
good to retain them in single RAC in the property file snitch also
initially. node repair is a safe option. If you need to change the RAC
placement, my take would be to increase the Replication factor to atleast 3
and then distribute the nodes in different RAC.

This is not an expert opinion but a newbie thought.

Regards,
Rameez


On Tue, Aug 5, 2014 at 11:35 PM, Robert Coli rc...@eventbrite.com wrote:

 On Tue, Aug 5, 2014 at 3:52 AM, Rene Kochen rene.koc...@schange.com
 wrote:

 Do I have to run full repairs after this change? Because the yaml file
 states: IF YOU CHANGE THE SNITCH AFTER DATA IS INSERTED INTO THE CLUSTER,
 YOU MUST RUN A FULL REPAIR, SINCE THE SNITCH AFFECTS WHERE REPLICAS ARE
 PLACED.


 As long as you correctly configure the new snitch so that the replica sets
 do not change, no, you do not need to repair.

 Barring that, if you manage to transform the replica set in such a way
 that you always have one (fully repaired) replica from the old set, repair
 will help. I do not recommend this very risky practice. In practice the
 only transformation of snitch in a cluster with data which is likely to be
 safe is one whose result is a NOOP in terms of replica placement.

 In fact, the yaml file is stating something unreasonable there, because
 repair cannot protect against this case :

 - 6 node cluster, A B C D E F,  RF = 2

 1) Start with SimpleSnitch so that A, B have the two replicas of row key X.
 2) Write row key X, value Y, to nodes A and B.
 2) Change to OtherSnitch so that now C,D are responsible for row key X.
 3) Repair and notice that neither C nor D answer Y when asked for row X.

 =Rob




Re: How to perform Range Queries in Cassandra

2014-07-06 Thread Rameez Thonnakkal
Won't the performeance improve significantly if you increase the number of
nodes even in a commodity hardware profile.
On 5 Jul 2014 01:38, Jens Rantil jens.ran...@tink.se wrote:

 Hi Mike,

 To learn get subsecond performance on your queries using _any_ database
 you need to use proper indexing. Like Jeremy said, Solr will do this.

 If you'd like to try to solve this using Cassandra you need to learn the
 difference between partition and clustering in your primary key and
 understand you need a clustering to do any kind of range query.

 Also, COUNTs in Cassandra are generally fairly slow.

 Cheers,
 Jens
 —
 Sent from Mailbox https://www.dropbox.com/mailbox


 On Tue, Jun 24, 2014 at 10:09 AM, Mike Carter jaloos...@gmail.com wrote:

 Hello!


 I'm a beginner in C* and I'm quite struggling with it.

 I’d like to measure the performance of some Cassandra-Range-Queries. The
 idea is to execute multidimensional range-queries on Cassandra. E.g. there
 is a given table of 1million rows with 10 columns and I like to execute
 some queries like “select count(*) from testable where d=1 and v110 and v2
 20 and v3 45 and v470 … allow filtering”.  This kind of queries is very
 slow in C* and soon the tables are bigger, I get a read-timeout probably
 caused by long scan operations.

 In further tests I like to extend the dimensions to more than 200
 hundreds and the rows to 100millions, but actually I can’t handle this
 small table. Should reorganize the data or is it impossible to perform such
 high multi-dimensional queries on Cassandra?





 The setup:

 Cassandra is installed on a single node with 2 TB disk space and 180GB
 Ram.

 Connected to Test Cluster at localhost:9160.

 [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]



 Keyspace:

 CREATE KEYSPACE test WITH replication = {

   'class': 'SimpleStrategy',

   'replication_factor': '1'

 };





 Table:

 CREATE TABLE testc21 (

   key int,

   d int,

   v1 int,

   v10 int,

   v2 int,

   v3 int,

   v4 int,

   v5 int,

   v6 int,

   v7 int,

   v8 int,

   v9 int,

   PRIMARY KEY (key)

 ) WITH

   bloom_filter_fp_chance=0.01 AND

   caching='ROWS_ONLY' AND

   comment='' AND

   dclocal_read_repair_chance=0.00 AND

   gc_grace_seconds=864000 AND

   index_interval=128 AND

   read_repair_chance=0.10 AND

   replicate_on_write='true' AND

   populate_io_cache_on_flush='false' AND

   default_time_to_live=0 AND

   speculative_retry='99.0PERCENTILE' AND

   memtable_flush_period_in_ms=0 AND

   compaction={'class': 'SizeTieredCompactionStrategy'} AND

   compression={'sstable_compression': 'LZ4Compressor'};



 CREATE INDEX testc21_d_idx ON testc21 (d);



  select * from testc21 limit 10;

 key| d | v1 | v10 | v2 | v3 | v4  | v5 | v6 | v7 | v8 | v9

 +---++-+++-+++++-

  302602 | 1 | 56 |  55 | 26 | 45 |  67 | 75 | 25 | 50 | 26 |  54

  531141 | 1 | 90 |  77 | 86 | 42 |  76 | 91 | 47 | 31 | 77 |  27

  693077 | 1 | 67 |  71 | 14 | 59 | 100 | 90 | 11 | 15 |  6 |  19

4317 | 1 | 70 |  77 | 44 | 77 |  41 | 68 | 33 |  0 | 99 |  14

  927961 | 1 | 15 |  97 | 95 | 80 |  35 | 36 | 45 |  8 | 11 | 100

  313395 | 1 | 68 |  62 | 56 | 85 |  14 | 96 | 43 |  6 | 32 |   7

  368168 | 1 |  3 |  63 | 55 | 32 |  18 | 95 | 67 | 78 | 83 |  52

  671830 | 1 | 14 |  29 | 28 | 17 |  42 | 42 |  4 |  6 | 61 |  93

   62693 | 1 | 26 |  48 | 15 | 22 |  73 | 94 | 86 |  4 | 66 |  63

  488360 | 1 |  8 |  57 | 86 | 31 |  51 |  9 | 40 | 52 | 91 |  45

 Mike





Disable vnode

2014-07-04 Thread Rameez Thonnakkal
hello Team,

I am looking for standard operating procedure to disable vnode in a
production cluster.
This is to enable solr which doesn't work with a cassandra cluster having
vnode enabled.

Any suggestions/

Thanks,
Rameez


Re: Disable vnode

2014-07-04 Thread Rameez Thonnakkal
Thanks Mark.
the procedure you shared is useful. I think I have missed the nodetool
rebuild command.
I am trying it out in a non-prod environment.

The num_tokens is set to 1 and initial_token is set to different values
(mine is a 6 node cluster with 3 in each datacenter).
Tried a rolling restart of the cluster. That didn't help.
Tried a cold restart of the cluster. That also didn't work.

I will try the nodetool rebuild and see whether any change.

Thanks,
rameez



On Fri, Jul 4, 2014 at 7:19 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Rameez,

 I have never done a migration from vnodes to non-vnodes however I would
 imagine that the procedure would be the same as its counterpart. As always
 testing in dev should be done first.

 To move from vnodes to non-vodes I would add a new datacenter to the
 cluster with vnodes disabled and rebuild from your vnode cluster.

 You can find some more details about adding a data center to your cluster
 here:
 http://datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html?scroll=task_ds_hmp_54q_gk__task_ds_hmp_54q_gk_unique_1



 Mark



 On Fri, Jul 4, 2014 at 2:43 PM, Rameez Thonnakkal ssram...@gmail.com
 wrote:

 hello Team,

 I am looking for standard operating procedure to disable vnode in a
 production cluster.
 This is to enable solr which doesn't work with a cassandra cluster having
 vnode enabled.

 Any suggestions/

 Thanks,
 Rameez





Re: Disable vnode

2014-07-04 Thread Rameez Thonnakkal
i did a nodetool rebuild on one of the nodes.

Datacenter: DC1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host
ID   Rack
*UN  10.123.75.51  10.54 GB   256 16.0%
d2f980c1-cf82-4659-95ce-ffa3e50ed7c1  RAC1*
UN  10.123.75.53  5.18 GB256 16.5%
bab7739d-c424-42ef-a8f6-2ba82fcdd0b9  RAC1
UN  10.123.75.52  5.51 GB256 18.3%
70469a76-939b-4b8c-9512-33aedec6fd3e  RAC1
Datacenter: DC2

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host
ID   Rack
UN  10.123.75.51  5.3 GB 256 16.1%
106d5001-2d44-4d81-8af8-5cf841a1575e  RAC1
UN  10.123.75.52  5.34 GB256 16.2%
c4333d90-476a-4b44-bc23-5fca7ba6a2e7  RAC1
UN  10.123.75.53  5.11 GB256 16.8%
8154288e-a0fb-45f8-b3fb-6c3d645ba8f3  RAC1

In that node the Load has increased from around 5GB to 10GB.
But the tokens remains same (256). My expectation was that it would come
down to 1.

i will continue to rebuild the remaining nodes.  But not sure whether this
is helping.



On Fri, Jul 4, 2014 at 7:28 PM, Rameez Thonnakkal ssram...@gmail.com
wrote:

 Thanks Mark.
 the procedure you shared is useful. I think I have missed the nodetool
 rebuild command.
 I am trying it out in a non-prod environment.

 The num_tokens is set to 1 and initial_token is set to different values
 (mine is a 6 node cluster with 3 in each datacenter).
 Tried a rolling restart of the cluster. That didn't help.
 Tried a cold restart of the cluster. That also didn't work.

 I will try the nodetool rebuild and see whether any change.

 Thanks,
 rameez



 On Fri, Jul 4, 2014 at 7:19 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Rameez,

 I have never done a migration from vnodes to non-vnodes however I would
 imagine that the procedure would be the same as its counterpart. As always
 testing in dev should be done first.

 To move from vnodes to non-vodes I would add a new datacenter to the
 cluster with vnodes disabled and rebuild from your vnode cluster.

 You can find some more details about adding a data center to your cluster
 here:
 http://datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html?scroll=task_ds_hmp_54q_gk__task_ds_hmp_54q_gk_unique_1



 Mark



 On Fri, Jul 4, 2014 at 2:43 PM, Rameez Thonnakkal ssram...@gmail.com
 wrote:

 hello Team,

 I am looking for standard operating procedure to disable vnode in a
 production cluster.
 This is to enable solr which doesn't work with a cassandra cluster
 having vnode enabled.

 Any suggestions/

 Thanks,
 Rameez






Re: Multi-DC Repairs and Token Questions

2014-05-28 Thread Rameez Thonnakkal
as Chovatia mentioned, the keyspaces seems to be different.
try Describe keyspace SN_KEYSPACE and describe keyspace MY_KEYSPACE
from CQL.
This will give you an idea about how many replicas are there for these
keyspaces.



On Wed, May 28, 2014 at 11:49 AM, chovatia jaydeep 
chovatia_jayd...@yahoo.co.in wrote:

 What is your partition type? Is
 it org.apache.cassandra.dht.Murmur3Partitioner?
 In your repair command i do see there are two different KeySpaces 
 MY_KEYSPACE
 and SN_KEYSPACE, are these two separate key spaces or typo?

 -jaydeep


   On Tuesday, 27 May 2014 10:26 PM, Matthew Allen 
 matthew.j.al...@gmail.com wrote:


 Hi,

 Am a bit confused regarding data ownership in a multi-dc environment.

 I have the following setup in a test cluster with a keyspace with
 (placement_strategy = 'NetworkTopologyStrategy' and strategy_options =
 {'DC_NSW':2,'DC_VIC':2};)

 Datacenter: DC_NSW
 ==
 Replicas: 2
 Address RackStatus State   Load
 OwnsToken

 0
 nsw1  rack1   Up Normal  1007.43 MB  100.00%
 -9223372036854775808
 nsw2  rack1   Up Normal  1008.08 MB  100.00% 0


 Datacenter: DC_VIC
 ==
 Replicas: 2
 Address RackStatus State   Load
 OwnsToken

 100
 vic1   rack1   Up Normal  1015.1 MB   100.00%
 -9223372036854775708
 vic2   rack1   Up Normal  1015.13 MB  100.00% 100

 My understanding is that both Datacenters have a complete copy of the
 data, but when I run a repair -pr on each of the nodes, the vic hosts only
 take a couple of seconds, while the nsw nodes take about 5 minutes each.

 Does this mean that nsw nodes own the majority of the data given their
 key ranges and that repairs will need to cross datacenters ?

 Thanks

 Matt

 commandnodetool -h vic1 repair -pr   (takes seconds)
 Starting NodeTool
 [2014-05-28 15:11:02,783] Starting repair command #1, repairing 1 ranges
 for keyspace MY_KEYSPACE
 [2014-05-28 15:11:03,110] Repair session
 76d170f0-e626-11e3-af4e-218541ad23a1 for range
 (-9223372036854775808,-9223372036854775708] finished
 [2014-05-28 15:11:03,110] Repair command #1 finished
 [2014-05-28 15:11:03,126] Nothing to repair for keyspace 'system'
 [2014-05-28 15:11:03,126] Nothing to repair for keyspace 'system_traces'

 commandnodetool -h vic2 repair -pr (takes seconds)
 Starting NodeTool
 [2014-05-28 15:11:28,746] Starting repair command #1, repairing 1 ranges
 for keyspace MY_KEYSPACE
 [2014-05-28 15:11:28,840] Repair session
 864b14a0-e626-11e3-9612-07b0c029e3c7 for range (0,100] finished
 [2014-05-28 15:11:28,840] Repair command #1 finished
 [2014-05-28 15:11:28,866] Nothing to repair for keyspace 'system'
 [2014-05-28 15:11:28,866] Nothing to repair for keyspace 'system_traces'

 commandnodetool -h nsw1 repair -pr (takes minutes)
 Starting NodeTool
 [2014-05-28 15:11:32,579] Starting repair command #1, repairing 1 ranges
 for keyspace SN_KEYSPACE
 [2014-05-28 15:14:07,187] Repair session
 88966430-e626-11e3-81eb-c991646ac2bf for range (100,-9223372036854775808]
 finished
 [2014-05-28 15:14:07,187] Repair command #1 finished
 [2014-05-28 15:14:11,393] Nothing to repair for keyspace 'system'
 [2014-05-28 15:14:11,440] Nothing to repair for keyspace 'system_traces'

 commandnodetool -h nsw2 repair -pr (takes minutes)
 Starting NodeTool
 [2014-05-28 15:14:18,670] Starting repair command #1, repairing 1 ranges
 for keyspace SN_KEYSPACE
 [2014-05-28 15:17:27,300] Repair session
 eb936ce0-e626-11e3-81e2-8790242f886e for range (-9223372036854775708,0]
 finished
 [2014-05-28 15:17:27,300] Repair command #1 finished
 [2014-05-28 15:17:32,017] Nothing to repair for keyspace 'system'
 [2014-05-28 15:17:32,064] Nothing to repair for keyspace 'system_traces'





Re: ownership not equally distributed

2014-05-21 Thread Rameez Thonnakkal
This issue is resolved.
Don't know the exact root cause though.
Did a re-image of the server which was taking less token ownership and done
the configuration through chef.

Thanks,
Rameez



On Sat, May 17, 2014 at 1:06 AM, Rameez Thonnakkal ssram...@gmail.comwrote:

 Hello

 I am having a 4 node cluster where 2 nodes are in one data center and
 another 2 in a different one.

 But in the first data center the token ownership is not equally
 distributed. I am using vnode feature.

 num_tokens is set to 256 in all nodes.
 initial_number is left blank.

 Datacenter: DC1
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host
 ID   Rack
 UN  10.145.84.167  84.58 MB   256* 0.4% *
 ce5ddceb-b1d4-47ac-8d85-249aa7c5e971  RAC1
 UN  10.145.84.166  692.69 MB  255 44.2%
 e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6  RAC1
 Datacenter: DC2
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host
 ID   Rack
 UN  10.168.67.43   476 MB 256 27.8%
 05dc7ea6-0328-43b8-8b70-bcea856ba41e  RAC1
 UN  10.168.67.42   413.15 MB  256 27.7%
 677025f0-780c-45dc-bb3b-17ad260fba7d  RAC1


 done nodetool repair couple of times, but it didn't help.

 In the node where less ownership there, I have seen a frequent full GC
 occurring couple of times and had to restart cassandra.


 Any suggestions on how to resolve this is highly appreciated.

 Regards,
 Rameez




ownership not equally distributed

2014-05-16 Thread Rameez Thonnakkal
Hello

I am having a 4 node cluster where 2 nodes are in one data center and
another 2 in a different one.

But in the first data center the token ownership is not equally
distributed. I am using vnode feature.

num_tokens is set to 256 in all nodes.
initial_number is left blank.

Datacenter: DC1

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host
ID   Rack
UN  10.145.84.167  84.58 MB   256* 0.4% *
ce5ddceb-b1d4-47ac-8d85-249aa7c5e971  RAC1
UN  10.145.84.166  692.69 MB  255 44.2%
e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6  RAC1
Datacenter: DC2

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns   Host
ID   Rack
UN  10.168.67.43   476 MB 256 27.8%
05dc7ea6-0328-43b8-8b70-bcea856ba41e  RAC1
UN  10.168.67.42   413.15 MB  256 27.7%
677025f0-780c-45dc-bb3b-17ad260fba7d  RAC1


done nodetool repair couple of times, but it didn't help.

In the node where less ownership there, I have seen a frequent full GC
occurring couple of times and had to restart cassandra.


Any suggestions on how to resolve this is highly appreciated.

Regards,
Rameez