Re: Multinode Cassandra and sstableloader

2015-04-02 Thread Serega Sheypak
So, sstableloader streams a portion of data stored in
/var/lib/cassandra/data/keyspace/table catalog
If we have 3 nodes and RF=3, then only 1/3 of data would be streamed to
other cluster.
Problem is solved.


2015-04-01 12:05 GMT+02:00 Alain RODRIGUEZ :

> From Michael Laing - posted on the wrong thread :
>
> "We use Alain's solution as well to make major operational revisions.
>
> We have a "red team" and a "blue team in each AWS region, so we just add
> and drop datacenters to get where we want to be.
>
> Pretty simple."
>
> 2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ :
>
>> IMHO, the most straight forward solution is to add cluster2 as a new DC
>> for mykeyspace and then drop the old DC.
>>
>> That's how we migrated to VPC (AWS) and we love this approach since you
>> don't have to mess with your existing cluster, plus sync is made
>> automatically and you can then drop your old DC safely, when you are sure.
>>
>> I put steps on this ML long time ago:
>> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
>> Also Datastax docs:
>> https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>>
>> "get data from cluster1,
>> put it to cluster2
>> wipe cluster1"
>>
>> I would definitely use this method to do this (I actually did already,
>> multiple times).
>>
>> Up to you, I heard once that there is almost as much way of doing
>> operational on Cassandra as the number of operators :). You should go with
>> method you can be confident with. I can assure the one I propose is quite
>> secure.
>>
>> C*heers,
>>
>> Alain
>>
>> 2015-03-31 15:32 GMT+02:00 Serega Sheypak :
>>
>>> >I have to ask you if you considered doing an Alter keyspace, change RF
>>> The idea is dead simple:
>>> get data from cluster1,
>>> put it to cluster2
>>> vipe cluster1
>>>
>>> I understand drawbacks of streaming sstableloader approach, I need right
>>> now something easy. Later we consider switch to Priam since it does
>>> backup/restore in a right way.
>>>
>>> 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ :
>>>
 Hi,

 Despite of "I understand that it's not the best solution, I need it
 for testing purposes", I have to ask you if you considered doing an Alter
 keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
 to add a new DC (your cluster2) ?

 In the case you go your way (sstableloader) also advice you to make a
 snapshot (instead of just flushing) to avoid fails due to compactions on
 your active cluster1.

 To answer your question, sstableloader is supposed to distribute
 correctly data on the new cluster depending on your RF and topology.
 Basically if you run sstable loader just on sstable c1.node1 my guess
 is that you will have all the data present on c1.node1 stored on the new c2
 (each data to corresponding node). So if you have an RF=3 on c1, you should
 have all the data on c2 just by running sstableloader from c1.node1, if you
 are using RF=1 on c1, then you need to load data from c1.each_node. I
 suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

 I never used the tool, but that's what would be "logical" imho. Wait
 for a confirmation as I wouldn't to lead you to a failure of any kind.
 Also, I don't know if data is also replicated directly with sstableloader
 or if you need to repair c2 after loading data.

 C*heers,

 Alain

 2015-03-31 13:21 GMT+02:00 Serega Sheypak :

>  Hi, I have a simple question and can't find related info in docs.
>
> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
> transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
> using sstableloader. I understand that it's not the best solution, I need
> it for testing purposes.
>
> What I'm going to do:
>
>1. Recreate keyspace schema on cluster2 using schema from cluster1
>2. nodetool flush for mykeyspace.source_table being exported from
>cluster1 to cluster2
>3.
>
>Run sstableloader for each table on cluster1.node01
>
>sstableloader -d cluster2.nodeXXX.com
>
> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>
> What should I get as a result on cluster2?
>
> *ALL* data from source_table?
>
> or
>
> Just data stored in *partition of source_table*
>
> I'm confused. Doc says I just run this command to export table from
> cluster1 to cluster2, but I specify path to a part of source_table data,
> since other parts of table should be on other nodes.
>


>>>
>>
>


Re: Multinode Cassandra and sstableloader

2015-04-01 Thread Alain RODRIGUEZ
>From Michael Laing - posted on the wrong thread :

"We use Alain's solution as well to make major operational revisions.

We have a "red team" and a "blue team in each AWS region, so we just add
and drop datacenters to get where we want to be.

Pretty simple."

2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ :

> IMHO, the most straight forward solution is to add cluster2 as a new DC
> for mykeyspace and then drop the old DC.
>
> That's how we migrated to VPC (AWS) and we love this approach since you
> don't have to mess with your existing cluster, plus sync is made
> automatically and you can then drop your old DC safely, when you are sure.
>
> I put steps on this ML long time ago:
> https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
> Also Datastax docs:
> https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> "get data from cluster1,
> put it to cluster2
> wipe cluster1"
>
> I would definitely use this method to do this (I actually did already,
> multiple times).
>
> Up to you, I heard once that there is almost as much way of doing
> operational on Cassandra as the number of operators :). You should go with
> method you can be confident with. I can assure the one I propose is quite
> secure.
>
> C*heers,
>
> Alain
>
> 2015-03-31 15:32 GMT+02:00 Serega Sheypak :
>
>> >I have to ask you if you considered doing an Alter keyspace, change RF
>> The idea is dead simple:
>> get data from cluster1,
>> put it to cluster2
>> vipe cluster1
>>
>> I understand drawbacks of streaming sstableloader approach, I need right
>> now something easy. Later we consider switch to Priam since it does
>> backup/restore in a right way.
>>
>> 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ :
>>
>>> Hi,
>>>
>>> Despite of "I understand that it's not the best solution, I need it for
>>> testing purposes", I have to ask you if you considered doing an Alter
>>> keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
>>> to add a new DC (your cluster2) ?
>>>
>>> In the case you go your way (sstableloader) also advice you to make a
>>> snapshot (instead of just flushing) to avoid fails due to compactions on
>>> your active cluster1.
>>>
>>> To answer your question, sstableloader is supposed to distribute
>>> correctly data on the new cluster depending on your RF and topology.
>>> Basically if you run sstable loader just on sstable c1.node1 my guess is
>>> that you will have all the data present on c1.node1 stored on the new c2
>>> (each data to corresponding node). So if you have an RF=3 on c1, you should
>>> have all the data on c2 just by running sstableloader from c1.node1, if you
>>> are using RF=1 on c1, then you need to load data from c1.each_node. I
>>> suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.
>>>
>>> I never used the tool, but that's what would be "logical" imho. Wait for
>>> a confirmation as I wouldn't to lead you to a failure of any kind. Also, I
>>> don't know if data is also replicated directly with sstableloader or if you
>>> need to repair c2 after loading data.
>>>
>>> C*heers,
>>>
>>> Alain
>>>
>>> 2015-03-31 13:21 GMT+02:00 Serega Sheypak :
>>>
  Hi, I have a simple question and can't find related info in docs.

 I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
 transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
 using sstableloader. I understand that it's not the best solution, I need
 it for testing purposes.

 What I'm going to do:

1. Recreate keyspace schema on cluster2 using schema from cluster1
2. nodetool flush for mykeyspace.source_table being exported from
cluster1 to cluster2
3.

Run sstableloader for each table on cluster1.node01

sstableloader -d cluster2.nodeXXX.com

 /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

 What should I get as a result on cluster2?

 *ALL* data from source_table?

 or

 Just data stored in *partition of source_table*

 I'm confused. Doc says I just run this command to export table from
 cluster1 to cluster2, but I specify path to a part of source_table data,
 since other parts of table should be on other nodes.

>>>
>>>
>>
>


Re: Multinode Cassandra and sstableloader

2015-03-31 Thread Alain RODRIGUEZ
IMHO, the most straight forward solution is to add cluster2 as a new DC for
mykeyspace and then drop the old DC.

That's how we migrated to VPC (AWS) and we love this approach since you
don't have to mess with your existing cluster, plus sync is made
automatically and you can then drop your old DC safely, when you are sure.

I put steps on this ML long time ago:
https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
Also Datastax docs:
https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

"get data from cluster1,
put it to cluster2
wipe cluster1"

I would definitely use this method to do this (I actually did already,
multiple times).

Up to you, I heard once that there is almost as much way of doing
operational on Cassandra as the number of operators :). You should go with
method you can be confident with. I can assure the one I propose is quite
secure.

C*heers,

Alain

2015-03-31 15:32 GMT+02:00 Serega Sheypak :

> >I have to ask you if you considered doing an Alter keyspace, change RF
> The idea is dead simple:
> get data from cluster1,
> put it to cluster2
> vipe cluster1
>
> I understand drawbacks of streaming sstableloader approach, I need right
> now something easy. Later we consider switch to Priam since it does
> backup/restore in a right way.
>
> 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ :
>
>> Hi,
>>
>> Despite of "I understand that it's not the best solution, I need it for
>> testing purposes", I have to ask you if you considered doing an Alter
>> keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
>> to add a new DC (your cluster2) ?
>>
>> In the case you go your way (sstableloader) also advice you to make a
>> snapshot (instead of just flushing) to avoid fails due to compactions on
>> your active cluster1.
>>
>> To answer your question, sstableloader is supposed to distribute
>> correctly data on the new cluster depending on your RF and topology.
>> Basically if you run sstable loader just on sstable c1.node1 my guess is
>> that you will have all the data present on c1.node1 stored on the new c2
>> (each data to corresponding node). So if you have an RF=3 on c1, you should
>> have all the data on c2 just by running sstableloader from c1.node1, if you
>> are using RF=1 on c1, then you need to load data from c1.each_node. I
>> suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.
>>
>> I never used the tool, but that's what would be "logical" imho. Wait for
>> a confirmation as I wouldn't to lead you to a failure of any kind. Also, I
>> don't know if data is also replicated directly with sstableloader or if you
>> need to repair c2 after loading data.
>>
>> C*heers,
>>
>> Alain
>>
>> 2015-03-31 13:21 GMT+02:00 Serega Sheypak :
>>
>>>  Hi, I have a simple question and can't find related info in docs.
>>>
>>> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
>>> transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
>>> using sstableloader. I understand that it's not the best solution, I need
>>> it for testing purposes.
>>>
>>> What I'm going to do:
>>>
>>>1. Recreate keyspace schema on cluster2 using schema from cluster1
>>>2. nodetool flush for mykeyspace.source_table being exported from
>>>cluster1 to cluster2
>>>3.
>>>
>>>Run sstableloader for each table on cluster1.node01
>>>
>>>sstableloader -d cluster2.nodeXXX.com
>>>
>>> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>>>
>>> What should I get as a result on cluster2?
>>>
>>> *ALL* data from source_table?
>>>
>>> or
>>>
>>> Just data stored in *partition of source_table*
>>>
>>> I'm confused. Doc says I just run this command to export table from
>>> cluster1 to cluster2, but I specify path to a part of source_table data,
>>> since other parts of table should be on other nodes.
>>>
>>
>>
>


Re: Multinode Cassandra and sstableloader

2015-03-31 Thread Serega Sheypak
>I have to ask you if you considered doing an Alter keyspace, change RF
The idea is dead simple:
get data from cluster1,
put it to cluster2
vipe cluster1

I understand drawbacks of streaming sstableloader approach, I need right
now something easy. Later we consider switch to Priam since it does
backup/restore in a right way.

2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ :

> Hi,
>
> Despite of "I understand that it's not the best solution, I need it for
> testing purposes", I have to ask you if you considered doing an Alter
> keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
> to add a new DC (your cluster2) ?
>
> In the case you go your way (sstableloader) also advice you to make a
> snapshot (instead of just flushing) to avoid fails due to compactions on
> your active cluster1.
>
> To answer your question, sstableloader is supposed to distribute correctly
> data on the new cluster depending on your RF and topology.
> Basically if you run sstable loader just on sstable c1.node1 my guess is
> that you will have all the data present on c1.node1 stored on the new c2
> (each data to corresponding node). So if you have an RF=3 on c1, you should
> have all the data on c2 just by running sstableloader from c1.node1, if you
> are using RF=1 on c1, then you need to load data from c1.each_node. I
> suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.
>
> I never used the tool, but that's what would be "logical" imho. Wait for a
> confirmation as I wouldn't to lead you to a failure of any kind. Also, I
> don't know if data is also replicated directly with sstableloader or if you
> need to repair c2 after loading data.
>
> C*heers,
>
> Alain
>
> 2015-03-31 13:21 GMT+02:00 Serega Sheypak :
>
>>  Hi, I have a simple question and can't find related info in docs.
>>
>> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
>> transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
>> using sstableloader. I understand that it's not the best solution, I need
>> it for testing purposes.
>>
>> What I'm going to do:
>>
>>1. Recreate keyspace schema on cluster2 using schema from cluster1
>>2. nodetool flush for mykeyspace.source_table being exported from
>>cluster1 to cluster2
>>3.
>>
>>Run sstableloader for each table on cluster1.node01
>>
>>sstableloader -d cluster2.nodeXXX.com
>>
>> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>>
>> What should I get as a result on cluster2?
>>
>> *ALL* data from source_table?
>>
>> or
>>
>> Just data stored in *partition of source_table*
>>
>> I'm confused. Doc says I just run this command to export table from
>> cluster1 to cluster2, but I specify path to a part of source_table data,
>> since other parts of table should be on other nodes.
>>
>
>


Re: Multinode Cassandra and sstableloader

2015-03-31 Thread Alain RODRIGUEZ
Hi,

Despite of "I understand that it's not the best solution, I need it for
testing purposes", I have to ask you if you considered doing an Alter
keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a
snapshot (instead of just flushing) to avoid fails due to compactions on
your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly
data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is
that you will have all the data present on c1.node1 stored on the new c2
(each data to corresponding node). So if you have an RF=3 on c1, you should
have all the data on c2 just by running sstableloader from c1.node1, if you
are using RF=1 on c1, then you need to load data from c1.each_node. I
suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a
confirmation as I wouldn't to lead you to a failure of any kind. Also, I
don't know if data is also replicated directly with sstableloader or if you
need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak :

>  Hi, I have a simple question and can't find related info in docs.
>
> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer
> whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using
> sstableloader. I understand that it's not the best solution, I need it for
> testing purposes.
>
> What I'm going to do:
>
>1. Recreate keyspace schema on cluster2 using schema from cluster1
>2. nodetool flush for mykeyspace.source_table being exported from
>cluster1 to cluster2
>3.
>
>Run sstableloader for each table on cluster1.node01
>
>sstableloader -d cluster2.nodeXXX.com
>
> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>
> What should I get as a result on cluster2?
>
> *ALL* data from source_table?
>
> or
>
> Just data stored in *partition of source_table*
>
> I'm confused. Doc says I just run this command to export table from
> cluster1 to cluster2, but I specify path to a part of source_table data,
> since other parts of table should be on other nodes.
>


Multinode Cassandra and sstableloader

2015-03-31 Thread Serega Sheypak
 Hi, I have a simple question and can't find related info in docs.

I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer
whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using
sstableloader. I understand that it's not the best solution, I need it for
testing purposes.

What I'm going to do:

   1. Recreate keyspace schema on cluster2 using schema from cluster1
   2. nodetool flush for mykeyspace.source_table being exported from
   cluster1 to cluster2
   3.

   Run sstableloader for each table on cluster1.node01

   sstableloader -d cluster2.nodeXXX.com
   
/var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

What should I get as a result on cluster2?

*ALL* data from source_table?

or

Just data stored in *partition of source_table*

I'm confused. Doc says I just run this command to export table from
cluster1 to cluster2, but I specify path to a part of source_table data,
since other parts of table should be on other nodes.