Hi,

Despite of "I understand that it's not the best solution, I need it for
testing purposes", I have to ask you if you considered doing an Alter
keyspace, change RF > 1 for mykeyspace on cluster2 and "nodetool rebuild"
to add a new DC (your cluster2) ?

In the case you go your way (sstableloader) also advice you to make a
snapshot (instead of just flushing) to avoid fails due to compactions on
your active cluster1.

To answer your question, sstableloader is supposed to distribute correctly
data on the new cluster depending on your RF and topology.
Basically if you run sstable loader just on sstable c1.node1 my guess is
that you will have all the data present on c1.node1 stored on the new c2
(each data to corresponding node). So if you have an RF=3 on c1, you should
have all the data on c2 just by running sstableloader from c1.node1, if you
are using RF=1 on c1, then you need to load data from c1.each_node. I
suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

I never used the tool, but that's what would be "logical" imho. Wait for a
confirmation as I wouldn't to lead you to a failure of any kind. Also, I
don't know if data is also replicated directly with sstableloader or if you
need to repair c2 after loading data.

C*heers,

Alain

2015-03-31 13:21 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>:

>  Hi, I have a simple question and can't find related info in docs.
>
> I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to transfer
> whole keyspace named 'mykeyspace' data from cluster1 to cluster2 using
> sstableloader. I understand that it's not the best solution, I need it for
> testing purposes.
>
> What I'm going to do:
>
>    1. Recreate keyspace schema on cluster2 using schema from cluster1
>    2. nodetool flush for mykeyspace.source_table being exported from
>    cluster1 to cluster2
>    3.
>
>    Run sstableloader for each table on cluster1.node01
>
>    sstableloader -d cluster2.nodeXXX.com
>    
> /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/
>
> What should I get as a result on cluster2?
>
> *ALL* data from source_table?
>
> or
>
> Just data stored in *partition of source_table*
>
> I'm confused. Doc says I just run this command to export table from
> cluster1 to cluster2, but I specify path to a part of source_table data,
> since other parts of table should be on other nodes.
>

Reply via email to