OK Marcelo, I'll work on it today. -ml
On Tue, Jun 3, 2014 at 8:24 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br> wrote: > Hi Michael, > > For sure I would be interested in this program! > > I am new both to python and for cql. I started creating this copier, but > was having problems with timeouts. Alex solved my problem here on the list, > but I think I will still have a lot of trouble making the copy to work fine. > > I open sourced my version here: > https://github.com/s1mbi0se/cql_record_processor > > Just in case it's useful for anything. > > However, I saw CQL has support for concurrency itself and having something > made by someone who knows Python CQL Driver better would be very helpful. > > My two servers today are at OVH (ovh.com), we have servers at AWS but but > several cases we prefer other hosts. Both servers have SDD and 64 Gb RAM, I > could use the script as a benchmark for you if you want. Besides, we have > some bigger clusters, I could run on the just to test the speed if this is > going to help. > > Regards > Marcelo. > > > 2014-06-03 11:40 GMT-03:00 Laing, Michael <michael.la...@nytimes.com>: > > Hi Marcelo, >> >> I could create a fast copy program by repurposing some python apps that I >> am using for benchmarking the python driver - do you still need this? >> >> With high levels of concurrency and multiple subprocess workers, based on >> my current actual benchmarks, I think I can get well over 1,000 rows/second >> on my mac and significantly more in AWS. I'm using variable size rows >> averaging 5kb. >> >> This would be the initial version of a piece of the benchmark suite we >> will release as part of our nyt⨍aбrik project on 21 June for my >> Cassandra Day NYC talk re the python driver. >> >> ml >> >> >> On Mon, Jun 2, 2014 at 2:15 PM, Marcelo Elias Del Valle < >> marc...@s1mbi0se.com.br> wrote: >> >>> Hi Jens, >>> >>> Thanks for trying to help. >>> >>> Indeed, I know I can't do it using just CQL. But what would you use to >>> migrate data manually? I tried to create a python program using auto >>> paging, but I am getting timeouts. I also tried Hive, but no success. >>> I only have two nodes and less than 200Gb in this cluster, any simple >>> way to extract the data quickly would be good enough for me. >>> >>> Best regards, >>> Marcelo. >>> >>> >>> >>> 2014-06-02 15:08 GMT-03:00 Jens Rantil <jens.ran...@tink.se>: >>> >>> Hi Marcelo, >>>> >>>> Looks like you can't do this without migrating your data manually: >>>> https://stackoverflow.com/questions/18421668/alter-cassandra-column-family-primary-key-using-cassandra-cli-or-cql >>>> >>>> Cheers, >>>> Jens >>>> >>>> >>>> On Mon, Jun 2, 2014 at 7:48 PM, Marcelo Elias Del Valle < >>>> marc...@s1mbi0se.com.br> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have some cql CFs in a 2 node Cassandra 2.0.8 cluster. >>>>> >>>>> I realized I created my column family with the wrong partition. >>>>> Instead of: >>>>> >>>>> CREATE TABLE IF NOT EXISTS entity_lookup ( >>>>> name varchar, >>>>> value varchar, >>>>> entity_id uuid, >>>>> PRIMARY KEY ((name, value), entity_id)) >>>>> WITH >>>>> caching=all; >>>>> >>>>> I used: >>>>> >>>>> CREATE TABLE IF NOT EXISTS entitylookup ( >>>>> name varchar, >>>>> value varchar, >>>>> entity_id uuid, >>>>> PRIMARY KEY (name, value, entity_id)) >>>>> WITH >>>>> caching=all; >>>>> >>>>> >>>>> Now I need to migrate the data from the second CF to the first one. >>>>> I am using Data Stax Community Edition. >>>>> >>>>> What would be the best way to convert data from one CF to the other? >>>>> >>>>> Best regards, >>>>> Marcelo. >>>>> >>>> >>>> >>> >> >