Re: Academic paper about Cassandra database compaction

2018-05-15 Thread Jeff Jirsa
On Mon, May 14, 2018 at 11:04 AM, Lucas Benevides < lu...@maurobenevides.com.br> wrote: > Thank you Jeff Jirsa by your comments, > > How can we do this: "fix this by not scheduling the major compaction > until we know all of the sstables in the window are available to be > compacted"? > > Would r

Suggestions for migrating data from cassandra

2018-05-15 Thread Jing Meng
Hi guys, for some historical reason, our cassandra cluster is currently overloaded and operating on that somehow becomes a nightmare. Anyway, (sadly) we're planning to migrate cassandra data back to mysql... So we're not quite clear how to migrating the historical data from cassandra. While as I

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Michael Dykman
I don't know that there are any projects out there addressing this but I advise you to study LOAD ... INFILE in the MySQL manual specific to your target version. It basically describes a CSV format, where a given file represents a subset of data for a specific table. It is far and away the fastest

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread kurt greaves
COPY might work but over hundreds of gigabytes you'll probably run into issues if you're overloaded. If you've got access to Spark that would be an efficient way to pull down an entire table and dump it out using the spark-cassandra-connector. On 15 May 2018 at 10:59, Jing Meng wrote: > Hi guys,

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Arbab Khalil
Both C* and mysql support is available in Spark. For C*, datastax:spark-cassandra-connector is needed. It is very simple to read and write data in Spark. To read C* table use: df = spark.read.format("org.apache.spark.sql.cassandra")\ .options(keyspace = 'test', table = 'test_table').load() a

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Joseph Arriola
Hi Jing. How much information do you need to migrate? in volume and number of tables? With Spark could you do the follow: - Read the data and export directly to MySQL. - Read the data and export to csv files and after load to MySQL. Could you use other paths such as: - StreamSets