Re: Write to SSTables to do really fast initial load of database (e.g. for migration)

2020-04-22 Thread Tobias Eriksson
Thanx all for the good tips -Tobias From: Eric Evans Reply to: "user@cassandra.apache.org" Date: Tuesday, 21 April 2020 at 16:02 To: "user@cassandra.apache.org" Subject: Re: Write to SSTables to do really fast initial load of database (e.g. for migration) On Tue, Apr 21, 2020 at 4:16 AM

Re: Issues, understanding how CQL works

2020-04-22 Thread Alex Ott
not directly related, but you can try to use zstd as compression - in my tests it performed faster offload, with slightly worse compression ratio Marc Richter at "Wed, 22 Apr 2020 17:57:44 +0200" wrote: MR> Seems as if sstable2json is deprecated; see [1] and [2]. MR> So, dsbulk [3] it is, I

Re: Issues, understanding how CQL works

2020-04-22 Thread Marc Richter
Seems as if sstable2json is deprecated; see [1] and [2]. So, dsbulk [3] it is, I guess. I downloaded it and crafted the following commandline from the docs [4] for my use case: $ ../dsbulk-1.5.0/bin/dsbulk unload -h '["MY_CASSANDRA_IP"]' \ --driver.advanced.auth-provider.class

Re: Impact of setting low value for flag -XX:MaxDirectMemorySize

2020-04-22 Thread Reid Pinchback
If the memory wasn’t being used, and it got pushed to swap, then the right thing happened. It’s a common misconception that swap is bad. The use of swap isn’t bad. What is bad is if you find data churning in and out of swap space a lot so that your latency increases either due to the page

Re: Issues, understanding how CQL works

2020-04-22 Thread Aakash Pandhi
Marc, In DSE CQL offers option called CAPTURE, which can save output of query to a directed file. May be you can use that option to save all values you need in that file to see all signalids or whichever columns you need. File may grow big based on your dataset, so I am not sure what limit it

Re: Issues, understanding how CQL works

2020-04-22 Thread Marc Richter
This sounds like a promising way; thank you for bringing this up! I will see if I can manage it with this approach. Best regards, Marc Richter On 22.04.20 15:38, Durity, Sean R wrote: I thought this might be a single-time use case request. I think my first approach would be to use

Re: Issues, understanding how CQL works

2020-04-22 Thread Alex Ott
DSBulk also works with JSON... if transformations of data are complex, I would go with Spark running in local mode, and process data... On Wed, Apr 22, 2020 at 3:38 PM Durity, Sean R wrote: > I thought this might be a single-time use case request. I think my first > approach would be to use

RE: Issues, understanding how CQL works

2020-04-22 Thread Durity, Sean R
I thought this might be a single-time use case request. I think my first approach would be to use something like dsbulk to unload the data and then reload it into a table designed for the query you want to do (as long as you have adequate disk space). I think like a DBA/admin first. Dsbulk

Re: Impact of setting low value for flag -XX:MaxDirectMemorySize

2020-04-22 Thread manish khandelwal
I am running spark (max heap 4G) and a java application (4G) with my Cassandra server (8G). After heavy loading, if I run a spark process some main memory is pushed into swap. But if a restart Cassandra and execute the spark process memory is not pushed into the swap. Idea behind asking the

Re: Issues, understanding how CQL works

2020-04-22 Thread Marc Richter
Hi Jeff, thank you for your exhaustive and verbose answer! Also, a very big "Thank you!" to all the other replyers; I hope you understand that I summarize all your feedback in this single answer. From what I understand from your answers, Cassandra seems to be optimized to store (and read)

Re: Issues, understanding how CQL works

2020-04-22 Thread Pekka Enberg
Hi Marc, On Tue, Apr 21, 2020 at 4:20 PM Marc Richter wrote: > The database is already of round about 260 GB in size. > I now need to know what is the most recent entry in it; the correct > column to learn this would be "insertdate". > > In SQL I would do something like this: > > SELECT