Cassandra CDC updating

2019-01-10 Thread Hao Zhang
Hi All I enabled CDC through yaml. When I insert 100k small rows, I don't see CDC file being created or updated unless I restart cassandra service after each update. However, when I insert rows with columns of 1MB, I started to see CDC files added. I looked at commit log, they are updated in a

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
RF in the Analytics DC can be 2 (or even 1) if storage cost is more important than availability. There is a storage (and CPU and network latency) cost for a separate Spark cluster. So, the variables of your specific use case may swing the decision in different directions. Sean Durity From:

RE: [EXTERNAL] Re: Good way of configuring Apache spark with Apache Cassandra

2019-01-10 Thread Durity, Sean R
At this point, I would be talking to DataStax. They already have Spark and SOLR/search fully embedded in their product. You can look at their docs for some idea of the RAM and CPU required for combined Search/Analytics use cases. I would expect this to be a much faster route to production.

nodetool compactionstats

2019-01-10 Thread vale...@tortugatech.com
I see that nodetool compactionstats reports uncompressed byte size, but does anyone know why? It seems that for all use cases, the true (compressed) size would be most useful. Thanks, Valerie - To unsubscribe, e-mail:

Re: Cassandra and Apache Arrow

2019-01-10 Thread Uwe L. Korn
Hello, Seeing this mail thread pop up in my search filter I want to give some insights as one of the Arrow PMCs. I have not yet heard of anyone currently working on Cassandra + Arrow. This would definitely be a great combination to better support performant clients that send/receive larger