Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-04-12 Thread James Berragan
Hi Stefan, CDC is something we are also thinking about, and worthy of a separate discussion. We have tested Spark Streaming for CDC and I hope we can bolt on in the future, but streaming technologies also come with more caveats and nuances (we have found it beneficial with CDC to store a small

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
that will accompany this CEP to >> help readers understand it better. >> >> As a reminder, please keep the discussion here on the dev list vs. in the >> wiki, as we’ve found it easier to manage via email. >> >> Sincerely, >> >> Doug Rohrer & James Berragan >

Re: [DISCUSS] CEP-28: Reading and Writing Cassandra Data with Spark Bulk Analytics

2023-03-27 Thread James Berragan
On the Sidecar discussion, while Sidecar is the preferred mechanism for the reasons described, the API is sufficiently generic enough to plugin a user implementations (essentially provide a list of sstables for a token range, and a mechanism to open an InputStream on any SSTable file

Spark-Cassandra Bulk Reader: CASSANDRA-16222

2020-10-23 Thread James Berragan
Hi everyone, I want to highlight to the dev community CASSANDRA-16222 , a Spark library we have been working on that can compact and read raw Cassandra SSTables into SparkSQL. By reading the sstables directly from a snapshot directory we are