Why aren’t you using saveToCassandra 
(https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md
 
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md>)?
 They have a number of locality aware optimizations that will probably exceed 
your by hand bulk loading (especially if you’re not doing it inside something 
like foreach partition).

Also you can easily tune up and down the size of those tasks and therefore 
batches to minimize harm on the prod system.

> On Sep 24, 2015, at 5:37 PM, Benyi Wang <bewang.t...@gmail.com> wrote:
> 
> I use Spark and spark-cassandra-connector with a customized Cassandra writer 
> (spark-cassandra-connector doesn’t support DELETE). Basically the writer 
> works as follows:
> 
> Bind a row in Spark RDD with either INSERT/Delete PreparedStatement
> Create a BatchStatement for multiple rows
> Write to Cassandra.
> I knew using CQLBulkOutputFormat would be better, but it doesn't supports 
> DELETE. 
> 
> On Thu, Sep 24, 2015 at 1:27 PM, Gerard Maas <gerard.m...@gmail.com 
> <mailto:gerard.m...@gmail.com>> wrote:
> How are you loading the data? I mean, what insert method are you using?
> 
> On Thu, Sep 24, 2015 at 9:58 PM, Benyi Wang <bewang.t...@gmail.com 
> <mailto:bewang.t...@gmail.com>> wrote:
> I have a cassandra cluster provides data to a web service. And there is a 
> daily batch load writing data into the cluster.
> 
> Without the batch loading, the service’s Latency 99thPercentile is 3ms. But 
> during the load, it jumps to 90ms.
> I checked cassandra keyspace’s ReadLatency.99thPercentile, which jumps to 1ms 
> from 600 microsec.
> The service’s cassandra java driver request 99thPercentile was 90ms during 
> the load
> The java driver took the most time. I knew the Cassandra servers are busy in 
> writing, but I want to know what kinds of metrics can identify where is the 
> bottleneck so that I can tune it.
> 
> I’m using Cassandra 2.1.8 and Cassandra Java Driver 2.1.5.
> 
> 
> 

Regards,

Ryan Svihla

Reply via email to