Re: Inconsistent data in Cassandra
Hi Padma, Have you considered reducing the dataset before writing it to Cassandra? Looks like this consistency problem could be avoided by cleaning the dataset of unnecessary records before persisting it: val onlyMax = rddByPrimaryKey.reduceByKey{case (x,y) => Max(x,y)} // your max function here will need to pick the right max value from the records attached to the same primary key -kr, Gerard. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Inconsistent data in Cassandra
Hi All, I have the following scenario in writing rows to Cassandra from Spark Streaming - in a 1 sec batch, I have 3 tickets with same ticket number (primary key) but with different envelope numbers (i.e envelope 1, envelope 2, envelope 3.) I am writing these messages to Cassandra using saveTocassandra. Now if I verify the C* DB, I see that some rows are updated by envelope 1 and other rows by envelope 3 which is nothing but inconsistent rows. Ideally all the rows must contain data of envelope 3. I have not set any parameters such as- spark.cassandra.output.batch.size.rows spark.cassandra.output.batch.buffer.size spark.cassandra.output.consurrent.writes What would be the default values for these ? Can someone throw light on the issue ? Regards, Padma Ch