Re: Inconsistent data in Cassandra

2015-12-13 Thread Gerard Maas
Hi Padma,

Have you considered reducing the dataset before writing it to Cassandra? Looks 
like this consistency problem could be avoided by cleaning the dataset of 
unnecessary records before persisting it:

val onlyMax = rddByPrimaryKey.reduceByKey{case (x,y) => Max(x,y)} // your max 
function here will need to pick the right max value from the records attached 
to the same primary key

-kr, Gerard.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Inconsistent data in Cassandra

2015-12-06 Thread Priya Ch
Hi All,

  I have the following scenario in writing rows to Cassandra from Spark
Streaming -

in a 1 sec batch, I have 3 tickets with same ticket number (primary key)
but with different envelope numbers (i.e envelope 1, envelope 2, envelope
3.) I am writing these messages to Cassandra using saveTocassandra. Now if
I verify the C* DB, I see that some rows are updated by envelope 1 and
other rows by envelope 3 which is nothing but inconsistent rows. Ideally
all the rows must contain data of envelope 3.
I have not set any parameters such as-
spark.cassandra.output.batch.size.rows
spark.cassandra.output.batch.buffer.size
spark.cassandra.output.consurrent.writes

What would be the default values for these ?

Can someone throw light on the issue  ?

Regards,
Padma Ch