Ideally the 2 messages read from kafka must differ on some parameter
atleast, or else they are logically same
As a solution to your problem, if the message content is same, u cud create
a new field UUID, which might play the role of partition key while
inserting the 2 messages in Cassandra
Msg1 -
(removing dev from the to: as not relevant)
it would be good to see some sample data and the cassandra schema to have a
more concrete idea of the problem space.
Some thoughts: reduceByKey could still be used to 'pick' one element.
example of arbitrarily choosing the first one: reduceByKey{case (e
Hi,
Just my two cents. I understand your problem is that your problem is that
you have messages with the same key in two different dstreams. What I would
do would be making a union of all the dstreams with StreamingContext.union
or several calls to DStream.union, and then I would create a pair dst
Hi All,
Can someone throw insights on this ?
On Wed, Jul 29, 2015 at 8:29 AM, Priya Ch
wrote:
>
>
> Hi TD,
>
> Thanks for the info. I have the scenario like this.
>
> I am reading the data from kafka topic. Let's say kafka has 3 partitions
> for the topic. In my streaming application, I woul
You have to partition that data on the Spark Streaming by the primary key,
and then make sure insert data into Cassandra atomically per key, or per
set of keys in the partition. You can use the combination of the (batch
time, and partition Id) of the RDD inside foreachRDD as the unique id for
the d