I have a a Spark Streaming process that consumes records off a Kafka topic, 
processes them and sends them to a producer to publish on another topic. I 
would like to add a sequence number column that can be used to identify records 
that have the same key and be incremented for each duplicate reoccurence of 
that key. For example if the output sent to the producer is

Key, col1, col2, seqnum 
A, 67, dog, 1 
B, 56, cat, 1 
C, 89, fish, 1
then if A reoccurs within a reasonable time interval Spark would produce the 
following:

A, 67, dog, 2 
B, 56, cat, 2
etc. How would I do that ? I suspect that this is a pattern that occurs 
frequently, but I haven't found any examples.



Sent from my iPhone

Reply via email to