I have a a Spark Streaming process that consumes records off a Kafka topic, processes them and sends them to a producer to publish on another topic. I would like to add a sequence number column that can be used to identify records that have the same key and be incremented for each duplicate reoccurence of that key. For example if the output sent to the producer is
Key, col1, col2, seqnum A, 67, dog, 1 B, 56, cat, 1 C, 89, fish, 1 then if A reoccurs within a reasonable time interval Spark would produce the following: A, 67, dog, 2 B, 56, cat, 2 etc. How would I do that ? I suspect that this is a pattern that occurs frequently, but I haven't found any examples. Sent from my iPhone