Are there any known strategies to handle duplicate events during ingestion? I use Flume to ingest apache logs to parse the request using Morphlines and there are some duplicate requests with certain query params differing. I would like to handle these once I parse and split the query params into tokens in Morphlines. How does one lookup previous events in the stream (say in the 5min window) and de-dupe before writing to the sink?
Thanks!
