That would have to be done outside Flume, perhaps using something like Spark Streaming, or Storm.
Thanks, Hari On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <[email protected]> wrote: > Are there any known strategies to handle duplicate events during ingestion? > I use Flume to ingest apache logs to parse the request using Morphlines and > there are some duplicate requests with certain query params differing. I > would like to handle these once I parse and split the query params into > tokens in Morphlines. How does one lookup previous events in the stream > (say in the 5min window) and de-dupe before writing to the sink? > Thanks!
