Thanks Hari. One can't use some sort of lookup (maybe HBase) using the interceptors to see if certain combination of query params (user+page+action key) exist already that was seen in the past 5mins to skip the current event?
On Fri, Apr 17, 2015 at 1:56 PM, Hari Shreedharan <[email protected] > wrote: > That would have to be done outside Flume, perhaps using something like > Spark Streaming, or Storm. > > Thanks, > Hari > > > On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <[email protected]> wrote: > >> Are there any known strategies to handle duplicate events during >> ingestion? I use Flume to ingest apache logs to parse the request using >> Morphlines and there are some duplicate requests with certain query params >> differing. I would like to handle these once I parse and split the query >> params into tokens in Morphlines. How does one lookup previous events in >> the stream (say in the 5min window) and de-dupe before writing to the sink? >> >> Thanks! >> > >
