Thanks Hari. One can't use some sort of lookup (maybe HBase) using the
interceptors to see if certain combination of query params
(user+page+action key) exist already that was seen in the past 5mins to
skip the current event?



On Fri, Apr 17, 2015 at 1:56 PM, Hari Shreedharan <[email protected]
> wrote:

> That would have to be done outside Flume, perhaps using something like
> Spark Streaming, or Storm.
>
> Thanks,
> Hari
>
>
> On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <[email protected]> wrote:
>
>> Are there any known strategies to handle duplicate events during
>> ingestion? I use Flume to ingest apache logs to parse the request using
>> Morphlines and there are some duplicate requests with certain query params
>> differing. I would like to handle these once I parse and split the query
>> params into tokens in Morphlines. How does one lookup previous events in
>> the stream (say in the 5min window) and de-dupe before writing to the sink?
>>
>> Thanks!
>>
>
>

Reply via email to