You can (and we did), just note that HBase will add at least 5ms latency per event.
On Fri, Apr 17, 2015 at 5:39 PM, Buntu Dev <[email protected]> wrote: > Thanks Hari. One can't use some sort of lookup (maybe HBase) using the > interceptors to see if certain combination of query params (user+page+action > key) exist already that was seen in the past 5mins to skip the current > event? > > > > On Fri, Apr 17, 2015 at 1:56 PM, Hari Shreedharan > <[email protected]> wrote: >> >> That would have to be done outside Flume, perhaps using something like >> Spark Streaming, or Storm. >> >> Thanks, >> Hari >> >> >> On Fri, Apr 17, 2015 at 12:15 AM, Buntu Dev <[email protected]> wrote: >>> >>> Are there any known strategies to handle duplicate events during >>> ingestion? I use Flume to ingest apache logs to parse the request using >>> Morphlines and there are some duplicate requests with certain query params >>> differing. I would like to handle these once I parse and split the query >>> params into tokens in Morphlines. How does one lookup previous events in the >>> stream (say in the 5min window) and de-dupe before writing to the sink? >>> >>> Thanks! >> >> >
