Hi Andrew, 

Really happy to hear Wikimedia Foundation is considering Flume. I am fairly 
sure that if you find such a source useful, there would definitely be others 
who find it useful too. I'd recommend filing a jira and starting a discussion, 
and then submitting the patch. We would be happy to review and commit it. 


Thanks,
Hari

-- 
Hari Shreedharan


On Monday, January 14, 2013 at 9:29 AM, Andrew Otto wrote:

> Hi all,
> 
> I'm an Systems Engineer at the Wikimedia Foundation, and we're investigating 
> using Flume for our web request log HDFS imports. We've previously been using 
> Kafka, but have had to change short term architecture plans in order to get 
> data into HDFS reliably and regularly soon.
> 
> Our current web request logs are available for consumption over a multicast 
> UDP stream. I could hack something together to try and pipe this into Flume 
> using the existing sources (SyslogUDPSource, or maybe some combination of 
> socat + NetcatSource), but I'd rather reduce the number of moving parts. I'd 
> like to consume directly from the multicast UDP stream as a Flume source.
> 
> I coded up proof of concept based on the SyslogUDPSource, mainly just 
> stripping out the syslog event header extraction, and adding in multicast 
> Datagram connection code. I plan on cleaning this up, and making this a 
> generic raw UDP source, with multicast being a configuration option.
> 
> My question to you guys is, is this something the Flume community would find 
> useful? If so, should I open up a JIRA to track this? I've got a fork of the 
> Flume git repo over on github and will be doing my work there. I'd love to 
> share it upstream if it would be useful.
> 
> Thanks!
> -Andrew Otto
> Systems Engineer
> Wikimedia Foundation
> 
> 


Reply via email to