Hi Andrew, Really happy to hear Wikimedia Foundation is considering Flume. I am fairly sure that if you find such a source useful, there would definitely be others who find it useful too. I'd recommend filing a jira and starting a discussion, and then submitting the patch. We would be happy to review and commit it.
Thanks, Hari -- Hari Shreedharan On Monday, January 14, 2013 at 9:29 AM, Andrew Otto wrote: > Hi all, > > I'm an Systems Engineer at the Wikimedia Foundation, and we're investigating > using Flume for our web request log HDFS imports. We've previously been using > Kafka, but have had to change short term architecture plans in order to get > data into HDFS reliably and regularly soon. > > Our current web request logs are available for consumption over a multicast > UDP stream. I could hack something together to try and pipe this into Flume > using the existing sources (SyslogUDPSource, or maybe some combination of > socat + NetcatSource), but I'd rather reduce the number of moving parts. I'd > like to consume directly from the multicast UDP stream as a Flume source. > > I coded up proof of concept based on the SyslogUDPSource, mainly just > stripping out the syslog event header extraction, and adding in multicast > Datagram connection code. I plan on cleaning this up, and making this a > generic raw UDP source, with multicast being a configuration option. > > My question to you guys is, is this something the Flume community would find > useful? If so, should I open up a JIRA to track this? I've got a fork of the > Flume git repo over on github and will be doing my work there. I'd love to > share it upstream if it would be useful. > > Thanks! > -Andrew Otto > Systems Engineer > Wikimedia Foundation > >
