Or if that doesn't work try the Netcat source. Sent from my iPhone
> On Oct 10, 2013, at 11:46 PM, Mike Percy <[email protected]> wrote: > > Check out the latest trunk code... We just committed FLUME-1666 courtesy of > Jeff Lord this week. > > Mike > > Sent from my iPhone > >> On Oct 10, 2013, at 11:56 AM, DSuiter RDX <[email protected]> wrote: >> >> Hi all, >> >> We set up a pipeline to get rsyslog input from a remote server via TCP using >> rsyslog remote TCP forwarding functionality. The data gets sent from the >> server to a syslogTCP source, delivered to an Avro sink via memory channel, >> which then delivers it to an Avro source channeled to an HDFS sink. It is >> moving from source to destination fine, but the output is messy in HDFS. I >> realize some of it is Avro schema being defined, but there are Severity and >> Facility markers, and extra timestamps that do not appear in >> /var/log/messages in the original server. >> >> I am wondering if anyone can help us eliminate them? The extra information >> is not useful, so if we could get the information down to what is showing up >> in the /var/log/messages, that would simplify the next task of sorting the >> data in MapReduce. >> >> Here is the agent recipe, and a scrubbed sample of the data we are getting. >> >> Recipe: >> RT_syslog.sources = syslogTCP_RT_Tier1_Source avro_RT_Tier2_Source >> RT_syslog.sinks = avro_RT_Tier1_Sink HDFS_RT_Tier2_Sink >> RT_syslog.channels = memory_RT_Tier1_Channel memory_RT_Tier2_Channel >> >> # sources >> RT_syslog.sources.syslogTCP_RT_Tier1_Source.type = syslogtcp >> RT_syslog.sources.syslogTCP_RT_Tier1_Source.host = 12.34.56.78 >> RT_syslog.sources.syslogTCP_RT_Tier1_Source.port = 5140 >> RT_syslog.sources.syslogTCP_RT_Tier1_Source.channels = >> memory_RT_Tier1_Channel >> >> # channels >> RT_syslog.channels.memory_RT_Tier1_Channel.type = memory >> RT_syslog.channels.memory_RT_Tier1_Channel.capacity = 1500 >> RT_syslog.channels.memory_RT_Tier1_Channel.transactionCapacity = 1500 >> >> # sinks >> RT_syslog.sinks.avro_RT_Tier1_Sink.type = avro >> RT_syslog.sinks.avro_RT_Tier1_Sink.hostname = 12.34.56.78 >> RT_syslog.sinks.avro_RT_Tier1_Sink.port = 5141 >> RT_syslog.sinks.avro_RT_Tier1_Sink.batch-size = 1500 >> RT_syslog.sinks.avro_RT_Tier1_Sink.channel = memory_RT_Tier1_Channel >> >> # sources >> RT_syslog.sources.avro_RT_Tier2_Source.type = avro >> RT_syslog.sources.avro_RT_Tier2_Source.bind = 12.34.56.78 >> RT_syslog.sources.avro_RT_Tier2_Source.port = 5141 >> RT_syslog.sources.avro_RT_Tier2_Source.channels = memory_RT_Tier2_Channel >> >> # channels >> RT_syslog.channels.memory_RT_Tier2_Channel.type = memory >> RT_syslog.channels.memory_RT_Tier2_Channel.capacity = 15000 >> RT_syslog.channels.memory_RT_Tier2_Channel.transactionCapacity = 15000 >> >> # sinks >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.type = hdfs >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.channel = memory_RT_Tier2_Channel >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.path = /user/flume/RT_syslog >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.fileSuffix = .avro >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.serializer = avro_event >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.fileType = DataStream >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollInterval = 86400 >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollSize = 134217728 >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.batchSize = 15000 >> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollCount = 0 >> >> Data we are getting in HDFS: >> >> u'headers': {u'timestamp': u'1381256530000', u'host': u'server001', >> u'Severity': u'6', u'Facility': u'1'}} >> {u'body': "RT: Ticket XXXXXX created in queue 'General' by info >> (/opt/rt4/sbin/../lib/RT/Ticket.pm:694)", >> What that looks like in original form: >> >> Oct 10 11:33:42 server001 RT: Ticket XXXXXX created in queue 'General' by >> info (/opt/rt4/sbin/../lib/RT/Ticket.pm:694) >> >> Thanks! >> Devin Suiter >> Jr. Data Solutions Software Engineer >> >> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 >> Google Voice: 412-256-8556 | www.rdx.com
