Or if that doesn't work try the Netcat source.

Sent from my iPhone

> On Oct 10, 2013, at 11:46 PM, Mike Percy <[email protected]> wrote:
> 
> Check out the latest trunk code... We just committed FLUME-1666 courtesy of 
> Jeff Lord this week.
> 
> Mike
> 
> Sent from my iPhone
> 
>> On Oct 10, 2013, at 11:56 AM, DSuiter RDX <[email protected]> wrote:
>> 
>> Hi all,
>> 
>> We set up a pipeline to get rsyslog input from a remote server via TCP using 
>> rsyslog remote TCP forwarding functionality. The data gets sent from the 
>> server to a syslogTCP source, delivered to an Avro sink via memory channel, 
>> which then delivers it to an Avro source channeled to an HDFS sink. It is 
>> moving from source to destination fine, but the output is messy in HDFS. I 
>> realize some of it is Avro schema being defined, but there are Severity and 
>> Facility markers, and extra timestamps that do not appear in 
>> /var/log/messages in the original server.
>> 
>> I am wondering if anyone can help us eliminate them? The extra information 
>> is not useful, so if we could get the information down to what is showing up 
>> in the /var/log/messages, that would simplify the next task of sorting the 
>> data in MapReduce.
>> 
>> Here is the agent recipe, and a scrubbed sample of the data we are getting.
>> 
>> Recipe:
>> RT_syslog.sources = syslogTCP_RT_Tier1_Source avro_RT_Tier2_Source
>> RT_syslog.sinks = avro_RT_Tier1_Sink HDFS_RT_Tier2_Sink
>> RT_syslog.channels = memory_RT_Tier1_Channel memory_RT_Tier2_Channel
>> 
>> # sources
>> RT_syslog.sources.syslogTCP_RT_Tier1_Source.type = syslogtcp
>> RT_syslog.sources.syslogTCP_RT_Tier1_Source.host = 12.34.56.78
>> RT_syslog.sources.syslogTCP_RT_Tier1_Source.port = 5140
>> RT_syslog.sources.syslogTCP_RT_Tier1_Source.channels = 
>> memory_RT_Tier1_Channel
>> 
>> # channels
>> RT_syslog.channels.memory_RT_Tier1_Channel.type = memory
>> RT_syslog.channels.memory_RT_Tier1_Channel.capacity = 1500
>> RT_syslog.channels.memory_RT_Tier1_Channel.transactionCapacity = 1500
>> 
>> # sinks
>> RT_syslog.sinks.avro_RT_Tier1_Sink.type = avro
>> RT_syslog.sinks.avro_RT_Tier1_Sink.hostname = 12.34.56.78
>> RT_syslog.sinks.avro_RT_Tier1_Sink.port = 5141
>> RT_syslog.sinks.avro_RT_Tier1_Sink.batch-size = 1500
>> RT_syslog.sinks.avro_RT_Tier1_Sink.channel = memory_RT_Tier1_Channel
>> 
>> # sources
>> RT_syslog.sources.avro_RT_Tier2_Source.type = avro
>> RT_syslog.sources.avro_RT_Tier2_Source.bind = 12.34.56.78
>> RT_syslog.sources.avro_RT_Tier2_Source.port = 5141
>> RT_syslog.sources.avro_RT_Tier2_Source.channels = memory_RT_Tier2_Channel
>> 
>> # channels
>> RT_syslog.channels.memory_RT_Tier2_Channel.type = memory
>> RT_syslog.channels.memory_RT_Tier2_Channel.capacity = 15000
>> RT_syslog.channels.memory_RT_Tier2_Channel.transactionCapacity = 15000
>> 
>> # sinks
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.type = hdfs
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.channel = memory_RT_Tier2_Channel
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.path = /user/flume/RT_syslog
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.fileSuffix = .avro
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.serializer = avro_event
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.fileType = DataStream
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollInterval = 86400
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollSize = 134217728
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.batchSize = 15000
>> RT_syslog.sinks.HDFS_RT_Tier2_Sink.hdfs.rollCount = 0
>> 
>> Data we are getting in HDFS:
>> 
>> u'headers': {u'timestamp': u'1381256530000', u'host': u'server001', 
>> u'Severity': u'6', u'Facility': u'1'}}
>> {u'body': "RT: Ticket XXXXXX created in queue 'General' by info 
>> (/opt/rt4/sbin/../lib/RT/Ticket.pm:694)",
>> What that looks like in original form:
>> 
>> Oct 10 11:33:42 server001 RT: Ticket XXXXXX created in queue 'General' by 
>> info (/opt/rt4/sbin/../lib/RT/Ticket.pm:694)
>> 
>> Thanks!
>> Devin Suiter
>> Jr. Data Solutions Software Engineer
>> 
>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
>> Google Voice: 412-256-8556 | www.rdx.com

Reply via email to