Thanks for the tip! I was indeed missing the interceptors. I've added them now but the timestamp and hostname is still not showing up in the hdfs log. Any advice?

------- sample event in HDFS ------
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable??????cc?c??I?[???\?????`?????E?????Tsu[28432]: pam_unix(su:session): session opened for user root by myuser(uid=31043)

------ same event in syslog ------
Mar 31 16:18:32 hadoop-t1 su[28432]: pam_unix(su:session): session opened for user root by myuser(uid=31043)

------- flume-conf.properties --------
# Name the components on this agent
hadoop-t1.sources = r1
hadoop-t1.sinks = s1
hadoop-t1.channels = mem1

# Describe/configure the source
hadoop-t1.sources.r1.type = syslogtcp
hadoop-t1.sources.r1.host = localhost
hadoop-t1.sources.r1.port = 10005
hadoop-t1.sources.r1.portHeader = port
hadoop-t1.sources.r1.interceptors = i1 i2
hadoop-t1.sources.r1.interceptors.i1.type = timestamp
hadoop-t1.sources.r1.interceptors.i2.type = host
hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname

##HDFS Sink
hadoop-t1.sinks.s1.type = hdfs
hadoop-t1.sinks.s1.hdfs.path = hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
hadoop-t1.sinks.s1.hdfs.batchSize = 1
hadoop-t1.sinks.s1.serializer = org.apache.flume.serialization.HeaderAndBodyTextEventSerializer$Builder
hadoop-t1.sinks.s1.serializer.columns = timestamp hostname
hadoop-t1.sinks.s1.serializer.format = CSV
hadoop-t1.sinks.s1.serializer.appendNewline = true

## MEM  Use a channel which buffers events in memory
hadoop-t1.channels.mem1.type = memory
hadoop-t1.channels.mem1.capacity = 1000
hadoop-t1.channels.mem1.transactionCapacity = 100

# Bind the source and sink to the channel
hadoop-t1.sources.r1.channels = mem1
hadoop-t1.sinks.s1.channel = mem1


On 14-03-28 3:37 PM, Jeff Lord wrote:
Do you have the appropriate interceptors configured?


On Fri, Mar 28, 2014 at 12:28 PM, Ryan Suarez <[email protected] <mailto:[email protected]>> wrote:

    RTFM indicates I need the following sink properties:

    ---
    hadoop-t1.sinks.hdfs1.serializer =
    org.apache.flume.serialization.HeaderAndBodyTextEventSerializer
    hadoop-t1.sinks.hdfs1.serializer.columns = timestamp hostname msg
    hadoop-t1.sinks.hdfs1.serializer.format = CSV
    hadoop-t1.sinks.hdfs1.serializer.appendNewline = true
    ---

    But I'm still not getting timestamp information.  How would I get
    hostname and timestamp information in the logs?


    On 14-03-26 3:02 PM, Ryan Suarez wrote:

        Greetings,

        I'm running flume that's shipped with Hortonworks HDP2 to feed
        syslogs to hdfs.  The problem is the timestamp and hostname of
        the event is not logged to hdfs.

        ---
        flume@hadoop-t1:~$ hadoop fs -cat
        /opt/logs/hadoop-t1/2014-03-26/FlumeData.1395859766307
        SEQ!org.apache.hadoop.io
        
<http://org.apache.hadoop.io>.LongWritable"org.apache.hadoop.io.BytesWritable??Ak?i<??G??`D??$hTsu[22209]:
        pam_unix(su:session): session opened for user root by
        someuser(uid=11111)
        ---

        How do I configure the sink to add hostname and timestamp info
        the the event?

        Here's my flume-conf.properties:

        ---
        flume@hadoop-t1:/etc/flume/conf$ cat flume-conf.properties
        # Name the components on this agent
        hadoop-t1.sources = syslog1
        hadoop-t1.sinks = hdfs1
        hadoop-t1.channels = mem1

        # Describe/configure the source
        hadoop-t1.sources.syslog1.type = syslogtcp
        hadoop-t1.sources.syslog1.host = localhost
        hadoop-t1.sources.syslog1.port = 10005
        hadoop-t1.sources.syslog1.portHeader = port

        ##HDFS Sink
        hadoop-t1.sinks.hdfs1.type = hdfs
        hadoop-t1.sinks.hdfs1.hdfs.path =
        hdfs://hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d
        <http://hadoop-t1.mydomain.org:8020/opt/logs/%%7Bhost%7D/%Y-%m-%d>
        hadoop-t1.sinks.hdfs1.hdfs.batchSize = 1

        # Use a channel which buffers events in memory
        hadoop-t1.channels.mem1.type = memory
        hadoop-t1.channels.mem1.capacity = 1000
        hadoop-t1.channels.mem1.transactionCapacity = 100

        # Bind the source and sink to the channel
        hadoop-t1.sources.syslog1.channels = mem1
        hadoop-t1.sinks.hdfs1.channel = mem1
        ---

        ---
        flume@hadoop-t1:~$ flume-ng version
        Flume 1.4.0.2.0.11.0-1
        Source code repository:
        https://git-wip-us.apache.org/repos/asf/flume.git
        Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e
        Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014
        From source with checksum dea9ae30ce2c27486ae7c76ab7aba020
        ---




Reply via email to