Well you are writing a sequence file (default) Is that what you want? If you want text use:
hdfs.fileType = datastream and for the serializer you should be able to just use: a1.sinks.k1.sink.serializer = header_and_text On Tue, Apr 1, 2014 at 8:02 AM, Ryan Suarez <[email protected]>wrote: > Thanks for the tip! I was indeed missing the interceptors. I've added > them now but the timestamp and hostname is still not showing up in the hdfs > log. Any advice? > > > ------- sample event in HDFS ------ > SEQ > !org.apache.hadoop.io.LongWritable”org.apache.hadoop.io.BytesWritable������cc�c��I�[��ڳ\�����`��� > �� E � ����Tsu[28432]: pam_unix(su:session): session opened for user root > by myuser(uid=31043) > > ------ same event in syslog ------ > Mar 31 16:18:32 hadoop-t1 su[28432]: pam_unix(su:session): session opened > for user root by myuser(uid=31043) > > ------- flume-conf.properties -------- > > # Name the components on this agent > hadoop-t1.sources = r1 > hadoop-t1.sinks = s1 > > hadoop-t1.channels = mem1 > > # Describe/configure the source > hadoop-t1.sources.r1.type = syslogtcp > hadoop-t1.sources.r1.host = localhost > hadoop-t1.sources.r1.port = 10005 > hadoop-t1.sources.r1.portHeader = port > hadoop-t1.sources.r1.interceptors = i1 i2 > hadoop-t1.sources.r1.interceptors.i1.type = timestamp > hadoop-t1.sources.r1.interceptors.i2.type = host > hadoop-t1.sources.r1.interceptors.i2.hostHeader = hostname > > ##HDFS Sink > hadoop-t1.sinks.s1.type = hdfs > hadoop-t1.sinks.s1.hdfs.path = hdfs:// > hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d > hadoop-t1.sinks.s1.hdfs.batchSize = 1 > hadoop-t1.sinks.s1.serializer = > org.apache.flume.serialization.HeaderAndBodyTextEventSerializer$Builder > hadoop-t1.sinks.s1.serializer.columns = timestamp hostname > hadoop-t1.sinks.s1.serializer.format = CSV > hadoop-t1.sinks.s1.serializer.appendNewline = true > > ## MEM Use a channel which buffers events in memory > > hadoop-t1.channels.mem1.type = memory > hadoop-t1.channels.mem1.capacity = 1000 > hadoop-t1.channels.mem1.transactionCapacity = 100 > > # Bind the source and sink to the channel > hadoop-t1.sources.r1.channels = mem1 > hadoop-t1.sinks.s1.channel = mem1 > > > > On 14-03-28 3:37 PM, Jeff Lord wrote: > > Do you have the appropriate interceptors configured? > > > On Fri, Mar 28, 2014 at 12:28 PM, Ryan Suarez < > [email protected]> wrote: > >> RTFM indicates I need the following sink properties: >> >> --- >> hadoop-t1.sinks.hdfs1.serializer = >> org.apache.flume.serialization.HeaderAndBodyTextEventSerializer >> hadoop-t1.sinks.hdfs1.serializer.columns = timestamp hostname msg >> hadoop-t1.sinks.hdfs1.serializer.format = CSV >> hadoop-t1.sinks.hdfs1.serializer.appendNewline = true >> --- >> >> But I'm still not getting timestamp information. How would I get >> hostname and timestamp information in the logs? >> >> >> On 14-03-26 3:02 PM, Ryan Suarez wrote: >> >>> Greetings, >>> >>> I'm running flume that's shipped with Hortonworks HDP2 to feed syslogs >>> to hdfs. The problem is the timestamp and hostname of the event is not >>> logged to hdfs. >>> >>> --- >>> flume@hadoop-t1:~$ hadoop fs -cat >>> /opt/logs/hadoop-t1/2014-03-26/FlumeData.1395859766307 >>> SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable??Ak?i<??G??`D??$hTsu[22209]: >>> pam_unix(su:session): session opened for user root by someuser(uid=11111) >>> --- >>> >>> How do I configure the sink to add hostname and timestamp info the the >>> event? >>> >>> Here's my flume-conf.properties: >>> >>> --- >>> flume@hadoop-t1:/etc/flume/conf$ cat flume-conf.properties >>> # Name the components on this agent >>> hadoop-t1.sources = syslog1 >>> hadoop-t1.sinks = hdfs1 >>> hadoop-t1.channels = mem1 >>> >>> # Describe/configure the source >>> hadoop-t1.sources.syslog1.type = syslogtcp >>> hadoop-t1.sources.syslog1.host = localhost >>> hadoop-t1.sources.syslog1.port = 10005 >>> hadoop-t1.sources.syslog1.portHeader = port >>> >>> ##HDFS Sink >>> hadoop-t1.sinks.hdfs1.type = hdfs >>> hadoop-t1.sinks.hdfs1.hdfs.path = hdfs:// >>> hadoop-t1.mydomain.org:8020/opt/logs/%{host}/%Y-%m-%d >>> hadoop-t1.sinks.hdfs1.hdfs.batchSize = 1 >>> >>> # Use a channel which buffers events in memory >>> hadoop-t1.channels.mem1.type = memory >>> hadoop-t1.channels.mem1.capacity = 1000 >>> hadoop-t1.channels.mem1.transactionCapacity = 100 >>> >>> # Bind the source and sink to the channel >>> hadoop-t1.sources.syslog1.channels = mem1 >>> hadoop-t1.sinks.hdfs1.channel = mem1 >>> --- >>> >>> --- >>> flume@hadoop-t1:~$ flume-ng version >>> Flume 1.4.0.2.0.11.0-1 >>> Source code repository: >>> https://git-wip-us.apache.org/repos/asf/flume.git >>> Revision: fcdc3d29a1f249bef653b10b149aea2bc5df892e >>> Compiled by jenkins on Wed Mar 12 05:11:30 PDT 2014 >>> From source with checksum dea9ae30ce2c27486ae7c76ab7aba020 >>> --- >>> >> >> > >
