Ashish, Thanks for responding. What I was doing to verify, was by looking at the file once it was in HDFS. The goal being I need different flume clients that need to append host and static interceptor information in their header, so that when they stream to the same hdfs destination, I can figure out where the line entry came from. The reason I want to combine from several log locations, has largely to do with pulling together a larger file for hadoop to process vs many smaller files.
Does anyone one know if there is a way to use the static/host interceptor from across an avro stream into an HDFS sink? On Tue, Sep 16, 2014 at 11:51 PM, Ashish <[email protected]> wrote: > How are you verifying the data? > > When you are using Avro Sink, data shall be sent to Avro Source, here the > config defined serializer (Header and Text) is not used. > > client.sinks.ki.sink.serializer = HEADER_AND_TEXT has no effect > > Not sure about HDFS sink. > > Lets see if someone else can help here. Anyone? > > On Wed, Sep 17, 2014 at 9:29 AM, chris <[email protected]> wrote: > >> Thanks that worked for the config I have below. >> Now I change this to stream to an avro sink it stopped working. >> I even tried adding it to the hdfs conf,but it doesn't produce the static >> interceptors. >> >> Any ideas on why it works as a file_roll sync but not avro sink? >> >> >> Client.config >> client.channels=ch1 >> client.channels.ch1.type=memory >> client.channels.ch1.capacity=100000 >> client.channels.ch1.transactionCapacity=1000000000000 >> >> client.sources=src-1 >> client.sources.src-1.type=spooldir >> client.sources.src-1.spoolDir=/root/unpack >> client.sources.src-1.deserializer.maxLineLength=10000 >> client.sources.src-1.interceptors = i2 i1 >> client.sources.src-1.interceptors.i1.type = host >> client.sources.src-1.interceptors.i1.hostHeader = hostname >> #client.sources.src-1.interceptors.i1.useIP = true >> client.sources.src-1.interceptors.i2.type = static >> client.sources.src-1.interceptors.i2.key = environment >> client.sources.src-1.interceptors.i2.value = sqa >> client.sinks=k1 >> client.sinks.k1.type=avro >> client.sinks.k1.hostname=localhost >> client.sinks.k1.port=42424 >> client.sinks.ki.sink.serializer = HEADER_AND_TEXT >> ## Debugging Sink, Comment out AvroSink if you use this one >> # http://flume.apache.org/FlumeUserGuide.html#file-roll-sink >> #client.sinks.k1.type = file_roll >> #client.sinks.k1.sink.directory = /root/sink >> #client.sinks.k1.sink.rollInterval = 0 >> #client.sinks.k1.sink.serializer = HEADER_AND_TEXT >> >> # Connect soure and sink with channel >> client.sources.src-1.channels=ch1 >> client.sinks.k1.channel=ch1 >> >> HDFS conf >> collector.sources=av1 >> collector.sources.av1.interceptors = i2 >> collector.sources.av1.interceptors.i2.type = timestamp >> collector.sources.av1.type=avro >> collector.sources.av1.bind=0.0.0.0 >> collector.sources.av1.port=42424 >> collector.sources.av1.channels=ch1 >> collector.channels=ch1 >> collector.channels.ch1.type=memory >> collector.channels.ch1.capacity = 100000 >> collector.channels.ch1.transactionCapacity = 1000000000000 >> collector.sinks=k1 >> collector.sinks.k1.type=hdfs >> collector.sinks.k1.channel=ch1 >> collector.sinks.k1.hdfs.path=/flume/%y%m%d >> collector.sinks.k1.hdfs.fileType = DataStream >> collector.sinks.k1.hdfs.rollInterval = 86400 >> collector.sinks.k1.hdfs.rollSize = 0 >> collector.sinks.k1.hdfs.rollCount = 0 >> collector.sinks.k1.hdfs.serializer = HEADER_AND_TEXT >> >> >> >> On 9/16/14 11:10 AM, Ashish wrote: >> >> Try using HEADER_AND_TEXT as serializer for sink, default is Text >> Serializer that writes only the Event body. >> >> On Tue, Sep 16, 2014 at 7:31 PM, christopher palm <[email protected]> >> <[email protected]> wrote: >> >> > All, >> > >> > I am trying to get the static interceptor to insert key, value information >> > in each line that is written >> > out in my data sink. >> > I have tried this with various configurations, but can't seem to get any >> > output from the interceptor >> > to show up in the output files produced by Flume in the target data >> > directory. >> > Below is my latest config, using spooldir as the source and a file_roll >> > sink as the output. >> > >> > Any suggestions as to what I am configuring wrong here? >> > >> > Thanks, >> > Chris >> > >> > client.channels=ch1 >> > client.channels.ch1.type=memory >> > client.channels.ch1.capacity=100000 >> > client.channels.ch1.transactionCapacity=100000 >> > >> > client.sources=src-1 >> > client.sources.src-1.type=spooldir >> > client.sources.src-1.spoolDir=/opt/app/solr/flume/sinkIn >> > client.sources.src-1.deserializer.maxLineLength=10000 >> > client.sources.src-1.interceptors = i1 >> > client.sources.src-1.interceptors.i1.type = static >> > client.sources.src-1.interceptors.i1.preserveExisting = false >> > client.sources.src-1.interceptors.i1.key = datacenter >> > client.sources.src-1.interceptors.i1.value= NYC_01 >> > client.sinks=k1 >> > #client.sinks.k1.type=avro >> > #client.sinks.k1.hostname=localhost >> > #client.sinks.k1.port=42424 >> > ## Debugging Sink, Comment out AvroSink if you use this one >> > # http://flume.apache.org/FlumeUserGuide.html#file-roll-sink >> > client.sinks.k1.type = file_roll >> > client.sinks.k1.sink.directory = /opt/app/solr/flume/sinkOut >> > client.sinks.k1.sink.rollInterval = 0 >> > >> > # Connect soure and sink with channel >> > client.sources.src-1.channels=ch1 >> > client.sinks.k1.channel=ch1 >> > >> >> >> >> -- >> thanks >> ashish >> >> Blog: http://www.ashishpaliwal.com/blog >> My Photo Galleries: http://www.pbase.com/ashishpaliwal >> >> >> > > > -- > thanks > ashish > > Blog: http://www.ashishpaliwal.com/blog > My Photo Galleries: http://www.pbase.com/ashishpaliwal >
