On Tue, Aug 21, 2012 at 2:25 AM, バーチャル クリストファー <[email protected]>wrote:
> Hi David, > > Currently there is no way to write headers to HDFS using the built-in > Flume functionality. > This is not entirely true, the following combination will write headers to HDFS, in an avro_data file format (binary). agent.sinks.hdfsBinarySink.hdfs.fileType = DataStream agent.sinks.hdfsBinarySink.serializer = avro_client agent.sinks.hdfsBinarySink.hdfs.writeFormat = writable The serializer used is part of flume distribution viz. flume-ng-core/src/main/java/org/apache/flume/serialization/FlumeEventAvroEventSerializer.java A file thus written can be processed with AVRO mapreduce API found in AVRO distribution. Also note that simply using DataStream doesn't mean it's a text file, the serializer and hdfs.writeFormat also decide whether the file is text or binary. I've read the entire HDFS sink code and exprimented with it a lot, so if you want more details, let me know. > > If you are writing to text or binary files on HDFS (i.e. you have set > hdfs.fileType = DataStream or CompressedStream in your config), then you > can supply your own custom serializer, which will allow you to write > headers to HDFS. You will need to write a serializer that implements > org.apache.flume.**serialization.EventSerializer. > > If, on the other hand, you are writing to HDFS SequenceFiles, then > unfortunately there is no way to customize the way that events are > serialized, so you cannot write event headers to HDFS. This is a known > issue (FLUME-1100) and I have supplied a patch to fix it. > > Chris. > > > > On 2012/08/21 11:36, David Capwell wrote: > >> I was wondering if I pass random data to an event's header, can the >> HDFSSink write it to HDFS? I know it can use the headers to split the data >> into different paths, but what about writing the data to HDFS itself? >> >> thanks for your time reading this email. >> > > >
