To add to the question, I've setup 4 HDFS sinks as follows
a) seqaeSink , serializer = avro_event , fileType = SequenceFile b) seqtSink , serializer = text , fileType = SequenceFile c) dsaeSink , serializer = avro_event , fileType = DataStream c) dsaeSink , serializer = text , fileType = DataStream , writable = text The problem is seqae, doesn't write AvroEvent object, rather it writes a Sequence File of LongWritable,BytesWritable, and this is WRONG. The Sequence File should be of AvroEvent. The seqt sink works correctly, as in it writes a sequence File of LongWritable, BytesWritable. dsae sink, writes a Data Stream File (each event saperated by new line) of Avro Events dst sink writes plane message body to the file, and that's correct too. So in conclusion the combination serializer = avro_event , fileType = SequenceFile is not working as expected, it works just like the combination serializer = text , fileType = SequenceFile On Tue, Jul 31, 2012 at 10:11 AM, Gumnaam Sur <[email protected]> wrote: > Hi, > For HDFS Sink we have 3 properties which determine the type and content > that gets written to the file. > > writeFomrat = text | writabe > fileType = SequenceFile | DataStream | CompressedStream > serializer = text | avro_event | <custom> > > Can one of the devs, explain these in detail, and the output expected by > various permutation / combinations of the 3 values. and if any combination > is > invalid etc. > > e.g. what's the difference between the combo > serializer = avro_event , fileType = SequenceFile > and > serializer = avro_event , fileType = DataStream > > , What's the difference between writeFormat = 'text' and writeFormat = > 'writable' ? > > To give some background, I am looking to serialize Avro Events, in HDFS in > Sequence file, > and trying to use org.apache.avro.mapreduce.* from my hadoop jobs. I > figure using SequenceFile > should give better performance, over text, but I am not exactly sure of > the various flume options > I mentioned above. > > thanks >
