Thanks, Martin. I was using C# on Windows to generate the Avro files, and now learning to use java.
The reason I want to combine the Avro files is that MR & HDFS are better for large files, if we split the original log files, there would be too many small files. 2014/1/8 Martin Kleppmann <[email protected]> > Which language are you using to generate the Avro files? > > If you're using MapReduce to process the Avro files, it should be able to > handle multiple input files anyway, so there shouldn't be any need to > combine the compressed files. Can you just leave them as smaller files? > > Martin > > > On 6 January 2014 02:30, Fengyun RAO <[email protected]> wrote: > >> Hi, all >> >> I have some IIS log files whose format depends on "#Fields" line inside >> the log, which make the file not splitable and not suitable for MR job. So >> I want to preprocess the files to Avro files. It's simple and fast to >> transform each line to an Avro record, but the serialization and >> compression is too slow. >> >> Is there a way that the serialize and compress in parallel, while write >> sequentially? In principle I could even split the records to several files, >> which could serialize and compress in parallel, but I can't find a way to >> combine them. >> >> any suggestions? Thanks! >> > >
