Which language are you using to generate the Avro files? If you're using MapReduce to process the Avro files, it should be able to handle multiple input files anyway, so there shouldn't be any need to combine the compressed files. Can you just leave them as smaller files?
Martin On 6 January 2014 02:30, Fengyun RAO <[email protected]> wrote: > Hi, all > > I have some IIS log files whose format depends on "#Fields" line inside > the log, which make the file not splitable and not suitable for MR job. So > I want to preprocess the files to Avro files. It's simple and fast to > transform each line to an Avro record, but the serialization and > compression is too slow. > > Is there a way that the serialize and compress in parallel, while write > sequentially? In principle I could even split the records to several files, > which could serialize and compress in parallel, but I can't find a way to > combine them. > > any suggestions? Thanks! >
