Thanks, Martin.

I was using C# on Windows to generate the Avro files, and now learning to
use java.

The reason I want to combine the Avro files is that MR & HDFS are better
for large files, if we split the original log files, there would be too
many small files.


2014/1/8 Martin Kleppmann <[email protected]>

> Which language are you using to generate the Avro files?
>
> If you're using MapReduce to process the Avro files, it should be able to
> handle multiple input files anyway, so there shouldn't be any need to
> combine the compressed files. Can you just leave them as smaller files?
>
> Martin
>
>
> On 6 January 2014 02:30, Fengyun RAO <[email protected]> wrote:
>
>> Hi, all
>>
>> I have some IIS log files whose format depends on "#Fields" line inside
>> the log, which make the file not splitable and not suitable for MR job. So
>> I want to preprocess the files to Avro files. It's simple and fast to
>> transform each line to an Avro record, but the serialization and
>> compression is too slow.
>>
>> Is there a way that the serialize and compress in parallel, while write
>> sequentially? In principle I could even split the records to several files,
>> which could serialize and compress in parallel, but I can't find a way to
>> combine them.
>>
>>  any suggestions? Thanks!
>>
>
>

Reply via email to