There's an open-source tool called FileCrush
(http://www.jointhegrid.com/hadoop_filecrush/index.jsp) that might be
helpful with this. Not sure if it supports lzo out of the box.
Basically, you want to balance the desire for small files with the
amount of data you want a single reducer to write. Aim for 3-5 gigs /
reducer.

D

On Fri, Dec 16, 2011 at 12:19 PM, Cameron Gandevia <[email protected]> wrote:
> Thanks Dmitriy. One more question. Is the correct way to merge them using a
> map/reduce job with a single reducer or is there some hadoop tool that can
> do this?
>
> On Fri, Dec 16, 2011 at 12:14 PM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> Yes, the indexer isn't giving you anything if your files are so small.
>> You should merge your files if that's an option.
>>
>> On Fri, Dec 16, 2011 at 12:06 PM, Cameron Gandevia <[email protected]>
>> wrote:
>> > Hi
>> >
>> > I am currently injesting lzo compressed log files, running the lzo
>> indexer
>> > on them and then running a bunch of pig scripts. The typical size of the
>> > lzo files are around 100mb. I am wondering if I should run a map/reduce
>> job
>> > to merge the files into a larger file prior to running the lzo indexer?
>> >
>> > Thanks
>>
>
>
>
> --
> Thanks
>
> Cameron Gandevia

Reply via email to