In my pig.properties are only these parameters specified: log4jconf,
fs.default.name, mapred.job.tracker. So it should use the
CombineFileInputFormat by default. I have 100.000 files of around 16K.

2012/1/11 Prashant Kommireddi <[email protected]>

> Hi Marcel,
>
> You might not find "pig.splitCombination" in your configuration if not
> manually set. Pig internally defaults it to true.
>
> What is the value of  "pig.maxCombinedSplitSize", if you are not setting it
> manually this should be equal to your block size. What is the individual
> filesize of the small files?
>
> Thanks,
> Prashant
>
>
> On Wed, Jan 11, 2012 at 3:18 PM, Marcel Holle
> <[email protected]>wrote:
>
> > If I got it right I should see an output like "Total input paths
> (combined)
> > to process : 7" when I run a pig script, but I'm missing the "(combined)"
> > part, so CombineFileInputFormat is not used? Where could I find the pig
> > configuration? I think I have to check the "pig.splitCombination" value.
> >
> > 2012/1/11 Daniel Dai <[email protected]>
> >
> > > Check PIG-1518.
> > >
> > > Daniel
> > >
> > > On Wed, Jan 11, 2012 at 11:01 AM, Marcel Holle
> > > <[email protected]>wrote:
> > >
> > > > How could I verify this information? Could you point me to a config
> or
> > > the
> > > > source code?
> > > >
> > > > 2012/1/11 Daniel Dai <[email protected]>
> > > >
> > > > > It is default in 0.8 as well.
> > > > >
> > > > > Daniel
> > > > >
> > > > > On Wed, Jan 11, 2012 at 10:43 AM, Marcel Holle
> > > > > <[email protected]>wrote:
> > > > >
> > > > > > Is there also a way to activate the CombineFileInputFormat in Pig
> > > > 0.8.1?
> > > > > >
> > > > > > 2012/1/10 Alex Rovner <[email protected]>
> > > > > >
> > > > > > > In versions 9+ default is CombineFileInputFormat
> > > > > > >
> > > > > > > On Tue, Jan 10, 2012 at 8:10 PM, Marcel Holle
> > > > > > > <[email protected]>wrote:
> > > > > > >
> > > > > > > > How could I use the CombineFileInputFormat in Pig? I have a
> > > > > performance
> > > > > > > > issue with lots of small files which I want to get rid of. I
> > > think
> > > > by
> > > > > > > > default the FileInputFormat is used.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to