In my pig.properties are only these parameters specified: log4jconf, fs.default.name, mapred.job.tracker. So it should use the CombineFileInputFormat by default. I have 100.000 files of around 16K.
2012/1/11 Prashant Kommireddi <[email protected]> > Hi Marcel, > > You might not find "pig.splitCombination" in your configuration if not > manually set. Pig internally defaults it to true. > > What is the value of "pig.maxCombinedSplitSize", if you are not setting it > manually this should be equal to your block size. What is the individual > filesize of the small files? > > Thanks, > Prashant > > > On Wed, Jan 11, 2012 at 3:18 PM, Marcel Holle > <[email protected]>wrote: > > > If I got it right I should see an output like "Total input paths > (combined) > > to process : 7" when I run a pig script, but I'm missing the "(combined)" > > part, so CombineFileInputFormat is not used? Where could I find the pig > > configuration? I think I have to check the "pig.splitCombination" value. > > > > 2012/1/11 Daniel Dai <[email protected]> > > > > > Check PIG-1518. > > > > > > Daniel > > > > > > On Wed, Jan 11, 2012 at 11:01 AM, Marcel Holle > > > <[email protected]>wrote: > > > > > > > How could I verify this information? Could you point me to a config > or > > > the > > > > source code? > > > > > > > > 2012/1/11 Daniel Dai <[email protected]> > > > > > > > > > It is default in 0.8 as well. > > > > > > > > > > Daniel > > > > > > > > > > On Wed, Jan 11, 2012 at 10:43 AM, Marcel Holle > > > > > <[email protected]>wrote: > > > > > > > > > > > Is there also a way to activate the CombineFileInputFormat in Pig > > > > 0.8.1? > > > > > > > > > > > > 2012/1/10 Alex Rovner <[email protected]> > > > > > > > > > > > > > In versions 9+ default is CombineFileInputFormat > > > > > > > > > > > > > > On Tue, Jan 10, 2012 at 8:10 PM, Marcel Holle > > > > > > > <[email protected]>wrote: > > > > > > > > > > > > > > > How could I use the CombineFileInputFormat in Pig? I have a > > > > > performance > > > > > > > > issue with lots of small files which I want to get rid of. I > > > think > > > > by > > > > > > > > default the FileInputFormat is used. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
