What load function are you using ? if it implements some of the interfaces specified here, it turns off split combination -
http://pig.apache.org/docs/r0.9.1/perf.html#combine-files

-Thejas


On 1/11/12 11:07 PM, Marcel Holle wrote:
In my pig.properties are only these parameters specified: log4jconf,
fs.default.name, mapred.job.tracker. So it should use the
CombineFileInputFormat by default. I have 100.000 files of around 16K.

2012/1/11 Prashant Kommireddi<[email protected]>

Hi Marcel,

You might not find "pig.splitCombination" in your configuration if not
manually set. Pig internally defaults it to true.

What is the value of  "pig.maxCombinedSplitSize", if you are not setting it
manually this should be equal to your block size. What is the individual
filesize of the small files?

Thanks,
Prashant


On Wed, Jan 11, 2012 at 3:18 PM, Marcel Holle
<[email protected]>wrote:

If I got it right I should see an output like "Total input paths
(combined)
to process : 7" when I run a pig script, but I'm missing the "(combined)"
part, so CombineFileInputFormat is not used? Where could I find the pig
configuration? I think I have to check the "pig.splitCombination" value.

2012/1/11 Daniel Dai<[email protected]>

Check PIG-1518.

Daniel

On Wed, Jan 11, 2012 at 11:01 AM, Marcel Holle
<[email protected]>wrote:

How could I verify this information? Could you point me to a config
or
the
source code?

2012/1/11 Daniel Dai<[email protected]>

It is default in 0.8 as well.

Daniel

On Wed, Jan 11, 2012 at 10:43 AM, Marcel Holle
<[email protected]>wrote:

Is there also a way to activate the CombineFileInputFormat in Pig
0.8.1?

2012/1/10 Alex Rovner<[email protected]>

In versions 9+ default is CombineFileInputFormat

On Tue, Jan 10, 2012 at 8:10 PM, Marcel Holle
<[email protected]>wrote:

How could I use the CombineFileInputFormat in Pig? I have a
performance
issue with lots of small files which I want to get rid of. I
think
by
default the FileInputFormat is used.










Reply via email to