I'm using Pig 0.6.0 and a fix for bug PIG-619 is causing a performance issue with some of my Jobs. In Pig 0.3.0 a fix was added to create an empty slice for any file with a zero file length. In some cases this can cause a number of unneeded map jobs to run. I tried duplicate the problem in Pig-619 on PIG 0.6.0 running on Hadoop 0.20.2 on by removing change and running the scenarios in the issues, but wasn't able to duplicate the problem. I have a couple questions I'm hope someone can answer.
Does anybody know whether the issue was fixed by moving to Hadoop 0.20.2 and could simply be removed? The fix was added in Pig 0.3.0 which ran on Hadoop 0.18.0. Can I change the code to just add one empty slice in the case where all the files are empty instead of an empty slice for all empty files? If I could duplicate the problem I would feel better about making the change.
