Hello.
My Pig job always make one reduce job in version 0.12.0-h2, ... because
InputSizeReducerEstimator class return input file size always -1.
I'm not sure the reason, but actually, PlanHelper.getPhysicalOperators
method always return 0 size list.
public int estimateNumberOfReducers(Job job, MapReduceOper mapReduceOper)
> throws IOException {
> Configuration conf = job.getConfiguration();
> long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM,
> DEFAULT_BYTES_PER_REDUCER);
> int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM,
> DEFAULT_MAX_REDUCER_COUNT_PARAM);
> List<POLoad> poLoads =
> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, POLoad.class);
> long totalInputFileSize = getTotalInputFileSize(conf, poLoads,
> job);
> log.info("BytesPerReducer=" + bytesPerReducer + " maxReducers="
> + maxReducers + " totalInputFileSize=" + totalInputFileSize);
> // if totalInputFileSize == -1, we couldn't get the input size so
> we can't estimate.
> if (totalInputFileSize == -1) { return -1; }
> int reducers = (int)Math.ceil((double)totalInputFileSize /
> bytesPerReducer);
> reducers = Math.max(1, reducers);
> reducers = Math.min(maxReducers, reducers);
> return reducers;
> }
and the pig job ends successful.
But the reducer planed one one task, it takes very long time.
I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version.
And also another version by installing ambari 1.4.3.
The result always same.
What was wrong ???