Oh...oh. The current developing version is not appliable to my service. My pig job failed by another problem.. I want to see the reason why the job failed... but I have no much time. It looks like more serious problem.
Right now, pig-0.12.0-h2 version is more stable in my case. So, I decide to use pig-0.12.0-h2 version. and wait for public release of pig-0.12.1-h2 version. 2014-02-07 18:35 GMT+09:00 최종원 <[email protected]>: > Finally, I solved the problem. thank you. > > It fixed in 0.12.1 version. > > I've downloaded the source code from github. > and change the pig version, and make build the 0.12.1 vesion with h2 > option. > > It calculate the input source files size, and make multi reduce tasks... > > Thank you very much... happy weekend ~ bye > > > > > 2014-02-07 15:48 GMT+09:00 최종원 <[email protected]>: > > Thank you for your answer. >> >> But where can I find pig source of pig-0.12.0-h2 version ? >> I think, there must be difference between pig-0.12.0 and pig-0.12.0-h2. >> >> but I cannot find the source version 0.12.0-h2. >> >> when I extract the jar file, there are additional package (like >> org.apache.pig.backend.hadoop23.PigJobControl ...). >> >> >> >> >> >> >> >> 2014-02-07 10:59 GMT+09:00 Cheolsoo Park <[email protected]>: >> >> Hi, >>> >>> Sounds like you're bitten by PIG-3512- >>> https://issues.apache.org/jira/browse/PIG-3512 >>> >>> Can you try to apply the patch and rebuild the jar? >>> >>> Thanks, >>> Cheolsoo >>> >>> >>> >>> On Thu, Feb 6, 2014 at 7:27 PM, 최종원 <[email protected]> wrote: >>> >>> > This is the log ... >>> > >>> > 2014-02-06 17:29:19,087 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>> > - Reduce phase detected, estimating # of required reducers. >>> > 2014-02-06 17:29:19,087 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>> > - Using reducer estimator: >>> > >>> > >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator >>> > 2014-02-06 17:29:19,087 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator >>> > - BytesPerReducer=100000000 maxReducers=999=-1 totalInputFileSize >>> > 2014-02-06 17:29:19,087 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>> > - Could not estimate number of reducers and no requested or default >>> > parallelism set. Defaulting to 1 reducer. >>> > 2014-02-06 17:29:19,087 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler >>> > - Setting Parallelism to 1 >>> > 2014-02-06 17:29:19,104 [Thread-42] INFO >>> > >>> > >>> >>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher >>> > - 1 map-reduce job(s) waiting for submission. >>> > >>> > InputSizeReducerEstimator cannot calculate map files size, so doesn't >>> > estimate reducer size. >>> > But I think, I gave the right hadoop file path. >>> > I tried many possible pathes like... >>> > >>> > relative-path/to/file >>> > /user/myuser/absolute-path/to/file >>> > hdfs://host:8020/user/myuser/absolute-path/to/file >>> > >>> hdfs://host:9000/user/myuser/absolute-path/to/file/change-the-hdfs-port >>> > >>> > etc... >>> > >>> > but the pig failed to estimate reducer size. >>> > >>> > I am almost defeated... by this problem. >>> > >>> > >>> > >>> > 2014-02-06 21:31 GMT+09:00 최종원 <[email protected]>: >>> > >>> > > Hello. >>> > > >>> > > My Pig job always make one reduce job in version 0.12.0-h2, ... >>> because >>> > > >>> > > InputSizeReducerEstimator class return input file size always -1. >>> > > >>> > > I'm not sure the reason, but actually, >>> PlanHelper.getPhysicalOperators >>> > > method always return 0 size list. >>> > > >>> > > >>> > > public int estimateNumberOfReducers(Job job, MapReduceOper >>> > >> mapReduceOper) throws IOException { >>> > >> Configuration conf = job.getConfiguration(); >>> > >> long bytesPerReducer = conf.getLong(BYTES_PER_REDUCER_PARAM, >>> > >> DEFAULT_BYTES_PER_REDUCER); >>> > >> int maxReducers = conf.getInt(MAX_REDUCER_COUNT_PARAM, >>> > >> DEFAULT_MAX_REDUCER_COUNT_PARAM); >>> > >> List<POLoad> poLoads = >>> > >> PlanHelper.getPhysicalOperators(mapReduceOper.mapPlan, >>> POLoad.class); >>> > >> long totalInputFileSize = getTotalInputFileSize(conf, >>> poLoads, >>> > >> job); >>> > >> log.info("BytesPerReducer=" + bytesPerReducer + " >>> maxReducers=" >>> > >> + maxReducers + " totalInputFileSize=" + >>> > totalInputFileSize); >>> > >> // if totalInputFileSize == -1, we couldn't get the input >>> size >>> > >> so we can't estimate. >>> > >> if (totalInputFileSize == -1) { return -1; } >>> > >> int reducers = (int)Math.ceil((double)totalInputFileSize / >>> > >> bytesPerReducer); >>> > >> reducers = Math.max(1, reducers); >>> > >> reducers = Math.min(maxReducers, reducers); >>> > >> return reducers; >>> > >> } >>> > > >>> > > >>> > > >>> > > and the pig job ends successful. >>> > > >>> > > But the reducer planed one one task, it takes very long time. >>> > > >>> > > >>> > > I tried it in apache hadoop 2.2.0 and pig 0.12.0 (h2) version. >>> > > >>> > > And also another version by installing ambari 1.4.3. >>> > > >>> > > The result always same. >>> > > >>> > > >>> > > What was wrong ??? >>> > > >>> > >>> >> >> >
