I seem to be hitting this issue in pig-0.12 although it claims to be fixed in pig-0.12 https://issues.apache.org/jira/browse/PIG-3395 Large filter expression makes Pig hang
Cheers, Suhas. On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish <[email protected]> wrote: > This is the pig script - > > %default previousPeriod $pPeriod > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;* > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > store gpWeekRanked INTO 'gpWeekRanked'; > describe gpWeekRanked; > > > Without the filter statement, the code runs without hanging. > > Cheers, > Suhas. > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish <[email protected]>wrote: > >> Hi >> I launched the attached pig job on pig-12 with hadoop MRv1 with the >> attached data, but the FILTER function causes the job to get stuck in an >> infinite loop. >> >> pig -p pPeriod=201312 -f test.pig >> >> The thread in question seems to be stuck forever inside while loop of >> runPipeline method. >> >> stack trace: >> ----------- >> >> "main" prio=10 tid=0x00007fd74800b000 nid=0x2f63 runnable >> [0x00007fd750d50000] >> java.lang.Thread.State: RUNNABLE >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer. >> relationalOperators.POForEach.getNextTuple(POForEach.java:217) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. >> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. >> PigGenericMapBase.map(PigGenericMapBase.java:277) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. >> PigGenericMapBase.map(PigGenericMapBase.java:64) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:282) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs( >> UserGroupInformation.java:1117) >> at org.apache.hadoop.mapred.Child.main(Child.java:271) >> >> >> >> >> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/ >> PigGenericMapBase.java: >> >> protected void *runPipeline*(PhysicalOperator leaf) throws IOException, >> InterruptedException { >> while(true){ >> Result res = leaf.getNext(DUMMYTUPLE); >> if(res.returnStatus==POStatus.STATUS_OK){ >> collect(outputCollector,(Tuple)res.result); >> continue; >> } >> .... >> >> >> >> Whats the suggested code fix here? >> >> >> Thanks, >> Suhas. >> > >
