I seem to be hitting this issue in pig-0.12 although it claims to be fixed
in pig-0.12
https://issues.apache.org/jira/browse/PIG-3395
Large filter expression makes Pig hang

Cheers,
Suhas.


On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish <[email protected]> wrote:

> This is the pig script -
>
> %default previousPeriod $pPeriod
>
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
>
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
>
> *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;*
>
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
>
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked INTO 'gpWeekRanked';
> describe gpWeekRanked;
>
>
> Without the filter statement, the code runs without hanging.
>
> Cheers,
> Suhas.
>
>
> On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish <[email protected]>wrote:
>
>> Hi
>> I launched the attached pig job on pig-12 with hadoop MRv1 with the
>> attached data, but the FILTER function causes the job to get stuck in an
>> infinite loop.
>>
>> pig -p pPeriod=201312 -f test.pig
>>
>> The thread in question seems to be stuck forever inside while loop of
>> runPipeline method.
>>
>> stack trace:
>> -----------
>>
>> "main" prio=10 tid=0x00007fd74800b000 nid=0x2f63 runnable
>> [0x00007fd750d50000]
>>    java.lang.Thread.State: RUNNABLE
>>     at
>> org.apache.pig.backend.hadoop.executionengine.physicalLayer.
>> relationalOperators.POForEach.getNextTuple(POForEach.java:217)
>>     at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
>>     at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.map(PigGenericMapBase.java:277)
>>     at
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
>> PigGenericMapBase.map(PigGenericMapBase.java:64)
>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:680)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:415)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(
>> UserGroupInformation.java:1117)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:271)
>>
>>
>>
>>
>> org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/
>> PigGenericMapBase.java:
>>
>> protected void *runPipeline*(PhysicalOperator leaf) throws IOException,
>> InterruptedException {
>>         while(true){
>>             Result res = leaf.getNext(DUMMYTUPLE);
>>             if(res.returnStatus==POStatus.STATUS_OK){
>>                 collect(outputCollector,(Tuple)res.result);
>>                 continue;
>>             }
>> ....
>>
>>
>>
>> Whats the suggested code fix here?
>>
>>
>> Thanks,
>> Suhas.
>>
>
>

Reply via email to