I want to do know whether it's possible to do loop in pig and end loop by
some feedback variable.
More specifically
1. I want to read a set of files/directories with different names, and
process them in the same workflow and then join the result of all of the
processed result.
e.g. A=load 'a.txt' as (a,b,c); AGroup=group A by a, count(A) as ACount;
B=load 'b.txt' as (a,b,c); BGroup=group B by a,count(B) as BCount;
C=load 'b.txt' as (a,b,c); CGroup=group C by a,count(C) as CCount;
....
X=load 'x.txt' as (a,b,c); XGroup=group X by a,count(X) as XCount;
Result= foreach (join AGroup by a, BGroup by a, CGroup by a, ...,
XGroup by a) generate AGroup::a, ACount, BCount, CCount, .... XCount
Is it possible to simplify my statements by using loop like statements?
2. I want to run one statement again and again until one UDF's value is 0
e.g. I want something like following
A = load something;
while(true){
A= foreach A generate UDF1(a), FEEDBACKUDF(a) as Signal;
if(Signal==0)
break;
}
Is it possible to do above things in Pig? and How?
Thanks a lot!
--
Regards,
Yong-gang Cao
Seattle,WA,98104