I don't think there's a need to reinvent, or reimplement, the wheel here. You are just talking about templates. Try http://template-toolkit.org/ (or any of the ruby / python variants on the theme).
Or the ruby Oink DSL. -D On Fri, Apr 30, 2010 at 9:45 AM, hc busy <[email protected]> wrote: > Sometimes, I find it necessary to project before performing the group by. > Because there isn't support for functions or #def's it's not possible to > pass in which column to group by, except to project before grouping. > > A = LOAD 'a' AS (group, value); > B = LOAD 'b'; > B2 = foreach B generate $5 as group, *; > G = GROUP A BY group, *B2 BY group*; > R = FOREACH G GENERATE FLATTEN(my.udf(A,B2)); > > Wouldn't introducing #define in pig speed this up? Add a preprocessor > similar to the parameter substitution to support basic #define would be > cool. > > #define JordiGroup(t1, t2, f1, f2){ > G = group t1 by f1, t2 by f2; > FOREACH G GENERATE FLATTEN(my.udf(t1,t2)); > > } > > ... and later on > > R = JordiGroup(A, B, group, $5); > > Where the result of the #define is the last line; The implementation would > have a really simple parser to ensure () [] and {}'s match for blocks > starting with '#define'. Then it performs substitution in order the macro's > appear, no recursion is allowed. > > > > > On Fri, Apr 30, 2010 at 8:51 AM, Alan Gates <[email protected]> wrote: > >> You need to change your group to a cogroup so that both bags are in your >> data stream. If you don't want to group bag b by the same keys as a (that >> is, you want all of b available for each group of a) then you can load b as >> a side file inside your udf. >> >> Alan. >> >> >> On Apr 30, 2010, at 4:32 AM, Jordi Deu-Pons wrote: >> >> Hi, >>> >>> I've developed an UDF that receives two bags as inputs and outputs one >>> bag. >>> >>> One of the bags is different in every group and the other is always the >>> same. >>> >>> Example code: >>> >>> A = LOAD 'a' AS (group, value); >>> B = LOAD 'b'; >>> G = GROUP A BY group; >>> R = FOREACH G GENERATE FLATTEN(my.udf(A,B)); >>> >>> This give an error "Error during parsing. Invalid alias: B". >>> I can understand this error, but I cannot realize another >>> way to do this. >>> >>> Do you know which is the best way to do this? >>> >>> Thanks >>> >>> -- >>> a10! i fins aviat. >>> J:-Deu >>> >> >> >
