But we don't want to extend PigLatin to have #define... ? On Fri, Apr 30, 2010 at 10:04 AM, Dmitriy Ryaboy <[email protected]> wrote:
> http://www.stringtemplate.org/ > > On Fri, Apr 30, 2010 at 9:57 AM, hc busy <[email protected]> wrote: > > Is there a Java preprocessor? > > > > On Fri, Apr 30, 2010 at 9:54 AM, Dmitriy Ryaboy <[email protected]> > wrote: > > > >> I don't think there's a need to reinvent, or reimplement, the wheel > here. > >> > >> You are just talking about templates. Try http://template-toolkit.org/ > >> (or any of the ruby / python variants on the theme). > >> > >> Or the ruby Oink DSL. > >> > >> -D > >> > >> On Fri, Apr 30, 2010 at 9:45 AM, hc busy <[email protected]> wrote: > >> > Sometimes, I find it necessary to project before performing the group > by. > >> > Because there isn't support for functions or #def's it's not possible > to > >> > pass in which column to group by, except to project before grouping. > >> > > >> > A = LOAD 'a' AS (group, value); > >> > B = LOAD 'b'; > >> > B2 = foreach B generate $5 as group, *; > >> > G = GROUP A BY group, *B2 BY group*; > >> > R = FOREACH G GENERATE FLATTEN(my.udf(A,B2)); > >> > > >> > Wouldn't introducing #define in pig speed this up? Add a preprocessor > >> > similar to the parameter substitution to support basic #define would > be > >> > cool. > >> > > >> > #define JordiGroup(t1, t2, f1, f2){ > >> > G = group t1 by f1, t2 by f2; > >> > FOREACH G GENERATE FLATTEN(my.udf(t1,t2)); > >> > > >> > } > >> > > >> > ... and later on > >> > > >> > R = JordiGroup(A, B, group, $5); > >> > > >> > Where the result of the #define is the last line; The implementation > >> would > >> > have a really simple parser to ensure () [] and {}'s match for blocks > >> > starting with '#define'. Then it performs substitution in order the > >> macro's > >> > appear, no recursion is allowed. > >> > > >> > > >> > > >> > > >> > On Fri, Apr 30, 2010 at 8:51 AM, Alan Gates <[email protected]> > wrote: > >> > > >> >> You need to change your group to a cogroup so that both bags are in > your > >> >> data stream. If you don't want to group bag b by the same keys as a > >> (that > >> >> is, you want all of b available for each group of a) then you can > load b > >> as > >> >> a side file inside your udf. > >> >> > >> >> Alan. > >> >> > >> >> > >> >> On Apr 30, 2010, at 4:32 AM, Jordi Deu-Pons wrote: > >> >> > >> >> Hi, > >> >>> > >> >>> I've developed an UDF that receives two bags as inputs and outputs > one > >> >>> bag. > >> >>> > >> >>> One of the bags is different in every group and the other is always > the > >> >>> same. > >> >>> > >> >>> Example code: > >> >>> > >> >>> A = LOAD 'a' AS (group, value); > >> >>> B = LOAD 'b'; > >> >>> G = GROUP A BY group; > >> >>> R = FOREACH G GENERATE FLATTEN(my.udf(A,B)); > >> >>> > >> >>> This give an error "Error during parsing. Invalid alias: B". > >> >>> I can understand this error, but I cannot realize another > >> >>> way to do this. > >> >>> > >> >>> Do you know which is the best way to do this? > >> >>> > >> >>> Thanks > >> >>> > >> >>> -- > >> >>> a10! i fins aviat. > >> >>> J:-Deu > >> >>> > >> >> > >> >> > >> > > >> > > >
