Re: UDF with two Bag one per group and one 'static'

Dmitriy Ryaboy Fri, 30 Apr 2010 10:04:30 -0700

http://www.stringtemplate.org/


On Fri, Apr 30, 2010 at 9:57 AM, hc busy <[email protected]> wrote:
> Is there a Java preprocessor?
>
> On Fri, Apr 30, 2010 at 9:54 AM, Dmitriy Ryaboy <[email protected]> wrote:
>
>> I don't think there's a need to reinvent, or reimplement, the wheel here.
>>
>> You are just talking about templates. Try http://template-toolkit.org/
>> (or any of the ruby / python variants on the theme).
>>
>> Or the ruby Oink DSL.
>>
>> -D
>>
>> On Fri, Apr 30, 2010 at 9:45 AM, hc busy <[email protected]> wrote:
>> > Sometimes, I find it necessary to project before performing the group by.
>> > Because there isn't support for functions or #def's it's not possible to
>> > pass in which column to group by, except to project before grouping.
>> >
>> > A = LOAD 'a' AS (group, value);
>> > B = LOAD 'b';
>> > B2 = foreach B generate $5 as group, *;
>> > G = GROUP A BY group, *B2 BY group*;
>> > R = FOREACH G GENERATE FLATTEN(my.udf(A,B2));
>> >
>> > Wouldn't introducing #define in pig speed this up? Add a preprocessor
>> > similar to the parameter substitution to support basic #define would be
>> > cool.
>> >
>> > #define JordiGroup(t1, t2, f1, f2){
>> >           G = group t1 by f1, t2 by f2;
>> >           FOREACH G GENERATE FLATTEN(my.udf(t1,t2));
>> >
>> > }
>> >
>> > ... and later on
>> >
>> > R = JordiGroup(A, B, group, $5);
>> >
>> > Where the result of the #define is the last line; The implementation
>> would
>> > have a really simple parser to ensure () [] and {}'s match for blocks
>> > starting with '#define'. Then it performs substitution in order the
>> macro's
>> > appear, no recursion is allowed.
>> >
>> >
>> >
>> >
>> > On Fri, Apr 30, 2010 at 8:51 AM, Alan Gates <[email protected]> wrote:
>> >
>> >> You need to change your group to a cogroup so that both bags are in your
>> >> data stream.  If you don't want to group bag b by the same keys as a
>> (that
>> >> is, you want all of b available for each group of a) then you can load b
>> as
>> >> a side file inside your udf.
>> >>
>> >> Alan.
>> >>
>> >>
>> >> On Apr 30, 2010, at 4:32 AM, Jordi Deu-Pons wrote:
>> >>
>> >>  Hi,
>> >>>
>> >>> I've developed an UDF that receives two bags as inputs and outputs one
>> >>> bag.
>> >>>
>> >>> One of the bags is different in every group and the other is always the
>> >>> same.
>> >>>
>> >>> Example code:
>> >>>
>> >>> A = LOAD 'a' AS (group, value);
>> >>> B = LOAD 'b';
>> >>> G = GROUP A BY group;
>> >>> R = FOREACH G GENERATE FLATTEN(my.udf(A,B));
>> >>>
>> >>> This give an error "Error during parsing. Invalid alias: B".
>> >>> I can understand this error, but I cannot realize another
>> >>> way to do this.
>> >>>
>> >>> Do you know which is the best way to do this?
>> >>>
>> >>> Thanks
>> >>>
>> >>> --
>> >>> a10! i fins aviat.
>> >>> J:-Deu
>> >>>
>> >>
>> >>
>> >
>>
>

Re: UDF with two Bag one per group and one 'static'

Reply via email to