no, you want a bag. should be this:

B = foreach A generate name,days_ago, FLATTEN(((days_ago ==
> 1)?{('yesterday'),('week'),('month'),('quarter')}:((...)?:));

On Mon, Jul 25, 2011 at 10:25 AM, Xiaomeng Wan <[email protected]> wrote:
> maybe you can try something like this:
>
> B = foreach A generate name,days_ago, FLATTEN(((days_ago ==
> 1)?{('yesterday','week','month','quarter')}:((...)?:));
>
> Shawn
>
> On Sat, Jul 23, 2011 at 7:44 PM, Raghu Angadi <[email protected]> wrote:
>> I see 3 independent questions :
>>
>>  1. How can we pass entire row tuple to an UDF as 'B = FOREACH A GENERATE
>> myudf(A)', without knowing schema? I don't know if that is passible. It does
>> feel like it should be possible.
>>
>>  2. How can I return an augmented Tuple? Your UDF can make a copy of the
>> input tuple and add whatever you like to and return it.. may be your
>> question is not this simple.
>>
>>  3. How can I make UDF result in multiple row for for input row  as in your
>> example:
>>       - your UDF needs to return bag of row tuples. For (b,1) it would
>> return {(b,1,yesterday), (b,1,week), ... }
>>       - your pig script would flatten the output of the UDF :
>>         B = foreach A generate FLATTEN( myUDF(name, days_ago) );
>>
>> Raghu.
>>
>> On Fri, Jul 22, 2011 at 6:10 PM, Dexin Wang <[email protected]> wrote:
>>
>>> Thanks. I'm not familiar with python, but I write bunch of UDFs in java.
>>>
>>> One question though, how do I pass the the entire tuple to the UDF, I mean
>>> I
>>> can't do something like this:
>>>
>>>    B = FOREACH A GENERATE myudf(A)
>>>
>>> Essentially what I want is given a tuple, I want to enrich the tuple to add
>>> one more field to it, and the value of the new field depends on the value
>>> in
>>> some existing fields in the tuple.
>>>
>>> (a,1) -> (a,1,yesterday)
>>>
>>> how would I do that?
>>>
>>> I imagine I can do
>>>   B = GROUP A BY random;
>>>   C = FOREACH B GENERATE myudf(A);
>>>
>>> But I really don't like adding another GROUP BY here.
>>>
>>> On Fri, Jul 22, 2011 at 5:23 PM, Scott Foster <[email protected]
>>> >wrote:
>>>
>>> > Hi Dexin,
>>> > This is the sort of thing I've started using Python UDFs for. See:
>>> > http://wiki.apache.org/pig/UDFsUsingScriptingLanguages for examples of
>>> > how to write the python code.
>>> >
>>> > If your udf was implemented in Python you could then do this...
>>> >
>>> > register 'udfs.py' using jython as udf;
>>> > ...
>>> > B = FOREACH A generate name, udf.daysAgoString(days_ago);
>>> >
>>> > scott.
>>> >
>>> > On Fri, Jul 22, 2011 at 4:42 PM, Dexin Wang <[email protected]> wrote:
>>> > > Possible to do conditional and more than one generate inside a foreach?
>>> > >
>>> > > for example, I have tuples like this (names, days_ago)
>>> > >
>>> > > (a,0)
>>> > > (b,1)
>>> > > (c,9)
>>> > > (d,40)
>>> > >
>>> > > b shows up 1 day ago, so it belongs to all of the following: yesterday,
>>> > last
>>> > > week, last month, and last quarter. So I'd like to turn the above to:
>>> > >
>>> > > (a,0,today)
>>> > > (b,1,yesterday)
>>> > > (b,1,week)
>>> > > (b,1,month)
>>> > > (b,1,quarter)
>>> > > (c,9,month)
>>> > > (c,9,quarter)
>>> > > (d,40,quarter)
>>> > >
>>> > > I imagine/dream I could do something like this
>>> > >
>>> > > B = FOREACH A
>>> > >  {
>>> > >        if (days_ago <= 90) generate name,days_ago,'quarter';
>>> > >        if (days_ago <= 30) generate name,days_ago,'month';
>>> > >        if (days_ago <= 7)   generate name,days_ago,'week';
>>> > >        if (days_ago == 1)   generate name,days_ago,'yesterday';
>>> > >        if (days_ago == 0)   generate name,days_ago,'today';
>>> > >  }
>>> > >
>>> > > of course that's not valid syntax. I could write my own UDF but would
>>> be
>>> > > nice there's some way to get what I want without UDF.
>>> > >
>>> > > Thanks!
>>> > > Dexin
>>> > >
>>> >
>>>
>>
>

Reply via email to