No, Pig Latin does data flow only, not control flow. But I'm not sure what
your code would mean. Y is bag (or a relation). Y.$0 is still a bag, not a
single valued entity, as your pseudo code implies. Do you really mean
something like:
foreach row in Y:
if Y[row][0] == 'foo':
generate X[row][0]
If that's the case you can write a UDF that would do that, and invoke it as:
foreach X {
...
generate yourUDF(X, Y);
}
This would give you the freedom to choose when to and when not to emit records.
In the case where there were no records in Y that met your filter condition
you would still have to emit a null, so you might still want to filter out
nulls afterwords.
Alan.
On Oct 3, 2011, at 11:18 AM, Stan Rosenberg wrote:
> Alan,
>
> Let me abstract my previous example to:
>
> foreach X {
> -- do some processing and store results in Y
> if (Y.$0 == 'foo') {
> generate X.$0, ...
> }
> }
>
> Does pig support this type of control-flow?
>
> Many thanks,
>
> stan
>
> On Mon, Oct 3, 2011 at 11:42 AM, Alan Gates <[email protected]> wrote:
>
>> I'm not sure what you mean by t being non empty, since it's a relation and
>> not an expression. But guessing that you mean there is some bag in t you
>> want to check for non-emptiness, isn't the following equivalent?
>>
>> foreach X {
>> ...
>> t = distinct ...
>> tnotempty = filter t by !IsEmpty(...);
>> generate foo, ...
>> }
>>
>> Alan.
>>
>>
>> On Oct 3, 2011, at 12:46 AM, Dmitriy Ryaboy wrote:
>>
>>> Why not this:
>>>
>>> Y = foreach X { ..
>>> t = distinct ...
>>> generate t, foo...
>>> }
>>>
>>> Z = filter Y by isEmpty(t);
>>>
>>> OR: t can't be empty if the thing you are distincting is not empty, so
>> this
>>> should work:
>>>
>>> Y = filter X by IsEmpty(thing_you_wanted_to_distinct);
>>> Z = foreach Y {
>>> -- the thing you are distincting is now guaranteed to have at least 1
>>> value
>>> t = distinct ..
>>> generate foo...
>>> }
>>>
>>> On Sun, Oct 2, 2011 at 9:28 AM, Stan Rosenberg <
>>> [email protected]> wrote:
>>>
>>>> Hi Folks,
>>>>
>>>> I came across a use case where I'd like to do something like this:
>>>>
>>>> FOREACH X {
>>>> ...
>>>> t = DISTINCT (...)
>>>> if (!IsEmpty(t))
>>>> GENERATE foo, ...
>>>> }
>>>>
>>>> Thus, 'generate' is conditionally executed and the control flow depends
>> on
>>>> the value of some tuple 't'.
>>>> Can this be done in pig?
>>>>
>>>> Thanks,
>>>>
>>>> stan
>>>>
>>>> P.S. Please ignore my previous email; I accidentally triggered send
>> before
>>>> I
>>>> had a chance to finish it.
>>>>
>>
>>