There's a fair bit of overhead there.

UDFs are ok and normal in pig. Everything is done with them. Don't be afraid
of udfs :).

There's some pain with the compile cycle (edit code in java, test, compile,
jar, register...). That's where inline python udfs become handy!

D

On Wed, Sep 14, 2011 at 2:53 PM, Eli Finkelshteyn <[email protected]> wrote:

> Ah, neat! That would do the trick. Seems like a lot of extra steps, but
> I'll take it if that's how it's done in PIG. Thanks!
>
>
> On 9/14/11 5:51 PM, Ryan Hoegg wrote:
>
>> What about trying something with SPLIT and UNION:
>>
>> SPLIT EXAMPLE_SOURCE INTO GOOD IF number>5, BETTER IF (number>=2 AND
>> number<=4), BEST IF (number>=5);
>>
>> I did a few FOREACH and a UNION, and got this:
>> (a,6,best)
>> (b,5,best)
>> (d,8,best)
>> (a,6,good)
>> (d,8,good)
>> (a,2,better)
>> (b,2,better)
>> (c,3,better)
>> (d,3,better)
>> (d,4,better)
>>
>> --
>> Ryan Hoegg
>>
>> On Wed, Sep 14, 2011 at 4:24 PM, Eli 
>> Finkelshteyn<iefinkel@gmail.**com<[email protected]>
>> >wrote:
>>
>>  Sorry, bad example, I guess. I want something I can do case statements
>>> with. In this case I could map instead, but if I wanted to use less
>>> straight-forward cases (i.e. one case where number == 1, another where
>>> number between 2 and 4, another where number greater than 5, etc...), it
>>> would be much more difficult to do with mapping.
>>>
>>> Again, I know this is something I can do with udfs, but it seemed like
>>> something light enough to be built into PIG itself, so I was hoping there
>>> was a way to do it without needing to write a udf every time I have a new
>>> transformation to make.
>>>
>>> Eli
>>>
>>> On 9/14/11 5:07 PM, Ryan Hoegg wrote:
>>>
>>>  What about putting the mappings into their own relation?  I tried this
>>>> with
>>>> 0.9.0:
>>>>
>>>> example.txt:
>>>> a,1
>>>> a,2
>>>> b,2
>>>> c,1
>>>> d,3
>>>> d,4
>>>>
>>>> mapping.txt:
>>>> 1,one
>>>> 2,two
>>>> 3,three
>>>> 4,four
>>>>
>>>> MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
>>>> (number:int,name:chararray);
>>>> EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS
>>>> (item:chararray,number:int);
>>>> MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number;
>>>> PRETTY = FOREACH MAPPED GENERATE item, name;
>>>> DUMP PRETTY;
>>>> (a,one)
>>>> (c,one)
>>>> (a,two)
>>>> (b,two)
>>>> (d,three)
>>>> (d,four)
>>>>
>>>> --
>>>> Ryan Hoegg
>>>>
>>>> On Wed, Sep 14, 2011 at 3:27 PM, Eli Finkelshteyn<iefinkel@gmail.****
>>>> com<[email protected]>
>>>>
>>>>> wrote:
>>>>>
>>>>  Hi,
>>>>
>>>>> I'd like to generate based on exclusive conditions (something like the
>>>>> CASE
>>>>> statement in SQL). An example:
>>>>>
>>>>> Say I have data that looks like:
>>>>>
>>>>> (a, 1)
>>>>> (a, 2)
>>>>> (b, 2)
>>>>> (c, 1)
>>>>> (d, 3)
>>>>> (d, 4)
>>>>>
>>>>> And I want to just convert each of the numbers to their written forms
>>>>> to
>>>>> get:
>>>>>
>>>>> (a, one)
>>>>> (a, two)
>>>>> (b, two)
>>>>> (c, one)
>>>>> (d, three)
>>>>> (d, four)
>>>>>
>>>>> Would I need to write a udf for that, or is there some simple way to do
>>>>> it
>>>>> using cases? I know I can do a bunch of bidirectional generates one on
>>>>> top
>>>>> of the other to achieve this, like:
>>>>>
>>>>> FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1
>>>>> ==
>>>>> 3)
>>>>> ? 'three' : 'four')));
>>>>>
>>>>> but that seems too messy. I'd appreciate any advice.
>>>>>
>>>>> Thanks!
>>>>> Eli
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>

Reply via email to