There's a fair bit of overhead there. UDFs are ok and normal in pig. Everything is done with them. Don't be afraid of udfs :).
There's some pain with the compile cycle (edit code in java, test, compile, jar, register...). That's where inline python udfs become handy! D On Wed, Sep 14, 2011 at 2:53 PM, Eli Finkelshteyn <[email protected]> wrote: > Ah, neat! That would do the trick. Seems like a lot of extra steps, but > I'll take it if that's how it's done in PIG. Thanks! > > > On 9/14/11 5:51 PM, Ryan Hoegg wrote: > >> What about trying something with SPLIT and UNION: >> >> SPLIT EXAMPLE_SOURCE INTO GOOD IF number>5, BETTER IF (number>=2 AND >> number<=4), BEST IF (number>=5); >> >> I did a few FOREACH and a UNION, and got this: >> (a,6,best) >> (b,5,best) >> (d,8,best) >> (a,6,good) >> (d,8,good) >> (a,2,better) >> (b,2,better) >> (c,3,better) >> (d,3,better) >> (d,4,better) >> >> -- >> Ryan Hoegg >> >> On Wed, Sep 14, 2011 at 4:24 PM, Eli >> Finkelshteyn<iefinkel@gmail.**com<[email protected]> >> >wrote: >> >> Sorry, bad example, I guess. I want something I can do case statements >>> with. In this case I could map instead, but if I wanted to use less >>> straight-forward cases (i.e. one case where number == 1, another where >>> number between 2 and 4, another where number greater than 5, etc...), it >>> would be much more difficult to do with mapping. >>> >>> Again, I know this is something I can do with udfs, but it seemed like >>> something light enough to be built into PIG itself, so I was hoping there >>> was a way to do it without needing to write a udf every time I have a new >>> transformation to make. >>> >>> Eli >>> >>> On 9/14/11 5:07 PM, Ryan Hoegg wrote: >>> >>> What about putting the mappings into their own relation? I tried this >>>> with >>>> 0.9.0: >>>> >>>> example.txt: >>>> a,1 >>>> a,2 >>>> b,2 >>>> c,1 >>>> d,3 >>>> d,4 >>>> >>>> mapping.txt: >>>> 1,one >>>> 2,two >>>> 3,three >>>> 4,four >>>> >>>> MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS >>>> (number:int,name:chararray); >>>> EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS >>>> (item:chararray,number:int); >>>> MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number; >>>> PRETTY = FOREACH MAPPED GENERATE item, name; >>>> DUMP PRETTY; >>>> (a,one) >>>> (c,one) >>>> (a,two) >>>> (b,two) >>>> (d,three) >>>> (d,four) >>>> >>>> -- >>>> Ryan Hoegg >>>> >>>> On Wed, Sep 14, 2011 at 3:27 PM, Eli Finkelshteyn<iefinkel@gmail.**** >>>> com<[email protected]> >>>> >>>>> wrote: >>>>> >>>> Hi, >>>> >>>>> I'd like to generate based on exclusive conditions (something like the >>>>> CASE >>>>> statement in SQL). An example: >>>>> >>>>> Say I have data that looks like: >>>>> >>>>> (a, 1) >>>>> (a, 2) >>>>> (b, 2) >>>>> (c, 1) >>>>> (d, 3) >>>>> (d, 4) >>>>> >>>>> And I want to just convert each of the numbers to their written forms >>>>> to >>>>> get: >>>>> >>>>> (a, one) >>>>> (a, two) >>>>> (b, two) >>>>> (c, one) >>>>> (d, three) >>>>> (d, four) >>>>> >>>>> Would I need to write a udf for that, or is there some simple way to do >>>>> it >>>>> using cases? I know I can do a bunch of bidirectional generates one on >>>>> top >>>>> of the other to achieve this, like: >>>>> >>>>> FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1 >>>>> == >>>>> 3) >>>>> ? 'three' : 'four'))); >>>>> >>>>> but that seems too messy. I'd appreciate any advice. >>>>> >>>>> Thanks! >>>>> Eli >>>>> >>>>> >>>>> >>>>> >>>>> >
