Ah, neat! That would do the trick. Seems like a lot of extra steps, but I'll take it if that's how it's done in PIG. Thanks!

On 9/14/11 5:51 PM, Ryan Hoegg wrote:
What about trying something with SPLIT and UNION:

SPLIT EXAMPLE_SOURCE INTO GOOD IF number>5, BETTER IF (number>=2 AND
number<=4), BEST IF (number>=5);

I did a few FOREACH and a UNION, and got this:
(a,6,best)
(b,5,best)
(d,8,best)
(a,6,good)
(d,8,good)
(a,2,better)
(b,2,better)
(c,3,better)
(d,3,better)
(d,4,better)

--
Ryan Hoegg

On Wed, Sep 14, 2011 at 4:24 PM, Eli Finkelshteyn<[email protected]>wrote:

Sorry, bad example, I guess. I want something I can do case statements
with. In this case I could map instead, but if I wanted to use less
straight-forward cases (i.e. one case where number == 1, another where
number between 2 and 4, another where number greater than 5, etc...), it
would be much more difficult to do with mapping.

Again, I know this is something I can do with udfs, but it seemed like
something light enough to be built into PIG itself, so I was hoping there
was a way to do it without needing to write a udf every time I have a new
transformation to make.

Eli

On 9/14/11 5:07 PM, Ryan Hoegg wrote:

What about putting the mappings into their own relation?  I tried this
with
0.9.0:

example.txt:
a,1
a,2
b,2
c,1
d,3
d,4

mapping.txt:
1,one
2,two
3,three
4,four

MAPPINGS = LOAD 'mapping.txt' USING PigStorage(',') AS
(number:int,name:chararray);
EXAMPLE_SOURCE = LOAD 'example.txt' USING PigStorage(',') AS
(item:chararray,number:int);
MAPPED = JOIN EXAMPLE_SOURCE BY number LEFT OUTER, MAPPINGS BY number;
PRETTY = FOREACH MAPPED GENERATE item, name;
DUMP PRETTY;
(a,one)
(c,one)
(a,two)
(b,two)
(d,three)
(d,four)

--
Ryan Hoegg

On Wed, Sep 14, 2011 at 3:27 PM, Eli 
Finkelshteyn<iefinkel@gmail.**com<[email protected]>
wrote:
  Hi,
I'd like to generate based on exclusive conditions (something like the
CASE
statement in SQL). An example:

Say I have data that looks like:

(a, 1)
(a, 2)
(b, 2)
(c, 1)
(d, 3)
(d, 4)

And I want to just convert each of the numbers to their written forms to
get:

(a, one)
(a, two)
(b, two)
(c, one)
(d, three)
(d, four)

Would I need to write a udf for that, or is there some simple way to do
it
using cases? I know I can do a bunch of bidirectional generates one on
top
of the other to achieve this, like:

FOREACH rel GENERATE $0, (($1==1) ? 'one' : (($1 == 2) ? 'two' : (($1 ==
3)
? 'three' : 'four')));

but that seems too messy. I'd appreciate any advice.

Thanks!
Eli





Reply via email to