I have a complex algorithm that I'm mapping to pig.

It's basically two steps.

The first step takes a ton of data and boils it down to ONE variable.

That variable needs to be used in a number of places in the next steps.

It doesn't make sense to create a temporary file like:

1, VAR
2, VAR
3, VAR

… but instead it seems cleaner to just use something like

result = FOREACH input GENERATE $0 * VARIABLE;

… but the question is how do I get the variable into Pig.

I don't see a way which is straight forward.

One thing I was thinking of doing is splitting up the Job into two pig
files.

Then running the first, getting the variable, and passing it as a param into
the remaining scripts.

Is this what pretty much everyone else does?

Maybe this should be in the FAQ.

-- 

Founder/CEO Spinn3r.com

Location: *San Francisco, CA*
Skype: *burtonator*

Skype-in: *(415) 871-0687*

Reply via email to