I have a complex algorithm that I'm mapping to pig. It's basically two steps.
The first step takes a ton of data and boils it down to ONE variable. That variable needs to be used in a number of places in the next steps. It doesn't make sense to create a temporary file like: 1, VAR 2, VAR 3, VAR … but instead it seems cleaner to just use something like result = FOREACH input GENERATE $0 * VARIABLE; … but the question is how do I get the variable into Pig. I don't see a way which is straight forward. One thing I was thinking of doing is splitting up the Job into two pig files. Then running the first, getting the variable, and passing it as a param into the remaining scripts. Is this what pretty much everyone else does? Maybe this should be in the FAQ. -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* Skype-in: *(415) 871-0687*
