I was suggesting using the Set just as a means to distinct your input data. You'll want to iterate over the set and add each item into the response tuple. So you're returning a tuple of N unique objects, instead of a tuple of a Set of N.
On Sun, Apr 3, 2011 at 9:57 AM, Jonathan Coveney <jcove...@gmail.com> wrote: > I do not know if this is it, but I am not sure that pig likes it when you > use the result variable in its own declaration. That is to say, try doing > rows2 = Foreach rows generate etc. > > 2011/4/3 Mark <static.void....@gmail.com> > >> I have a simple EvalFunc as so: >> >> public class Set extends EvalFunc<Tuple> { >> public Tuple exec(Tuple tuple) throws IOException { >> Set<Object> unique = new HashSet<Object>(); >> unique.addAll(tuple.getAll()); >> return TupleFactory.getInstance().newTuple(unique); >> } >> } >> >> How can I apply this to a result set though? When I try: >> >> rows = LOAD 'foo'; >> rows = FOREACH rows GENERATE com.mycompany.piggybank.Set(rows); >> 2011-04-03 09:16:25,423 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> ERROR 1000: Error during parsing. Scalars can be only used with projections >> >> I get the above error? Should I be using something other than a EvalFunc? >> >> Thanks >> >> >> >> On 4/3/11 8:53 AM, Bill Graham wrote: >> >>> You could add all the values to a set in a udf and the return it's >>> contents. >>> >>> On Sunday, April 3, 2011, Mark<static.void....@gmail.com> wrote: >>> >>>> If I have a tuple of values, is there a way to eliminate duplicate values >>>> per tuple? >>>> >>>> Example: >>>> (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9) >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>>> >