I was suggesting using the Set just as a means to distinct your input
data. You'll want to iterate over the set and add each item into the
response tuple. So you're returning a tuple of N unique objects,
instead of a tuple of a Set of N.

On Sun, Apr 3, 2011 at 9:57 AM, Jonathan Coveney <jcove...@gmail.com> wrote:
> I do not know if this is it, but I am not sure that pig likes it when you
> use the result variable in its own declaration. That is to say, try doing
> rows2 = Foreach rows generate etc.
>
> 2011/4/3 Mark <static.void....@gmail.com>
>
>> I have a simple EvalFunc as so:
>>
>> public class Set extends EvalFunc<Tuple> {
>>  public Tuple exec(Tuple tuple) throws IOException {
>>    Set<Object> unique = new HashSet<Object>();
>>    unique.addAll(tuple.getAll());
>>    return TupleFactory.getInstance().newTuple(unique);
>>  }
>> }
>>
>> How can I apply this to a result set though?  When I try:
>>
>> rows = LOAD 'foo';
>> rows = FOREACH rows GENERATE com.mycompany.piggybank.Set(rows);
>> 2011-04-03 09:16:25,423 [main] ERROR org.apache.pig.tools.grunt.Grunt -
>> ERROR 1000: Error during parsing. Scalars can be only used with projections
>>
>> I get the above error? Should I be using something other than a EvalFunc?
>>
>> Thanks
>>
>>
>>
>> On 4/3/11 8:53 AM, Bill Graham wrote:
>>
>>> You could add all the values to a set in a udf and the return it's
>>> contents.
>>>
>>> On Sunday, April 3, 2011, Mark<static.void....@gmail.com>  wrote:
>>>
>>>> If I have a tuple of values, is there a way to eliminate duplicate values
>>>> per tuple?
>>>>
>>>> Example:
>>>> (5,5,4,7,2,3,4,9) = (5,4,7,2,3,9)
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>

Reply via email to