Hi Rekha,

You are right, I mistake the
org.apache.pig.piggybank.evaluation.math.MAX as the built-in UDF.
Actually the default UDF should be org.apache.pig.builtin.MAX;


On Tue, May 11, 2010 at 2:00 PM, Rekha Joshi <[email protected]> wrote:
> Hi Jeff,
>
> Not sure if we are on the same page;but as you disagree I ran the datasets on 
> grunt and default MAX works as expected for getting max.
> Please let me know.
>
> Thanks & Regards,
> /R
>
> grunt> A = load '99.txt' using PigStorage(',') as (f1:int,f2:int, 
> f3:chararray);
> grunt> dump A;
> 2010-05-11 05:55:07,702 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for A
> 2010-05-11 05:55:07,702 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for A
> 2010-05-11 05:55:07,720 [main] WARN  org.apache.pig.impl.io.FileLocalizer - 
> FileLocalizer.create: failed to create /tmp/temp1359137963
> 2010-05-11 05:55:07,827 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp1359137963/tmp-2084075282"
> 2010-05-11 05:55:07,827 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 4
> 2010-05-11 05:55:07,827 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 96
> 2010-05-11 05:55:07,827 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2010-05-11 05:55:07,827 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (10,155,ABC)
> (20,100,DEF)
> (30,200,XYZ)
> (40,100,XXX)
> grunt> B = load '91.txt' using PigStorage(',') as (f4:int, f5:int, 
> f6:chararray);
> grunt> dump B;
> 2010-05-11 05:55:38,530 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for B
> 2010-05-11 05:55:38,530 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for B
> 2010-05-11 05:55:38,546 [main] WARN  org.apache.pig.impl.io.FileLocalizer - 
> FileLocalizer.create: failed to create /tmp/temp1359137963
> 2010-05-11 05:55:38,604 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp1359137963/tmp511625931"
> 2010-05-11 05:55:38,604 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 3
> 2010-05-11 05:55:38,605 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 72
> 2010-05-11 05:55:38,605 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2010-05-11 05:55:38,605 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (10,160,CBA)
> (20,90,QQQ)
> (40,150,AAA)
> grunt> D = union A, B;
> grunt> dump D;
> 2010-05-11 05:56:06,399 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for B
> 2010-05-11 05:56:06,399 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for B
> 2010-05-11 05:56:06,399 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for A
> 2010-05-11 05:56:06,399 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for A
> 2010-05-11 05:56:06,426 [main] WARN  org.apache.pig.impl.io.FileLocalizer - 
> FileLocalizer.create: failed to create /tmp/temp1359137963
> 2010-05-11 05:56:06,573 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp1359137963/tmp-1380306413"
> 2010-05-11 05:56:06,573 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 7
> 2010-05-11 05:56:06,573 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 168
> 2010-05-11 05:56:06,573 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2010-05-11 05:56:06,573 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (10,155,ABC)
> (10,160,CBA)
> (20,100,DEF)
> (20,90,QQQ)
> (30,200,XYZ)
> (40,150,AAA)
> (40,100,XXX)
> grunt> E = group D by $0;
> grunt> dump E;
> 2010-05-11 05:56:39,830 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for B
> 2010-05-11 05:56:39,831 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for B
> 2010-05-11 05:56:39,831 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for A
> 2010-05-11 05:56:39,831 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for A
> 2010-05-11 05:56:39,857 [main] WARN  org.apache.pig.impl.io.FileLocalizer - 
> FileLocalizer.create: failed to create /tmp/temp1359137963
> 2010-05-11 05:56:39,995 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp1359137963/tmp-1896759683"
> 2010-05-11 05:56:39,995 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 4
> 2010-05-11 05:56:39,995 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 235
> 2010-05-11 05:56:39,995 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2010-05-11 05:56:39,995 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (10,{(10,155,ABC),(10,160,CBA)})
> (20,{(20,100,DEF),(20,90,QQQ)})
> (30,{(30,200,XYZ)})
> (40,{(40,150,AAA),(40,100,XXX)})
> grunt> describe E;
> E: {group: int,D: {f1: int,f2: int,f3: chararray}}
> grunt> F = foreach E generate group, MAX(D.f2);
> grunt> dump F;
> 2010-05-11 05:57:10,378 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for B
> 2010-05-11 05:57:10,378 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for B
> 2010-05-11 05:57:10,378 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned 
> for A
> 2010-05-11 05:57:10,378 [main] INFO  
> org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No map keys pruned 
> for A
> 2010-05-11 05:57:10,430 [main] WARN  org.apache.pig.impl.io.FileLocalizer - 
> FileLocalizer.create: failed to create /tmp/temp1359137963
> 2010-05-11 05:57:10,555 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Successfully 
> stored result in: "file:/tmp/temp1359137963/tmp1923083174"
> 2010-05-11 05:57:10,555 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Records 
> written : 4
> 2010-05-11 05:57:10,555 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Bytes written 
> : 72
> 2010-05-11 05:57:10,555 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
> 2010-05-11 05:57:10,555 [main] INFO  
> org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
> (10,160)
> (20,100)
> (30,200)
> (40,150)
> grunt>
>
>
>
> On 5/11/10 11:21 AM, "Jeff Zhang" <[email protected]> wrote:
>
> Hi Rekha,
>
> I look at the source code, the MAX udf built-in accept a Tuple rather
> than a DataBag. And it can only handle two values
>
>
> On Tue, May 11, 2010 at 1:35 PM, Rekha Joshi <[email protected]> wrote:
>> Hi Moeller,
>>
>> I think  the default MAX udf can get the max of second element within the 
>> group, as the group would be something like below.
>> (10,{(10,155,ABC),(10,160,CBA)})
>> (20,{(20,100,DEF),(20,90,QQQ)})
>> (30,{(30,200,XYZ)})
>> (40,{(40,150,AAA),(40,100,XXX)})
>>
>> Something like , Z = foreach Y generate group, MAX(X.f2);
>> (10,160)
>> (20,100)
>> (30,200)
>> (40,150)
>>
>> Refer http://hadoop.apache.org/pig/docs/r0.6.0/piglatin_ref2.html
>>
>> You might have to resort to another join or maybe a conditional expr to get 
>> third element, or write your udf.
>>
>> Thanks & Regards
>> /
>>
>> On 5/11/10 10:44 AM, "Jeff Zhang" <[email protected]> wrote:
>>
>> myoutput = FOREACH grouped GENERATE
>> group,org.apache.pig.piggybank.myudf.MaxElement($1.$1);
>>
>> And the following is the udf:
>>
>> package org.apache.pig.piggybank.myudf;
>>
>> import java.io.IOException;
>>
>> import org.apache.pig.EvalFunc;
>> import org.apache.pig.data.DataBag;
>> import org.apache.pig.data.Tuple;
>>
>> public class MaxElement extends EvalFunc<Integer> {
>>
>>   �...@override
>>    public Integer exec(Tuple input) throws IOException {
>>        int max = Integer.MIN_VALUE;
>>        DataBag bag = (DataBag) input.get(0);
>>        for (Tuple tuple : bag) {
>>            Integer value = (Integer)tuple.get(0);
>>            if (value > max){
>>                max=value;
>>            }
>>        }
>>        return max;
>>    }
>>
>> }
>>
>>
>>
>> On Tue, May 11, 2010 at 12:50 PM, Mads Moeller <[email protected]> wrote:
>>> Hi all,
>>>
>>> I am new to Pig/Hadoop and I am trying to figure out how I can merge
>>> two (or more) input files, based on the value in one of the data
>>> fields. E.g. from the below input files (INPUT 1 and INPUT 2), I want
>>> to join on $0 and keep the row containing the highest value in $1 of
>>> each row.
>>>
>>> INPUT 1
>>> 10,155,ABC
>>> 20,100,DEF
>>> 30,200,XYZ
>>> 40,100,XXX
>>>
>>> INPUT 2
>>> 10,160,CBA
>>> 20,90,QQQ
>>> 40,150,AAA
>>>
>>> DESIRED OUTPUT
>>> 10,160,CBA
>>> 20,100,DEF
>>> 30,200,XYZ
>>> 40,150,AAA
>>>
>>> -- pig script start
>>> INPUT1 = LOAD 'file1' USING PigStorage(',') AS (id:int, myval:int,
>>> name:chararray);
>>> INPUT2 = LOAD 'file2' USING PigStorage(',') AS (id:int, myval:int,
>>> name:chararray);
>>> combined = UNION INPUT1, INPUT2;
>>> grouped = GROUP filtered BY id;
>>> -- myoutput = FOREACH grouped GENERATE ??? -- I am stuck :-)
>>>
>>> STORE myoutput INTO 'output.csv' USING PigStorage(',');
>>> -- pig script end
>>>
>>> Any suggestions to how this can be accomplished?
>>>
>>> Thanks.
>>>
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>



-- 
Best Regards

Jeff Zhang

Reply via email to