SUM function requires that you specify the specific element from the grouping. In this case, U_tm and U_cnt are both within group/bags and need to be accessed as "reqd.U_tm" and "reqd.U_cnt".
--Sum the User Counts and Times G3 = foreach G2 generate group,SUM(reqd.U_tm)as time,SUM(reqd.U_cnt)as count; On Nov 13, 2012, at 9:58 AM, ingvay7 <[email protected]> wrote: (Apologies for resending but corrected script below) This is the error I got: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1045: Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast. Updated code: a = LOAD 'Report' AS ( dt:chararray, Server:chararray, Type:chararray, Ops:chararray, UserID:chararray, U_cnt:int, U_tm:int, U_min_tm:int, U_max_tm:int, U_avg_tm:float ); --Remove Test Servers remtest = filter a by not Server matches 'Test%'; -- Filter to required columns reqd = foreach remtest generate $1,$2,$3,$4,$5,$6; --Groupby G2 = group reqd by (Server,Type,Ops); --Sum the User Counts and Times G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count; store G3 into 'Servertest'; ----- Original Message ----- From: Prashant Kommireddi <[email protected]> To: "[email protected]" <[email protected]> Cc: Sent: Tuesday, November 13, 2012 11:59 AM Subject: Re: Help with Script Hi, Can you paste the error message here? Sent from my iPhone On Nov 13, 2012, at 8:34 AM, "[email protected]" <[email protected]> wrote: hey all, Very new Pig user here. I think I'm trying to get something very simple done but getting a few errors. See me script below.Any guidance will be appreciated.Thanks. I get errors such as Error during parsing. Invalid alias: serverin {time: double,count: double} I am basically trying to duplicate the following SQL query: select Server, Type, Ops, count(*) users, sum(U_tm) , sum(U_cnt) from TableA group by 1, 2, 3; My script is as follows: a = LOAD 'Report' AS ( dt:chararray, Server:chararray, Type:chararray, Ops:chararray, UserID:chararray, U_cnt:int, U_tm:int, U_min_tm:int, U_max_tm:int, U_avg_tm:float, ); --Remove Test Servers remtest = filter a by not Server matches 'Test%'; -- Filter to required columns reqd = foreach remtest generate $1,$2,$3,$4,$5,$6; --Groupby G2 = group reqd by Server,Type,Ops; --Sum the User Counts and Times G3 = foreach G2 generate group,SUM(U_tm)as time,SUM(U_cnt)as count; --byServeroperation = order G3 by Server; store G3 into 'Servertest'; ingvay7
