Hi, my pig code is like this: register myudf.jar a = load 'testurls' as (info:chararray); b = foreach a generate info,com.company.pig.GetInfoScore($0) as m; dump b;
The output is like this: (65RFPRO800863GPT,[108#0.2]) (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16]) (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05]) (5498267_31,[108#0.05,25#0.19,12#0.19]) And I want to group by the map key, and count the info, just like the below output: 108 3 /*65RFPRO800863GPT 6JL6U6EA00863J0J 5498267_31 */ 352 1 /*6JL6U6EA00863J0J*/ 25 3 /*6JL6U6EA00863J0J 6B7FF3E300052E97 5498267_31 */ 26 1 /*6JL6U6EA00863J0J*/ 4 2 /*6JL6U6EA00863J0J 6B7FF3E300052E97*/ 405 1 /*6B7FF3E300052E97*/ 12 1 /*5498267_31*/ I have a think that I have to split the map to many rows just as the below: (65RFPRO800863GPT, 108, 0.2) (6JL6U6EA00863J0J, 352, 0.5) (6JL6U6EA00863J0J, 25, 0.15) (6JL6U6EA00863J0J, 108, 0.07) (6JL6U6EA00863J0J, 26, 0.06) (6JL6U6EA00863J0J, 4, 0.16) (6B7FF3E300052E97, 25, 0.28) (6JL6U6EA00863J0J, 405, 0.05) (6JL6U6EA00863J0J, 4, 0.05) (5498267_31, 108, 0.05) (6JL6U6EA00863J0J, 25, 0.19) (6JL6U6EA00863J0J, 12, 0.19) And then it is easy to group and count. Am I right? I have no idea how to split the map to many rows as the above show. Help. Thanks. 2011/5/25 Alan Gates <[email protected]> > Can't you mimic dynamic key support with static keys by making your map > have two static keys 'key' and 'value'? > > Alan. > > > On May 24, 2011, at 3:05 AM, Jameson Li wrote: > > OK.OK.I know that just write UDFs. >> I have to write UDFs, and see you...... >> And I still think there should be grammar support for map operation both >> static key and dynamic key............. >> >> Thanks. >> >> 2011/5/24 Daniel Dai <[email protected]> >> >> GetKey(m) already get the key, so you can filter the key. For value, you >>> may need to put into UDF. >>> >>> Grammar support for map is based on static key, eg: m#'key1'. Your use >>> case >>> is mostly dealing dynamic keys, which you may rely on yourself currently. >>> >>> Daniel >>> >>> -----Original Message----- From: Jameson Li >>> Sent: Monday, May 23, 2011 7:07 PM >>> To: Daniel Dai >>> Cc: [email protected] >>> Subject: Re: how to operate a map type >>> >>> >>> And how to filter a map key or a map value? And also only UDF? >>> >>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; >>> c = filter b by m.key == 'aaa' or m.value> 0.2; >>> >>> How could I write the code? >>> Any other way without writing UDF? >>> >>> And I have a doubt since only writing UDF can operate a map type, why not >>> have the official functions about the map type? >>> >>> Thanks. >>> >>> 2011/5/24 Daniel Dai <[email protected]> >>> >>> I cannot think of a way without writing UDF. You can write two UDF: >>> >>>> * GetKey, input a map, output the key of the map >>>> * GetValues, input a bag of map, output a bag of map values >>>> >>>> The script is like: >>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; >>>> c = foreach b generate GetKey(m) as key, m; >>>> d = group c by key; >>>> e = foreach c generate group, SUM(GetValues(c.m)); >>>> >>>> >>>> Daniel >>>> >>>> >>>> On 05/23/2011 07:06 AM, Jameson Li wrote: >>>> >>>> Hi all, >>>> >>>>> >>>>> I have the below pig code: >>>>> >>>>> register /home/uu/project/lib/pigudfs.jar >>>>> ruls = load 'testurl' as (url:chararray); >>>>> >>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1); >>>>> >>>>> here when dump b, it will return: >>>>> ([4#0.1677963]) >>>>> ([193#0.16985779,81#0.10994483]) >>>>> ([418#0.14138427,9#0.1107544,282#0.18699136]) >>>>> >>>>> I just want group by the map key and sum the map value just like: >>>>> c = group b by $0#key; >>>>> d = foreach c generate group,SUM(b.$0#value); >>>>> >>>>> How could I write the code? >>>>> >>>>> Thanks, >>>>> Jameson Li. >>>>> >>>>> >>>>> >>>> >>>> >>> >
