Another alternative is to write a udf that returns all keys in a map as a bag. I think this will be useful addition to piggybank. It will also be useful to have getEntries(Map), getValues(Map) udfs in piggybank. If you choose this option and you are in a position to contribute the udf code, please do so.
Thanks, Thejas On 6/2/11 8:55 AM, "Xiaomeng Wan" <[email protected]> wrote: can't you udf return a bag of tuple with two fields (ie key and value), then flatten it? Shawn On Thu, Jun 2, 2011 at 7:28 AM, Jameson Li <[email protected]> wrote: > Hi, > > my pig code is like this: > register myudf.jar > a = load 'testurls' as (info:chararray); > b = foreach a generate info,com.company.pig.GetInfoScore($0) as m; > dump b; > > The output is like this: > (65RFPRO800863GPT,[108#0.2]) > (6JL6U6EA00863J0J,[352#0.5,25#0.15,108#0.07,26#0.06,4#0.16]) > (6B7FF3E300052E97,[25#0.28,405#0.05,4#0.05]) > (5498267_31,[108#0.05,25#0.19,12#0.19]) > > And I want to group by the map key, and count the info, just like the below > output: > 108 3 /*65RFPRO800863GPT 6JL6U6EA00863J0J 5498267_31 */ > 352 1 /*6JL6U6EA00863J0J*/ > 25 3 /*6JL6U6EA00863J0J 6B7FF3E300052E97 5498267_31 */ > 26 1 /*6JL6U6EA00863J0J*/ > 4 2 /*6JL6U6EA00863J0J 6B7FF3E300052E97*/ > 405 1 /*6B7FF3E300052E97*/ > 12 1 /*5498267_31*/ > > I have a think that I have to split the map to many rows just as the below: > (65RFPRO800863GPT, 108, 0.2) > (6JL6U6EA00863J0J, 352, 0.5) > (6JL6U6EA00863J0J, 25, 0.15) > (6JL6U6EA00863J0J, 108, 0.07) > (6JL6U6EA00863J0J, 26, 0.06) > (6JL6U6EA00863J0J, 4, 0.16) > (6B7FF3E300052E97, 25, 0.28) > (6JL6U6EA00863J0J, 405, 0.05) > (6JL6U6EA00863J0J, 4, 0.05) > (5498267_31, 108, 0.05) > (6JL6U6EA00863J0J, 25, 0.19) > (6JL6U6EA00863J0J, 12, 0.19) > > And then it is easy to group and count. > Am I right? > I have no idea how to split the map to many rows as the above show. > Help. > > Thanks. > > 2011/5/25 Alan Gates <[email protected]> > >> Can't you mimic dynamic key support with static keys by making your map >> have two static keys 'key' and 'value'? >> >> Alan. >> >> >> On May 24, 2011, at 3:05 AM, Jameson Li wrote: >> >> OK.OK.I know that just write UDFs. >>> I have to write UDFs, and see you...... >>> And I still think there should be grammar support for map operation both >>> static key and dynamic key............. >>> >>> Thanks. >>> >>> 2011/5/24 Daniel Dai <[email protected]> >>> >>> GetKey(m) already get the key, so you can filter the key. For value, you >>>> may need to put into UDF. >>>> >>>> Grammar support for map is based on static key, eg: m#'key1'. Your use >>>> case >>>> is mostly dealing dynamic keys, which you may rely on yourself currently. >>>> >>>> Daniel >>>> >>>> -----Original Message----- From: Jameson Li >>>> Sent: Monday, May 23, 2011 7:07 PM >>>> To: Daniel Dai >>>> Cc: [email protected] >>>> Subject: Re: how to operate a map type >>>> >>>> >>>> And how to filter a map key or a map value? And also only UDF? >>>> >>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; >>>> c = filter b by m.key == 'aaa' or m.value> 0.2; >>>> >>>> How could I write the code? >>>> Any other way without writing UDF? >>>> >>>> And I have a doubt since only writing UDF can operate a map type, why not >>>> have the official functions about the map type? >>>> >>>> Thanks. >>>> >>>> 2011/5/24 Daniel Dai <[email protected]> >>>> >>>> I cannot think of a way without writing UDF. You can write two UDF: >>>> >>>>> * GetKey, input a map, output the key of the map >>>>> * GetValues, input a bag of map, output a bag of map values >>>>> >>>>> The script is like: >>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1) as m; >>>>> c = foreach b generate GetKey(m) as key, m; >>>>> d = group c by key; >>>>> e = foreach c generate group, SUM(GetValues(c.m)); >>>>> >>>>> >>>>> Daniel >>>>> >>>>> >>>>> On 05/23/2011 07:06 AM, Jameson Li wrote: >>>>> >>>>> Hi all, >>>>> >>>>>> >>>>>> I have the below pig code: >>>>>> >>>>>> register /home/uu/project/lib/pigudfs.jar >>>>>> ruls = load 'testurl' as (url:chararray); >>>>>> >>>>>> b = foreach ruls generate com.company.pig.GetURLContent($0,3,0.1); >>>>>> >>>>>> here when dump b, it will return: >>>>>> ([4#0.1677963]) >>>>>> ([193#0.16985779,81#0.10994483]) >>>>>> ([418#0.14138427,9#0.1107544,282#0.18699136]) >>>>>> >>>>>> I just want group by the map key and sum the map value just like: >>>>>> c = group b by $0#key; >>>>>> d = foreach c generate group,SUM(b.$0#value); >>>>>> >>>>>> How could I write the code? >>>>>> >>>>>> Thanks, >>>>>> Jameson Li. >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >> > --
