Well, I think I should start with the first step. My UDF works on a grouped data, I want to get some suggestions on how I can retain my schema of input grouped data in my outputSchema method. Thanks.
Regards Syed Wasti On 6/11/10 4:28 PM, "Syed Wasti" <[email protected]> wrote: > Hi, > I have written a UDF to sort the grouped data on a given field (in my case > date field) and return the sorted data in a databag. I want my method to get > the schema of my fields within the input (which is in a bag) and returning > bag should carry this schema. > In the outputSchema method the input schema is treated as tuple schema, for > which I will have to add this schema in a tuple and push this tuple in a > bag. So my output will look something like this; > > grunt>grp_frds= GROUP gen_frds BY id; > grunt>grp_out= FOREACH grp_frds GENERATE FLATTEN(PartByDesc(gen_frds, 3)); > --(second parameter is the field on which I want to sort my bag) > grunt> describe grp_out; > grp_out: {bag_of_tokenTuples::gen_frds: {id: long,dep_id: long, grp: > int,date: chararray}} > > So, in my case I don¹t want the date field any more, so in the next operator > > forc = FOREACH grp_gen GENERATE FLATTEN(bag_of_tokenTuples::gen_frds.id) AS > id, FLATTEN(bag_of_tokenTuples:: gen_frds. dep_id) AS dep_id, > FLATTEN(bag_of_tokenTuples::gen_frds.grp) AS grp; > > All this works as I want it to be, but I am expecting the FLATTEN keyword I > am using over my UDF to eliminate all the nesting or within the > ³bag_of_tokenTuples² eleminate the ³gen_frds² bag and have the fields within > the bag_of_tokenTuples. > Looking for suggestions please. > > Thanks > Syed Wasti >
