I am writing a GenericUDTF now, but notice on http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF
the method docs show: /** * Called to notify the UDTF that there are no more rows to process. Note that * forward() should not be called in this function. Only clean up code should * be run. */ public abstract void close() throws HiveException; but the example does exactly that: @Override public void close() throws HiveException { forwardObj[0] = count; forward(forwardObj); forward(forwardObj); } I'll assume the example is correct and continue, but it might be worth fixing that page. Cheers, Tim On Mon, Nov 8, 2010 at 7:35 AM, Tim Robertson <timrobertson...@gmail.com> wrote: > Thank you both, > > A quick glance looks like that is what I am looking for. When I get > it working, I'll post the solution. > > Cheers, > Tim > > On Mon, Nov 8, 2010 at 6:55 AM, Namit Jain <nj...@facebook.com> wrote: >> Other option would be to create a wrapper script (not use either UDF or >> UDTF) >> That script, in any language, can emit any number of output rows per input >> row. >> >> Look at: >> http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform >> for details >> >> ________________________________ >> From: Sonal Goyal [sonalgoy...@gmail.com] >> Sent: Sunday, November 07, 2010 8:40 PM >> To: user@hive.apache.org >> Subject: Re: Unions causing many scans of input - workaround? >> >> Hey Tim, >> >> You have an interesting problem. Have you tried creating a UDTF for your >> case, so that you can possibly emit more than one record for each row of >> your input? >> >> http://wiki.apache.org/hadoop/Hive/DeveloperGuide/UDTF >> >> Thanks and Regards, >> Sonal >> >> Sonal Goyal | Founder and CEO | Nube Technologies LLP >> http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal >> >> >> >> >> >> On Mon, Nov 8, 2010 at 2:31 AM, Tim Robertson <timrobertson...@gmail.com> >> wrote: >>> >>> Hi all, >>> >>> I am porting custom MR code to Hive and have written working UDFs >>> where I need them. Is there a work around to having to do this in >>> Hive: >>> >>> select * from >>> ( >>> select name_id, toTileX(longitude,0) as x, toTileY(latitude,0) as >>> y, 0 as zoom, funct2(lontgitude, 0) as f2_x, funct2(latitude,0) as >>> f2_y, count (1) as count >>> from table >>> group by name_id, x, y, f2_x, f2_y >>> >>> UNION ALL >>> >>> select name_id, toTileX(longitude,1) as x, toTileY(latitude,1) as >>> y, 1 as zoom, funct2(lontgitude, 1) as f2_x, funct2(latitude,1) as >>> f2_y, count (1) as count >>> from table >>> group by name_id, x, y, f2_x, f2_y >>> >>> --- etc etc increasing in zoom >>> ) >>> >>> The issue being that this does many passes over the table, whereas >>> previously in my Map() I would just emit many times from the same >>> input record and then let it all group in the shuffle and sort. >>> I actually emit 184 times for an input record (23 zoom levels of >>> google maps, and 8 ways to derive the name_id) for a single record >>> which means 184 union statements - Is it possible in hive to force it >>> to emit many times from the source record in the stage-1 map? >>> >>> (ahem) Does anyone know if Pig can do this if not in Hive? >>> >>> I hope I have explained this well enough to make sense. >>> >>> Thanks in advance, >>> Tim >> >> >