Right.. Russel, the reason DynamicInvokers weren't working is that InvokeForString expects a function that returns a String. randomUUID returns a UUID, not a String. You could of course call this trivially using jruby udfs (less work than the java version).
D On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[email protected]> wrote: > You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE > to do that since DEFINE does not support methods that return arbitrary > classes. > > Wrapping it into an UDF, works just fine, > > package piggybank; > > import java.io.IOException; > import java.util.UUID; > > import org.apache.pig.EvalFunc; > import org.apache.pig.data.Tuple; > > public class CreateUUID > extends EvalFunc<String> > { > public String exec(Tuple input) > throws IOException > { > try > { > return UUID.randomUUID().toString(); > } > catch(Exception e) > { > // Throwing an exception will cause the task to fail. > throw new IOException("Something bad happened!", e); > } > } > } > // eof > > > register 'mypiggybank.jar'; > define CreateUUID piggybank.CreateUUID(); > > input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); > describe input_lines; > dump input_lines; > > new_list = FOREACH input_lines GENERATE line, CreateUUID(); > describe new_list; > dump new_list; > > -- eof > > >>________________________________ >> From: Russell Jurney <[email protected]> >>To: [email protected] >>Sent: Sunday, May 27, 2012 4:56:07 PM >>Subject: Re: Create rdbms like sequence in Pig on Pig Relation >> >>It helps, but I am not able to invoke java.util.UUID.toString, maybe >>because it doesn't take an argument. This is from the docs: >> >>DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String >>String'); >>encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); >>decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, >>'UTF-8'); >> >> >>Maybe I forgot, but is this how I do it? >> >>DEFINE UUID InvokeForString('java.util.UUID.toString'); >>with_uuid = FOREACH my_stuff generate UUID(), *; >> >> >>Sorry, I only understand example code - not APIs. My Java is quite weak. >> >>http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() >> >>On Sun, May 27, 2012 at 2:33 AM, Subir S <[email protected]> wrote: >> >>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 >>> >>> >>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ >>> >>> Thanks >>> >>> On 5/24/12, Russell Jurney <[email protected]> wrote: >>> > Thanks, I mean how do you invoke it directly in grunt> from Pig? >>> > >>> > I keep messing it up for the last 30 minutes. Should I check the settings >>> > on my pacemaker, I feel like Fabio on NyQuil messing with this. >>> > >>> > On Wed, May 23, 2012 at 10:19 PM, Subir S <[email protected]> >>> > wrote: >>> > >>> >> Hope this helps -> >>> >> http://www.javapractices.com/topic/TopicAction.do?Id=56 >>> >> >>> >> and this -> >>> >> >>> >> >>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >>> >> >>> >> Thanks >>> >> >>> >> >>> >> >>> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >>> >> <[email protected]>wrote: >>> >> >>> >> > How do you invoke java.util.UUID.randomUUID? There is no invoker that >>> >> > doesn't take an arg? >>> >> > >>> >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >>> >> > [email protected]> wrote: >>> >> > >>> >> > > I dont think so. However, its a single line java command. You can >>> >> create >>> >> > > customUDF for this and use in your code. >>> >> > > >>> >> > > java.util.UUID.randomUUID(); >>> >> > > >>> >> > > ~Rajesh.B >>> >> > > >>> >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH >>> >> > > <[email protected]>wrote: >>> >> > > >>> >> > > > Thanks Rajesh. >>> >> > > > >>> >> > > > Is GUID a built in UDF? >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Dipesh >>> >> > > > >>> >> > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < >>> >> > > > [email protected]> wrote: >>> >> > > > >>> >> > > > > If you do not bother about sequence number and the intention is >>> >> > > > > to >>> >> > > create >>> >> > > > > just unique key, you can just use GUID which doesn't require any >>> >> > > > > synchronization at all (all mappers can run in parallel). >>> >> > > > > >>> >> > > > > The approached I suggested in earlier mail comes into picture >>> >> mainly >>> >> > > for >>> >> > > > > sequence number. >>> >> > > > > >>> >> > > > > ~Rajesh.B >>> >> > > > > >>> >> > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < >>> >> > > > > [email protected]> wrote: >>> >> > > > > >>> >> > > > > > Pig doesn't have that facility yet. Moreover, its not very >>> >> > efficient >>> >> > > to >>> >> > > > > do >>> >> > > > > > this in PIG/MR as it requires synchronization. >>> >> > > > > > >>> >> > > > > > However, if this is unavoidable situation for you, following >>> >> things >>> >> > > can >>> >> > > > > be >>> >> > > > > > considered >>> >> > > > > > >>> >> > > > > > 1. Maintaining the seq number details in zookeeper >>> >> > > > > > 2. Having a simple structure in HBase table (seqNumber --> >>> >> Value). >>> >> > > You >>> >> > > > > can >>> >> > > > > > get a bucket of values (ex: 1000-2000) from this and use it in >>> >> your >>> >> > > > UDF. >>> >> > > > > > When the range depletes, you have to query/update HBase table >>> >> (ex: >>> >> > > > > > 3000-4000). There are corner cases which needs to be handled. >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > ~Rajesh.B >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < >>> >> > > > > > [email protected]> wrote: >>> >> > > > > > >>> >> > > > > >> Sorry, if my point was not clear. >>> >> > > > > >> >>> >> > > > > >> I wish to create a sequence on a pig relation. >>> >> > > > > >> >>> >> > > > > >> Say For example i have a relation with data: >>> >> > > > > >> (John, A-1) >>> >> > > > > >> (Jack, B-2) >>> >> > > > > >> (Jim, C-1) >>> >> > > > > >> >>> >> > > > > >> I want to create sequence i.e to add one more column to the >>> >> > > relation, >>> >> > > > > like >>> >> > > > > >> a counter and keep on increasing the count for each record >>> >> > > > > >> read. >>> >> > > > > Expected >>> >> > > > > >> output should be something like this: >>> >> > > > > >> >>> >> > > > > >> (If 200 is the start sequence. ) >>> >> > > > > >> (John, A-1, 201) >>> >> > > > > >> (Jack, B-2, 202) >>> >> > > > > >> (Jim, C-1, 203) >>> >> > > > > >> >>> >> > > > > >> Could you please suggest to proceed on this? >>> >> > > > > >> >>> >> > > > > >> Thanks, >>> >> > > > > >> Dipesh >>> >> > > > > >> >>> >> > > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < >>> >> > > [email protected]> >>> >> > > > > >> wrote: >>> >> > > > > >> >>> >> > > > > >> > What do you mean by 'rdbms like sequence' ? >>> >> > > > > >> > Thanks, >>> >> > > > > >> > Thejas >>> >> > > > > >> > >>> >> > > > > >> > >>> >> > > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: >>> >> > > > > >> > >>> >> > > > > >> >> I want to create a rdbms like sequence on a Pig relation. >>> >> > > > > >> >> >>> >> > > > > >> >> Is there any existing UDF which could do this? >>> >> > > > > >> >> >>> >> > > > > >> >> I am bit new to pig, Kindly suggest how to proceed? >>> >> > > > > >> >> >>> >> > > > > >> >> >>> >> > > > > >> >> Thanks& Regards, >>> >> > > > > >> >> >>> >> > > > > >> > >>> >> > > > > >> > >>> >> > > > > >> >>> >> > > > > >> >>> >> > > > > >> -- >>> >> > > > > >> Dipesh Kr. Singh >>> >> > > > > >> >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > > > > -- >>> >> > > > > > ~Rajesh.B >>> >> > > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > >>> >> > > > > -- >>> >> > > > > ~Rajesh.B >>> >> > > > > >>> >> > > > >>> >> > > > >>> >> > > > >>> >> > > > -- >>> >> > > > Dipesh Kr. Singh >>> >> > > > >>> >> > > >>> >> > > >>> >> > > >>> >> > > -- >>> >> > > ~Rajesh.B >>> >> > > >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Russell Jurney twitter.com/rjurney [email protected] >>> >> > datasyndrome.com >>> >> > >>> >> >>> > >>> > >>> > >>> > -- >>> > Russell Jurney twitter.com/rjurney [email protected] >>> > datasyndrome.com >>> > >>> >> >> >> >>-- >>Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com >> >> >>
