Are there examples of JRuby UDFs? I couldn't figure it out. Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
On May 27, 2012, at 4:01 PM, Dmitriy Ryaboy <[email protected]> wrote: > Right.. Russel, the reason DynamicInvokers weren't working is that > InvokeForString expects a function that returns a String. randomUUID > returns a UUID, not a String. > You could of course call this trivially using jruby udfs (less work > than the java version). > > D > > On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[email protected]> > wrote: >> You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE >> to do that since DEFINE does not support methods that return arbitrary >> classes. >> >> Wrapping it into an UDF, works just fine, >> >> package piggybank; >> >> import java.io.IOException; >> import java.util.UUID; >> >> import org.apache.pig.EvalFunc; >> import org.apache.pig.data.Tuple; >> >> public class CreateUUID >> extends EvalFunc<String> >> { >> public String exec(Tuple input) >> throws IOException >> { >> try >> { >> return UUID.randomUUID().toString(); >> } >> catch(Exception e) >> { >> // Throwing an exception will cause the task to fail. >> throw new IOException("Something bad happened!", e); >> } >> } >> } >> // eof >> >> >> register 'mypiggybank.jar'; >> define CreateUUID piggybank.CreateUUID(); >> >> input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray); >> describe input_lines; >> dump input_lines; >> >> new_list = FOREACH input_lines GENERATE line, CreateUUID(); >> describe new_list; >> dump new_list; >> >> -- eof >> >> >>> ________________________________ >>> From: Russell Jurney <[email protected]> >>> To: [email protected] >>> Sent: Sunday, May 27, 2012 4:56:07 PM >>> Subject: Re: Create rdbms like sequence in Pig on Pig Relation >>> >>> It helps, but I am not able to invoke java.util.UUID.toString, maybe >>> because it doesn't take an argument. This is from the docs: >>> >>> DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String >>> String'); >>> encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray); >>> decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded, >>> 'UTF-8'); >>> >>> >>> Maybe I forgot, but is this how I do it? >>> >>> DEFINE UUID InvokeForString('java.util.UUID.toString'); >>> with_uuid = FOREACH my_stuff generate UUID(), *; >>> >>> >>> Sorry, I only understand example code - not APIs. My Java is quite weak. >>> >>> http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString() >>> >>> On Sun, May 27, 2012 at 2:33 AM, Subir S <[email protected]> wrote: >>> >>>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0 >>>> >>>> >>>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/ >>>> >>>> Thanks >>>> >>>> On 5/24/12, Russell Jurney <[email protected]> wrote: >>>>> Thanks, I mean how do you invoke it directly in grunt> from Pig? >>>>> >>>>> I keep messing it up for the last 30 minutes. Should I check the settings >>>>> on my pacemaker, I feel like Fabio on NyQuil messing with this. >>>>> >>>>> On Wed, May 23, 2012 at 10:19 PM, Subir S <[email protected]> >>>>> wrote: >>>>> >>>>>> Hope this helps -> >>>>>> http://www.javapractices.com/topic/TopicAction.do?Id=56 >>>>>> >>>>>> and this -> >>>>>> >>>>>> >>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29 >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney >>>>>> <[email protected]>wrote: >>>>>> >>>>>>> How do you invoke java.util.UUID.randomUUID? There is no invoker that >>>>>>> doesn't take an arg? >>>>>>> >>>>>>> On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> I dont think so. However, its a single line java command. You can >>>>>> create >>>>>>>> customUDF for this and use in your code. >>>>>>>> >>>>>>>> java.util.UUID.randomUUID(); >>>>>>>> >>>>>>>> ~Rajesh.B >>>>>>>> >>>>>>>> On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> Thanks Rajesh. >>>>>>>>> >>>>>>>>> Is GUID a built in UDF? >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Dipesh >>>>>>>>> >>>>>>>>> On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> If you do not bother about sequence number and the intention is >>>>>>>>>> to >>>>>>>> create >>>>>>>>>> just unique key, you can just use GUID which doesn't require any >>>>>>>>>> synchronization at all (all mappers can run in parallel). >>>>>>>>>> >>>>>>>>>> The approached I suggested in earlier mail comes into picture >>>>>> mainly >>>>>>>> for >>>>>>>>>> sequence number. >>>>>>>>>> >>>>>>>>>> ~Rajesh.B >>>>>>>>>> >>>>>>>>>> On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Pig doesn't have that facility yet. Moreover, its not very >>>>>>> efficient >>>>>>>> to >>>>>>>>>> do >>>>>>>>>>> this in PIG/MR as it requires synchronization. >>>>>>>>>>> >>>>>>>>>>> However, if this is unavoidable situation for you, following >>>>>> things >>>>>>>> can >>>>>>>>>> be >>>>>>>>>>> considered >>>>>>>>>>> >>>>>>>>>>> 1. Maintaining the seq number details in zookeeper >>>>>>>>>>> 2. Having a simple structure in HBase table (seqNumber --> >>>>>> Value). >>>>>>>> You >>>>>>>>>> can >>>>>>>>>>> get a bucket of values (ex: 1000-2000) from this and use it in >>>>>> your >>>>>>>>> UDF. >>>>>>>>>>> When the range depletes, you have to query/update HBase table >>>>>> (ex: >>>>>>>>>>> 3000-4000). There are corner cases which needs to be handled. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ~Rajesh.B >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> Sorry, if my point was not clear. >>>>>>>>>>>> >>>>>>>>>>>> I wish to create a sequence on a pig relation. >>>>>>>>>>>> >>>>>>>>>>>> Say For example i have a relation with data: >>>>>>>>>>>> (John, A-1) >>>>>>>>>>>> (Jack, B-2) >>>>>>>>>>>> (Jim, C-1) >>>>>>>>>>>> >>>>>>>>>>>> I want to create sequence i.e to add one more column to the >>>>>>>> relation, >>>>>>>>>> like >>>>>>>>>>>> a counter and keep on increasing the count for each record >>>>>>>>>>>> read. >>>>>>>>>> Expected >>>>>>>>>>>> output should be something like this: >>>>>>>>>>>> >>>>>>>>>>>> (If 200 is the start sequence. ) >>>>>>>>>>>> (John, A-1, 201) >>>>>>>>>>>> (Jack, B-2, 202) >>>>>>>>>>>> (Jim, C-1, 203) >>>>>>>>>>>> >>>>>>>>>>>> Could you please suggest to proceed on this? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Dipesh >>>>>>>>>>>> >>>>>>>>>>>> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair < >>>>>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> What do you mean by 'rdbms like sequence' ? >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Thejas >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I want to create a rdbms like sequence on a Pig relation. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there any existing UDF which could do this? >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am bit new to pig, Kindly suggest how to proceed? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks& Regards, >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Dipesh Kr. Singh >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> ~Rajesh.B >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> ~Rajesh.B >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Dipesh Kr. Singh >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ~Rajesh.B >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Russell Jurney twitter.com/rjurney [email protected] >>>>>>> datasyndrome.com >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Russell Jurney twitter.com/rjurney [email protected] >>>>> datasyndrome.com >>>>> >>>> >>> >>> >>> >>> -- >>> Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com >>> >>> >>>
