Are there examples of JRuby UDFs? I couldn't figure it out.

Russell Jurney
twitter.com/rjurney
[email protected]
datasyndrome.com

On May 27, 2012, at 4:01 PM, Dmitriy Ryaboy <[email protected]> wrote:

> Right.. Russel, the reason DynamicInvokers weren't working is that
> InvokeForString expects a function that returns a String. randomUUID
> returns a UUID, not a String.
> You could of course call this trivially using jruby udfs (less work
> than the java version).
>
> D
>
> On Sun, May 27, 2012 at 2:39 PM, Dragan Nedeljkovic <[email protected]> 
> wrote:
>> You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE
>> to do that since DEFINE does not support methods that return arbitrary 
>> classes.
>>
>> Wrapping it into an UDF, works just fine,
>>
>> package piggybank;
>>
>> import java.io.IOException;
>> import java.util.UUID;
>>
>> import org.apache.pig.EvalFunc;
>> import org.apache.pig.data.Tuple;
>>
>> public class CreateUUID
>> extends EvalFunc<String>
>> {
>> public String exec(Tuple input)
>> throws IOException
>> {
>> try
>> {
>> return UUID.randomUUID().toString();
>> }
>> catch(Exception e)
>> {
>> // Throwing an exception will cause the task to fail.
>> throw new IOException("Something bad happened!", e);
>> }
>> }
>> }
>> // eof
>>
>>
>> register 'mypiggybank.jar';
>> define CreateUUID piggybank.CreateUUID();
>>
>> input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray);
>> describe input_lines;
>> dump input_lines;
>>
>> new_list = FOREACH input_lines GENERATE line, CreateUUID();
>> describe new_list;
>> dump new_list;
>>
>> -- eof
>>
>>
>>> ________________________________
>>> From: Russell Jurney <[email protected]>
>>> To: [email protected]
>>> Sent: Sunday, May 27, 2012 4:56:07 PM
>>> Subject: Re: Create rdbms like sequence in Pig on Pig Relation
>>>
>>> It helps, but I am not able to invoke java.util.UUID.toString, maybe
>>> because it doesn't take an argument.  This is from the docs:
>>>
>>> DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String
>>> String');
>>> encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray);
>>> decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded,
>>> 'UTF-8');
>>>
>>>
>>> Maybe I forgot, but is this how I do it?
>>>
>>> DEFINE UUID InvokeForString('java.util.UUID.toString');
>>> with_uuid = FOREACH my_stuff generate UUID(), *;
>>>
>>>
>>> Sorry, I only understand example code - not APIs. My Java is quite weak.
>>>
>>> http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString()
>>>
>>> On Sun, May 27, 2012 at 2:33 AM, Subir S <[email protected]> wrote:
>>>
>>>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0
>>>>
>>>>
>>>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/
>>>>
>>>> Thanks
>>>>
>>>> On 5/24/12, Russell Jurney <[email protected]> wrote:
>>>>> Thanks, I mean how do you invoke it directly in grunt> from Pig?
>>>>>
>>>>> I keep messing it up for the last 30 minutes. Should I check the settings
>>>>> on my pacemaker, I feel like Fabio on NyQuil messing with this.
>>>>>
>>>>> On Wed, May 23, 2012 at 10:19 PM, Subir S <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hope this helps ->
>>>>>> http://www.javapractices.com/topic/TopicAction.do?Id=56
>>>>>>
>>>>>> and this ->
>>>>>>
>>>>>>
>>>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>> How do you invoke java.util.UUID.randomUUID?  There is no invoker that
>>>>>>> doesn't take an arg?
>>>>>>>
>>>>>>> On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I dont think so. However, its a single line java command. You can
>>>>>> create
>>>>>>>> customUDF for this and use in your code.
>>>>>>>>
>>>>>>>> java.util.UUID.randomUUID();
>>>>>>>>
>>>>>>>> ~Rajesh.B
>>>>>>>>
>>>>>>>> On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH
>>>>>>>> <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> Thanks Rajesh.
>>>>>>>>>
>>>>>>>>> Is GUID a built in UDF?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dipesh
>>>>>>>>>
>>>>>>>>> On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> If you do not bother about sequence number and the intention is
>>>>>>>>>> to
>>>>>>>> create
>>>>>>>>>> just unique key, you can just use GUID which doesn't require any
>>>>>>>>>> synchronization at all (all mappers can run in parallel).
>>>>>>>>>>
>>>>>>>>>> The approached I suggested in earlier mail comes into picture
>>>>>> mainly
>>>>>>>> for
>>>>>>>>>> sequence number.
>>>>>>>>>>
>>>>>>>>>> ~Rajesh.B
>>>>>>>>>>
>>>>>>>>>> On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Pig doesn't have that facility yet. Moreover, its not very
>>>>>>> efficient
>>>>>>>> to
>>>>>>>>>> do
>>>>>>>>>>> this in PIG/MR as it requires synchronization.
>>>>>>>>>>>
>>>>>>>>>>> However, if this is unavoidable situation for you, following
>>>>>> things
>>>>>>>> can
>>>>>>>>>> be
>>>>>>>>>>> considered
>>>>>>>>>>>
>>>>>>>>>>> 1. Maintaining the seq number details in zookeeper
>>>>>>>>>>> 2. Having a simple structure in HBase table (seqNumber -->
>>>>>> Value).
>>>>>>>> You
>>>>>>>>>> can
>>>>>>>>>>> get a bucket of values (ex: 1000-2000) from this and use it in
>>>>>> your
>>>>>>>>> UDF.
>>>>>>>>>>> When the range depletes, you have to query/update HBase table
>>>>>> (ex:
>>>>>>>>>>> 3000-4000). There are corner cases which needs to be handled.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ~Rajesh.B
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sorry, if my point was not clear.
>>>>>>>>>>>>
>>>>>>>>>>>> I wish to create a sequence on a pig relation.
>>>>>>>>>>>>
>>>>>>>>>>>> Say For example i have a relation with data:
>>>>>>>>>>>> (John, A-1)
>>>>>>>>>>>> (Jack, B-2)
>>>>>>>>>>>> (Jim, C-1)
>>>>>>>>>>>>
>>>>>>>>>>>> I want to create sequence i.e to add one more column to the
>>>>>>>> relation,
>>>>>>>>>> like
>>>>>>>>>>>> a counter and keep on increasing the count for each record
>>>>>>>>>>>> read.
>>>>>>>>>> Expected
>>>>>>>>>>>> output should be something like this:
>>>>>>>>>>>>
>>>>>>>>>>>> (If 200 is the start sequence. )
>>>>>>>>>>>> (John, A-1, 201)
>>>>>>>>>>>> (Jack, B-2, 202)
>>>>>>>>>>>> (Jim, C-1, 203)
>>>>>>>>>>>>
>>>>>>>>>>>> Could you please suggest to proceed on this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Dipesh
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <
>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> What do you mean by 'rdbms like sequence' ?
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Thejas
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I want to create a rdbms like sequence on a Pig relation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any existing UDF which could do this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am bit new to pig, Kindly suggest how to proceed?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks&  Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Dipesh Kr. Singh
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> ~Rajesh.B
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> ~Rajesh.B
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Dipesh Kr. Singh
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ~Rajesh.B
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>>> datasyndrome.com
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>> datasyndrome.com
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
>>>
>>>
>>>

Reply via email to