You have to call UUID.randomUUID() to get an UUID, but you cannot use DEFINE
to do that since DEFINE does not support methods that return arbitrary classes.

Wrapping it into an UDF, works just fine,

package piggybank;

import java.io.IOException;
import java.util.UUID;

import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class CreateUUID
extends EvalFunc<String>
{
public String exec(Tuple input)
throws IOException
{
try
{
return UUID.randomUUID().toString(); 
}
catch(Exception e)
{
// Throwing an exception will cause the task to fail.
throw new IOException("Something bad happened!", e);
}
}
}
// eof


register 'mypiggybank.jar';
define CreateUUID piggybank.CreateUUID();

input_lines = LOAD 'test_CreateUUID.in' AS (line:chararray);
describe input_lines;
dump input_lines;

new_list = FOREACH input_lines GENERATE line, CreateUUID();
describe new_list;
dump new_list;

-- eof


>________________________________
> From: Russell Jurney <[email protected]>
>To: [email protected] 
>Sent: Sunday, May 27, 2012 4:56:07 PM
>Subject: Re: Create rdbms like sequence in Pig on Pig Relation
> 
>It helps, but I am not able to invoke java.util.UUID.toString, maybe
>because it doesn't take an argument.  This is from the docs:
>
>DEFINE UrlDecode InvokeForString('java.net.URLDecoder.decode', 'String
>String');
>encoded_strings = LOAD 'encoded_strings.txt' as (encoded:chararray);
>decoded_strings = FOREACH encoded_strings GENERATE UrlDecode(encoded,
>'UTF-8');
>
>
>Maybe I forgot, but is this how I do it?
>
>DEFINE UUID InvokeForString('java.util.UUID.toString');
>with_uuid = FOREACH my_stuff generate UUID(), *;
>
>
>Sorry, I only understand example code - not APIs. My Java is quite weak.
>
>http://docs.oracle.com/javase/6/docs/api/java/util/UUID.html#toString()
>
>On Sun, May 27, 2012 at 2:33 AM, Subir S <[email protected]> wrote:
>
>> I hope this helps. DynamicInvoker feature in Pig. Added in 0.8.0
>>
>>
>> http://squarecog.wordpress.com/2010/08/20/upcoming-features-in-pig-0-8-dynamic-invokers/
>>
>> Thanks
>>
>> On 5/24/12, Russell Jurney <[email protected]> wrote:
>> > Thanks, I mean how do you invoke it directly in grunt> from Pig?
>> >
>> > I keep messing it up for the last 30 minutes. Should I check the settings
>> > on my pacemaker, I feel like Fabio on NyQuil messing with this.
>> >
>> > On Wed, May 23, 2012 at 10:19 PM, Subir S <[email protected]>
>> > wrote:
>> >
>> >> Hope this helps ->
>> >> http://www.javapractices.com/topic/TopicAction.do?Id=56
>> >>
>> >> and this ->
>> >>
>> >>
>> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html#randomUUID%28%29
>> >>
>> >> Thanks
>> >>
>> >>
>> >>
>> >> On Thu, May 24, 2012 at 10:42 AM, Russell Jurney
>> >> <[email protected]>wrote:
>> >>
>> >> > How do you invoke java.util.UUID.randomUUID?  There is no invoker that
>> >> > doesn't take an arg?
>> >> >
>> >> > On Sun, May 20, 2012 at 6:26 PM, Rajesh Balamohan <
>> >> > [email protected]> wrote:
>> >> >
>> >> > > I dont think so. However, its a single line java command. You can
>> >> create
>> >> > > customUDF for this and use in your code.
>> >> > >
>> >> > > java.util.UUID.randomUUID();
>> >> > >
>> >> > > ~Rajesh.B
>> >> > >
>> >> > > On Sun, May 20, 2012 at 8:15 AM, DIPESH KUMAR SINGH
>> >> > > <[email protected]>wrote:
>> >> > >
>> >> > > > Thanks Rajesh.
>> >> > > >
>> >> > > > Is GUID a built in UDF?
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Dipesh
>> >> > > >
>> >> > > > On Sun, May 20, 2012 at 8:06 AM, Rajesh Balamohan <
>> >> > > > [email protected]> wrote:
>> >> > > >
>> >> > > > > If you do not bother about sequence number and the intention is
>> >> > > > > to
>> >> > > create
>> >> > > > > just unique key, you can just use GUID which doesn't require any
>> >> > > > > synchronization at all (all mappers can run in parallel).
>> >> > > > >
>> >> > > > > The approached I suggested in earlier mail comes into picture
>> >> mainly
>> >> > > for
>> >> > > > > sequence number.
>> >> > > > >
>> >> > > > > ~Rajesh.B
>> >> > > > >
>> >> > > > > On Sun, May 20, 2012 at 8:02 AM, Rajesh Balamohan <
>> >> > > > > [email protected]> wrote:
>> >> > > > >
>> >> > > > > > Pig doesn't have that facility yet. Moreover, its not very
>> >> > efficient
>> >> > > to
>> >> > > > > do
>> >> > > > > > this in PIG/MR as it requires synchronization.
>> >> > > > > >
>> >> > > > > > However, if this is unavoidable situation for you, following
>> >> things
>> >> > > can
>> >> > > > > be
>> >> > > > > > considered
>> >> > > > > >
>> >> > > > > > 1. Maintaining the seq number details in zookeeper
>> >> > > > > > 2. Having a simple structure in HBase table (seqNumber -->
>> >> Value).
>> >> > > You
>> >> > > > > can
>> >> > > > > > get a bucket of values (ex: 1000-2000) from this and use it in
>> >> your
>> >> > > > UDF.
>> >> > > > > > When the range depletes, you have to query/update HBase table
>> >> (ex:
>> >> > > > > > 3000-4000). There are corner cases which needs to be handled.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > ~Rajesh.B
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On Sat, May 19, 2012 at 12:04 AM, DIPESH KUMAR SINGH <
>> >> > > > > > [email protected]> wrote:
>> >> > > > > >
>> >> > > > > >> Sorry, if my point was not clear.
>> >> > > > > >>
>> >> > > > > >> I wish to create a sequence on a pig relation.
>> >> > > > > >>
>> >> > > > > >> Say For example i have a relation with data:
>> >> > > > > >> (John, A-1)
>> >> > > > > >> (Jack, B-2)
>> >> > > > > >> (Jim, C-1)
>> >> > > > > >>
>> >> > > > > >> I want to create sequence i.e to add one more column to the
>> >> > > relation,
>> >> > > > > like
>> >> > > > > >> a counter and keep on increasing the count for each record
>> >> > > > > >> read.
>> >> > > > > Expected
>> >> > > > > >> output should be something like this:
>> >> > > > > >>
>> >> > > > > >> (If 200 is the start sequence. )
>> >> > > > > >> (John, A-1, 201)
>> >> > > > > >> (Jack, B-2, 202)
>> >> > > > > >> (Jim, C-1, 203)
>> >> > > > > >>
>> >> > > > > >> Could you please suggest to proceed on this?
>> >> > > > > >>
>> >> > > > > >> Thanks,
>> >> > > > > >> Dipesh
>> >> > > > > >>
>> >> > > > > >> On Fri, May 18, 2012 at 6:50 AM, Thejas Nair <
>> >> > > [email protected]>
>> >> > > > > >> wrote:
>> >> > > > > >>
>> >> > > > > >> > What do you mean by 'rdbms like sequence' ?
>> >> > > > > >> > Thanks,
>> >> > > > > >> > Thejas
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >> > On 5/16/12 10:41 AM, DIPESH KUMAR SINGH wrote:
>> >> > > > > >> >
>> >> > > > > >> >> I want to create a rdbms like sequence on a Pig relation.
>> >> > > > > >> >>
>> >> > > > > >> >> Is there any existing UDF which could do this?
>> >> > > > > >> >>
>> >> > > > > >> >> I am bit new to pig, Kindly suggest how to proceed?
>> >> > > > > >> >>
>> >> > > > > >> >>
>> >> > > > > >> >> Thanks&  Regards,
>> >> > > > > >> >>
>> >> > > > > >> >
>> >> > > > > >> >
>> >> > > > > >>
>> >> > > > > >>
>> >> > > > > >> --
>> >> > > > > >> Dipesh Kr. Singh
>> >> > > > > >>
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > --
>> >> > > > > > ~Rajesh.B
>> >> > > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > >
>> >> > > > > --
>> >> > > > > ~Rajesh.B
>> >> > > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > Dipesh Kr. Singh
>> >> > > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > ~Rajesh.B
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Russell Jurney twitter.com/rjurney [email protected]
>> >> > datasyndrome.com
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Russell Jurney twitter.com/rjurney [email protected]
>> > datasyndrome.com
>> >
>>
>
>
>
>-- 
>Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
>
>
>

Reply via email to