There is an uncommitted Piggybank UDF which may help you. https://issues.apache.org/jira/browse/PIG-1229 You can try the first patch ( pig-1229.2.patch by Ankur ) listed on the page It does a different thing of writing rows from Pig into the DB. But DB connection part you can borrow from it.
Note to self: I really want to get this patch committed before more people reinvent the wheel of making Pig talk to DB. On Thu, Jul 1, 2010 at 09:48, Dmitriy Ryaboy <[email protected]> wrote: > Also -- I hope your cluster is not too big. It's really easy to DDOS your > database using hadoop. > > On Thu, Jul 1, 2010 at 9:47 AM, Dmitriy Ryaboy <[email protected]> wrote: > >> The simplest thing you can do is to have database handle at the object >> level, set it to null, and just initialize it in eval() if you see that it's >> null. >> You can also init the connection in the constructor. >> A static dbh will let you share it across tasks, if you persist the jvm. >> Naturally you will want to throw in some code to handle dropped connections >> and all that. >> >> >> >> On Thu, Jul 1, 2010 at 9:01 AM, Dave Viner <[email protected]> wrote: >> >>> In a custom UDF, what's the most appropriate way to initialize and connect >>> to a old-fashioned rdbms? >>> >>> I wrote a simple UDF which opens/closes a connection on each exec(), but >>> this feels a bit like overkill. Is there an "init()" method that is >>> invoked >>> in a UDF to help with one-time initialization (like a database connection >>> or >>> sql query preparation)? >>> >>> Thanks >>> Dave Viner >>> >> >> >
