Hello,

regarding MR-job deployment, I read this Cloudera blog article:
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

In my case, I have to deploy the Oracle JDBC driver. I've tried the various 
option discussed in the article and the only one which worked out-of-the box 
was including the JDBC jar file into my JAR file in the lib folder. Copying the 
JDBC jar into HADOOP_HOME/lib etc ... didn't work. Whenever the MR-Job wasn't 
able to locate the JDBC driver, I get the infamous exception:


java.io.IOException
        at 
org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutputFormat.java:180)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at org.apache.hadoop.mapred.Child.main(Child.java:264)


While I can embed the JDBC library with each build of our MR-job, I rather 
would like to deploy the JDBC library into HADOOP_HOME/lib, because it is 
rather static and other MR-jobs might depend on that as well. The interesting 
thing is, when working with the Cloudera VMWare, a reboot after copying the 
library into HADOOP_HOME/lib helped. So, how are you deploying your MR-jobs 
into a real/live cluster without the need to restart something?

Thanks a lot!

Thomas


_______________________________________________________ 
DI Thomas Steinmaurer
Industrial Researcher
Software Competence Center Hagenberg GmbH
Softwarepark 21, A-4232 Hagenberg, Austria
UID: ATU 48056909 - FN: 184145b, Landesgericht Linz
Tel. +43 7236 3343-896
Fax +43 7236 3343-888
mailto:[email protected]
http://www.scch.at/

Reply via email to