Hello,
regarding MR-job deployment, I read this Cloudera blog article:
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
In my case, I have to deploy the Oracle JDBC driver. I've tried the various
option discussed in the article and the only one which worked out-of-the box
was including the JDBC jar file into my JAR file in the lib folder. Copying the
JDBC jar into HADOOP_HOME/lib etc ... didn't work. Whenever the MR-Job wasn't
able to locate the JDBC driver, I get the infamous exception:
java.io.IOException
at
org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutputFormat.java:180)
at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
While I can embed the JDBC library with each build of our MR-job, I rather
would like to deploy the JDBC library into HADOOP_HOME/lib, because it is
rather static and other MR-jobs might depend on that as well. The interesting
thing is, when working with the Cloudera VMWare, a reboot after copying the
library into HADOOP_HOME/lib helped. So, how are you deploying your MR-jobs
into a real/live cluster without the need to restart something?
Thanks a lot!
Thomas
_______________________________________________________
DI Thomas Steinmaurer
Industrial Researcher
Software Competence Center Hagenberg GmbH
Softwarepark 21, A-4232 Hagenberg, Austria
UID: ATU 48056909 - FN: 184145b, Landesgericht Linz
Tel. +43 7236 3343-896
Fax +43 7236 3343-888
mailto:[email protected]
http://www.scch.at/