We use a mix of librairies dumped on the Hadoop classpath and
including the jars within the job's jar.

BTW the blog does mention when using option 3 to:

"Restart the TastTrackers when you are done. Do not forget to update
the jar when the underlying software changes."

Getting the classpath config right can be a pita, it helps show it
when you start a task (if you use HBase, the CP will be printed when
it starts its ZK client).

J-D

On Sun, Sep 25, 2011 at 10:43 PM, Steinmaurer Thomas
<[email protected]> wrote:
> Hello,
>
> regarding MR-job deployment, I read this Cloudera blog article:
> http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> In my case, I have to deploy the Oracle JDBC driver. I've tried the various 
> option discussed in the article and the only one which worked out-of-the box 
> was including the JDBC jar file into my JAR file in the lib folder. Copying 
> the JDBC jar into HADOOP_HOME/lib etc ... didn't work. Whenever the MR-Job 
> wasn't able to locate the JDBC driver, I get the infamous exception:
>
>
> java.io.IOException
>        at 
> org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutputFormat.java:180)
>        at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:559)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>        at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
>
> While I can embed the JDBC library with each build of our MR-job, I rather 
> would like to deploy the JDBC library into HADOOP_HOME/lib, because it is 
> rather static and other MR-jobs might depend on that as well. The interesting 
> thing is, when working with the Cloudera VMWare, a reboot after copying the 
> library into HADOOP_HOME/lib helped. So, how are you deploying your MR-jobs 
> into a real/live cluster without the need to restart something?
>
> Thanks a lot!
>
> Thomas
>
>
> _______________________________________________________
> DI Thomas Steinmaurer
> Industrial Researcher
> Software Competence Center Hagenberg GmbH
> Softwarepark 21, A-4232 Hagenberg, Austria
> UID: ATU 48056909 - FN: 184145b, Landesgericht Linz
> Tel. +43 7236 3343-896
> Fax +43 7236 3343-888
> mailto:[email protected]
> http://www.scch.at/
>
>

Reply via email to