Re: nutch doesn't run on cdh and hadoop 0.21 [FIXED with workaround]

Andrzej Bialecki Tue, 23 Nov 2010 04:50:08 -0800

On 2010-11-23 12:38, Claudio Martella wrote:
> Hello list,
> 
> in my previous posts i reported about not being able to run nutch on a
> hadoop cluster running cloudera's cdh 0.20.2+737.
> (http://search.lucidimagination.com/search/document/b66fa844b87b2654/failure_running_on_hadoop#52c43d8c4137ea8c
> and
> http://search.lucidimagination.com/search/document/a2b151e6a7041c13/nutch_1_x_doesn_t_run_on_cloudera_s_cdh3#2991508ce0ae5d52)
> 
> Basically the problem was hadoop not finding some nutch plugin classes
> like URLNormalizer etc.
> 
> I reported back to cloudera directly
> (https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/9147acfc4d18cfaf#)
> and it looks like the problem is connected to MAPREDUCE-967 (now part of
> hadoop 0.21 and backported by cloudera to their cdh 0.20.2).
> 
> What the patch does is basically modify the way MapReduce unpacks the
> job's jar. The old way was to unpack the whole of it, now only classes/
> and lib/ are unpacked. This way nutch is missing the plugins/ directory.
> The nutch job format should be changed accordingly.
> 
> Todd Lipcon suggested a workaround until that moment: setting
> 'mapreduce.job.jar.unpack.pattern' configuration to
> "(?:classes/|lib/|plugins/).*"
> 
> Should I file a JIRA?


Thank you for tracking this down! Yes, please do. I think that Nutch 1.3
(if it's ever released ;) ) is going to use a more recent version of
Hadoop, the same with trunk, but at least people will be able to find
this information.


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: nutch doesn't run on cdh and hadoop 0.21 [FIXED with workaround]

Reply via email to