On 2010-11-23 12:38, Claudio Martella wrote: > Hello list, > > in my previous posts i reported about not being able to run nutch on a > hadoop cluster running cloudera's cdh 0.20.2+737. > (http://search.lucidimagination.com/search/document/b66fa844b87b2654/failure_running_on_hadoop#52c43d8c4137ea8c > and > http://search.lucidimagination.com/search/document/a2b151e6a7041c13/nutch_1_x_doesn_t_run_on_cloudera_s_cdh3#2991508ce0ae5d52) > > Basically the problem was hadoop not finding some nutch plugin classes > like URLNormalizer etc. > > I reported back to cloudera directly > (https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/9147acfc4d18cfaf#) > and it looks like the problem is connected to MAPREDUCE-967 (now part of > hadoop 0.21 and backported by cloudera to their cdh 0.20.2). > > What the patch does is basically modify the way MapReduce unpacks the > job's jar. The old way was to unpack the whole of it, now only classes/ > and lib/ are unpacked. This way nutch is missing the plugins/ directory. > The nutch job format should be changed accordingly. > > Todd Lipcon suggested a workaround until that moment: setting > 'mapreduce.job.jar.unpack.pattern' configuration to > "(?:classes/|lib/|plugins/).*" > > Should I file a JIRA?
Thank you for tracking this down! Yes, please do. I think that Nutch 1.3 (if it's ever released ;) ) is going to use a more recent version of Hadoop, the same with trunk, but at least people will be able to find this information. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com

