Scott Green wrote:
> Well, why should all resources needed to be packed?

Because when you run Nutch on a Hadoop cluster, Hadoop requires that all 
job resources be packed into a job JAR, which is then submitted to each 
tasktracker as a part of the job. So, if you want to run in non-local 
mode you have to build the nutch-xxx.job JAR ("ant job" target).

Apparently you are running in so called "local" mode, where these issues 
are quite muddy - but as soon as you try to execute it on a cluster your 
method will stop working.


> The built result may looks like:
>
> xxx-plugin
>  `--- conf
>  `--- web
>  `--- xxx-plugin.jar
>  `--- deps.jar
>  `-- plugin.xml

Again: in the "local" mode this may work, but these unpacked plugins are 
not available for jobs executing on a Hadoop cluster.

>
>> Now, you may have tested your method and found that it does indeed work
>> - but the reason is a bit obscure: the bin/nutch and bin/hadoop scripts
>> add your build/ directory to the classpath, so that you can locally test
>> the latest versions of the code without creating the *.job file.
>> However, when you run your code on a Hadoop cluster your local build/
>> directory is no longer accessible, and your method will mysteriously
>> fail - or even worse, you may get a different version of a resource from
>> an older version of the build/ directory found on Hadoop tasktracker
>> nodes ...
>
> If you packed everything into jar(s), it is possible that the jar on
> hadoop tasktracker node is old version, right?

No. The job jar is always up to date, because it is sent with every job.

But if you don't get the resources from this jar, and instead rely on 
using java.io.File-s, you may pick some old cruft from the local build/ 
directory that you may have accidentally deployed to your tasktrackers ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to