Thanks you for the detailed explanation, Andrzej.

My plugin contains one language-model(configuration file) whose size
is 40M, and could you please suggest me where the model file should
put.
 a) put it into nutch/conf dir like "regex-urlfilter.txt" file
 b) put it into plugin's jar package.

On 1/17/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Scott Green wrote:
> > Well, why should all resources needed to be packed?
>
> Because when you run Nutch on a Hadoop cluster, Hadoop requires that all
> job resources be packed into a job JAR, which is then submitted to each
> tasktracker as a part of the job. So, if you want to run in non-local
> mode you have to build the nutch-xxx.job JAR ("ant job" target).
>
> Apparently you are running in so called "local" mode, where these issues
> are quite muddy - but as soon as you try to execute it on a cluster your
> method will stop working.
>
>
> > The built result may looks like:
> >
> > xxx-plugin
> >  `--- conf
> >  `--- web
> >  `--- xxx-plugin.jar
> >  `--- deps.jar
> >  `-- plugin.xml
>
> Again: in the "local" mode this may work, but these unpacked plugins are
> not available for jobs executing on a Hadoop cluster.
>
> >
> >> Now, you may have tested your method and found that it does indeed work
> >> - but the reason is a bit obscure: the bin/nutch and bin/hadoop scripts
> >> add your build/ directory to the classpath, so that you can locally test
> >> the latest versions of the code without creating the *.job file.
> >> However, when you run your code on a Hadoop cluster your local build/
> >> directory is no longer accessible, and your method will mysteriously
> >> fail - or even worse, you may get a different version of a resource from
> >> an older version of the build/ directory found on Hadoop tasktracker
> >> nodes ...
> >
> > If you packed everything into jar(s), it is possible that the jar on
> > hadoop tasktracker node is old version, right?
>
> No. The job jar is always up to date, because it is sent with every job.
>
> But if you don't get the resources from this jar, and instead rely on
> using java.io.File-s, you may pick some old cruft from the local build/
> directory that you may have accidentally deployed to your tasktrackers ...
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to