Thanks you for the detailed explanation, Andrzej. My plugin contains one language-model(configuration file) whose size is 40M, and could you please suggest me where the model file should put. a) put it into nutch/conf dir like "regex-urlfilter.txt" file b) put it into plugin's jar package.
On 1/17/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Scott Green wrote: > > Well, why should all resources needed to be packed? > > Because when you run Nutch on a Hadoop cluster, Hadoop requires that all > job resources be packed into a job JAR, which is then submitted to each > tasktracker as a part of the job. So, if you want to run in non-local > mode you have to build the nutch-xxx.job JAR ("ant job" target). > > Apparently you are running in so called "local" mode, where these issues > are quite muddy - but as soon as you try to execute it on a cluster your > method will stop working. > > > > The built result may looks like: > > > > xxx-plugin > > `--- conf > > `--- web > > `--- xxx-plugin.jar > > `--- deps.jar > > `-- plugin.xml > > Again: in the "local" mode this may work, but these unpacked plugins are > not available for jobs executing on a Hadoop cluster. > > > > >> Now, you may have tested your method and found that it does indeed work > >> - but the reason is a bit obscure: the bin/nutch and bin/hadoop scripts > >> add your build/ directory to the classpath, so that you can locally test > >> the latest versions of the code without creating the *.job file. > >> However, when you run your code on a Hadoop cluster your local build/ > >> directory is no longer accessible, and your method will mysteriously > >> fail - or even worse, you may get a different version of a resource from > >> an older version of the build/ directory found on Hadoop tasktracker > >> nodes ... > > > > If you packed everything into jar(s), it is possible that the jar on > > hadoop tasktracker node is old version, right? > > No. The job jar is always up to date, because it is sent with every job. > > But if you don't get the resources from this jar, and instead rely on > using java.io.File-s, you may pick some old cruft from the local build/ > directory that you may have accidentally deployed to your tasktrackers ... > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers