Hi,
i dont know which plugin did you modify, but before runing your ANT deletre from the /nutch-1.0/build/ the plugin you modified, and runing the ANT copy from the build folder the plugin to /nutch-1.0/plugins/ thx > Date: Thu, 8 Oct 2009 21:46:42 +0200 > Subject: Only indexing pages meeting certain criteria > From: magg...@gmail.com > To: nutch-user@lucene.apache.org > > Hi, > I want nutch to only index some of the documents that it crawls, I have > tried what is suggested here: > http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11649.html > > That is in an IndexingFilter I check for the condition whether to index the > document and if not I return null. > > When I then run the crawl I get the following error: > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:273) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:134) > > I am on nutch 0.9 few months older than the date in the original post, does > anyone know what I might be doing wrong or why this is not working any more? > If this has changed can anyone tell me how I can do this? > > best regards, > Magnus _________________________________________________________________ Click less, chat more: Messenger on MSN.ca http://go.microsoft.com/?linkid=9677404