Hi,

i dont know which plugin did you modify, but before runing your ANT deletre 
from the /nutch-1.0/build/ the plugin you modified, and runing the ANT copy 
from the  build folder the plugin to /nutch-1.0/plugins/

thx



> Date: Thu, 8 Oct 2009 21:46:42 +0200
> Subject: Only indexing pages meeting certain criteria
> From: magg...@gmail.com
> To: nutch-user@lucene.apache.org
> 
> Hi,
> I want nutch to only index some of the documents that it crawls, I have
> tried what is suggested here:
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11649.html
> 
> That is in an IndexingFilter I check for the condition whether to index the
> document and if not I return null.
> 
> When I then run the crawl I get the following error:
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)
> 
> I am on nutch 0.9 few months older than the date in the original post, does
> anyone know what I might be doing wrong or why this is not working any more?
> If this has changed can anyone tell me how I can do this?
> 
> best regards,
> Magnus
                                          
_________________________________________________________________
Click less, chat more: Messenger on MSN.ca
http://go.microsoft.com/?linkid=9677404

Reply via email to