I'm on 1.0 and it works fine, returning null from the indexingfilter actual avoids indexing it.
SO you could consider switching to 1.0. 2009/10/8 Magnús Skúlason <magg...@gmail.com> > Hi, > I want nutch to only index some of the documents that it crawls, I have > tried what is suggested here: > http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11649.html > > That is in an IndexingFilter I check for the condition whether to index the > document and if not I return null. > > When I then run the crawl I get the following error: > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604) > at org.apache.nutch.indexer.Indexer.index(Indexer.java:273) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:134) > > I am on nutch 0.9 few months older than the date in the original post, does > anyone know what I might be doing wrong or why this is not working any > more? > If this has changed can anyone tell me how I can do this? > > best regards, > Magnus > -- -MilleBii-