Re: Only indexing pages meeting certain criteria

MilleBii Fri, 09 Oct 2009 08:50:35 -0700

I'm on 1.0 and it works fine, returning null from the indexingfilter actual
avoids indexing it.


SO you could consider switching to 1.0.


2009/10/8 Magnús Skúlason <magg...@gmail.com>

> Hi,
> I want nutch to only index some of the documents that it crawls, I have
> tried what is suggested here:
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11649.html
>
> That is in an IndexingFilter I check for the condition whether to index the
> document and if not I return null.
>
> When I then run the crawl I get the following error:
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)
>
> I am on nutch 0.9 few months older than the date in the original post, does
> anyone know what I might be doing wrong or why this is not working any
> more?
> If this has changed can anyone tell me how I can do this?
>
> best regards,
> Magnus
>



-- 
-MilleBii-

Re: Only indexing pages meeting certain criteria

Reply via email to