Re: Problem with crawling using the latest 1.0 trunk

Andrzej Bialecki Mon, 02 Mar 2009 11:19:25 -0800

ahammad wrote:

I am aware that this is still a development version, but I need to test a few
things with Nutch/Solr so I installed the latest dev version of Nutch 1.0.


I tried running a crawl like I did with the working 0.9 version. From the
log, it seems to fetch all the pages properly, but it fails at the indexing:

CrawlDb update: starting
CrawlDb update: db: kb/crawldb
CrawlDb update: segments: [kb/segments/20090302135858]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: kb/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135757
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135807
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135858
LinkDb: done
Indexer: starting
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:146)


I took a look at all the configuration and as far as I can tell, I did the
same thing with my 0.9 install. Could it be that I didn't install it
properly? I unzipped it and ran ant and ant war in the root directory.

Please check the logs in the logs/ directory - the above message is notinformative, the real reason of the failure can be found in the logs.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Problem with crawling using the latest 1.0 trunk

Reply via email to