ahammad wrote:
I am aware that this is still a development version, but I need to test a few things with Nutch/Solr so I installed the latest dev version of Nutch 1.0.I tried running a crawl like I did with the working 0.9 version. From the log, it seems to fetch all the pages properly, but it fails at the indexing: CrawlDb update: starting CrawlDb update: db: kb/crawldb CrawlDb update: segments: [kb/segments/20090302135858] CrawlDb update: additions allowed: true CrawlDb update: URL normalizing: true CrawlDb update: URL filtering: true CrawlDb update: Merging segment data into db. CrawlDb update: done LinkDb: starting LinkDb: linkdb: kb/linkdb LinkDb: URL normalize: true LinkDb: URL filter: true LinkDb: adding segment: file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135757 LinkDb: adding segment: file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135807 LinkDb: adding segment: file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135858 LinkDb: done Indexer: starting Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.indexer.Indexer.index(Indexer.java:72) at org.apache.nutch.crawl.Crawl.main(Crawl.java:146) I took a look at all the configuration and as far as I can tell, I did the same thing with my 0.9 install. Could it be that I didn't install it properly? I unzipped it and ran ant and ant war in the root directory.
Please check the logs in the logs/ directory - the above message is not informative, the real reason of the failure can be found in the logs.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
