Hi,
and thanks for being persistent. Can you specify what is the version of
nutch that you are running, is it a nightly build (if yes, which one?)
or did you check out the svn trunk? And just to be sure: you are running
with default configuration?
--
Sami Siren
ahammad wrote:
I checked hadoop.log and this is what it has:
java.lang.IllegalArgumentException: it doesn't make sense to have a field
that is neither indexed nor stored
at org.apache.lucene.document.Field.<init>(Field.java:279)
at
org.apache.nutch.indexer.lucene.LuceneWriter.createLuceneDoc(LuceneWriter.java:133)
at
org.apache.nutch.indexer.lucene.LuceneWriter.write(LuceneWriter.java:239)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:40)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:410)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:158)
at
org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:170)
I don't understand what that refers to specifically. I'm running it at it's
default configuration, without any of the advanced indexing that I have in
my 0.9 install.
Cheers.
Andrzej Bialecki wrote:
ahammad wrote:
I am aware that this is still a development version, but I need to test a
few
things with Nutch/Solr so I installed the latest dev version of Nutch
1.0.
I tried running a crawl like I did with the working 0.9 version. From the
log, it seems to fetch all the pages properly, but it fails at the
indexing:
CrawlDb update: starting
CrawlDb update: db: kb/crawldb
CrawlDb update: segments: [kb/segments/20090302135858]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: kb/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135757
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135807
LinkDb: adding segment:
file:/c:/nutch-2009-03-02_04-01-53/kb/segments/20090302135858
LinkDb: done
Indexer: starting
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:146)
I took a look at all the configuration and as far as I can tell, I did
the
same thing with my 0.9 install. Could it be that I didn't install it
properly? I unzipped it and ran ant and ant war in the root directory.
Please check the logs in the logs/ directory - the above message is not
informative, the real reason of the failure can be found in the logs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com