Nutch is looking good. Is Solr running at all? The example doesn't have a log output, it jut writes to stdout when running. If you're running under Tomcat Solr logs are by default written to catalina.out.
On Thursday 10 February 2011 14:31:27 McGibbney, Lewis John wrote: > Hi Markus > > Ok first is first, here is Hadoop.log > > 2011-02-09 23:24:11,826 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-02-09 23:24:11,828 > INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-02-09 > 23:24:11,875 INFO solr.SolrMappingReader - source: content dest: content > 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - source: site dest: > site 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - source: title > dest: title 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - source: > host dest: host 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - > source: segment dest: segment 2011-02-09 23:24:11,876 INFO > solr.SolrMappingReader - source: boost dest: boost 2011-02-09 23:24:11,876 > INFO solr.SolrMappingReader - source: digest dest: digest 2011-02-09 > 23:24:11,876 INFO solr.SolrMappingReader - source: tstamp dest: tstamp > 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - source: url dest: > id 2011-02-09 23:24:11,876 INFO solr.SolrMappingReader - source: url > dest: url 2011-02-09 23:24:13,626 WARN mapred.LocalJobRunner - > job_local_0001 org.apache.solr.common.SolrException: Not Found > > Not Found > > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:424) at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt > pSolrServer.java:243) at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac > tUpdateRequest.java:105) at > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j > ava:54) at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j > ava:44) at > org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:440) at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159 > ) at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > 2011-02-09 23:24:14,128 ERROR solr.SolrIndexer - java.io.IOException: Job > failed! > > I am unsure of where to get Solr output as I have been unable to progress > past the stage above. I have been indexing directly from Nutch to vanilla > Solr 1.4.1 dist, but this is my first attempt at indexing to my own app. > Within my web app I have added following dirs: > > bin (empty) > conf (usual nutch schema, solrconfig with Nutch requestHandler, scripts, > synonyms, etc) data (index and spellchecker dirs! Each containing > segments.gen and segments_1) dist (as per 1.4.1 solr version) > lib (as above) > > I hope that this is sufficient > > Lewis > ________________________________________ > From: Markus Jelsma [[email protected]] > Sent: 10 February 2011 10:58 > To: [email protected] > Cc: McGibbney, Lewis John > Subject: Re: Index with Solr to my own webapp > > Yes, please show us the hadoop.log output and the Solr output. The latter > is in this stage usually more important. You might write to not-existing > fields or writing multiple values to a single valued field or... > whatever's happening. > > On Thursday 10 February 2011 00:36:21 McGibbney, Lewis John wrote: > > Hi list, > > > > I am at Solr indexing stage and seem to have hit trouble when sending > > crawldb linkdb and segments/* to Solr to be indexed. I have added xml > > file to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics. > > My Solr 1.4.1 implementation resides within my web app at following > > location /home/lewis/Downloads/mywebapp but when I send this command to > > index with Solr > > > > lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex > > http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb > > crawl/segments/* > > > > I am getting java.io.IOException: Job failed! > > > > I had experienced this before when I was using the Solrindex command > > incorrectly, I am hoping that this is not the case, however, it is late > > and I might have missed something simple. > > > > I have Hadoop.log if this would help at all. > > > > Any suggestions please. Thanks > > > > Lewis > > > > Glasgow Caledonian University is a registered Scottish charity, number > > SC021474 > > > > Winner: Times Higher Education’s Widening Participation Initiative of the > > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219 > > , en.html > > > > Winner: Times Higher Education’s Outstanding Support for Early Career > > Researchers of the Year 2010, GCU as a lead with Universities Scotland > > partners. > > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,1569 > > 1 ,en.html > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > Email has been scanned for viruses by Altman Technologies' email management > service - www.altman.co.uk/emailsystems > > Glasgow Caledonian University is a registered Scottish charity, number > SC021474 > > Winner: Times Higher Education’s Widening Participation Initiative of the > Year 2009 and Herald Society’s Education Initiative of the Year 2009. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219, > en.html > > Winner: Times Higher Education’s Outstanding Support for Early Career > Researchers of the Year 2010, GCU as a lead with Universities Scotland > partners. > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691 > ,en.html -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

