Nutch is looking good. Is Solr running at all? The example doesn't have a log 
output, it jut writes to stdout when running. If you're running under Tomcat 
Solr logs are by default written to catalina.out.

On Thursday 10 February 2011 14:31:27 McGibbney, Lewis John wrote:
> Hi Markus
> 
> Ok first is first, here is Hadoop.log
> 
> 2011-02-09 23:24:11,826 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter 2011-02-09 23:24:11,828
> INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2011-02-09
> 23:24:11,875 INFO  solr.SolrMappingReader - source: content dest: content
> 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: site dest:
> site 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: title
> dest: title 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source:
> host dest: host 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader -
> source: segment dest: segment 2011-02-09 23:24:11,876 INFO 
> solr.SolrMappingReader - source: boost dest: boost 2011-02-09 23:24:11,876
> INFO  solr.SolrMappingReader - source: digest dest: digest 2011-02-09
> 23:24:11,876 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
> 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url dest:
> id 2011-02-09 23:24:11,876 INFO  solr.SolrMappingReader - source: url
> dest: url 2011-02-09 23:24:13,626 WARN  mapred.LocalJobRunner -
> job_local_0001 org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
>         at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:424) at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
> pSolrServer.java:243) at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstrac
> tUpdateRequest.java:105) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> org.apache.nutch.indexer.solr.SolrWriter.write(SolrWriter.java:64) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j
> ava:54) at
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.j
> ava:44) at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:440) at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:159
> ) at
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:50)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> 2011-02-09 23:24:14,128 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!
> 
> I am unsure of where to get Solr output as I have been unable to progress
> past the stage above. I have been indexing directly from Nutch to vanilla
> Solr 1.4.1 dist, but this is my first attempt at indexing to my own app.
> Within my web app I have added following dirs:
> 
> bin (empty)
> conf (usual nutch schema, solrconfig with Nutch requestHandler, scripts,
> synonyms, etc) data (index and spellchecker dirs! Each containing
> segments.gen and segments_1) dist (as per 1.4.1 solr version)
> lib (as above)
> 
> I hope that this is sufficient
> 
> Lewis
> ________________________________________
> From: Markus Jelsma [[email protected]]
> Sent: 10 February 2011 10:58
> To: [email protected]
> Cc: McGibbney, Lewis John
> Subject: Re: Index with Solr to my own webapp
> 
> Yes, please show us the hadoop.log output and the Solr output. The latter
> is in this stage usually more important. You might write to not-existing
> fields or writing multiple values to a single valued field or...
> whatever's happening.
> 
> On Thursday 10 February 2011 00:36:21 McGibbney, Lewis John wrote:
> > Hi list,
> > 
> > I am at Solr indexing stage and seem to have hit trouble when sending
> > crawldb linkdb and segments/* to Solr to be indexed. I have added xml
> > file to $CATALINA_HOME/cong/catalina/localhost with my webapp specifics.
> > My Solr 1.4.1 implementation resides within my web app at following
> > location /home/lewis/Downloads/mywebapp but when I send this command to
> > index with Solr
> > 
> > lewis@lewis-01:~/Downloads/nutch-1.2$ bin/nutch solrindex
> > http://127.0.0.1:8080/mywebapp crawl/crawldb crawl/linkdb
> > crawl/segments/*
> > 
> > I am getting java.io.IOException: Job failed!
> > 
> > I had experienced this before when I was using the Solrindex command
> > incorrectly, I am hoping that this is not the case, however, it is late
> > and I might have missed something simple.
> > 
> > I have Hadoop.log if this would help at all.
> > 
> > Any suggestions please. Thanks
> > 
> > Lewis
> > 
> > Glasgow Caledonian University is a registered Scottish charity, number
> > SC021474
> > 
> > Winner: Times Higher Education’s Widening Participation Initiative of the
> > Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219
> > , en.html
> > 
> > Winner: Times Higher Education’s Outstanding Support for Early Career
> > Researchers of the Year 2010, GCU as a lead with Universities Scotland
> > partners.
> > http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,1569
> > 1 ,en.html
> 
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
> Email has been scanned for viruses by Altman Technologies' email management
> service - www.altman.co.uk/emailsystems
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,
> en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691
> ,en.html

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to