Hello Jim and Tolga Thanks for this... copied nutch's schema.xml to solr and it works.
When runing bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 5 topN 1000 Only seems to index 8 docs because in solr's admin did a query string search for *:* returns only 8 docs in the results. Have tried stopping and starting solr and running nutch again (using different depth and topN parameters) and the result is always the same.. Have tried to add more seeds to the urls\seeds.txt list with separate urls on a new line but same. what commands in nutch can I use to get it to crawl the site again and add to solr's index.. Tried bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 5 topN 1000 solrindex But this gives error Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:c:/nutch14/runtime/local/solrindex Thank you On Fri, May 18, 2012 at 9:20 PM, Jim Chandler <[email protected]>wrote: > You need to add the site field in your schema.xml - in your solr. > > Jim > > On Fri, May 18, 2012 at 12:58 AM, cameron tran <[email protected] > >wrote: > > > Hello > > > > I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to > > http://127.0.0.1:8983/solr/ but is getting the following error. Using > Solr > > 3.6.0.. Please error in bold below. > > > > Is there some incompatability issue? > > > > Ran > > bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth > 3 > > topN 300 > > > > Thank you for your help > > > > org.apache.solr.common.SolrException: ERROR: [doc= > http://www.website.com/] > > unknown field 'site' > > > > *ERROR: [doc=http://www.website.com/] unknown field 'site'* > > > > request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2 > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) > > at > > > > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > > at > > > > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) > > at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93) > > at > > > > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > > *2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException: > Job > > failed!* > > 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates - > > SolrDeleteDuplicates: starting at 2012-05-18 14:21:46 > > 2012-05-18 14:21:46,921 INFO solr.SolrDeleteDuplicates - > > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr > > 2012-05-18 14:21:48,640 INFO solr.SolrDeleteDuplicates - > > SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01 > > 2012-05-18 14:21:48,640 INFO crawl.Crawl - crawl finished: > > crawl-20120518141951 > > >

