Hello Jim and Tolga

Thanks for this... copied nutch's schema.xml to solr and it works.

When runing
bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth 5
topN 1000

Only seems to index 8 docs because in solr's admin did a query string
search for *:*

returns only 8 docs in the results.

Have tried stopping and starting solr and running nutch again (using
different depth and topN parameters) and the result is always the same..
Have tried to add more seeds to the urls\seeds.txt list with separate urls
on a new line but same.

what commands in nutch can I use to get it to crawl the site again and add
to solr's index..

Tried bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3
-depth 5 topN 1000 solrindex

But this gives error

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:c:/nutch14/runtime/local/solrindex

Thank you


On Fri, May 18, 2012 at 9:20 PM, Jim Chandler <[email protected]>wrote:

> You need to add the site field in your schema.xml - in your solr.
>
> Jim
>
> On Fri, May 18, 2012 at 12:58 AM, cameron tran <[email protected]
> >wrote:
>
> > Hello
> >
> > I am trying to get Nutch 1.4 (downloaded binary) to do solrindex to
> > http://127.0.0.1:8983/solr/ but is getting the following error. Using
> Solr
> > 3.6.0.. Please error in bold below.
> >
> > Is there some incompatability issue?
> >
> > Ran
> > bin/nutch crawl urls -solr http://127.0.0.1:8983/solr -threads 3 -depth
> 3
> > topN 300
> >
> > Thank you for your help
> >
> > org.apache.solr.common.SolrException: ERROR: [doc=
> http://www.website.com/]
> > unknown field 'site'
> >
> > *ERROR: [doc=http://www.website.com/] unknown field 'site'*
> >
> > request: http://127.0.0.1:8983/solr/update?wt=javabin&version=2
> >    at
> >
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
> >    at
> >
> >
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
> >    at
> >
> >
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> >    at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
> >    at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:93)
> >    at
> >
> >
> org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
> >    at
> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
> >    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
> >    at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> > *2012-05-18 14:21:46,921 ERROR solr.SolrIndexer - java.io.IOException:
> Job
> > failed!*
> > 2012-05-18 14:21:46,921 INFO  solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: starting at 2012-05-18 14:21:46
> > 2012-05-18 14:21:46,921 INFO  solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: Solr url: http://127.0.0.1:8983/solr
> > 2012-05-18 14:21:48,640 INFO  solr.SolrDeleteDuplicates -
> > SolrDeleteDuplicates: finished at 2012-05-18 14:21:48, elapsed: 00:00:01
> > 2012-05-18 14:21:48,640 INFO  crawl.Crawl - crawl finished:
> > crawl-20120518141951
> >
>

Reply via email to