Try changing uniqueKey from id to url as below under in schema.xml and
restart Solr:
<uniqueKey>url</uniqueKey>

If that still did not work, that means you are having an empty url. We can
fix that.


On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema <[email protected]
> wrote:

> Friends,
>
> I am having the worst time getting nutch and solr to play together nicely.
>
> I downloaded and installed the current binaries for both nutch and solr.  I
> edited the nutch-site.xml file to include:
>
> <property>
> <name>http.agent.name</name>
> <value>Solr/Nutch Search</value>
> </property>
> <property>
> <name>plugin.includes</name>
> <value>protocol-http|**urlfilter-regex|parse-(text|**html|tika)|
> index-basic|query-(basic|**stemmer|site|url)|summary-**basic|scoring-opic|
> urlnormalizer-(pass|regex|**basic)</value>
> </property>
> <property>
> <name>http.content.limit</**name>
> <value>65536</value>
> </property>
> <property>
> <name>searcher.dir</name>
> <value>/opt/SolrSearch</value>
> </property>
>
>
> I installed them and tested them according to each of their respective
> tutorials; in other words I believe each is working, separately.  I crawled
> a url and the 'readdb -stats' report shows that I have successfully
> collected some links.  Most of the links are to '.pdf' files.
>
> I followed the instructions to link nutch and solr; e.g. copy the nutch
> schema to become the solr schema.
>
> When I run the bin/nutch solrindex ... command I get the following error:
>
> java.io.IOException: Job failed!
>
> When I look in the log/hadoop.log file I see:
>
> 2011-08-01 13:10:00,086 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: site dest:
> site
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: title dest:
> title
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: host dest:
> host
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: boost dest:
> boost
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: digest dest:
> digest
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: tstamp dest:
> tstamp
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url dest: id
> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url dest:
> url
> 2011-08-01 13:10:00,537 WARN  mapred.LocalJobRunner - job_local_0001
> org.apache.solr.common.**SolrException: Document [null] missing required
> field: id
>
> Document [null] missing required field: id
>
> request: 
> http://localhost:8983/solr/**update?wt=javabin&version=2<http://localhost:8983/solr/update?wt=javabin&version=2>
>        at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.**
> request(CommonsHttpSolrServer.**java:435)
>        at org.apache.solr.client.solrj.**impl.CommonsHttpSolrServer.**
> request(CommonsHttpSolrServer.**java:244)
>        at org.apache.solr.client.solrj.**request.AbstractUpdateRequest.**
> process(AbstractUpdateRequest.**java:105)
>        at org.apache.solr.client.solrj.**SolrServer.add(SolrServer.**
> java:49)
>        at org.apache.nutch.indexer.solr.**SolrWriter.close(SolrWriter.**
> java:82)
>        at org.apache.nutch.indexer.**IndexerOutputFormat$1.close(**
> IndexerOutputFormat.java:48)
>        at org.apache.hadoop.mapred.**ReduceTask.runOldReducer(**
> ReduceTask.java:474)
>        at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:411)
>        at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
> LocalJobRunner.java:216)
> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: Job
> failed!
>
> The same error appears in the solr log.
>
> I have tried the 'sync solrj libraries' fix; that is, I copied
> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no
> effect.  Since I am running binaries, I, of course, did not run ant job.  Is
> that the magic?
>
> Any suggestions?
>
>
>
>
>
>
>

Reply via email to