Glad it worked for you on Solr 3.2. I did try Nutch 1.3 and Solr 3.3, however I did not update my blog yet with Solr 3.3. ;-)
have fun! On Mon, Aug 8, 2011 at 1:57 PM, John R. Brinkema <[email protected]>wrote: > On 8/2/2011 11:21 PM, Way Cool wrote: > >> Try changing uniqueKey from id to url as below under in schema.xml and >> restart Solr: >> <uniqueKey>url</uniqueKey> >> >> If that still did not work, that means you are having an empty url. We can >> fix that. >> >> >> On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema<brinkema@teo.** >> uscourts.gov <[email protected]> >> >>> wrote: >>> Friends, >>> >>> I am having the worst time getting nutch and solr to play together >>> nicely. >>> >>> I downloaded and installed the current binaries for both nutch and solr. >>> I >>> edited the nutch-site.xml file to include: >>> >>> <property> >>> <name>http.agent.name</name> >>> <value>Solr/Nutch Search</value> >>> </property> >>> <property> >>> <name>plugin.includes</name> >>> <value>protocol-http|****urlfilter-regex|parse-(text|****html|tika)| >>> index-basic|query-(basic|****stemmer|site|url)|summary-**** >>> basic|scoring-opic| >>> urlnormalizer-(pass|regex|****basic)</value> >>> </property> >>> <property> >>> <name>http.content.limit</****name> >>> <value>65536</value> >>> </property> >>> <property> >>> <name>searcher.dir</name> >>> <value>/opt/SolrSearch</value> >>> </property> >>> >>> >>> I installed them and tested them according to each of their respective >>> tutorials; in other words I believe each is working, separately. I >>> crawled >>> a url and the 'readdb -stats' report shows that I have successfully >>> collected some links. Most of the links are to '.pdf' files. >>> >>> I followed the instructions to link nutch and solr; e.g. copy the nutch >>> schema to become the solr schema. >>> >>> When I run the bin/nutch solrindex ... command I get the following error: >>> >>> java.io.IOException: Job failed! >>> >>> When I look in the log/hadoop.log file I see: >>> >>> 2011-08-01 13:10:00,086 INFO solr.SolrMappingReader - source: content >>> dest: content >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: site dest: >>> site >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: title >>> dest: >>> title >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: host dest: >>> host >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: segment >>> dest: segment >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: boost >>> dest: >>> boost >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: digest >>> dest: >>> digest >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: tstamp >>> dest: >>> tstamp >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: >>> id >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url dest: >>> url >>> 2011-08-01 13:10:00,537 WARN mapred.LocalJobRunner - job_local_0001 >>> org.apache.solr.common.****SolrException: Document [null] missing >>> required >>> field: id >>> >>> Document [null] missing required field: id >>> >>> request: >>> http://localhost:8983/solr/****update?wt=javabin&version=2<http://localhost:8983/solr/**update?wt=javabin&version=2> >>> <ht**tp://localhost:8983/solr/**update?wt=javabin&version=2<http://localhost:8983/solr/update?wt=javabin&version=2> >>> > >>> >>> at org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.** >>> request(CommonsHttpSolrServer.****java:435) >>> at org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.** >>> request(CommonsHttpSolrServer.****java:244) >>> at org.apache.solr.client.solrj.****request.** >>> AbstractUpdateRequest.** >>> process(AbstractUpdateRequest.****java:105) >>> at org.apache.solr.client.solrj.****SolrServer.add(SolrServer.** >>> java:49) >>> at org.apache.nutch.indexer.solr.****SolrWriter.close(SolrWriter. >>> **** >>> java:82) >>> at org.apache.nutch.indexer.****IndexerOutputFormat$1.close(** >>> IndexerOutputFormat.java:48) >>> at org.apache.hadoop.mapred.****ReduceTask.runOldReducer(** >>> ReduceTask.java:474) >>> at org.apache.hadoop.mapred.****ReduceTask.run(ReduceTask.**** >>> java:411) >>> at org.apache.hadoop.mapred.****LocalJobRunner$Job.run(** >>> LocalJobRunner.java:216) >>> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: Job >>> failed! >>> >>> The same error appears in the solr log. >>> >>> I have tried the 'sync solrj libraries' fix; that is, I copied >>> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no >>> effect. Since I am running binaries, I, of course, did not run ant job. >>> Is >>> that the magic? >>> >>> Any suggestions? >>> >>> >>> >>> >>> >>> >>> >>> Update from the trenches .... > > I followed Way Cool's suggestion (now called Dr. Cool since he has been so > helpful) of using Nutch 1.3 and Solr 3.2 ... which worked just fine. > > I am off using this pair until a get a breather and then try Nutch 1.3 and > Solr 3.3 again, this time with Dr. Cool's latest suggestion/ > > Thanks to all. /jb > >

