3.3 will work perfectly as there are no changes the the javabin format. However, one should update the schema version to reflect recent changes in branch 3.4-dev. It's likely this branch version is released earlier than Nutch 1.4 that should be compatible with the most recent stable Solr release.
> Glad it worked for you on Solr 3.2. I did try Nutch 1.3 and Solr 3.3, > however I did not update my blog yet with Solr 3.3. ;-) > > have fun! > > On Mon, Aug 8, 2011 at 1:57 PM, John R. Brinkema > > <[email protected]>wrote: > > On 8/2/2011 11:21 PM, Way Cool wrote: > >> Try changing uniqueKey from id to url as below under in schema.xml and > >> restart Solr: > >> <uniqueKey>url</uniqueKey> > >> > >> If that still did not work, that means you are having an empty url. We > >> can fix that. > >> > >> > >> On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema<brinkema@teo.** > >> uscourts.gov <[email protected]> > >> > >>> wrote: > >>> Friends, > >>> > >>> I am having the worst time getting nutch and solr to play together > >>> nicely. > >>> > >>> I downloaded and installed the current binaries for both nutch and > >>> solr. > >>> > >>> I > >>> > >>> edited the nutch-site.xml file to include: > >>> > >>> <property> > >>> <name>http.agent.name</name> > >>> <value>Solr/Nutch Search</value> > >>> </property> > >>> <property> > >>> <name>plugin.includes</name> > >>> <value>protocol-http|****urlfilter-regex|parse-(text|****html|tika)| > >>> index-basic|query-(basic|****stemmer|site|url)|summary-**** > >>> basic|scoring-opic| > >>> urlnormalizer-(pass|regex|****basic)</value> > >>> </property> > >>> <property> > >>> <name>http.content.limit</****name> > >>> <value>65536</value> > >>> </property> > >>> <property> > >>> <name>searcher.dir</name> > >>> <value>/opt/SolrSearch</value> > >>> </property> > >>> > >>> > >>> I installed them and tested them according to each of their respective > >>> tutorials; in other words I believe each is working, separately. I > >>> crawled > >>> a url and the 'readdb -stats' report shows that I have successfully > >>> collected some links. Most of the links are to '.pdf' files. > >>> > >>> I followed the instructions to link nutch and solr; e.g. copy the nutch > >>> schema to become the solr schema. > >>> > >>> When I run the bin/nutch solrindex ... command I get the following > >>> error: > >>> > >>> java.io.IOException: Job failed! > >>> > >>> When I look in the log/hadoop.log file I see: > >>> > >>> 2011-08-01 13:10:00,086 INFO solr.SolrMappingReader - source: content > >>> dest: content > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: site > >>> dest: site > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: title > >>> dest: > >>> title > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: host > >>> dest: host > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: segment > >>> dest: segment > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: boost > >>> dest: > >>> boost > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: digest > >>> dest: > >>> digest > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: tstamp > >>> dest: > >>> tstamp > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url > >>> dest: id > >>> 2011-08-01 13:10:00,087 INFO solr.SolrMappingReader - source: url > >>> dest: url > >>> 2011-08-01 13:10:00,537 WARN mapred.LocalJobRunner - job_local_0001 > >>> org.apache.solr.common.****SolrException: Document [null] missing > >>> required > >>> field: id > >>> > >>> Document [null] missing required field: id > >>> > >>> request: > >>> http://localhost:8983/solr/****update?wt=javabin&version=2<http://loca > >>> lhost:8983/solr/**update?wt=javabin&version=2> > >>> <ht**tp://localhost:8983/solr/**update?wt=javabin&version=2<http://loc > >>> alhost:8983/solr/update?wt=javabin&version=2> > >>> > >>> at > >>> org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.** > >>> > >>> request(CommonsHttpSolrServer.****java:435) > >>> > >>> at > >>> org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.** > >>> > >>> request(CommonsHttpSolrServer.****java:244) > >>> > >>> at org.apache.solr.client.solrj.****request.** > >>> > >>> AbstractUpdateRequest.** > >>> process(AbstractUpdateRequest.****java:105) > >>> > >>> at org.apache.solr.client.solrj.****SolrServer.add(SolrServer.** > >>> > >>> java:49) > >>> > >>> at > >>> org.apache.nutch.indexer.solr.****SolrWriter.close(SolrWriter. > >>> > >>> **** > >>> java:82) > >>> > >>> at org.apache.nutch.indexer.****IndexerOutputFormat$1.close(** > >>> > >>> IndexerOutputFormat.java:48) > >>> > >>> at org.apache.hadoop.mapred.****ReduceTask.runOldReducer(** > >>> > >>> ReduceTask.java:474) > >>> > >>> at org.apache.hadoop.mapred.****ReduceTask.run(ReduceTask.**** > >>> > >>> java:411) > >>> > >>> at org.apache.hadoop.mapred.****LocalJobRunner$Job.run(** > >>> > >>> LocalJobRunner.java:216) > >>> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException: > >>> Job failed! > >>> > >>> The same error appears in the solr log. > >>> > >>> I have tried the 'sync solrj libraries' fix; that is, I copied > >>> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with no > >>> effect. Since I am running binaries, I, of course, did not run ant > >>> job. > >>> > >>> Is > >>> > >>> that the magic? > >>> > >>> Any suggestions? > >>> > >>> Update from the trenches .... > > > > I followed Way Cool's suggestion (now called Dr. Cool since he has been > > so helpful) of using Nutch 1.3 and Solr 3.2 ... which worked just fine. > > > > I am off using this pair until a get a breather and then try Nutch 1.3 > > and Solr 3.3 again, this time with Dr. Cool's latest suggestion/ > > > > Thanks to all. /jb

