Solr's schema has it's own version that's 1.4 in current 3.x.

See inline comments:
http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/example/solr/conf/schema.xml?view=markup

> Markus,
> 
> What do you mean by "update the schema version"?  Nutch's or Solr's?
> And are we talking about simple copies or line-by-line merges?  And what
> about the schema copy specified in the RunningNutchAndSolr tutorial?
> 
> This sounds like the answer, I just don't know enough to do it.  tnx.
> 
> On 8/8/2011 8:04 PM, Markus Jelsma wrote:
> > 3.3 will work perfectly as there are no changes the the javabin format.
> > However, one should update the schema version to reflect recent changes
> > in branch 3.4-dev. It's likely this branch version is released earlier
> > than Nutch 1.4 that should be compatible with the most recent stable
> > Solr release.
> > 
> >> Glad it worked for you on Solr 3.2. I did try Nutch 1.3 and Solr 3.3,
> >> however I did not update my blog yet with Solr 3.3. ;-)
> >> 
> >> have fun!
> >> 
> >> On Mon, Aug 8, 2011 at 1:57 PM, John R. Brinkema
> >> 
> >> <[email protected]>wrote:
> >>> On 8/2/2011 11:21 PM, Way Cool wrote:
> >>>> Try changing uniqueKey from id to url as below under in schema.xml and
> >>>> restart Solr:
> >>>> <uniqueKey>url</uniqueKey>
> >>>> 
> >>>> If that still did not work, that means you are having an empty url. We
> >>>> can fix that.
> >>>> 
> >>>> 
> >>>> On Mon, Aug 1, 2011 at 12:45 PM, John R. Brinkema<brinkema@teo.**
> >>>> uscourts.gov<[email protected]>
> >>>> 
> >>>>> wrote:
> >>>>> Friends,
> >>>>> 
> >>>>> I am having the worst time getting nutch and solr to play together
> >>>>> nicely.
> >>>>> 
> >>>>> I downloaded and installed the current binaries for both nutch and
> >>>>> solr.
> >>>>> 
> >>>>>   I
> >>>>> 
> >>>>> edited the nutch-site.xml file to include:
> >>>>> 
> >>>>> <property>
> >>>>> <name>http.agent.name</name>
> >>>>> <value>Solr/Nutch Search</value>
> >>>>> </property>
> >>>>> <property>
> >>>>> <name>plugin.includes</name>
> >>>>> <value>protocol-http|****urlfilter-regex|parse-(text|****html|tika)|
> >>>>> index-basic|query-(basic|****stemmer|site|url)|summary-****
> >>>>> basic|scoring-opic|
> >>>>> urlnormalizer-(pass|regex|****basic)</value>
> >>>>> </property>
> >>>>> <property>
> >>>>> <name>http.content.limit</****name>
> >>>>> <value>65536</value>
> >>>>> </property>
> >>>>> <property>
> >>>>> <name>searcher.dir</name>
> >>>>> <value>/opt/SolrSearch</value>
> >>>>> </property>
> >>>>> 
> >>>>> 
> >>>>> I installed them and tested them according to each of their
> >>>>> respective tutorials; in other words I believe each is working,
> >>>>> separately.  I crawled
> >>>>> a url and the 'readdb -stats' report shows that I have successfully
> >>>>> collected some links.  Most of the links are to '.pdf' files.
> >>>>> 
> >>>>> I followed the instructions to link nutch and solr; e.g. copy the
> >>>>> nutch schema to become the solr schema.
> >>>>> 
> >>>>> When I run the bin/nutch solrindex ... command I get the following
> >>>>> error:
> >>>>> 
> >>>>> java.io.IOException: Job failed!
> >>>>> 
> >>>>> When I look in the log/hadoop.log file I see:
> >>>>> 
> >>>>> 2011-08-01 13:10:00,086 INFO  solr.SolrMappingReader - source:
> >>>>> content dest: content
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: site
> >>>>> dest: site
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: title
> >>>>> dest:
> >>>>> title
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: host
> >>>>> dest: host
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source:
> >>>>> segment dest: segment
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: boost
> >>>>> dest:
> >>>>> boost
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: digest
> >>>>> dest:
> >>>>> digest
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: tstamp
> >>>>> dest:
> >>>>> tstamp
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url
> >>>>> dest: id
> >>>>> 2011-08-01 13:10:00,087 INFO  solr.SolrMappingReader - source: url
> >>>>> dest: url
> >>>>> 2011-08-01 13:10:00,537 WARN  mapred.LocalJobRunner - job_local_0001
> >>>>> org.apache.solr.common.****SolrException: Document [null] missing
> >>>>> required
> >>>>> field: id
> >>>>> 
> >>>>> Document [null] missing required field: id
> >>>>> 
> >>>>> request:
> >>>>> http://localhost:8983/solr/****update?wt=javabin&version=2<http://loc
> >>>>> a lhost:8983/solr/**update?wt=javabin&version=2>
> >>>>> <ht**tp://localhost:8983/solr/**update?wt=javabin&version=2<http://lo
> >>>>> c alhost:8983/solr/update?wt=javabin&version=2>
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.*
> >>>>>         *
> >>>>> 
> >>>>> request(CommonsHttpSolrServer.****java:435)
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.solr.client.solrj.****impl.CommonsHttpSolrServer.*
> >>>>>         *
> >>>>> 
> >>>>> request(CommonsHttpSolrServer.****java:244)
> >>>>> 
> >>>>>         at org.apache.solr.client.solrj.****request.**
> >>>>> 
> >>>>> AbstractUpdateRequest.**
> >>>>> process(AbstractUpdateRequest.****java:105)
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.solr.client.solrj.****SolrServer.add(SolrServer.*
> >>>>>         *
> >>>>> 
> >>>>> java:49)
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.nutch.indexer.solr.****SolrWriter.close(SolrWriter
> >>>>>         .
> >>>>> 
> >>>>> ****
> >>>>> java:82)
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.nutch.indexer.****IndexerOutputFormat$1.close(**
> >>>>> 
> >>>>> IndexerOutputFormat.java:48)
> >>>>> 
> >>>>>         at org.apache.hadoop.mapred.****ReduceTask.runOldReducer(**
> >>>>> 
> >>>>> ReduceTask.java:474)
> >>>>> 
> >>>>>         at
> >>>>>         org.apache.hadoop.mapred.****ReduceTask.run(ReduceTask.****
> >>>>> 
> >>>>> java:411)
> >>>>> 
> >>>>>         at org.apache.hadoop.mapred.****LocalJobRunner$Job.run(**
> >>>>> 
> >>>>> LocalJobRunner.java:216)
> >>>>> 2011-08-01 13:10:01,050 ERROR solr.SolrIndexer - java.io.IOException:
> >>>>> Job failed!
> >>>>> 
> >>>>> The same error appears in the solr log.
> >>>>> 
> >>>>> I have tried the 'sync solrj libraries' fix; that is, I copied
> >>>>> apache-solr-solrj-3.3.0.jar from the solr lib to the nutch lib with
> >>>>> no effect.  Since I am running binaries, I, of course, did not run
> >>>>> ant job.
> >>>>> 
> >>>>>   Is
> >>>>> 
> >>>>> that the magic?
> >>>>> 
> >>>>> Any suggestions?
> >>>>> 
> >>>>>   Update from the trenches ....
> >>> 
> >>> I followed Way Cool's suggestion (now called  Dr. Cool since he has
> >>> been so helpful) of using Nutch 1.3 and Solr 3.2 ... which worked just
> >>> fine.
> >>> 
> >>> I am off using this pair until a get a breather and then try Nutch 1.3
> >>> and Solr 3.3 again, this time with Dr. Cool's latest suggestion/
> >>> 
> >>> Thanks to all.  /jb

Reply via email to