Update the solr schema.xml so that it allows multiple values for that field?
|-----Original Message----- |From: Markus Jelsma [mailto:[email protected]] |Sent: Tuesday, May 25, 2010 4:49 AM |To: [email protected] |Subject: Re: Solr integration in nutch-1.1dev | |Hello Julien, | | |I picked today's build from your URL but the problem persists as reported |earlier. Any more ideas on how to tackle this? | | |Cheers, | |On Monday 17 May 2010 15:50:55 Julien Nioche wrote: |> Hi Markus, |> |> This has been solved last week and is in the trunk of the SVN repository. |> The nightly build has just been fixed after the move to the TLP so the |> version you are using does not have the fix yet. Check |> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest |> build or check it out from SVN |> |> J. |> |> > Hi, |> > |> > |> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i |> > need |> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere |> > [1]. |> > |> > First i fetch only a single HTML page and send it to Solr as i did with |> > 1.0 but it fails now. Here's what Solr thinks of the request: |> > |> > |> > --------------- |> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log |> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values |> > encountered for non multiValued copy field id: <URL HERE> |> > at |> > |org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26 |> >0) at |> > |> > |org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP |> >rocessorFactory.java:60) at |> > |> > |org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP |> >rocessorFactory.java:94) at |> > |> > |org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur |> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at |> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) |> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) |> > at |> > |> > |org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten |> >tStreamHandlerBase.java:54) at |> > |> > |org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa |> >se.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) |> > at |> > |> > |org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav |> >a:338) at |> > |> > |org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja |> >va:241) at |> > |> > |org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat |> >ionFilterChain.java:235) at |> > |> > |org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte |> >rChain.java:206) at |> > |> > |org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve |> >.java:233) at |> > |> > |org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve |> >.java:191) at |> > |> > |org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: |> >128) at |> > |> > |org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: |> >102) at |> > |> > |org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j |> >ava:109) at |> > |org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28 |> >6) at |> > |org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845 |> >) at |> > |> > |org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H |> >ttp11Protocol.java:583) at |> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) |> > at java.lang.Thread.run(Thread.java:619) |> > --------------- |> > |> > |> > Well, this obviously is wrong. Although i am still using the old 1.0 |> > schema.xml, it still isn't multiValued in the nightly build's schema.xml |> > file. |> > |> > Below Nutch's relevant log lines: |> > |> > |> > --------------- |> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: content |> > dest: |> > content |> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: site |dest: |> > site |> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: title |> > dest: title |> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: host |dest: |> > host |> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: segment |> > dest: |> > segment |> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: boost |> > dest: boost |> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: digest |> > dest: digest |> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: tstamp |> > dest: tstamp |> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url dest: |> > id 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url |> > dest: url |> > 2010-05-17 14:25:31,821 INFO collection.CollectionManager - |> > Instantiating CollectionManager |> > 2010-05-17 14:25:31,822 INFO collection.CollectionManager - |initializing |> > CollectionManager |> > 2010-05-17 14:25:31,849 INFO collection.CollectionManager - file has1 |> > elements |> > 2010-05-17 14:25:32,474 WARN mapred.LocalJobRunner - job_local_0001 |> > org.apache.solr.common.SolrException: Bad Request |> > |> > Bad Request |> > |> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1 |> > at |> > |> > |org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt |> >tpSolrServer.java:424) at |> > |> > |org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt |> >tpSolrServer.java:243) at |> > |> > |org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra |> >ctUpdateRequest.java:105) at |> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at |> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74) |> > at |> > |> > |org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat. |> >java:48) at |> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) |> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) |> > at |> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) |> > --------------- |> > |> > Because i still use my old 1.0 configuration files i get the following |> > warning |> > from Nutch but doesn't look like it's related to the Sorl integration: |> > |> > --------------- |> > 2010-05-17 14:34:11,529 WARN conf.Configuration - DEPRECATED: |> > hadoop-site.xml |> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead |> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override |> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml |> > respectively --------------- |> > |> > Did i just stumble upon a regression in 1.1dev and should i file a bug |or |> > could something else spoil the fun? |> > |> > |> > |> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch- |> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to- |nu |> >tch-%0Atd710135.html> |> > |> > Cheers, |> > |> > Markus Jelsma - Technisch Architect - Buyways BV |> > http://www.linkedin.com/in/markus17 |> > 050-8536620 / 06-50258350 |> | |Markus Jelsma - Technisch Architect - Buyways BV |http://www.linkedin.com/in/markus17 |050-8536620 / 06-50258350

