Confirmed! It was the old schema.xml file. Next time i'd better check for differences :)
On Tuesday 25 May 2010 21:38:45 Markus Jelsma wrote: > Hi Brian, > > > > Again, thanks for the help. I have looked up the schema file from the trunk > and 1.0 tag using web svn. It seems you are right, although a cannot > confirm as of yet, i will return to work tomorrow. Anyway, the > solrindex-mapping configuration in 1.1-dev does not show any weird stuff > related to the ID field. It does, however, have a copyField from url to > url which makes no sense to me. The suspect is the copyField directive in > the schema.xml from the 1.0 tag, it contains a copyField directive from > URL to ID which disappeared in trunk some time ago. 1.1-dev, according to > svn, introduced the solrindex-mapping configuration which already maps the > URL to the ID field and because i have the old schema.xml file in my Solr > instance, it would, of course, copyField to an already occupied ID field. > > > > I'd bet that's the issue here and if so, perhaps it would be best to > investigate all new relevant configuration files the next time instead of > assuming the schema.xml file wouldn't change. Back on this tomorrow and > thanks for the useful pointer! > > > > Cheers, > > > > -----Original message----- > From: Brian Tingle <brian.tin...@ucop.edu> > Sent: Tue 25-05-2010 21:11 > To: user@nutch.apache.org; > Subject: RE: Solr integration in nutch-1.1dev > > I think I had the same problem, I just checked my schema.xml ... it looks > like I just commented out the copyField source="url" dest="id" > > <!-- copyField source="url" dest="id"/ --> > > |-----Original Message----- > |From: Markus Jelsma [mailto:markus.jel...@buyways.nl] > |Sent: Tuesday, May 25, 2010 12:04 PM > |To: user@nutch.apache.org > |Subject: RE: Solr integration in nutch-1.1dev > | > |Hi Brian, > | > | > | > |Thanks for your reply. But as can be seen in the stacktrace, it's the ID > |field of a document. It cannot be set to accommodate multiple values and > | it wouldn't make sense either. The ID field should contain the URL of the > | fetched and parsed content. Also, you can clearly see the mapping in the > | included Nutch logs; it maps the URL field to Solr's ID field as well as > | mapping the URL to the URL field which doesn't make sense but it'm still > | the example schema and mapping configuration. Also, i couldn't image if > | multiple values for a URL field in Nutch itself makes any sense at all, > | how would a piece of content on a distinct URL have more than one URL? > | > | > | > |Do you or anybody else have an idea to solve this mystery? I'm also not > |getting much from Nutch' logs, they don't mention anything else accept > | that sending the data over to a Solr instance failed. > | > | > | > |Cheers, > | > |-----Original message----- > |From: Brian Tingle <brian.tin...@ucop.edu> > |Sent: Tue 25-05-2010 20:47 > |To: user@nutch.apache.org; Markus Jelsma <markus.jel...@buyways.nl>; > |Subject: RE: Solr integration in nutch-1.1dev > | > |Update the solr schema.xml so that it allows multiple values for that > | field? > | > ||-----Original Message----- > ||From: Markus Jelsma [mailto:markus.jel...@buyways.nl] > ||Sent: Tuesday, May 25, 2010 4:49 AM > ||To: user@nutch.apache.org > ||Subject: Re: Solr integration in nutch-1.1dev > || > ||Hello Julien, > || > || > ||I picked today's build from your URL but the problem persists as reported > ||earlier. Any more ideas on how to tackle this? > || > || > ||Cheers, > || > ||On Monday 17 May 2010 15:50:55 Julien Nioche wrote: > ||> Hi Markus, > ||> > ||> This has been solved last week and is in the trunk of the SVN > ||> repository. The nightly build has just been fixed after the move to the > ||> TLP so the version you are using does not have the fix yet. Check > ||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the > ||> latest build or check it out from SVN > ||> > ||> J. > ||> > ||> > Hi, > ||> > > ||> > > ||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build > ||> > because > | > |i > | > ||> > need > ||> > Tika to parse JPEG images and that would be in 1.1 as i read > ||> > somewhere [1]. > ||> > > ||> > First i fetch only a single HTML page and send it to Solr as i did > ||> > with 1.0 but it fails now. Here's what Solr thinks of the request: > ||> > > ||> > > ||> > --------------- > ||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log > ||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values > ||> > encountered for non multiValued copy field id: <URL HERE> > ||> > at > || > ||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26 > || > ||> >0) at > || > ||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP > || > ||> >rocessorFactory.java:60) at > || > ||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP > || > ||> >rocessorFactory.java:94) at > || > ||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur > || > ||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) > ||> > at > ||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) > ||> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at > || > ||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten > || > ||> >tStreamHandlerBase.java:54) at > || > ||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa > || > ||> >se.java:131) at > | > |org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > | > ||> > at > || > ||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav > || > ||> >a:338) at > || > ||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja > || > ||> >va:241) at > || > ||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat > || > ||> >ionFilterChain.java:235) at > || > ||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte > || > ||> >rChain.java:206) at > || > ||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve > || > ||> >.java:233) at > || > ||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve > || > ||> >.java:191) at > || > ||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java: > ||> >128) at > || > ||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java: > ||> >102) at > || > ||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j > || > ||> >ava:109) at > || > ||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28 > || > ||> >6) at > || > ||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845 > || > ||> >) at > || > ||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H > || > ||> >ttp11Protocol.java:583) at > ||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44 > ||> >7) at java.lang.Thread.run(Thread.java:619) > ||> > --------------- > ||> > > ||> > > ||> > Well, this obviously is wrong. Although i am still using the old 1.0 > ||> > schema.xml, it still isn't multiValued in the nightly build's > | > |schema.xml > | > ||> > file. > ||> > > ||> > Below Nutch's relevant log lines: > ||> > > ||> > > ||> > --------------- > ||> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: > ||> > content dest: > ||> > content > ||> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: site > || > ||dest: > ||> > site > ||> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: title > ||> > dest: title > ||> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: host > || > ||dest: > ||> > host > ||> > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: > ||> > segment dest: > ||> > segment > ||> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: boost > ||> > dest: boost > ||> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: digest > ||> > dest: digest > ||> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: tstamp > ||> > dest: tstamp > ||> > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url > | > |dest: > ||> > id 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url > ||> > dest: url > ||> > 2010-05-17 14:25:31,821 INFO collection.CollectionManager - > ||> > Instantiating CollectionManager > ||> > 2010-05-17 14:25:31,822 INFO collection.CollectionManager - > || > ||initializing > || > ||> > CollectionManager > ||> > 2010-05-17 14:25:31,849 INFO collection.CollectionManager - file > ||> > has1 elements > ||> > 2010-05-17 14:25:32,474 WARN mapred.LocalJobRunner - job_local_0001 > ||> > org.apache.solr.common.SolrException: Bad Request > ||> > > ||> > Bad Request > ||> > > ||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1 > ||> > at > || > ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt > || > ||> >tpSolrServer.java:424) at > || > ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt > || > ||> >tpSolrServer.java:243) at > || > ||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra > || > ||> >ctUpdateRequest.java:105) at > ||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at > ||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74) > ||> > at > || > ||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat. > || > ||> >java:48) at > ||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474 > ||> >) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at > | > |org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > | > ||> > --------------- > ||> > > ||> > Because i still use my old 1.0 configuration files i get the > ||> > following warning > ||> > from Nutch but doesn't look like it's related to the Sorl > ||> > integration: > ||> > > ||> > --------------- > ||> > 2010-05-17 14:34:11,529 WARN conf.Configuration - DEPRECATED: > ||> > hadoop-site.xml > ||> > found in the classpath. Usage of hadoop-site.xml is deprecated. > ||> > Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to > ||> > override properties of core-default.xml, mapred-default.xml and > ||> > hdfs-default.xml respectively --------------- > ||> > > ||> > Did i just stumble upon a regression in 1.1dev and should i file a > ||> > bug > || > ||or > || > ||> > could something else spoil the fun? > ||> > > ||> > > ||> > > ||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch- > ||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-t > ||> >o- > || > ||nu > || > ||> >tch-%0Atd710135.html> > ||> > > ||> > Cheers, > ||> > > ||> > Markus Jelsma - Technisch Architect - Buyways BV > ||> > http://www.linkedin.com/in/markus17 > ||> > 050-8536620 / 06-50258350 > || > ||Markus Jelsma - Technisch Architect - Buyways BV > ||http://www.linkedin.com/in/markus17 > ||050-8536620 / 06-50258350 > Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350