Hi Markus, This has been solved last week and is in the trunk of the SVN repository. The nightly build has just been fixed after the move to the TLP so the version you are using does not have the fix yet. Check http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest build or check it out from SVN
J. -- DigitalPebble Ltd http://www.digitalpebble.com On 17 May 2010 14:26, Markus Jelsma <[email protected]> wrote: > Hi, > > > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i > need > Tika to parse JPEG images and that would be in 1.1 as i read somewhere [1]. > > First i fetch only a single HTML page and send it to Solr as i did with 1.0 > but it fails now. Here's what Solr thinks of the request: > > > --------------- > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values > encountered for non multiValued copy field id: <URL HERE> > at > org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:260) > at > > org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) > at > > org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:94) > at > > org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) > at > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) > at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) > at > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > --------------- > > > Well, this obviously is wrong. Although i am still using the old 1.0 > schema.xml, it still isn't multiValued in the nightly build's schema.xml > file. > > Below Nutch's relevant log lines: > > > --------------- > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: content > dest: > content > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: site dest: > site > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: title dest: > title > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: host dest: > host > 2010-05-17 14:25:31,776 INFO solr.SolrMappingReader - source: segment > dest: > segment > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: boost dest: > boost > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: digest dest: > digest > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url dest: id > 2010-05-17 14:25:31,777 INFO solr.SolrMappingReader - source: url dest: > url > 2010-05-17 14:25:31,821 INFO collection.CollectionManager - Instantiating > CollectionManager > 2010-05-17 14:25:31,822 INFO collection.CollectionManager - initializing > CollectionManager > 2010-05-17 14:25:31,849 INFO collection.CollectionManager - file has1 > elements > 2010-05-17 14:25:32,474 WARN mapred.LocalJobRunner - job_local_0001 > org.apache.solr.common.SolrException: Bad Request > > Bad Request > > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1 > at > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) > at > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) > at > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) > at > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74) > at > > org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > --------------- > > Because i still use my old 1.0 configuration files i get the following > warning > from Nutch but doesn't look like it's related to the Sorl integration: > > --------------- > 2010-05-17 14:34:11,529 WARN conf.Configuration - DEPRECATED: > hadoop-site.xml > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use > core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of > core-default.xml, mapred-default.xml and hdfs-default.xml respectively > --------------- > > Did i just stumble upon a regression in 1.1dev and should i file a bug or > could something else spoil the fun? > > > > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch- > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-%0Atd710135.html> > > Cheers, > > Markus Jelsma - Technisch Architect - Buyways BV > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 > >

