Hi,

I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because i need 
Tika to parse JPEG images and that would be in 1.1 as i read somewhere [1].

First i fetch only a single HTML page and send it to Solr as i did with 1.0 
but it fails now. Here's what Solr thinks of the request:


---------------
May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values 
encountered for non multiValued copy field id: <URL HERE>
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:260)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
        at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:94)
        at 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory$SignatureUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:619)
---------------


Well, this obviously is wrong. Although i am still using the old 1.0 
schema.xml, it still isn't multiValued in the nightly build's schema.xml file. 

Below Nutch's relevant log lines:


---------------
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content dest: 
content
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site dest: site
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title dest: 
title
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host dest: host
2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment dest: 
segment
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost dest: 
boost
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest dest: 
digest
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest: id
2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url dest: url
2010-05-17 14:25:31,821 INFO  collection.CollectionManager - Instantiating 
CollectionManager
2010-05-17 14:25:31,822 INFO  collection.CollectionManager - initializing 
CollectionManager
2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1 
elements
2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)
        at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
        at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
        at 
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:48)
        at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
---------------

Because i still use my old 1.0 configuration files i get the following warning 
from Nutch but doesn't look like it's related to the Sorl integration:

---------------
2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED: hadoop-site.xml 
found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use 
core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of 
core-default.xml, mapred-default.xml and hdfs-default.xml respectively 
---------------

Did i just stumble upon a regression in 1.1dev and should i file a bug or 
could something else spoil the fun?



[1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
td710135.html

Cheers,

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to