I think I had the same problem, I just checked my schema.xml ... it looks like 
I just commented out the copyField source="url" dest="id"

<!-- copyField source="url" dest="id"/ -->

|-----Original Message-----
|From: Markus Jelsma [mailto:[email protected]]
|Sent: Tuesday, May 25, 2010 12:04 PM
|To: [email protected]
|Subject: RE: Solr integration in nutch-1.1dev
|
|Hi Brian,
|
|
|
|Thanks for your reply. But as can be seen in the stacktrace, it's the ID
|field of a document. It cannot be set to accommodate multiple values and it
|wouldn't make sense either. The ID field should contain the URL of the
|fetched and parsed content. Also, you can clearly see the mapping in the
|included Nutch logs; it maps the URL field to Solr's ID field as well as
|mapping the URL to the URL field which doesn't make sense but it'm still the
|example schema and mapping configuration. Also, i couldn't image if multiple
|values for a URL field in Nutch itself makes any sense at all, how would a
|piece of content on a distinct URL have more than one URL?
|
|
|
|Do you or anybody else have an idea to solve this mystery? I'm also not
|getting much from Nutch' logs, they don't mention anything else accept that
|sending the data over to a Solr instance failed.
|
|
|
|Cheers,
|
|-----Original message-----
|From: Brian Tingle <[email protected]>
|Sent: Tue 25-05-2010 20:47
|To: [email protected]; Markus Jelsma <[email protected]>;
|Subject: RE: Solr integration in nutch-1.1dev
|
|Update the solr schema.xml so that it allows multiple values for that field?
|
||-----Original Message-----
||From: Markus Jelsma [mailto:[email protected]]
||Sent: Tuesday, May 25, 2010 4:49 AM
||To: [email protected]
||Subject: Re: Solr integration in nutch-1.1dev
||
||Hello Julien,
||
||
||I picked today's build from your URL but the problem persists as reported
||earlier. Any more ideas on how to tackle this?
||
||
||Cheers,
||
||On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
||> Hi Markus,
||>
||> This has been solved last week and is in the trunk of the SVN repository.
||> The nightly build has just been fixed after the move to the TLP so the
||> version you are using does not have the fix yet. Check
||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the latest
||> build or check it out from SVN
||>
||> J.
||>
||> > Hi,
||> >
||> >
||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build because
|i
||> > need
||> > Tika to parse JPEG images and that would be in 1.1 as i read somewhere
||> > [1].
||> >
||> > First i fetch only a single HTML page and send it to Solr as i did with
||> > 1.0 but it fails now. Here's what Solr thinks of the request:
||> >
||> >
||> > ---------------
||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
||> > encountered for non multiValued copy field id: <URL HERE>
||> >        at
||> >
||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
||> >0) at
||> >
||> >
||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
||> >rocessorFactory.java:60) at
||> >
||> >
||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
||> >rocessorFactory.java:94) at
||> >
||> >
||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162) at
||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
||> >        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
||> >        at
||> >
||> >
||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
||> >tStreamHandlerBase.java:54) at
||> >
||> >
||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
||> >se.java:131) at
|org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
||> > at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
||> >a:338) at
||> >
||> >
||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
||> >va:241) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
||> >ionFilterChain.java:235) at
||> >
||> >
||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
||> >rChain.java:206) at
||> >
||> >
||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
||> >.java:233) at
||> >
||> >
||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
||> >.java:191) at
||> >
||> >
||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
||> >128) at
||> >
||> >
||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
||> >102) at
||> >
||> >
||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
||> >ava:109) at
||> >
||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
||> >6) at
||> >
||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
||> >) at
||> >
||> >
||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
||> >ttp11Protocol.java:583) at
||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
||> >        at java.lang.Thread.run(Thread.java:619)
||> > ---------------
||> >
||> >
||> > Well, this obviously is wrong. Although i am still using the old 1.0
||> > schema.xml, it still isn't multiValued in the nightly build's
|schema.xml
||> > file.
||> >
||> > Below Nutch's relevant log lines:
||> >
||> >
||> > ---------------
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: content
||> > dest:
||> > content
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
||dest:
||> > site
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
||> > dest: title
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
||dest:
||> > host
||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: segment
||> > dest:
||> > segment
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
||> > dest: boost
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
||> > dest: digest
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
||> > dest: tstamp
||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
|dest:
||> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
||> > dest: url
||> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
||> > Instantiating CollectionManager
||> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
||initializing
||> > CollectionManager
||> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file has1
||> > elements
||> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
||> > org.apache.solr.common.SolrException: Bad Request
||> >
||> > Bad Request
||> >
||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
||> >        at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:424) at
||> >
||> >
||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
||> >tpSolrServer.java:243) at
||> >
||> >
||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
||> >ctUpdateRequest.java:105) at
||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
||> >        at
||> >
||> >
||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
||> >java:48) at
||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474)
||> >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
||> >        at
||> >
|org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
||> > ---------------
||> >
||> > Because i still use my old 1.0 configuration files i get the following
||> > warning
||> > from Nutch but doesn't look like it's related to the Sorl integration:
||> >
||> > ---------------
||> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
||> > hadoop-site.xml
||> > found in the classpath. Usage of hadoop-site.xml is deprecated. Instead
||> > use core-site.xml, mapred-site.xml and hdfs-site.xml to override
||> > properties of core-default.xml, mapred-default.xml and hdfs-default.xml
||> > respectively ---------------
||> >
||> > Did i just stumble upon a regression in 1.1dev and should i file a bug
||or
||> > could something else spoil the fun?
||> >
||> >
||> >
||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-
||nu
||> >tch-%0Atd710135.html>
||> >
||> > Cheers,
||> >
||> > Markus Jelsma - Technisch Architect - Buyways BV
||> > http://www.linkedin.com/in/markus17
||> > 050-8536620 / 06-50258350
||>
||
||Markus Jelsma - Technisch Architect - Buyways BV
||http://www.linkedin.com/in/markus17
||050-8536620 / 06-50258350

Reply via email to