Confirmed! It was the old schema.xml file. Next time i'd better check for 
differences :)

On Tuesday 25 May 2010 21:38:45 Markus Jelsma wrote:
> Hi Brian,
> 
>  
> 
> Again, thanks for the help. I have looked up the schema file from the trunk
>  and 1.0 tag using web svn. It seems you are right, although a cannot
>  confirm as of yet, i will return to work tomorrow. Anyway, the
>  solrindex-mapping configuration in 1.1-dev does not show any weird stuff
>  related to the ID field. It does, however, have a copyField from url to
>  url which makes no sense to me. The suspect is the copyField directive in
>  the schema.xml from the 1.0 tag, it contains a copyField directive from
>  URL to ID which disappeared in trunk some time ago. 1.1-dev, according to
>  svn, introduced the solrindex-mapping configuration which already maps the
>  URL to the ID field and because i have the old schema.xml file in my Solr
>  instance, it would, of course, copyField to an already occupied ID field.
> 
>  
> 
> I'd bet that's the issue here and if so, perhaps it would be best to
>  investigate all new relevant configuration files the next time instead of
>  assuming the schema.xml file wouldn't change. Back on this tomorrow and
>  thanks for the useful pointer!
> 
>  
> 
> Cheers,
> 
> 
>  
> -----Original message-----
> From: Brian Tingle <brian.tin...@ucop.edu>
> Sent: Tue 25-05-2010 21:11
> To: user@nutch.apache.org;
> Subject: RE: Solr integration in nutch-1.1dev
> 
> I think I had the same problem, I just checked my schema.xml ... it looks
>  like I just commented out the copyField source="url" dest="id"
> 
> <!-- copyField source="url" dest="id"/ -->
> 
> |-----Original Message-----
> |From: Markus Jelsma [mailto:markus.jel...@buyways.nl]
> |Sent: Tuesday, May 25, 2010 12:04 PM
> |To: user@nutch.apache.org
> |Subject: RE: Solr integration in nutch-1.1dev
> |
> |Hi Brian,
> |
> |
> |
> |Thanks for your reply. But as can be seen in the stacktrace, it's the ID
> |field of a document. It cannot be set to accommodate multiple values and
> | it wouldn't make sense either. The ID field should contain the URL of the
> | fetched and parsed content. Also, you can clearly see the mapping in the
> | included Nutch logs; it maps the URL field to Solr's ID field as well as
> | mapping the URL to the URL field which doesn't make sense but it'm still
> | the example schema and mapping configuration. Also, i couldn't image if
> | multiple values for a URL field in Nutch itself makes any sense at all,
> | how would a piece of content on a distinct URL have more than one URL?
> |
> |
> |
> |Do you or anybody else have an idea to solve this mystery? I'm also not
> |getting much from Nutch' logs, they don't mention anything else accept
> | that sending the data over to a Solr instance failed.
> |
> |
> |
> |Cheers,
> |
> |-----Original message-----
> |From: Brian Tingle <brian.tin...@ucop.edu>
> |Sent: Tue 25-05-2010 20:47
> |To: user@nutch.apache.org; Markus Jelsma <markus.jel...@buyways.nl>;
> |Subject: RE: Solr integration in nutch-1.1dev
> |
> |Update the solr schema.xml so that it allows multiple values for that
> | field?
> |
> ||-----Original Message-----
> ||From: Markus Jelsma [mailto:markus.jel...@buyways.nl]
> ||Sent: Tuesday, May 25, 2010 4:49 AM
> ||To: user@nutch.apache.org
> ||Subject: Re: Solr integration in nutch-1.1dev
> ||
> ||Hello Julien,
> ||
> ||
> ||I picked today's build from your URL but the problem persists as reported
> ||earlier. Any more ideas on how to tackle this?
> ||
> ||
> ||Cheers,
> ||
> ||On Monday 17 May 2010 15:50:55 Julien Nioche wrote:
> ||> Hi Markus,
> ||>
> ||> This has been solved last week and is in the trunk of the SVN
> ||> repository. The nightly build has just been fixed after the move to the
> ||> TLP so the version you are using does not have the fix yet. Check
> ||> http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ to get the
> ||> latest build or check it out from SVN
> ||>
> ||> J.
> ||>
> ||> > Hi,
> ||> >
> ||> >
> ||> > I've got a copy of the nutch-2010-05-11_04-34-41 nightly build
> ||> > because
> |
> |i
> |
> ||> > need
> ||> > Tika to parse JPEG images and that would be in 1.1 as i read
> ||> > somewhere [1].
> ||> >
> ||> > First i fetch only a single HTML page and send it to Solr as i did
> ||> > with 1.0 but it fails now. Here's what Solr thinks of the request:
> ||> >
> ||> >
> ||> > ---------------
> ||> > May 17, 2010 2:25:32 PM org.apache.solr.common.SolrException log
> ||> > SEVERE: org.apache.solr.common.SolrException: ERROR: multiple values
> ||> > encountered for non multiValued copy field id: <URL HERE>
> ||> >        at
> ||
> ||org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:26
> ||
> ||> >0) at
> ||
> ||org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateP
> ||
> ||> >rocessorFactory.java:60) at
> ||
> ||org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateP
> ||
> ||> >rocessorFactory.java:94) at
> ||
> ||org.apache.solr.update.processor.SignatureUpdateProcessorFactory$Signatur
> ||
> ||> >eUpdateProcessor.processAdd(SignatureUpdateProcessorFactory.java:162)
> ||> > at
> ||> > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
> ||> > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at
> ||
> ||org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conten
> ||
> ||> >tStreamHandlerBase.java:54) at
> ||
> ||org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa
> ||
> ||> >se.java:131) at
> |
> |org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> |
> ||> > at
> ||
> ||org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.jav
> ||
> ||> >a:338) at
> ||
> ||org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.ja
> ||
> ||> >va:241) at
> ||
> ||org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applicat
> ||
> ||> >ionFilterChain.java:235) at
> ||
> ||org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilte
> ||
> ||> >rChain.java:206) at
> ||
> ||org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve
> ||
> ||> >.java:233) at
> ||
> ||org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve
> ||
> ||> >.java:191) at
> ||
> ||org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:
> ||> >128) at
> ||
> ||org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:
> ||> >102) at
> ||
> ||org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.j
> ||
> ||> >ava:109) at
> ||
> ||org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:28
> ||
> ||> >6) at
> ||
> ||org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845
> ||
> ||> >) at
> ||
> ||org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(H
> ||
> ||> >ttp11Protocol.java:583) at
> ||> > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
> ||> >7) at java.lang.Thread.run(Thread.java:619)
> ||> > ---------------
> ||> >
> ||> >
> ||> > Well, this obviously is wrong. Although i am still using the old 1.0
> ||> > schema.xml, it still isn't multiValued in the nightly build's
> |
> |schema.xml
> |
> ||> > file.
> ||> >
> ||> > Below Nutch's relevant log lines:
> ||> >
> ||> >
> ||> > ---------------
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source:
> ||> > content dest:
> ||> > content
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: site
> ||
> ||dest:
> ||> > site
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: title
> ||> > dest: title
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source: host
> ||
> ||dest:
> ||> > host
> ||> > 2010-05-17 14:25:31,776 INFO  solr.SolrMappingReader - source:
> ||> > segment dest:
> ||> > segment
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: boost
> ||> > dest: boost
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: digest
> ||> > dest: digest
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: tstamp
> ||> > dest: tstamp
> ||> > 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
> |
> |dest:
> ||> > id 2010-05-17 14:25:31,777 INFO  solr.SolrMappingReader - source: url
> ||> > dest: url
> ||> > 2010-05-17 14:25:31,821 INFO  collection.CollectionManager -
> ||> > Instantiating CollectionManager
> ||> > 2010-05-17 14:25:31,822 INFO  collection.CollectionManager -
> ||
> ||initializing
> ||
> ||> > CollectionManager
> ||> > 2010-05-17 14:25:31,849 INFO  collection.CollectionManager - file
> ||> > has1 elements
> ||> > 2010-05-17 14:25:32,474 WARN  mapred.LocalJobRunner - job_local_0001
> ||> > org.apache.solr.common.SolrException: Bad Request
> ||> >
> ||> > Bad Request
> ||> >
> ||> > request: http://127.0.0.1:8080/solr/update?wt=javabin&version=1
> ||> >        at
> ||
> ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> ||
> ||> >tpSolrServer.java:424) at
> ||
> ||org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt
> ||
> ||> >tpSolrServer.java:243) at
> ||
> ||org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra
> ||
> ||> >ctUpdateRequest.java:105) at
> ||> > org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49) at
> ||> > org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:74)
> ||> >        at
> ||
> ||org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.
> ||
> ||> >java:48) at
> ||> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474
> ||> >) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411) at
> |
> |org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> |
> ||> > ---------------
> ||> >
> ||> > Because i still use my old 1.0 configuration files i get the
> ||> > following warning
> ||> > from Nutch but doesn't look like it's related to the Sorl
> ||> > integration:
> ||> >
> ||> > ---------------
> ||> > 2010-05-17 14:34:11,529 WARN  conf.Configuration - DEPRECATED:
> ||> > hadoop-site.xml
> ||> > found in the classpath. Usage of hadoop-site.xml is deprecated.
> ||> > Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to
> ||> > override properties of core-default.xml, mapred-default.xml and
> ||> > hdfs-default.xml respectively ---------------
> ||> >
> ||> > Did i just stumble upon a regression in 1.1dev and should i file a
> ||> > bug
> ||
> ||or
> ||
> ||> > could something else spoil the fun?
> ||> >
> ||> >
> ||> >
> ||> > [1]: http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-to-nutch-
> ||> > td710135.html<http://lucene.472066.n3.nabble.com/Adding-jpeg-parser-t
> ||> >o-
> ||
> ||nu
> ||
> ||> >tch-%0Atd710135.html>
> ||> >
> ||> > Cheers,
> ||> >
> ||> > Markus Jelsma - Technisch Architect - Buyways BV
> ||> > http://www.linkedin.com/in/markus17
> ||> > 050-8536620 / 06-50258350
> ||
> ||Markus Jelsma - Technisch Architect - Buyways BV
> ||http://www.linkedin.com/in/markus17
> ||050-8536620 / 06-50258350
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to