Thanks a lot Markus for your answer. My English is not so good. I was reading but i don’t know how to fix the problems yet. Could you explain me in details the solution please. I was looking in conf directory but I can't find how to map one mime types to another. I need to replace index-more plugin ? I was looking in the link that you suggest me and a saw a NUTCH-1262-1.5-1.patch but I don’t know how to use that patch. Please tell me if I need to delete the index completely or there is a way to replace an application/xhtml+xml to text/html in solr index.
-----Mensaje original----- De: Markus Jelsma [mailto:[email protected]] Enviado el: domingo, 25 de noviembre de 2012 4:33 AM Para: [email protected] Asunto: RE: problem with text/html content type of documents appears application/xhtml+xml in solr index Hi - trunk's more indexing filter can map mime types to any target. With it you can map both (x)html mimes to text/html or to `web page`. https://issues.apache.org/jira/browse/NUTCH-1262 -----Original message----- > From:Eyeris Rodriguez Rueda <[email protected]> > Sent: Sun 25-Nov-2012 00:48 > To: [email protected] > Subject: problem with text/html content type of documents appears > application/xhtml+xml in solr index > > Hi. > > I have changed my nutch version from 1.4 to 1.5.1 and I have detected a > problem with content type of some document, some pages with text/html appears > in solr index with application/xhtml+xml , when I check the links the > navegator tell me that efectively is text/html. > Any body can help me to fix this problem, I think change this content type > manually in solr index to text/html but is not a good way for me. > Please any suggestion or advice will be accepted. 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci

