Hi - trunk's more indexing filter can map mime types to any target. With it you can map both (x)html mimes to text/html or to `web page`.
https://issues.apache.org/jira/browse/NUTCH-1262 -----Original message----- > From:Eyeris Rodriguez Rueda <[email protected]> > Sent: Sun 25-Nov-2012 00:48 > To: [email protected] > Subject: problem with text/html content type of documents appears > application/xhtml+xml in solr index > > Hi. > > I have changed my nutch version from 1.4 to 1.5.1 and I have detected a > problem with content type of some document, some pages with text/html appears > in solr index with application/xhtml+xml , when I check the links the > navegator tell me that efectively is text/html. > Any body can help me to fix this problem, I think change this content type > manually in solr index to text/html but is not a good way for me. > Please any suggestion or advice will be accepted. > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci >

