Hi, A quite awful issue just occurred and i traced it back down the line. Apparently the parser seems to translate HTML entities back to their original form, < to < and > to > etc. This is no problem for searching as i strip it away but it gets stored, and it are the stored fields that are being used to display the results.
bin/nutch org.apache.nutch.parse.ParserChecker -dumpText http://www.w3schools.com/tags/ref_entities.asp As you can see, the original entities become valid HTML elements and will be parsed if displayed as part of search results. The question is why the entities get translated and how to turn it off. searching the internet or the config didn't point me in the right direction. Doing some escaping on the front end isn't the solution i'm looking for as my highlighting elements will be escaped as well. Escaping there and restoring the highlighting elements afterwards is only a temporary work-around in this case. Cheers, -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350