Byron Miller wrote:
Oh yeah, does anyone have any tips on cleaning up the SUMMARIES so any lingering code, cntrl characters or non XML valid characters don't come through? The following search causes it to barf:
http://www.mozdex.com/open.jsp?query=opensearch
Well, the problem with this particular offending page comes from the fact that the original HTML content had a different encoding than expected, so some non-latin characters ended up as control characters after invalid re-encoding.
But if you ignore this for a moment, the XML error comes from the fact that this offending character falls outside the declared encoding, which is Latin1.
Is there any particular reason why you use ISO-8859-1 instead of UTF-8? I think you need to use the latter in order to properly present international content. And then, you need to encode the data that you put in the response so that it follows the UTF-8 encoding - whether through your servlet container, or by simply calling String.getBytes("UTF-8") and writing these to the output...
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
