Byron Miller wrote:

Oh yeah, does anyone have any tips on cleaning up the SUMMARIES so any
lingering code, cntrl characters or non XML valid characters don't come
through?  The following search causes it to barf:

http://www.mozdex.com/open.jsp?query=opensearch

Well, the problem with this particular offending page comes from the fact that the original HTML content had a different encoding than expected, so some non-latin characters ended up as control characters after invalid re-encoding.


But if you ignore this for a moment, the XML error comes from the fact that this offending character falls outside the declared encoding, which is Latin1.

Is there any particular reason why you use ISO-8859-1 instead of UTF-8? I think you need to use the latter in order to properly present international content. And then, you need to encode the data that you put in the response so that it follows the UTF-8 encoding - whether through your servlet container, or by simply calling String.getBytes("UTF-8") and writing these to the output...

--
Best regards,
Andrzej Bialecki
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Reply via email to