Hello Michi,

Tuesday, January 15, 2002, 5:17:30 PM, you wrote:

MR> Hi!

MR> I want to write a method, that converts text into html-readable characters.
MR> so I have to replace "<", ">", "&" and "\" with their named entities - that is 
clear.

Yup, looks like everybody has to write such function at least a dozen
times in his life :-)!

MR> but what about unicode and characters above the ASCII-128.
MR> I think, if I have got a text (with or without unicode-characters) it is ok, to
MR> substitute all characters above ASCII-128 and all unicode charcters with &#xxx;
Well, it's really okay unless you do not mind spending 6 or more bytes
per char ( &#xxx; takes 6 bytes at least).
It depends on what charset you use. (You set it with
response.setContentType("text/html; charset=ISO-8859-1");
)
If you set ISO-8859-1 or do not touch anything and get this charset by
default, then you have to encode all the chars > 255 as &#xxx;

MR> but how to know the right unicode-encoding for the ASCII-characters 128-255 ???

MR> I think the first 256 unicode-characters are identical to iso-8859-1 (is this 
correct???).
IMHO it's correct.
So you do not have to replace 128-255. Leave them as they are.

MR> so what if I want to substitute greek-characters (0370-03FF unicode and 
iso-8859-7)!
MR> how do I know, how to subsitute each character?
See no problem here, just replace character any character over 255
with the appropriate &#abc; sequence. For example replace the U+0370
char with &#880; or with &#x370; whatever you like better.

The code that followes &# in HTML 4.0 is a Unicode code of the char
(either decimal in case of &#abc; or hex in case &#xabc;)

Good luck!

--
Best regards,
 Anton Tagunov                            mailto:[EMAIL PROTECTED]

___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

  • unicode Michi Reutter
    • Anthony Tagunov

Reply via email to