On 18.04.2004 07:32, Upayavira wrote:

1) I have some polish text that uses character entities, such as "się" How can I translate this into a single or double byte character in either ISO-8859-1, ISO-8859-2 or UTF-8?

They are not translated. While you have the entity representation in the XML files, you have characters in Java. Only the serializer decides whether it puts them out as character or character entity. In general this can't be influenced, but the one or the other serializer might have configuration options for this. But at the end (i.e. in the browser or where ever) it should work for both the entity and the character as they represent the same "thing".

Ah. But if I want to convert the entities into characters as a one-off offline event (e.g in a text editor, or perl script)?

Sorry, but I don't understand the question. When and where do you want to do the conversion?


So UTF-8 is a good encoding to use, it sounds like. So, if I have multiple languages, it is best to aim for UTF-8 as a source encoding, and serialize to that.

You do not need UTF-8 necessarily for the source, you can write the files also in their specific encoding, e.g. Polish in the mentioned ISO-8859-2. This does absolutely no harm and causes no problems.


UTF-8 simplifies only the handling with the browser massively. If you deliver pages in different encoding to different users depending on the locale, you also have to parse their requests with different encodings (aka form encoding).

But, if I have got these characters as characters not entities, then I could encode it as iso-8859-1, and serialize as UTF-8, and the necessary translation would happen?

IIUC this question is answered with the section above, isn't it?


Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to