On 17.04.2004 22:33, Upayavira wrote:
A few I18N questions:
1) I have some polish text that uses character entities, such as "się" How can I translate this into a single or double byte character in either ISO-8859-1, ISO-8859-2 or UTF-8?
They are not translated. While you have the entity representation in the XML files, you have characters in Java. Only the serializer decides whether it puts them out as character or character entity. In general this can't be influenced, but the one or the other serializer might have configuration options for this. But at the end (i.e. in the browser or where ever) it should work for both the entity and the character as they represent the same "thing".
Ah. But if I want to convert the entities into characters as a one-off offline event (e.g in a text editor, or perl script)?
2) I can set the encoding of a page in the serialiser configuration. How do I deal with the situation where the best encoding depends upon the language, which means that the encoding should be chosen based upon the encoding of a source file?
That's not possible. As written above you have more or less encoding-neutral characters in Java (obviously not completely as somewhere in the memory they are also just bytes). But at least they are independent on the encoding of the original file. You do not know in which encoding the XML file was. You have to decide the serializer's encoding only based on the possible character range. If it's strewed over the ISO char sets better use UTF-8 in general. Another option would be to use a selector based on user's locale which chooses the serializer (with a specific encoding).
So UTF-8 is a good encoding to use, it sounds like. So, if I have multiple languages, it is best to aim for UTF-8 as a source encoding, and serialize to that.
But, if I have got these characters as characters not entities, then I could encode it as iso-8859-1, and serialize as UTF-8, and the necessary translation would happen?
Those i18n ignorant English men! ;-)
Yup. But at least this one's willing to learn (at last!)
Regards, Upayavira
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
