From: Dmitry I. Platonoff [mailto:[EMAIL PROTECTED]]
> There was a number of discussions lately about the problems with incorrect
> character encodings. In common, people often face the situation, when the
> parameters of a servlet (which usually come from a form submitted by user)
> are parsed and built using the wrong character encoding, which results in
> complete and unrecoverable loss of any text information supplied by user.

As a start I would suggest *accepting* UTF8 everywhere ASCII (8859_1)
is accepted now.  I don't think this will break anything, and would
support all (or most?) local character sets.

I think this is pretty much a no-brainer.

Second you might consider *generating* UTF8 everywhere ASCII (8859_1)
is generated now.  I don't think this will break anything, as any code
that successfully generates 8859_1 now will continue to encode into the
exact same bytes under UTF8.  Code that attempts to encode non-ASCII
characters into ASCII will currently get an exception and fail.
If the default encoding were changed to UTF8 that encoding
would always succeed.

I think this *might* be a no-brainer.

Third you might want the default file encoding for files read and written
to disk to be UTF8, so that any data received off the web (in any encoding)
can always be successfully saved to disk using the default encoding.

This means invoking Java like: "java -Dfile.encoding=UTF8 ...".

If the application needs to store some local files in another encoding
(like EBCDIC :) then this would be explicitly coded for (say) files saved
into a particular directory.


Of course, as an English-speaking (only) US programmer,
I'm perfectly happy to continue using ASCII... :).


> 1. INTRODUCTION.
>
> I don't know who invented the ASCII table. But I want these people to be
> sorry for what they did. :) The lion's share of all the i18n problems we
> have is caused by this perfect example of selfishness and ignorance.
>
> This is supposed to be a joke, and I do understand the conditions and
> limitations people had back then, but this is just one more sad example of
> what will happen if we forget to think about the interests of others. And
> we will have to clean up this particular mess for years still.

I think you need more of a sense of history here :).

Old machines could use even fewer bits per character (6 bits) as memory and
disk was *much* more expensive, and the ability 25% more in your *very*
expensive computer was worth the limitations.  It wasn't a lack of foresight
so much as the extreme expense of then-current hardware.

So give "these people" a break :).

___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

Reply via email to