On 5/18/11 12:11 AM, James Kosin wrote:
+1

But I'd like to see more mapping of languages to default encoding types
as well for each language.  Or automatic support in java for the
language and encoding via the OS first and override options for those
performing multiple languages than the native.

Making the encoding dependent on the language is not really well-defined,
with which encoding do I end up when I specify French as language?

The encoding could be the default encoding of the platform and additionally be defined by the user. We decided that the user must always specify the encoding, because
then he needs to think about in which encoding the training/test data is.

Since training is often done for foreign languages I believe it prevents many from just
running with the incorrect default encoding.

Anyway I also use OS X where MacRoman is the default encoding which is just incompatible
with all the training data I have.

Jörn


Reply via email to