My understanding is that Java's default encoding is OS- and locale-
dependent, not UTF-8. For example, on my US-based Mac, I get
"MacRoman" as the default character set when doing "System.out.println
(java.nio.charset.Charset.defaultCharset())".
Changing this is not hard, but may confuse other users who rely on
the platform-dependent nature of the default. I'm somewhat reluctant
to impose UTF-8 as the default, so I'll look into providing an
explicit command line option. In the meantime, I suggest a little
script that reads your file in UTF-8 (or whatever your preferred
encoding is ) and converts into unicode escapes.
Robert
On Feb 5, 2007, at 7:45 PM, Steven Foster wrote:
Thanks Robert. I am puzzled because I thought that utf8 IS the
default charset for java. Is that incorrect?
How difficult would it be to change Rats! to read utf8 files?
For our purpose, it will be much easier to incorporate utf8 text in
the rats file rather than represent it as escaped hex.
- Steven
Robert Grimm wrote:
Right now, Rats! just uses Java's default character encoding for
reading and writing files. I guess I might/should change that to
UTF-8. You can, however, use unicode escapes ('\\' 'u' hex hex hex
hex) in character and string literals to denote non-ASCII characters.
Robert
On Feb 5, 2007, at 7:11 PM, Steven Foster wrote:
Hi everyone,
It seems I can't write *.rats source files in unicode encodings
( neither utf8, utf-16 nor ucs-2). Rats immediately gives
error message about invalid characters at beginning of file.
Do I need to recompile Rats to read unicode source files?
Or have I edited the ?.rats file with the wrong editor? ( I tried
several )
I have tried both big-endian and little-endian utf-16.
Thank you very much!
Steven
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg
_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg