Hi!

I've been observing the recent SVN log entries about encoding information in 
CHARSXPs with great interest. This looks like a very nice addition. While 
this is still work in progress, I'd like to suggest the following extra:

At least in RKWard, all shown strings need to be converted to UTF-8 (the 
internal storage format used in Qt QStrings). This needs to be done 
independent of the current locale, and the encoding used in the embedded R 
process. I imagine other graphical or non-graphical toolkits will similarly 
use UTF-8 to store strings, internally.

For this reason, an addition of e.g.

char* Rf_translateCharToUTF8(SEXP);

would be nice. This function would translate to UTF-8 independently of the 
current LC_CTYPE. While it is possible to achieve the same effect by first 
translating the strings to the current LC_CTYPE encoding (using 
Rf_translateChar()), and then translate to UTF-8 in a second step (using 
custom means, if needed), being able to do this conversion in a single step 
would be more elegant, and also potentially avoid expensive recoding steps.

Alternatively, having access to the IS_UTF8 and IS_LATIN1 macros from C would 
be good enough to hand-code efficient conversion to UTF-8 (but may be too 
close to the internals).

Not sure, whether this is considered important enough to warant inclusion in 
the API, but I just wanted to throw in the idea in time.

Regards
Thomas Friedrichsmeier

Attachment: pgpGliPm0PTAW.pgp
Description: PGP signature

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to