[Rd] latin1,utf-8...encoding and data

Stéphane Dray Wed, 18 Oct 2006 08:13:34 -0700

Hello,
I have some questions concerning encoding and package distribution. We 
develop the ade4 package. For some data sets included in the package, 
there are accentued character (e.g. é,è...). The data sets have been 
saved using latin1 encoding, but some of us use utf-8 and can not see 
some data sets which contains accented chracters.
e.g:


librarry(ade4)
data(rankrock)
rankrock

in this case, characters are in rownames. Other data sets have such 
characters in data (e.g. levels of factors..). A solution is to use 
iconv... this is quite easy for us but perhaps more difficult for a user 
which can have no idea of the problem. This problem is quite marginal 
for the moment but some linux distribution are utf-8 by default (e.g. 
ubuntu) and I suppose that the problem will be more and more present in 
the future.

So we wonder if there is a proper way to code and save these data sets. 
I have found some documents of B. Ripley and this note :

http://developer.r-project.org/210update.txt

  -  Names in data objects (e.g. in .rda files) are problematic.  It
     is likely that by release time these will be treated as in
     Latin-1.

If I am correct, I did not find an answer to this problem.

What are the plans of R gurus on this question ?
Thanks a lot.
Sincerely.

Please add my adress in answers as I am not subsciber of this list.


-- 
Stéphane DRAY ([EMAIL PROTECTED] )
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - Lyon I
43, Bd du 11 Novembre 1918, 69622 Villeurbanne Cedex, France
Tel: 33 4 72 43 27 57       Fax: 33 4 72 43 13 88
http://biomserv.univ-lyon1.fr/~dray/

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] latin1,utf-8...encoding and data

Reply via email to