Re: [R] Umlaut read from csv-file

Prof Brian Ripley Fri, 07 Nov 2008 23:06:14 -0800

We have no idea what you understood (you didn't tell us), but the helpsays


encoding: character vector.  The encoding(s) to be assumed when 'file'
          is a character string: see 'file'.  A possible value is
          '"unknown"': see the ‘Details’.


...
     This paragraph applies if 'file' is a filename (rather than a
     connection).  If 'encoding = "unknown"', an attempt is made to
     guess the encoding.  The result of 'localeToCharset()' is used as
     a guide.  If 'encoding' has two or more elements, they are tried
     in turn until the file/URL can be read without error in the trial
     encoding.

So source(encoding="latin1") says the file is encoded in Latin-1 andshould be re-encoded if necessary (e.g. in UTF-8 locale).


Setting the Encoding of parsed character strings is not mentioned.

You could have written out a data frame with write.csv() and re-read itwith read.csv(encoding = "latin1"): that was the workaround you were givenearlier (not to use source).


On Sat, 8 Nov 2008, Heinz Tuechler wrote:

At 16:52 07.11.2008, Prof Brian Ripley wrote:
On Fri, 7 Nov 2008, Peter Dalgaard wrote:
Heinz Tuechler wrote:
Dear Prof.Ripley!

Thank you very much for your attention. In the given example Encoding(),
or the encoding parameter of read.csv solve the problem. I hope your
patch will solve also the problem, when I read a spss file by
spss.get(), since this function has no encoding parameter and my real
problem originated there.
read.spss() (package foreign) does have a reencode argument, though; and
this is called by spss.get(), so it looks like an easy hack to add it
there.
Yes, older software like spss.get needs to get updated for theinternationalization age. Modifying it to have a ... argument passed toread.spss would be a good idea (and future-proofing).
In cases like this it is likely that the SPSS file does contain itsencoding (although sometimes it does not and occasionally it is wrong), soit is helpful to make use of the info if it is there. However, the defaultis read.spss(reencode=NA) because of the problems of assuming that the infois correct when it is not are worse.
The cause, why I tried the example below was to solve the encoding by dumpingand then re-sourcing a data.frame with the encoding parameter set to latin1.As you can see, source(x, encoding='latin1') does not have the effect Iexpected. Unfortunately I do not have any idea, what I understood wrongregarding the meaning of encoding='latin1'.
Heinz Tüchler


us <- c("a", "b", "c", "ä", "ö", "ü")
Encoding(us)
[1] "unknown" "unknown" "unknown" "latin1"  "latin1"  "latin1"
dump('us', 'us_dump.txt')
rm(us)
source('us_dump.txt', encoding='latin1')
us
[1] "a" "b" "c" "ä" "ö" "ü"
Encoding(us)
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
unlink('us_dump.txt')
--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Umlaut read from csv-file

Reply via email to