On Wed, 2 Aug 2006, Thomas Kuster wrote:
Hello
When I read a SPSS *.por file with read.spss everything after a umlaut is
missing:
This sounds like a conflict between encodings -- eg if R is assuming UTF-8
and the file is encoding in Latin-1 then the sequence
U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
U+0072 : LATIN SMALL LETTER R
is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.
The underlying C code (being written in the US quite a long time ago)
doesn't know about encodings, and I don't know what the rules are in SPSS
for valid characters (I suspect that in these old portable file formats it
probably just reads and writes bytes, leaving it up to the OS to interpret
them.
You could try running R in a non-UTF-8 locale to see if it helps.
If anyone has definitive information about how SPSS represents strings and
decides on valid characters that might be useful too.
-thomas
library("foreign")
spssdaten <- read.spss("projets.por")
attr(spssdaten$PROJETX, "value.labels")[1:20]
Bg Stammzellenforschung Bb
863 862
Bb Neugestaltung des Finanzausgleichs
861 854
EV Postdienste f Bb
853 852
Bb Bg Steuerpaket
851 843
Bb Anhebung der Mehrwertsteuer s 11. AHV-Revision
842 841
Volkinitiative Lebenslange Verwahrung
833 832
Gegenentwurf zur Avanti EV Lehrstellen-Initiative
831 824
EV Moratorium Plus EV Strom ohne Atom
823 822
EV Ja zu fairen Mieten EV Gleiche Rechte f
821 815
EV Gesundheitsinitiative EV Sonntags-Initiative
814 813
The SPSS-File is okay:
system("cat projets.por |grep Postdienste")
echtserwerb 3. GenerationSD/N/EV Postdienste für alleSE/16/Änderrung Bg EOG
Mut
How can I read the SPSS-File with the Umlaut?
Bye
Thomas Kuster
R: 2.1.0 (2005-04-18)
OS: Debian Linux, 2.6.10-isgee-neptun-1
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Thomas Lumley Assoc. Professor, Biostatistics
[EMAIL PROTECTED] University of Washington, Seattle
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.