On Wed, 2 Aug 2006, Thomas Kuster wrote:

Hello

When I read a SPSS *.por file with read.spss everything after a umlaut is
missing:

This sounds like a conflict between encodings -- eg if R is assuming UTF-8 and the file is encoding in Latin-1 then the sequence
U+00FC : LATIN SMALL LETTER U WITH DIAERESIS
U+0072 : LATIN SMALL LETTER R
is coded as FC72 in the file, which is an illegal byte sequence in UTF-8.

The underlying C code (being written in the US quite a long time ago) doesn't know about encodings, and I don't know what the rules are in SPSS for valid characters (I suspect that in these old portable file formats it probably just reads and writes bytes, leaving it up to the OS to interpret them.

You could try running R in a non-UTF-8 locale to see if it helps.

If anyone has definitive information about how SPSS represents strings and decides on valid characters that might be useful too.

        -thomas

library("foreign")
spssdaten <- read.spss("projets.por")
attr(spssdaten$PROJETX, "value.labels")[1:20]
             Bg Stammzellenforschung                                  Bb
                                 863                                   862
Bb Neugestaltung des Finanzausgleichs
                                 861                                   854
                    EV Postdienste f                                   Bb
                                 853                                   852
                                 Bb                         Bg Steuerpaket
                                 851                                   843
    Bb Anhebung der Mehrwertsteuer s                      11. AHV-Revision
                                 842                                   841
Volkinitiative Lebenslange Verwahrung
                                 833                                   832
             Gegenentwurf zur Avanti             EV Lehrstellen-Initiative
                                 831                                   824
                  EV Moratorium Plus                    EV Strom ohne Atom
                                 823                                   822
              EV Ja zu fairen Mieten                   EV Gleiche Rechte f
                                 821                                   815
            EV Gesundheitsinitiative                EV Sonntags-Initiative
                                 814                                   813

The SPSS-File is okay:
system("cat projets.por |grep Postdienste")
echtserwerb 3. GenerationSD/N/EV Postdienste für alleSE/16/Änderrung Bg  EOG
Mut

How can I read the SPSS-File with the Umlaut?

Bye
Thomas Kuster

R: 2.1.0 (2005-04-18)
OS: Debian Linux, 2.6.10-isgee-neptun-1

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to