[R] read.spss and encodings

Thomas Friedrichsmeier Thu, 01 Feb 2007 04:54:42 -0800

Hi!

I'm having trouble with importing spss files containing non-ascii characters 
(R 2.4.1, debian linux, i386). To reproduce:


Download the following file: 
http://statmath.wu-wien.ac.at/data/spss/de/comphomeneu.sav

require (foreign)
Sys.setlocale (locale="C")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1] im B\374ro
# Levels: im B\374ro zuhause

\374 of course is actually a u-umlaut. However, I guess in the C locale it's 
not expected to print as such. But now try this (use any UTF-8 locale you may 
have installed):

Sys.setlocale (locale="de_DE.UTF-8")
read.spss("comphomeneu.sav")$ARBEIT[1]
# prints:
# [1]Error in print.default(xx, quote = quote, ...) :
#        invalid multibyte string

To me it looks, like read.spss () would probably need an encoding parameter, 
and / or some iconv () magic. Now, locale conversion always makes my head 
spin, so I thought I'd better post here, before calling this to be a bug in 
R. Two questions:

1) Is there some way to work around this, i.e. make sure it is converted to 
proper UTF-8 while importing? Am I missing something obvious?
2) Should I submit this as a bug report?

Thanks!
Thomas Friedrichsmeier

pgpEhd7gpCdY9.pgp
Description: PGP signature

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.spss and encodings

Reply via email to