Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

Prof Brian Ripley Sun, 03 Aug 2008 22:39:59 -0700

From the messages you get I do not believe this is a recent version of

read.spss (message 2 no longer appears), and you haven't followed theposting guide and told us. However, your message 3 does still appear, andthat might be significant.


A small anount of googling came up with


https://stat.ethz.ch/pipermail/r-help/2008-April/159342.html

and I guess this is the same issue. A quick look at the code forread.spss() suggests that the information on user-defined missing valuesis being read in, and that there are yet more possible types ofmissingness (only some of which I understand). So what is needed is toreturn that info to the R user: now we have an example at least somethingshold be possible.


On Fri, 1 Aug 2008, Jeroen Ooms wrote:


There is a problem when importing an spss-file containing explicitly declared
missing values in R using the read.spss function from the foreign package.
I'm not sure these problems are the same in every version of spss, I am
using the latest version 16.0.2.

I included  http://www.nabble.com/file/p18776776/missingdata.sav
missingdata.sav  and  http://www.nabble.com/file/p18776776/frequencies.jpg
frequencies.jpg  as an example. The data contains 3 types of missing data: 2
are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the
third type are the system missings. When this file is imported in R, only
the system missings are recognized as missing values, the others are just
imported as levels in the nominal case, and as (labeled) real values 8 and 9
in the continuous case. There are also no attributes in the object returned
by read.spss that contain information about which values/levels are the
missing values; their missingness seems to be completely ignored by the
function.

Is there some way or other function to be able to import spss files, with an
option that replaces all missing values with <NA>'s in R? Of course this
comes with the trade-off of losing the meaning of the missingness when there
are multiple types of missingness, but I think this is far less harmfull
than treating all missing values as normal values.

If the missingness information were returned others are likely todisagree, especially for factors. All that is 'harmfull' is that you arenot told that value labels NA and NAP were to be regarded as 'missing' inSPSS. We've no idea whether if would be a more or less egregious choiceto map them to R's NA, and certainly are not in a position to assert 'farless harmfull' in general.

[code]

mydata <- read.spss("c:/users/jeroen/desktop/missingdata.sav",
to.data.frame=T)

Warning messages:
1: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: File-indicated character
representation code (1252) looks like a Windows codepage
2: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 16 encountered in system file
3: In read.spss("c:/users/jeroen/desktop/missingdata.sav", to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 20 encountered in system file

mydata

  SUBJECT CATEGORI CONTINUO
1        1      yes     3.11
2        2      yes     2.10
3        3      yes     5.34
4        4      yes     1.54
5        5      yes     3.89
6        6       no     2.98
7        7       no     4.53
8        8       no     1.98
9        9       no     3.68
10      10       no     2.94
11      11       NA     8.00
12      12       NA     8.00
13      13       NA     8.00
14      14       NA     8.00
15      15       NA     8.00
16      16      NAP     9.00
17      17      NAP     9.00
18      18      NAP     9.00
19      19      NAP     9.00
20      20      NAP     9.00
21      21     <NA>       NA
22      22     <NA>       NA
23      23     <NA>       NA
24      24     <NA>       NA
25      25     <NA>       NA

is.na(mydata$CONTINUO)

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
TRUE

is.na(mydata$CATEGORI)

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
TRUE

summary(mydata)

   SUBJECT   CATEGORI    CONTINUO
Min.   : 1   yes :5   Min.   :1.540
1st Qu.: 7   no  :5   1st Qu.:3.078
Median :13   NA  :5   Median :6.670
Mean   :13   NAP :5   Mean   :5.854
3rd Qu.:19   NA's:5   3rd Qu.:8.250
Max.   :25            Max.   :9.000
                      NA's   :5.000
[/code]


--
View this message in context: 
http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html
Sent from the R devel mailing list archive at Nabble.com.

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

Reply via email to