Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

2008-08-05 Thread Jeroen Ooms

First of all, apologies if you feel misquoted, I was only trying to keep
things clear. Now, I have installed and tried the new version of the package
and it works perfectly. It does exactly what it should do. I tested it on
some huge SPSS's sample files which contained a lot of variables with
several types of missingness, and all missing values were correctly
converted to R NA values. I find this a very big improvement, and it makes
the transition from spss to R even easier. Thank you very much!






Prof Brian Ripley wrote:
 
 I've put up an experimental version at
 
 http://www.stats.ox.ac.uk/pub/R/foreign_0.8-28.1.tar.gz
 
 See the new 'use.missings' argument.  It does what I think should happen
 in your example and the other one I tried, but more experience would be
 helpful.
 
 On Mon, 4 Aug 2008, Jeroen Ooms wrote:
 
 Please don't silently excise context -- see the posting guide for the
 rights of posters to be quoted fairly (and your usage of my posting fails
 to be fair).
 
 Prof Brian Ripley wrote:

 From the messages you get I do not believe this is a recent version of
 read.spss (message 2 no longer appears)...

 I am sorry you are right here, I was using an outdated version of
 foreign. I
 have updated my packages. My current version is now R version 2.7.1
 (2008-06-23) with foreign_0.8-28.

 I have experimented importing some spss datafiles, mostly from the sample
 data files that are included with SPSS. Most of these files do not
 generate
 any warnings, so I am not sure this is related to the missingness.
 However,
 the problem of read.spss() not returning any information on missingness
 persists in all of these datafiles.


 Prof Brian Ripley wrote:

 All that is 'harmfull' is that you are not told that value labels NA and
 NAP were to be regarded as 'missing' in SPSS.  We've no idea whether if
 would be a more or less egregious choice to map them to R's NA, and
 certainly are not in a position to assert 'far less harmfull' in
 general.

 Of course the 'least harmfull' behavior of the function completely
 depends
 on the data and the user's intentions. I was explicitly suggesting making
 the mapping of missing values to NA's optional, to give users who
 consider
 this appropriate, the option to replace these missings. I do not claim
 this
 to be the best default behavior, just a very useful feature.
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595
 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
 
 

-- 
View this message in context: 
http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18829484.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

2008-08-04 Thread Jeroen Ooms


Prof Brian Ripley wrote:
 
From the messages you get I do not believe this is a recent version of
read.spss (message 2 no longer appears)... 

I am sorry you are right here, I was using an outdated version of foreign. I
have updated my packages. My current version is now R version 2.7.1
(2008-06-23) with foreign_0.8-28.

I have experimented importing some spss datafiles, mostly from the sample
data files that are included with SPSS. Most of these files do not generate
any warnings, so I am not sure this is related to the missingness. However,
the problem of read.spss() not returning any information on missingness
persists in all of these datafiles.


Prof Brian Ripley wrote:
 
 All that is 'harmfull' is that you are not told that value labels NA and
 NAP were to be regarded as 'missing' in SPSS.  We've no idea whether if
 would be a more or less egregious choice to map them to R's NA, and
 certainly are not in a position to assert 'far less harmfull' in general.

Of course the 'least harmfull' behavior of the function completely depends
on the data and the user's intentions. I was explicitly suggesting making
the mapping of missing values to NA's optional, to give users who consider
this appropriate, the option to replace these missings. I do not claim this
to be the best default behavior, just a very useful feature.


-- 
View this message in context: 
http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18809176.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

2008-08-04 Thread Prof Brian Ripley

I've put up an experimental version at

http://www.stats.ox.ac.uk/pub/R/foreign_0.8-28.1.tar.gz

See the new 'use.missings' argument.  It does what I think should happen
in your example and the other one I tried, but more experience would be
helpful.

On Mon, 4 Aug 2008, Jeroen Ooms wrote:

Please don't silently excise context -- see the posting guide for the
rights of posters to be quoted fairly (and your usage of my posting fails
to be fair).


Prof Brian Ripley wrote:



From the messages you get I do not believe this is a recent version of

read.spss (message 2 no longer appears)...

I am sorry you are right here, I was using an outdated version of foreign. I
have updated my packages. My current version is now R version 2.7.1
(2008-06-23) with foreign_0.8-28.

I have experimented importing some spss datafiles, mostly from the sample
data files that are included with SPSS. Most of these files do not generate
any warnings, so I am not sure this is related to the missingness. However,
the problem of read.spss() not returning any information on missingness
persists in all of these datafiles.


Prof Brian Ripley wrote:


All that is 'harmfull' is that you are not told that value labels NA and
NAP were to be regarded as 'missing' in SPSS.  We've no idea whether if
would be a more or less egregious choice to map them to R's NA, and
certainly are not in a position to assert 'far less harmfull' in general.


Of course the 'least harmfull' behavior of the function completely depends
on the data and the user's intentions. I was explicitly suggesting making
the mapping of missing values to NA's optional, to give users who consider
this appropriate, the option to replace these missings. I do not claim this
to be the best default behavior, just a very useful feature.


--
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] importing explicitly declared missing values in read.spss (foreign)

2008-08-03 Thread Prof Brian Ripley
From the messages you get I do not believe this is a recent version of 
read.spss (message 2 no longer appears), and you haven't followed the 
posting guide and told us.  However, your message 3 does still appear, and 
that might be significant.


A small anount of googling came up with

https://stat.ethz.ch/pipermail/r-help/2008-April/159342.html

and I guess this is the same issue.  A quick look at the code for 
read.spss() suggests that the information on user-defined missing values 
is being read in, and that there are yet more possible types of 
missingness (only some of which I understand).  So what is needed is to 
return that info to the R user: now we have an example at least something 
shold be possible.


On Fri, 1 Aug 2008, Jeroen Ooms wrote:



There is a problem when importing an spss-file containing explicitly declared
missing values in R using the read.spss function from the foreign package.
I'm not sure these problems are the same in every version of spss, I am
using the latest version 16.0.2.

I included  http://www.nabble.com/file/p18776776/missingdata.sav
missingdata.sav  and  http://www.nabble.com/file/p18776776/frequencies.jpg
frequencies.jpg  as an example. The data contains 3 types of missing data: 2
are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the
third type are the system missings. When this file is imported in R, only
the system missings are recognized as missing values, the others are just
imported as levels in the nominal case, and as (labeled) real values 8 and 9
in the continuous case. There are also no attributes in the object returned
by read.spss that contain information about which values/levels are the
missing values; their missingness seems to be completely ignored by the
function.

Is there some way or other function to be able to import spss files, with an
option that replaces all missing values with NA's in R? Of course this
comes with the trade-off of losing the meaning of the missingness when there
are multiple types of missingness, but I think this is far less harmfull
than treating all missing values as normal values.


If the missingness information were returned others are likely to 
disagree, especially for factors.  All that is 'harmfull' is that you are 
not told that value labels NA and NAP were to be regarded as 'missing' in 
SPSS.  We've no idea whether if would be a more or less egregious choice 
to map them to R's NA, and certainly are not in a position to assert 'far 
less harmfull' in general.




[code]

mydata - read.spss(c:/users/jeroen/desktop/missingdata.sav,
to.data.frame=T)

Warning messages:
1: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: File-indicated character
representation code (1252) looks like a Windows codepage
2: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 16 encountered in system file
3: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
 c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 20 encountered in system file


mydata

  SUBJECT CATEGORI CONTINUO
11  yes 3.11
22  yes 2.10
33  yes 5.34
44  yes 1.54
55  yes 3.89
66   no 2.98
77   no 4.53
88   no 1.98
99   no 3.68
10  10   no 2.94
11  11   NA 8.00
12  12   NA 8.00
13  13   NA 8.00
14  14   NA 8.00
15  15   NA 8.00
16  16  NAP 9.00
17  17  NAP 9.00
18  18  NAP 9.00
19  19  NAP 9.00
20  20  NAP 9.00
21  21 NA   NA
22  22 NA   NA
23  23 NA   NA
24  24 NA   NA
25  25 NA   NA


is.na(mydata$CONTINUO)

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
TRUE


is.na(mydata$CATEGORI)

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
TRUE


summary(mydata)

   SUBJECT   CATEGORICONTINUO
Min.   : 1   yes :5   Min.   :1.540
1st Qu.: 7   no  :5   1st Qu.:3.078
Median :13   NA  :5   Median :6.670
Mean   :13   NAP :5   Mean   :5.854
3rd Qu.:19   NA's:5   3rd Qu.:8.250
Max.   :25Max.   :9.000
  NA's   :5.000
[/code]


--
View this message in context: 
http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Brian D. Ripley,

[Rd] importing explicitly declared missing values in read.spss (foreign)

2008-08-01 Thread Jeroen Ooms

There is a problem when importing an spss-file containing explicitly declared
missing values in R using the read.spss function from the foreign package.
I'm not sure these problems are the same in every version of spss, I am
using the latest version 16.0.2.

I included  http://www.nabble.com/file/p18776776/missingdata.sav
missingdata.sav  and  http://www.nabble.com/file/p18776776/frequencies.jpg
frequencies.jpg  as an example. The data contains 3 types of missing data: 2
are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the
third type are the system missings. When this file is imported in R, only
the system missings are recognized as missing values, the others are just
imported as levels in the nominal case, and as (labeled) real values 8 and 9
in the continuous case. There are also no attributes in the object returned
by read.spss that contain information about which values/levels are the
missing values; their missingness seems to be completely ignored by the
function.

Is there some way or other function to be able to import spss files, with an
option that replaces all missing values with NA's in R? Of course this
comes with the trade-off of losing the meaning of the missingness when there
are multiple types of missingness, but I think this is far less harmfull
than treating all missing values as normal values. 

[code]
 mydata - read.spss(c:/users/jeroen/desktop/missingdata.sav,
 to.data.frame=T)
Warning messages:
1: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
  c:/users/jeroen/desktop/missingdata.sav: File-indicated character
representation code (1252) looks like a Windows codepage
2: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
  c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 16 encountered in system file
3: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame =
T) :
  c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7,
subtype 20 encountered in system file

 mydata
   SUBJECT CATEGORI CONTINUO
11  yes 3.11
22  yes 2.10
33  yes 5.34
44  yes 1.54
55  yes 3.89
66   no 2.98
77   no 4.53
88   no 1.98
99   no 3.68
10  10   no 2.94
11  11   NA 8.00
12  12   NA 8.00
13  13   NA 8.00
14  14   NA 8.00
15  15   NA 8.00
16  16  NAP 9.00
17  17  NAP 9.00
18  18  NAP 9.00
19  19  NAP 9.00
20  20  NAP 9.00
21  21 NA   NA
22  22 NA   NA
23  23 NA   NA
24  24 NA   NA
25  25 NA   NA

 is.na(mydata$CONTINUO)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE 
TRUE

 is.na(mydata$CATEGORI)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE 
TRUE

 summary(mydata)
SUBJECT   CATEGORICONTINUO
 Min.   : 1   yes :5   Min.   :1.540  
 1st Qu.: 7   no  :5   1st Qu.:3.078  
 Median :13   NA  :5   Median :6.670  
 Mean   :13   NAP :5   Mean   :5.854  
 3rd Qu.:19   NA's:5   3rd Qu.:8.250  
 Max.   :25Max.   :9.000  
   NA's   :5.000  
[/code]


-- 
View this message in context: 
http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel