Re: [Rd] importing explicitly declared missing values in read.spss (foreign)
First of all, apologies if you feel misquoted, I was only trying to keep things clear. Now, I have installed and tried the new version of the package and it works perfectly. It does exactly what it should do. I tested it on some huge SPSS's sample files which contained a lot of variables with several types of missingness, and all missing values were correctly converted to R NA values. I find this a very big improvement, and it makes the transition from spss to R even easier. Thank you very much! Prof Brian Ripley wrote: I've put up an experimental version at http://www.stats.ox.ac.uk/pub/R/foreign_0.8-28.1.tar.gz See the new 'use.missings' argument. It does what I think should happen in your example and the other one I tried, but more experience would be helpful. On Mon, 4 Aug 2008, Jeroen Ooms wrote: Please don't silently excise context -- see the posting guide for the rights of posters to be quoted fairly (and your usage of my posting fails to be fair). Prof Brian Ripley wrote: From the messages you get I do not believe this is a recent version of read.spss (message 2 no longer appears)... I am sorry you are right here, I was using an outdated version of foreign. I have updated my packages. My current version is now R version 2.7.1 (2008-06-23) with foreign_0.8-28. I have experimented importing some spss datafiles, mostly from the sample data files that are included with SPSS. Most of these files do not generate any warnings, so I am not sure this is related to the missingness. However, the problem of read.spss() not returning any information on missingness persists in all of these datafiles. Prof Brian Ripley wrote: All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general. Of course the 'least harmfull' behavior of the function completely depends on the data and the user's intentions. I was explicitly suggesting making the mapping of missing values to NA's optional, to give users who consider this appropriate, the option to replace these missings. I do not claim this to be the best default behavior, just a very useful feature. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18829484.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] importing explicitly declared missing values in read.spss (foreign)
Prof Brian Ripley wrote: From the messages you get I do not believe this is a recent version of read.spss (message 2 no longer appears)... I am sorry you are right here, I was using an outdated version of foreign. I have updated my packages. My current version is now R version 2.7.1 (2008-06-23) with foreign_0.8-28. I have experimented importing some spss datafiles, mostly from the sample data files that are included with SPSS. Most of these files do not generate any warnings, so I am not sure this is related to the missingness. However, the problem of read.spss() not returning any information on missingness persists in all of these datafiles. Prof Brian Ripley wrote: All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general. Of course the 'least harmfull' behavior of the function completely depends on the data and the user's intentions. I was explicitly suggesting making the mapping of missing values to NA's optional, to give users who consider this appropriate, the option to replace these missings. I do not claim this to be the best default behavior, just a very useful feature. -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18809176.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] importing explicitly declared missing values in read.spss (foreign)
I've put up an experimental version at http://www.stats.ox.ac.uk/pub/R/foreign_0.8-28.1.tar.gz See the new 'use.missings' argument. It does what I think should happen in your example and the other one I tried, but more experience would be helpful. On Mon, 4 Aug 2008, Jeroen Ooms wrote: Please don't silently excise context -- see the posting guide for the rights of posters to be quoted fairly (and your usage of my posting fails to be fair). Prof Brian Ripley wrote: From the messages you get I do not believe this is a recent version of read.spss (message 2 no longer appears)... I am sorry you are right here, I was using an outdated version of foreign. I have updated my packages. My current version is now R version 2.7.1 (2008-06-23) with foreign_0.8-28. I have experimented importing some spss datafiles, mostly from the sample data files that are included with SPSS. Most of these files do not generate any warnings, so I am not sure this is related to the missingness. However, the problem of read.spss() not returning any information on missingness persists in all of these datafiles. Prof Brian Ripley wrote: All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general. Of course the 'least harmfull' behavior of the function completely depends on the data and the user's intentions. I was explicitly suggesting making the mapping of missing values to NA's optional, to give users who consider this appropriate, the option to replace these missings. I do not claim this to be the best default behavior, just a very useful feature. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] importing explicitly declared missing values in read.spss (foreign)
From the messages you get I do not believe this is a recent version of read.spss (message 2 no longer appears), and you haven't followed the posting guide and told us. However, your message 3 does still appear, and that might be significant. A small anount of googling came up with https://stat.ethz.ch/pipermail/r-help/2008-April/159342.html and I guess this is the same issue. A quick look at the code for read.spss() suggests that the information on user-defined missing values is being read in, and that there are yet more possible types of missingness (only some of which I understand). So what is needed is to return that info to the R user: now we have an example at least something shold be possible. On Fri, 1 Aug 2008, Jeroen Ooms wrote: There is a problem when importing an spss-file containing explicitly declared missing values in R using the read.spss function from the foreign package. I'm not sure these problems are the same in every version of spss, I am using the latest version 16.0.2. I included http://www.nabble.com/file/p18776776/missingdata.sav missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg frequencies.jpg as an example. The data contains 3 types of missing data: 2 are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the third type are the system missings. When this file is imported in R, only the system missings are recognized as missing values, the others are just imported as levels in the nominal case, and as (labeled) real values 8 and 9 in the continuous case. There are also no attributes in the object returned by read.spss that contain information about which values/levels are the missing values; their missingness seems to be completely ignored by the function. Is there some way or other function to be able to import spss files, with an option that replaces all missing values with NA's in R? Of course this comes with the trade-off of losing the meaning of the missingness when there are multiple types of missingness, but I think this is far less harmfull than treating all missing values as normal values. If the missingness information were returned others are likely to disagree, especially for factors. All that is 'harmfull' is that you are not told that value labels NA and NAP were to be regarded as 'missing' in SPSS. We've no idea whether if would be a more or less egregious choice to map them to R's NA, and certainly are not in a position to assert 'far less harmfull' in general. [code] mydata - read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame=T) Warning messages: 1: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 16 encountered in system file 3: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 20 encountered in system file mydata SUBJECT CATEGORI CONTINUO 11 yes 3.11 22 yes 2.10 33 yes 5.34 44 yes 1.54 55 yes 3.89 66 no 2.98 77 no 4.53 88 no 1.98 99 no 3.68 10 10 no 2.94 11 11 NA 8.00 12 12 NA 8.00 13 13 NA 8.00 14 14 NA 8.00 15 15 NA 8.00 16 16 NAP 9.00 17 17 NAP 9.00 18 18 NAP 9.00 19 19 NAP 9.00 20 20 NAP 9.00 21 21 NA NA 22 22 NA NA 23 23 NA NA 24 24 NA NA 25 25 NA NA is.na(mydata$CONTINUO) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE is.na(mydata$CATEGORI) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE summary(mydata) SUBJECT CATEGORICONTINUO Min. : 1 yes :5 Min. :1.540 1st Qu.: 7 no :5 1st Qu.:3.078 Median :13 NA :5 Median :6.670 Mean :13 NAP :5 Mean :5.854 3rd Qu.:19 NA's:5 3rd Qu.:8.250 Max. :25Max. :9.000 NA's :5.000 [/code] -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Brian D. Ripley,
[Rd] importing explicitly declared missing values in read.spss (foreign)
There is a problem when importing an spss-file containing explicitly declared missing values in R using the read.spss function from the foreign package. I'm not sure these problems are the same in every version of spss, I am using the latest version 16.0.2. I included http://www.nabble.com/file/p18776776/missingdata.sav missingdata.sav and http://www.nabble.com/file/p18776776/frequencies.jpg frequencies.jpg as an example. The data contains 3 types of missing data: 2 are explicitly declared as a missing-value ('8' = NA and '9' = NAP), the third type are the system missings. When this file is imported in R, only the system missings are recognized as missing values, the others are just imported as levels in the nominal case, and as (labeled) real values 8 and 9 in the continuous case. There are also no attributes in the object returned by read.spss that contain information about which values/levels are the missing values; their missingness seems to be completely ignored by the function. Is there some way or other function to be able to import spss files, with an option that replaces all missing values with NA's in R? Of course this comes with the trade-off of losing the meaning of the missingness when there are multiple types of missingness, but I think this is far less harmfull than treating all missing values as normal values. [code] mydata - read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame=T) Warning messages: 1: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: File-indicated character representation code (1252) looks like a Windows codepage 2: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 16 encountered in system file 3: In read.spss(c:/users/jeroen/desktop/missingdata.sav, to.data.frame = T) : c:/users/jeroen/desktop/missingdata.sav: Unrecognized record type 7, subtype 20 encountered in system file mydata SUBJECT CATEGORI CONTINUO 11 yes 3.11 22 yes 2.10 33 yes 5.34 44 yes 1.54 55 yes 3.89 66 no 2.98 77 no 4.53 88 no 1.98 99 no 3.68 10 10 no 2.94 11 11 NA 8.00 12 12 NA 8.00 13 13 NA 8.00 14 14 NA 8.00 15 15 NA 8.00 16 16 NAP 9.00 17 17 NAP 9.00 18 18 NAP 9.00 19 19 NAP 9.00 20 20 NAP 9.00 21 21 NA NA 22 22 NA NA 23 23 NA NA 24 24 NA NA 25 25 NA NA is.na(mydata$CONTINUO) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE is.na(mydata$CATEGORI) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE summary(mydata) SUBJECT CATEGORICONTINUO Min. : 1 yes :5 Min. :1.540 1st Qu.: 7 no :5 1st Qu.:3.078 Median :13 NA :5 Median :6.670 Mean :13 NAP :5 Mean :5.854 3rd Qu.:19 NA's:5 3rd Qu.:8.250 Max. :25Max. :9.000 NA's :5.000 [/code] -- View this message in context: http://www.nabble.com/importing-explicitly-declared-missing-values-in-read.spss-%28foreign%29-tp18776776p18776776.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel