[R] missing values
hello all i would like to perform multiple imputation using the norm library. but i seem to get the following error when i use the da.norm function. Error in as.double.default(list(V1 = c(0.058177827, 0.123076923, 0.138713745, : (list) object cannot be coerced to 'double' can anyone help? thanking you in advance Allan Clark Lecturer in Statistical Sciences Department University of Cape Town 7701 Rondebosch South Africa TEL (Office): +27-21-650-3228 FAX: +27-21-650-4773 http://web.uct.ac.za/depts/stats/aclark.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] missing values
When it says 'matrix' it means it, not 'data frame'. On Wed, 30 May 2007, Allan Clark wrote: hello all i would like to perform multiple imputation using the norm library. but i seem to get the following error when i use the da.norm function. Error in as.double.default(list(V1 = c(0.058177827, 0.123076923, 0.138713745, : (list) object cannot be coerced to 'double' can anyone help? thanking you in advance Allan Clark Lecturer in Statistical Sciences Department University of Cape Town 7701 Rondebosch South Africa TEL (Office): +27-21-650-3228 FAX: +27-21-650-4773 http://web.uct.ac.za/depts/stats/aclark.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] missing values
hello, I need your help for this example for(k in LR) { + donGeno[[k]] - as.numeric(levels(factor(subset(don2, Id_Essai == 1006961 Id_Cara == LC[1] Id_Rep == k, select = Id_Geno)[,1]))) + print(donGeno[[k]])} [1] 65125 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 65137 65138 65139 65140 65141 65142 65143 65144 65171 [1] 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 65137 65138 65139 65140 65141 65142 65143 65144 65171 [1] 65125 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 65137 65138 65139 65140 65141 65142 65143 65144 65171 there are a missing value for the vector donGeno[[2]] in fact there aren't the value 65125 and I wanna cut this value in the others vectors and I tried to do this as follow C - vector() for(k in LR) { C[k] - length(donGeno[[k]]) } print(C) na=match(rep(0,length(C)-sum(match(C,C[1],nomatch=0))),match(C,C[1],nomatch=0)) #print(na) if(na==length(C)){ pos=match(0,match(donGeno[[na-1]],donGeno[[na]],nomatch=0)) for(k in 1:(na-1)) { donGeno[[k]] - donGeno[[k]][1:(na-1)] } } else{ pos=match(0,match(donGeno[[na+1]],donGeno[[na]],nomatch=0)) for(k in 1:(.)) } but I wonder if there's better from this script? ___ [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Missing values detected when there are no missing values
Hi On 22 Apr 2006 at 23:29, Bob Green wrote: Date sent: Sat, 22 Apr 2006 23:29:02 +1000 To: r-help@stat.math.ethz.ch From: Bob Green [EMAIL PROTECTED] Subject:[R] Missing values detected when there are no missing values I am hoping for some advice on the following matters. I have a csv data file with 153 variables x 92 rows. To determine what the variables looked like I ran the summary command. One variable had a large number of missing values 54/92. For some reason, all subsequent 74 variables are reported as having 92 NA values, irrespective of whether the original csv variable was complete or not. I have not seen any answer yet so I try to shot one. first how do you know there is not any missing value in your csv file? Below are the commands I ran: study1dat - read.csv(c:\\study1r.csv,header=T) attach(study1dat) names(study1dat) summary(study1dat) You showed what you did but we can not know much about study1r.csv so my answer is only guess. Let's assume that csv was constructed from Excel, couldn't be a problem in its construction? Some space in some columns which are not seen in Excel but are exported to csv and read to R as NA values? What does str(study1dat) say about your data? And are there really , vaues separators and . decimal separators as required by read.csv? The second puzzling issue, is that one variable with no missing values is reported in R as having 3 missing values, whereas there are no missing values in the csv file. The only errors in reading the data I received were: Not when reading but when attaching data frame. Names in your data frame are same as names of some functions in mentioned packages, which is not an error, R just tell you that this had happened and you shall be avare of it. HTH Petr The following object(s) are masked from package:stats : time The following object(s) are masked from package:graphics : screen The following object(s) are masked from package:datasets : sleep The following object(s) are masked from package:base : pipe I am happy to send the csv file if required. Any advice that can offered is appreciated, Bob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Petr Pikal [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Missing values detected when there are no missing values
I am hoping for some advice on the following matters. I have a csv data file with 153 variables x 92 rows. To determine what the variables looked like I ran the summary command. One variable had a large number of missing values 54/92. For some reason, all subsequent 74 variables are reported as having 92 NA values, irrespective of whether the original csv variable was complete or not. Below are the commands I ran: study1dat - read.csv(c:\\study1r.csv,header=T) attach(study1dat) names(study1dat) summary(study1dat) The second puzzling issue, is that one variable with no missing values is reported in R as having 3 missing values, whereas there are no missing values in the csv file. The only errors in reading the data I received were: The following object(s) are masked from package:stats : time The following object(s) are masked from package:graphics : screen The following object(s) are masked from package:datasets : sleep The following object(s) are masked from package:base : pipe I am happy to send the csv file if required. Any advice that can offered is appreciated, Bob __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values in step procedure
At 11:11 7/10/2005, you wrote: Hi, I have the problem that for the step procedure stops due to missing values. There are no options in Step or stepAIC to handle missing values. Is there any way to run stepwise modelselection in R in an automated way in this case? Here is the last step before it stops. Hope someone knows. Best regards, Andreas Step: AIC= 1999.16 EF ~ SF120_KS + SF120_PS + HADA0 + SOZU0 + LVEDD + logPROBNP + ALTER + SD0_01 + ASE_UK + DS140POS + RSQSICH0 + SD0_01:ASE_UK + SD0_01:DS140POS + SD0_01:RSQSICH0 + ASE_UK:DS140POS + ASE_UK:RSQSICH0 + DS140POS:RSQSICH0 + SD0_01:ASE_UK:RSQSICH0 + SD0_01:DS140POS:RSQSICH0 + ASE_UK:DS140POS:RSQSICH0 Df Sum of Sq RSS AIC - SOZU0 1 3.0 25356.0 1997.2 - HADA0 1 7.6 25360.6 1997.3 - ALTER 1 13.0 25365.9 1997.4 - SF120_PS 1 14.7 25367.6 1997.5 - ASE_UK:DS140POS:RSQSICH0 1 20.1 25373.1 1997.6 - SD0_01:DS140POS:RSQSICH0 1 44.8 25397.7 1998.0 - SD0_01:ASE_UK:RSQSICH01 54.4 25407.4 1998.2 none 25352.9 1999.2 - LVEDD 1 382.2 25735.1 2004.6 - SF120_KS 1 476.4 25829.3 2006.4 - logPROBNP 1 891.9 26244.9 2014.4 Error in step(mod2, direction = back) : number of rows in use has changed: remove missing values? Andreas Try data-na.omit(original database) before you run step() or stepAIC() Bernardo Rangel Tura, MD, MSc National Institute of Cardiology Laranjeiras Rio de Janeiro Brazil -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values in step procedure
Hi, I have the problem that for the step procedure stops due to missing values. There are no options in Step or stepAIC to handle missing values. Is there any way to run stepwise modelselection in R in an automated way in this case? Here is the last step before it stops. Hope someone knows. Best regards, Andreas Step: AIC= 1999.16 EF ~ SF120_KS + SF120_PS + HADA0 + SOZU0 + LVEDD + logPROBNP + ALTER + SD0_01 + ASE_UK + DS140POS + RSQSICH0 + SD0_01:ASE_UK + SD0_01:DS140POS + SD0_01:RSQSICH0 + ASE_UK:DS140POS + ASE_UK:RSQSICH0 + DS140POS:RSQSICH0 + SD0_01:ASE_UK:RSQSICH0 + SD0_01:DS140POS:RSQSICH0 + ASE_UK:DS140POS:RSQSICH0 Df Sum of Sq RSS AIC - SOZU0 1 3.0 25356.0 1997.2 - HADA0 1 7.6 25360.6 1997.3 - ALTER 1 13.0 25365.9 1997.4 - SF120_PS 1 14.7 25367.6 1997.5 - ASE_UK:DS140POS:RSQSICH0 1 20.1 25373.1 1997.6 - SD0_01:DS140POS:RSQSICH0 1 44.8 25397.7 1998.0 - SD0_01:ASE_UK:RSQSICH01 54.4 25407.4 1998.2 none 25352.9 1999.2 - LVEDD 1 382.2 25735.1 2004.6 - SF120_KS 1 476.4 25829.3 2006.4 - logPROBNP 1 891.9 26244.9 2014.4 Error in step(mod2, direction = back) : number of rows in use has changed: remove missing values? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values in step procedure
On Fri, 7 Oct 2005, Andreas Cordes wrote: I have the problem that for the step procedure stops due to missing values. There are no options in Step or stepAIC to handle missing values. Is there any way to run stepwise modelselection in R in an automated way in this case? Try the hint it gives you, or see the help page (which covers this in a warning with an explanation). [...] Error in step(mod2, direction = back) : number of rows in use has changed: remove missing values? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Missing values in argument of .Fortran.
I wish to pass a vector ``y'', some of whose entries are NAs to a fortran subroutine which I am dynamically loading and calling by means of .Fortran(). The subroutine runs through the vector entry by entry; obviously I want to have it do one thing if y[i] is present and a different thing if it is missing. The way I am thinking of proceeding is along the xlines of: ymiss - is.na(y) rslt - .Fortran( foo, NAOK=TRUE, as.double(y), as.logical(ymiss), etc, etc ) and inside ``foo'' have a logical branch based on the value of xmiss(i). Questions: (1) Is there a sexier way to proceed? E.g. is it possible within (g77) fortran to detect the fact that y(i) is/was an NA (or not) and make the nature of y(i) the basis of an if-statement? (2) Are there any lurking pitfalls in the use of the NAOK=TRUE argument? (3) Is there an entirely different and better way to proceed? TIA. cheers, Rolf Turner [EMAIL PROTECTED] P. S. I'm running R 2.0.1 under (Red Hat) Linux. (Sigh. Yes I must get around to upgrading real soon now.) R. T. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Missing values in argument of .Fortran.
On 6/6/2005 9:52 AM, Rolf Turner wrote: I wish to pass a vector ``y'', some of whose entries are NAs to a fortran subroutine which I am dynamically loading and calling by means of .Fortran(). The subroutine runs through the vector entry by entry; obviously I want to have it do one thing if y[i] is present and a different thing if it is missing. The way I am thinking of proceeding is along the xlines of: ymiss - is.na(y) rslt - .Fortran( foo, NAOK=TRUE, as.double(y), as.logical(ymiss), etc, etc ) and inside ``foo'' have a logical branch based on the value of xmiss(i). Questions: (1) Is there a sexier way to proceed? E.g. is it possible within (g77) fortran to detect the fact that y(i) is/was an NA (or not) and make the nature of y(i) the basis of an if-statement? In C you can use the macros ISNA(x) True for Rs NA only ISNAN(x) True for Rs NA and IEEE NaN R_FINITE(x) False for Inf, -Inf, NA, NaN where the R function is.na() is closest to ISNAN(), I think. There's no supplied way to do these things in Fortran, but presumably you could call a C function which did one of these tests. (2) Are there any lurking pitfalls in the use of the NAOK=TRUE argument? I think the way you did it looks perfectly safe. Following my advice above will be a little trickier, because some other user of your code might use a different Fortran compiler, and it might handle C functions differently. (3) Is there an entirely different and better way to proceed? I'd do it your way if I was using Fortran. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian I try just copying the package in my library /usr/lib/R/library . but when i do library() ... mice ** No title available (pre-2.0.0 install?) ** ... and when i do library(mice) Error in library(mice) : 'mice' is not a valid package --installed 2.0.0? The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. Regards, Giordano __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values
Hello, On my experience, mice works fine with R 1.9 but not necessarily for newer versions... Bruno Bruno Falissard INSERM U669, PSIGIAM Paris Sud Innovation Group in Adolescent Mental Health Maison de Solenn 97 Boulevard de Port Royal 75679 Paris cedex 14, France tel : (+33) 6 81 82 70 76 fax : (+33) 1 45 59 34 18 web site : http://perso.wanadoo.fr/bruno.falissard/ -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De la part de Giordano Sanchez Envoyé : mardi 26 avril 2005 11:58 À : r-help@stat.math.ethz.ch Objet : Re: [R] missing values Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian I try just copying the package in my library /usr/lib/R/library . but when i do library() ... mice ** No title available (pre-2.0.0 install?) ** ... and when i do library(mice) Error in library(mice) : 'mice' is not a valid package --installed 2.0.0? The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. Regards, Giordano __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
On 04/26/05 09:58, Giordano Sanchez wrote: Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian The package called norm also has functions for missing data. When I tried it, the values it gave were not sensible for my problem, but I may have done something wrong. (This was a simple problem that did not involve multiple imputation.) The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? What different methods? Compare how? Are you assuming that we remember your last post? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. This sounds like exactly what aregImput() is good for, or transcan(), depending on whether you need to make inferences (and hence do multiple imputation). Jon -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Jonathan Baron wrote: On 04/26/05 09:58, Giordano Sanchez wrote: Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian The package called norm also has functions for missing data. When I tried it, the values it gave were not sensible for my problem, but I may have done something wrong. (This was a simple problem that did not involve multiple imputation.) The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? What different methods? Compare how? Are you assuming that we remember your last post? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. This sounds like exactly what aregImput() is good for, or transcan(), depending on whether you need to make inferences (and hence do multiple imputation). Jon For those interested I have preprints of a paper comparing MICE, aregImpute, and transcan on the basis of simulations. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
On 26-Apr-05 Jonathan Baron wrote: On 04/26/05 09:58, Giordano Sanchez wrote: Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian The package called norm also has functions for missing data. When I tried it, the values it gave were not sensible for my problem, but I may have done something wrong. (This was a simple problem that did not involve multiple imputation.) Hi Jonathan, Would you be kind enough to give sufficient detail to reproduce such a case? I've used 'norm' (and 'cat' and 'mix') quite extensively, without encountering non-sensible results (at any rate in situations where the packages were not being abused, which one can do in certain circumstances -- imputing missing values can depend quite strongly on supplying realistic constraints, and on not expecting too much when the proportion of missing data is substantial: this methodology does not have magical powers!). best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 26-Apr-05 Time: 12:47:42 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Dear Giordano, Library Hmisc, by Frank Harrell, contains several functions for imputation which I have found extremely useful. Best, R. On Tuesday 26 April 2005 11:58, Giordano Sanchez wrote: Hello, Thanks for the instructive responses. But two questions arise. Firstable I can't manage to load the library mice. I'm using R 2.0.1 on my Debian I try just copying the package in my library /usr/lib/R/library . but when i do library() ... mice ** No title available (pre-2.0.0 install?) ** ... and when i do library(mice) Error in library(mice) : 'mice' is not a valid package --installed 2.0.0? The second question is more statistical: aregImpute() seems to give good results but i would like to compare the different methods not just graphically. It'is possible? I also have other meteorological stations that have correleted data with the data station I'm using? Can I use those data to improve my imputation method. Regards, Giordano __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Ramón Díaz-Uriarte Bioinformatics Unit Centro Nacional de Investigaciones Oncológicas (CNIO) (Spanish National Cancer Center) Melchor Fernández Almagro, 3 28029 Madrid (Spain) Fax: +-34-91-224-6972 Phone: +-34-91-224-6900 http://ligarto.org/rdiaz PGP KeyID: 0xE89B3462 (http://ligarto.org/rdiaz/0xE89B3462.asc) **NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros adjuntos, pueden contener información protegida para el uso exclusivo de su destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de transmisión por parte de otra persona que no sea el destinatario. Si usted recibe por error este correo, se ruega comunicarlo al remitente y borrar el mensaje recibido. **CONFIDENTIALITY NOTICE** This email communication and any attachments may contain confidential and privileged information for the sole use of the designated recipient named above. Distribution, reproduction or any other use of this transmission by any party other than the intended recipient is prohibited. If you are not the intended recipient please contact the sender and delete all copies. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
On 04/26/05 12:54, Ted Harding wrote: Would you be kind enough to give sufficient detail to reproduce such a case? I've used 'norm' (and 'cat' and 'mix') quite extensively, without encountering non-sensible results (at any rate in situations where the packages were not being abused, which one can do in certain circumstances -- imputing missing values can depend quite strongly on supplying realistic constraints, and on not expecting too much when the proportion of missing data is substantial: this methodology does not have magical powers!). OK. Here you go. First the data without any names: 41,43,41,43,44 43,40,40,42,41 43,44,NA,43,44 42,43,NA,44,44 41,44,42,42,42 43,43,41,42,42 47,48,46,47,46 39,35,35,39,38 40,39,36,40,38 40,40,40,40,40 48,46,46,48,46 45,45,42,44,45 41,40,40,41,41 40,39,37,40,38 41,42,40,41,41 41,42,41,43,43 46,46,45,46,46 40,40,41,40,41 39,41,40,41,41 40,43,38,40,39 37,36,37,36,39 45,46,45,46,46 43,44,42,43,44 42,42,48,42,43 45,46,45,46,45 37,36,36,36,38 37,34,39,37,39 NA,43,41,44,43 45,44,45,44,45 38,38,37,39,38 45,44,44,44,45 NA,42,43,43,43 45,45,44,44,45 40,35,37,40,38 43,43,43,43,43 39,34,37,36,39 38,38,38,39,39 43,41,40,42,43 46,43,42,45,45 46,45,41,44,44 40,40,38,39,40 39,37,39,38,39 Now the commands I used in norm, and the result: m1 - as.matrix(read.csv(test.data)) s1 - prelim.norm(m1) thetahat - em.norm(s1) rngseed(1234564) ximp - imp.norm(s1,thetahat,m1) ximp 1 41.0 43 41.0 43 44 2 43.0 40 40.0 42 41 3 43.0 44 43.72409 43 44 4 42.0 43 43.36864 44 44 5 41.0 44 42.0 42 42 6 43.0 43 41.0 42 42 7 47.0 48 46.0 47 46 8 39.0 35 35.0 39 38 9 40.0 39 36.0 40 38 10 40.0 40 40.0 40 40 11 48.0 46 46.0 48 46 12 45.0 45 42.0 44 45 13 41.0 40 40.0 41 41 14 40.0 39 37.0 40 38 15 41.0 42 40.0 41 41 16 41.0 42 41.0 43 43 17 46.0 46 45.0 46 46 18 40.0 40 41.0 40 41 19 39.0 41 40.0 41 41 20 40.0 43 38.0 40 39 21 37.0 36 37.0 36 39 22 45.0 46 45.0 46 46 23 43.0 44 42.0 43 44 24 42.0 42 48.0 42 43 25 45.0 46 45.0 46 45 26 37.0 36 36.0 36 38 27 37.0 34 39.0 37 39 28 44.13337 43 41.0 44 43 29 45.0 44 45.0 44 45 30 38.0 38 37.0 39 38 31 45.0 44 44.0 44 45 32 41.25152 42 43.0 43 43 33 45.0 45 44.0 44 45 34 40.0 35 37.0 40 38 35 43.0 43 43.0 43 43 36 39.0 34 37.0 36 39 37 38.0 38 38.0 39 39 38 43.0 41 40.0 42 43 39 46.0 43 42.0 45 45 40 46.0 45 41.0 44 44 41 40.0 40 38.0 39 40 42 39.0 37 39.0 38 39 What seemed odd to me, and maybe they aren't, were the imputed values in rows 3 and 4. They seemed high, knowing the rater in question and the students. Here is the output of transcan, for the same cases, which looks more in line with what I expected: 1 41.0 43 41.0 43 44 2 43.0 40 40.0 42 41 3 43.0 44 43.09469 43 44 4 42.0 43 43.39897 44 44 5 41.0 44 42.0 42 42 6 43.0 43 41.0 42 42 7 47.0 48 46.0 47 46 8 39.0 35 35.0 39 38 9 40.0 39 36.0 40 38 10 40.0 40 40.0 40 40 11 48.0 46 46.0 48 46 12 45.0 45 42.0 44 45 13 41.0 40 40.0 41 41 14 40.0 39 37.0 40 38 15 41.0 42 40.0 41 41 16 41.0 42 41.0 43 43 17 46.0 46 45.0 46 46 18 40.0 40 41.0 40 41 19 39.0 41 40.0 41 41 20 40.0 43 38.0 40 39 21 37.0 36 37.0 36 39 22 45.0 46 45.0 46 46 23 43.0 44 42.0 43 44 24 42.0 42 48.0 42 43 25 45.0 46 45.0 46 45 26 37.0 36 36.0 36 38 27 37.0 34 39.0 37 39 28 43.80165 43 41.0 44 43 29 45.0 44 45.0 44 45 30 38.0 38 37.0 39 38 31 45.0 44 44.0 44 45 32 42.91116 42 43.0 43 43 33 45.0 45 44.0 44 45 34 40.0 35 37.0 40 38 35 43.0 43 43.0 43 43 36 39.0 34 37.0 36 39 37 38.0 38 38.0 39 39 38 43.0 41 40.0 42 43 39 46.0 43 42.0 45 45 40 46.0 45 41.0 44 44 41 40.0 40 38.0 39 40 42 39.0 37 39.0 38 39 The commands here were s.imp - transcan(m1,asis=*,data=m1,imputed=T,long=T,pl=F) s.na - is.na(m1) # which ratings are imputed m1[which(s.na)] - unlist(s.imp$imputed) (I wish I could find a more elegant way to replace the NAs.) Jon - Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values
Hello, I have climatic data of various years with many missing values. I would like to know what tools in R are most suited to estimate this missing values. (New in R and quite new on statistics). Thanks, G __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Turns out that this is not a simple question. Depending on what you want to do, some statistical methods will just deal with missing data and use what is available, in different ways, e.g., cor(). For other purposes, you might want to impute (fill in) the missing values, and then there are many ways to do this, depending on what else you have (correlated variables?) and what assumptions you are willing to make. Two methods (among many) that I have found useful are in aregImpute() and transcan(), both in the Hmisc package. To learn more, see my R search page: http://finzi.psych.upenn.edu/ and I also have an example of aregImpute() in http://www.psych.upenn.edu/~baron/rpsych/rpsych.html but see the help files first. I found the following article very helpful when I was a beginner with respect to this topic (which is still close to true): Schafer, J. L., Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. Jon On 04/24/05 10:15, Giordano Sanchez wrote: Hello, I have climatic data of various years with many missing values. I would like to know what tools in R are most suited to estimate this missing values. (New in R and quite new on statistics). -- Jonathan Baron, Professor of Psychology, University of Pennsylvania Home page: http://www.sas.upenn.edu/~baron __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values
Hello, The mice package http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm is also potentially interesting. It works with R 1.9 but not always with newer versions. Best regards, Bruno Bruno Falissard Département de santé publique Hôpital Paul Brousse 14 Avenue Paul Vaillant Couturier 94804 Villejuif cedex, France tel : (+33) 6 81 82 70 76 fax : (+33) 1 45 59 34 18 web site : http://perso.wanadoo.fr/bruno.falissard/ -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] De la part de Giordano Sanchez Envoyé : dimanche 24 avril 2005 12:15 À : r-help@stat.math.ethz.ch Objet : [R] missing values Hello, I have climatic data of various years with many missing values. I would like to know what tools in R are most suited to estimate this missing values. (New in R and quite new on statistics). Thanks, G __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Missing Values
I have just started using R for my PhD. I am importing my data from Excel via notepad into Word. Unfortunately, my data has many missing values. I have put '.' and this allowed me to import the data into R. However, I now want to interpolate these missing values. Please can someone give me some pointers as to the method/code I could use? Thankyou, Lillian. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values in logistic regression
Dear R help list, I am trying to do a logistic regression where I have a categorical response variable Y and two numerical predictors X1 and X2. There are quite a lot of missing values for predictor X2. eg., Y X1 X2 red 0.6 0.2* red 0.5 0.2* red 0.5 NA red 0.5 NA green 0.2 0.1* green 0.1 NA green 0.1 NA green 0.05 0.05 * I am wondering can I combine X1 and X2 in a logistic regression to predict Y, using all the data for X1, even though there are NAs in the X2 data? Or do I have to take only the cases for which there is data for both X1 and X2? (marked with *s above) I will be very grateful for any help, sincerely, Avril Coghlan University College Dublin, Ireland __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values in logistic regression
Avril Coghlan [EMAIL PROTECTED] writes: Dear R help list, I am trying to do a logistic regression where I have a categorical response variable Y and two numerical predictors X1 and X2. There are quite a lot of missing values for predictor X2. eg., Y X1 X2 red 0.6 0.2* red 0.5 0.2* red 0.5 NA red 0.5 NA green 0.2 0.1* green 0.1 NA green 0.1 NA green 0.05 0.05 * I am wondering can I combine X1 and X2 in a logistic regression to predict Y, using all the data for X1, even though there are NAs in the X2 data? Or do I have to take only the cases for which there is data for both X1 and X2? (marked with *s above) I will be very grateful for any help, The built-in function (glm) for logistic regression will give you a complete-case analysis. For more advanced handling of missing values, you need to look into imputation methods. Two CRAN packages (at least) are dealing with this, namely mix and mitools. The former is support software for a book, which you'll probably want to consult. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values in logistic regression
On 29 Oct 2004, Avril Coghlan wrote: Dear R help list, I am trying to do a logistic regression where I have a categorical response variable Y and two numerical predictors X1 and X2. There are quite a lot of missing values for predictor X2. eg., Y X1 X2 red 0.6 0.2* red 0.5 0.2* red 0.5 NA red 0.5 NA green 0.2 0.1* green 0.1 NA green 0.1 NA green 0.05 0.05 * I am wondering can I combine X1 and X2 in a logistic regression to predict Y, using all the data for X1, even though there are NAs in the X2 data? Or do I have to take only the cases for which there is data for both X1 and X2? (marked with *s above) You need to either 1) Train separate models for Y | X1 and Y | X1, X2 and use the appropriate one. 2) Produce an imputation model for X2 | X1, and use multiple imputation. Given that the latter look like [0, 1] scores, mix (as suggested by PD) is not likely to be appropriate, but e.g. a 2D kde fit may well be. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values in logistic regression
On 29-Oct-04 Avril Coghlan wrote: Dear R help list, I am trying to do a logistic regression where I have a categorical response variable Y and two numerical predictors X1 and X2. There are quite a lot of missing values for predictor X2. eg., Y X1 X2 red 0.6 0.2* red 0.5 0.2* red 0.5 NA red 0.5 NA green 0.2 0.1* green 0.1 NA green 0.1 NA green 0.05 0.05 * I am wondering can I combine X1 and X2 in a logistic regression to predict Y, using all the data for X1, even though there are NAs in the X2 data? Or do I have to take only the cases for which there is data for both X1 and X2? (marked with *s above) I don't know of any R routine directly aimed at logistic regression with missing values as you describe. However, if you are prepared to assume (or try to arrange by a judiciously chosen transformation) that the distribution of (X1,X2) is bivariate normal, with mean dependent on the value of Y but with the same variance-covariance matrix throughout, then you should be able to make progress along the following lines. This ties in with Peter Dalgaard's suggestion of mix. I shall assume for this explanation that your Y categories take only two values A and B (as red , green), though the method can be directly extended to several categories in Y. The underlying theoretical point is that a linear logistic regression is equivalent to a Bayesian discrimination between two normally-distributed clusters. Let the vector of means for (X1,X2) be mA for group A, and mB for group B; and let the covariance matrix be V. Let x denote (X1,X2). Then P(A|x) = [f(x|A)*p(A)]/[f(x|A)*p(A) + f(x|B)*p(B)] where p(A) and p(B) are the prior probabilities of a group A or a group B item. Now substitute f(x|A) = C*exp(-0.5*(x-mA)'%*%W%*%(x-mA)) and similar for f(x|B); C is the constant 1/sqrt(2*pi*det(V))^k where k is the dimension of x, and W is the inverse of V. Then, with a bit of algebra, P(A|x) = 1/(1 + exp(a + b%*%x)) (a logistic regression) where a is the scalar log(p(B)/p(A)) + 0.5*(mA'%*%W%*%mA - mB'%*%W%*%mB) and b is the vector (mB - mA)'%*%W Now you can come back to the mix package. This is for multiple imputation of missing values in a dataset consisting of variables of two kinds: categorical and continuous. The joint probability model for all the variables is expressed as a product of the multinomial distribution for the categorical variables, with a multivariate normal distribution for the continuous variables where it is assumed that the covariance matrix is the same for every combination of the values of the categorical variables, while the multivariate means may differ at different levels of the categoricals. Hence the underlying model for the mix package is exactly what is needed for the above. The primary output from imputation runs with mix is a set of completed datasets (with missing values filled in). You can then run a logistic regression on each completed dataset, obtaining for each dataset the estimates of the regression parameters and their standard errors. These can then be combined using the function mi.inference in the mix library. You can also, however, extract the parameter values (multinomial probabilities and multivariate means and covariance matrix) used in a particular imputation using the function getparam.mix in the mix library. This function needs parameters s (evaluated by the preliminary processor prelim.mix), and theta, evaluated for each imputation by a data augmentation function such as da.mix. Then you can substitute these in the above formulae for a and b to get a and b directly, without needing to do an explicit logistic regression on the completed dataset. Hoping this helps! Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 29-Oct-04 Time: 13:45:46 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values in logistic regression
(Ted Harding) wrote: On 29-Oct-04 Avril Coghlan wrote: Dear R help list, I am trying to do a logistic regression where I have a categorical response variable Y and two numerical predictors X1 and X2. There are quite a lot of missing values for predictor X2. eg., Y X1 X2 red 0.6 0.2* red 0.5 0.2* red 0.5 NA red 0.5 NA green 0.2 0.1* green 0.1 NA green 0.1 NA green 0.05 0.05 * I am wondering can I combine X1 and X2 in a logistic regression to predict Y, using all the data for X1, even though there are NAs in the X2 data? Or do I have to take only the cases for which there is data for both X1 and X2? (marked with *s above) I don't know of any R routine directly aimed at logistic regression with missing values as you describe. The aregImpute function in the Hmisc package can handle this, using predictive mean matching with weighted multinomial sampling of donor observations' binary covariate values. . . .. Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values imputation
What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? Thanks Anne Anne Piotet Tel: +41 79 359 83 32 (mobile) Email: [EMAIL PROTECTED] --- M-TD Modelling and Technology Development PSE-C CH-1015 Lausanne Switzerland Tel: +41 21 693 83 98 Fax: +41 21 646 41 33 -- [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values imputation
From: Rolf Turner Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) Yes, but EM for missing value imputation is a bit narrower, I guess. At least the `norm' package on CRAN has em.norm() for multivariate gaussian... Andy cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
On 12-May-04 Rolf Turner wrote: Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) Do we understand the same thing by EM Algorithm? The one I'm thinking of -- formulated under that name by Dempster, Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete data via the EM algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm in exactly the same sense as any iterative search for the maximum of a function. Essentially, in the context of data modelled by an underlying exponential family distribution where there is incomplete information about the values which have this distribution, it proceeds by Start: Choose starting estimates for the parameters of the distribution E: Using the current parameter values, compute the expected vaues of the sufficient statistics conditional on the observed information M: Solve the maximum-likelihood equations (which are functions of the sufficient statistics) using the expected values computed in (E) If sufficently converged, stop. Otherwise, make the current parameter values equal to the values estimated in (M) and return to (E). Algorithm, this, or not And where does extremely advanced and sophisticated Artificial Intelligence software come into it? You can, in some cases, perform the above EM algorithm by hand. Which EM Algorithm are you thinking of? Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-May-04 Time: 17:57:53 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
That's not an algorithm. It is a recipe for deriving an algorithm. algorithm - A detailed sequence of actions to perform to accomplish some task. Named after an Iranian mathematician, Al-Khawarizmi. Technically, an algorithm must reach a result after a finite number of steps, thus ruling out brute force search methods for certain problems, though some might claim that brute force search was also a valid (generic) algorithm. The term is also used loosely for any sequence of actions (which may or may not terminate). Paul E. Black's Dictionary of Algorithms, Data Structures, and Problems. On Wed, 12 May 2004 [EMAIL PROTECTED] wrote: On 12-May-04 Rolf Turner wrote: Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) Do we understand the same thing by EM Algorithm? The one I'm thinking of -- formulated under that name by Dempster, Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete data via the EM algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm in exactly the same sense as any iterative search for the maximum of a function. Essentially, in the context of data modelled by an underlying exponential family distribution where there is incomplete information about the values which have this distribution, it proceeds by Start: Choose starting estimates for the parameters of the distribution E: Using the current parameter values, compute the expected vaues of the sufficient statistics conditional on the observed information M: Solve the maximum-likelihood equations (which are functions of the sufficient statistics) using the expected values computed in (E) If sufficently converged, stop. Otherwise, make the current parameter values equal to the values estimated in (M) and return to (E). Algorithm, this, or not And where does extremely advanced and sophisticated Artificial Intelligence software come into it? You can, in some cases, perform the above EM algorithm by hand. Which EM Algorithm are you thinking of? Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-May-04 Time: 17:57:53 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
(Ted Harding) [EMAIL PROTECTED] writes: On 12-May-04 Rolf Turner wrote: Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) Do we understand the same thing by EM Algorithm? The one I'm thinking of -- formulated under that name by Dempster, Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete data via the EM algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm in exactly the same sense as any iterative search for the maximum of a function. Essentially, in the context of data modelled by an underlying exponential family distribution where there is incomplete information about the values which have this distribution, it proceeds by Start: Choose starting estimates for the parameters of the distribution E: Using the current parameter values, compute the expected vaues of the sufficient statistics conditional on the observed information M: Solve the maximum-likelihood equations (which are functions of the sufficient statistics) using the expected values computed in (E) If sufficently converged, stop. Otherwise, make the current parameter values equal to the values estimated in (M) and return to (E). Algorithm, this, or not And where does extremely advanced and sophisticated Artificial Intelligence software come into it? You can, in some cases, perform the above EM algorithm by hand. Which EM Algorithm are you thinking of? Thanks, Ted :-) -- to extend it a bit, one can imagine the use of approximate solutions to the 2 steps (simulation methods to get expected values, similar range of approaches for the maximization) and get a general (but possibly not robust) computational solution for the parametric problem. Just plug in a formula for the likelihood and the sufficient statistics... Of course, thousands of papers have been written on these variations (likelihood, specific implementations of the E and M steps). best, -tony -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
Picky, picky. Details are in the eyes of the beholder. Prof Brian Ripley [EMAIL PROTECTED] writes: That's not an algorithm. It is a recipe for deriving an algorithm. algorithm - A detailed sequence of actions to perform to accomplish some task. Named after an Iranian mathematician, Al-Khawarizmi. Technically, an algorithm must reach a result after a finite number of steps, thus ruling out brute force search methods for certain problems, though some might claim that brute force search was also a valid (generic) algorithm. The term is also used loosely for any sequence of actions (which may or may not terminate). Paul E. Black's Dictionary of Algorithms, Data Structures, and Problems. On Wed, 12 May 2004 [EMAIL PROTECTED] wrote: On 12-May-04 Rolf Turner wrote: Anne Piotet wrote: What R functionnalities are there to do missing values imputation (substantial proportion of missing data)? I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in which package? The so-called ``EM algorithm'' is ***NOT*** an algorithm. It is a methodology or a unifying concept. It would be impossible to ``implement'' it. (Except possibly by means of some extremely advanced and sophisticated Artificial Intelligence software.) Do we understand the same thing by EM Algorithm? The one I'm thinking of -- formulated under that name by Dempster, Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete data via the EM algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm in exactly the same sense as any iterative search for the maximum of a function. Essentially, in the context of data modelled by an underlying exponential family distribution where there is incomplete information about the values which have this distribution, it proceeds by Start: Choose starting estimates for the parameters of the distribution E: Using the current parameter values, compute the expected vaues of the sufficient statistics conditional on the observed information M: Solve the maximum-likelihood equations (which are functions of the sufficient statistics) using the expected values computed in (E) If sufficently converged, stop. Otherwise, make the current parameter values equal to the values estimated in (M) and return to (E). Algorithm, this, or not And where does extremely advanced and sophisticated Artificial Intelligence software come into it? You can, in some cases, perform the above EM algorithm by hand. Which EM Algorithm are you thinking of? Best wishes, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-May-04 Time: 17:57:53 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
A.J. Rossini wrote: Picky, picky. Details are in the eyes of the beholder. algorithm - A detailed sequence of actions to perform to accomplish some task. Named after an Iranian mathematician, Al-Khawarizmi. Personally I like the first definition of 'Algorism, Algorithm' in the 1913 Websters Revised Unabridged: 1. The art of calculating by nine figures and zero. Barry __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
On 12-May-04 A.J. Rossini wrote: (Ted Harding) [EMAIL PROTECTED] writes: [...] Algorithm, this, or not [...] Thanks, Ted :-) -- to extend it a bit, one can imagine the use of approximate solutions to the 2 steps (simulation methods to get expected values, similar range of approaches for the maximization) and get a general (but possibly not robust) computational solution for the parametric problem. Just plug in a formula for the likelihood and the sufficient statistics... And thank you, Tony! I confess to having deliberately been a bit provocative, since I see an issue here on which I have a view (apparently shared by Tony). For example: Question: In your view, is the following exchange sort procedure an algorithm? Or merely a recipe for deriving an algorithm? A: Starting at the intended low end of the line compare each i-th item X[i] with the (i+1)-th item X[i+1] for i=1,2,... B: If you find an i such that X[i] X[i+1], exchange the positions of X[i] and X[i+1] C: If you have reached the end of the line, stop. Otherwise, go to (A). Now, I think this is an algorithm. However, before reading on, please decide what you think yourself about this question. Well, you could use this to sort a line of people into order of increasing height, without recourse to a measuring scale. Just get X[i] and X[i+1] to stand up straight and look into each others eyes. If X[i] has to look down into the eyes of X[i+1}, then X[i] X[i+1]; otherwise not. The point is, illustrated naively by this example, that the above description of exchange sort doesn't explain anything about . So something has to be plugged in (in Tony's words) for , and hence the algorithm, to have a meaning or an implementation. There has to be a sort key with respect to which there is an implementation of in order to render the algorithm (my terminology ... ) concrete. So yes, if being picky, the above description of exchange sort could be called a recipe for deriving an algorithm. But then a different algorithm would result for (a) every different kind of thing which could be sorted; (b) every different kind of interpretation of (e.g. it would not then be the same algorithm if you measured people's heights with a scale). OK, now perhaps being picky in my turn ... However, the general point is that an algorithm, in my and no doubt Tony's notion of it, usually needs a plugin or two or several in order to be implemented for any particular case. So for the EM algorithm. It needs, specifically, a specification of the exponential-family distribution, a means for computing a conditional expected value with this distribution, and a solver for the complete-data maximum likelihood equations. Once these are provided, the implementation is complete. Just as a coded computer routine can call a subroutine or co-routine, so also one can envisage an algorithm calling a sub-algorithm. Final question: What, for instance, is the status of the R function integrate? plugin - function(x){x*(1-x)} integrate(plugin,0,1) uses (I quote): For a finite interval, globally adaptive interval subdivision is used in connection with extrapolation by the Epsilon algorithm. If plugin has not been specified, does the code for integrate represent an algorithm or not? Well, I rather think it does! Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-May-04 Time: 19:44:20 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
Thanks Brian. The EM algorithm requires an ``E'' step and an ``M'' step. Harding and Rossini appear to be seriously suggesting that an R function could be written which would (a) Perform the E step in arbitrary contexts, and (b) For that given expected value, work out a procedure to effect its maximization. Or maybe they're not serious. For the M step (b) general numerical optimization would theoretically do the trick. (But would be fraught with peril.) For the E step (a), forget it. The point is, the EM ``algorithm'' is NOT an algorithm which could be effected by an R function. This is in complete contrast with integrate() --- it's there; the code is written. Hand integrate() an integration problem, and it'll do it. One of the differences is that the input to an itegration problem is clearly defined and readily specifiable as an R function. The input to a general missing values problem is amorphous. Arguing about what constitutes an algorithm according to some abstract definition is mindless. If you define ``algorithm'' to suit yourself, then the EM algorithm is an algorithm; otherwise not. The original questioner wanted an R function to effect the EM algorithm. My point was that this is a silly request because such a function would be impossible to write. Call the EM algorithm an algorithm if it makes you happy. But remember that by doing so you'll mislead the naive inquirer who will expect there to be a real live implementation of that algorithm. In computer (R) code. Like integrate(). If you can write an R function to effect the EM ``algorithm'' --- in general, not just in a special case --- you'll win the Chambers Prize in computing and a few other things as well. cheers, Rolf Turner [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
Rolf Turner [EMAIL PROTECTED] writes: The EM algorithm requires an ``E'' step and an ``M'' step. Harding and Rossini appear to be seriously suggesting that an R function could be written which would (a) Perform the E step in arbitrary contexts, and (b) For that given expected value, work out a procedure to effect its maximization. Or maybe they're not serious. Serious for a range of reasonable specific problems and appropriate specification of the function (Remember that sufficient statistics aren't unique, and would have to be specified!). Think of it as a macro. Exercise left to the reader, see below. If you can write an R function to effect the EM ``algorithm'' --- in general, not just in a special case --- you'll win the Chambers Prize in computing and a few other things as well. I believe there is an eligibility issue with the award you mention (perhaps you are thinking of the ACM award?), but I suspect the results, as in most software publications, would be severe headaches and grief from having to listen to complaints, gripes, and groaning. Seldom are prizes, credit, and gratitude given, else Brian would be drowning in them. best, -tony -- [EMAIL PROTECTED]http://www.analytics.washington.edu/ Biomedical and Health Informatics University of Washington Biostatistics, SCHARP/HVTN Fred Hutchinson Cancer Research Center UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable FHCRC (M/W): 206-667-7025 FAX=206-667-4812 | use Email CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}} __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values imputation
On 12-May-04 Rolf Turner wrote: The EM algorithm requires an ``E'' step and an ``M'' step. Harding and Rossini appear to be seriously suggesting that an R function could be written which would (a) Perform the E step in arbitrary contexts, and (b) For that given expected value, work out a procedure to effect its maximization. Or maybe they're not serious. For the M step (b) general numerical optimization would theoretically do the trick. (But would be fraught with peril.) For the E step (a), forget it. The point is, the EM ``algorithm'' is NOT an algorithm which could be effected by an R function. [...] The original questioner wanted an R function to effect the EM algorithm. My point was that this is a silly request because such a function would be impossible to write. Well, I think there's been enough hair-splitting on the algorithm issue! To revert to the point about the original query from Anne Piotet. She said she would prefer to use maximum likelihood methods, and asked if the EM algorithm was available, in the context of imputing missing data. I don't think she was asking about whether R was blessed with a universal EM algorithm into which any incomplete-data problem could be plugged (and I agree that the generality of the problem, especially expressing the conditioning corresponding to arbitrary incompleteness, would make such a thing very elusive). What I believe she *was* asking was whether, using R, she could do imputation with maximum-likelihood methods using the EM algorithm. There are plenty of imputation methods which dodge likelihood altogether, and thereby lose efficiency, so the question has a lot of point, and the EM algorithm is of course the natural approach since no information is more manifestly incomplete than when there are holes in the data. Schafer's methods (and thanks, Chuck, for the pointer to pan) all implement the EM algorithm to obtain maximum likelihood estimates in the first instance. As far as replying to Anne was concerned, I think all that was needed was to give this information. To receive a response which asserted (in effect) that it was unimplementable must have come as a bit of a surpise, in the context! Anyway, 'nuff said, probably ... Best wishes to all, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 167 1972 Date: 12-May-04 Time: 22:05:08 -- XFMail -- __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values and survival analysis
On Sun, 11 Apr 2004 23:37:22 -0400 [EMAIL PROTECTED] wrote: Hi everyone, I'm analysing a survival analysis data set at the moment with missing values in the covariate and survival vectors (I have about 60 variables). I know there are some functions on the CRAN network to deal with missing values in general multivariate data. Does anybody know of any package that deals with missing data specifically in the context of survival analysis. Any help would be greatly appreciated. Thanks, john. Consider using the aregImpute function in the Hmisc package with right-censored survival times by predicting baseline covariates using the follow-up time, event indicator/censoring, and the product of the two, using multiple imputation. I am not comfortable with imputing follow-up time and event indicators from covariates though. If the follow-up time is completely missing you might consider discarding the observation. --- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values and survival analysis
Hi everyone, I'm analysing a survival analysis data set at the moment with missing values in the covariate and survival vectors (I have about 60 variables). I know there are some functions on the CRAN network to deal with missing values in general multivariate data. Does anybody know of any package that deals with missing data specifically in the context of survival analysis. Any help would be greatly appreciated. Thanks, john. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values for mda package
Thanks. I was able to use na.omit to remove NAs. But it seems to me this kills one of the advantages of the original algorithm for handling missing values. On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote: zhu wang wrote: Dear helpers, I am trying to use the mda package downloaded from the R website, but the data set has missing values so I got an error message. Should I manually handle these missing values? I was trying to read the documents to specify any option related to missing values, but I did not find it. Please forgive me if I ignore something obvious. If it is not documented (hence probably not available) and you don't know how to tell the functions to handle missing values, try to do it yourself. ?NA suggests: See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be tuned to deal with missing values. Uwe Ligges Thanks, Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values for mda package
Package mda covers many things, including bruto, mars, polyreg and mda itself. Which `the original algorithm' for which option did you have in mind? More concretely, what where you trying to do with the package? Given that the package is the original authors' own code, it seems unlikely that they `killed one of the advantages' of their methodology, so elucidation is sorely needed. On 9 Apr 2004, zhu wang wrote: Thanks. I was able to use na.omit to remove NAs. But it seems to me this kills one of the advantages of the original algorithm for handling missing values. On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote: zhu wang wrote: Dear helpers, I am trying to use the mda package downloaded from the R website, but the data set has missing values so I got an error message. Should I manually handle these missing values? I was trying to read the documents to specify any option related to missing values, but I did not find it. Please forgive me if I ignore something obvious. If it is not documented (hence probably not available) and you don't know how to tell the functions to handle missing values, try to do it yourself. ?NA suggests: See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be tuned to deal with missing values. Uwe Ligges Thanks, Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values for mda package
I basically wanted to use MARS to reproduce results using the dataset Marketing in the following book http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/ The authors actually provided S-Plus functions for mars, bruto ,etc. I used all default options of mars in R but there was an error due to NAs and I could not find any option to handle missing values. Zhu Wang -Original Message- From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] Sent: Fri 4/9/2004 12:53 PM To: Wang, Zhu Cc: Uwe Ligges; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject:Re: [R] missing values for mda package Package mda covers many things, including bruto, mars, polyreg and mda itself. Which `the original algorithm' for which option did you have in mind? More concretely, what where you trying to do with the package? Given that the package is the original authors' own code, it seems unlikely that they `killed one of the advantages' of their methodology, so elucidation is sorely needed. On 9 Apr 2004, zhu wang wrote: Thanks. I was able to use na.omit to remove NAs. But it seems to me this kills one of the advantages of the original algorithm for handling missing values. On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote: zhu wang wrote: Dear helpers, I am trying to use the mda package downloaded from the R website, but the data set has missing values so I got an error message. Should I manually handle these missing values? I was trying to read the documents to specify any option related to missing values, but I did not find it. Please forgive me if I ignore something obvious. If it is not documented (hence probably not available) and you don't know how to tell the functions to handle missing values, try to do it yourself. ?NA suggests: See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be tuned to deal with missing values. Uwe Ligges Thanks, Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] missing values for MARS (was mda package)
On Fri, 9 Apr 2004, Wang, Zhu wrote: I basically wanted to use MARS to reproduce results using the dataset Marketing in the following book http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/ The authors actually provided S-Plus functions for mars, bruto ,etc. I used all default options of mars in R but there was an error due to NAs and I could not find any option to handle missing values. Friedman originated MARS and has code for it. The code in mda by Hastie/Tibshirani is different, and the code on that website is a direct ancestor of the mda package for R. I see no option in the code for mars there to handle missing values, so you would do better to ask the authors how they did it (if you really believe they have such an option). And PLEASE read the posting guide and try to learn to ask precise questions with enough background information! -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values for mda package
Dear helpers, I am trying to use the mda package downloaded from the R website, but the data set has missing values so I got an error message. Should I manually handle these missing values? I was trying to read the documents to specify any option related to missing values, but I did not find it. Please forgive me if I ignore something obvious. Thanks, Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values for mda package
zhu wang wrote: Dear helpers, I am trying to use the mda package downloaded from the R website, but the data set has missing values so I got an error message. Should I manually handle these missing values? I was trying to read the documents to specify any option related to missing values, but I did not find it. Please forgive me if I ignore something obvious. If it is not documented (hence probably not available) and you don't know how to tell the functions to handle missing values, try to do it yourself. ?NA suggests: See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be tuned to deal with missing values. Uwe Ligges Thanks, Zhu Wang Statistical Science Department Southern Methodist University Dallas, TX 75275-0332 Phone: (214)768-2453 Fax: (214)768-4035 Email: [EMAIL PROTECTED] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values
How can I deal with missing values in the excel file? I used read.csv to imports data, how ever there are missing values in the csv file. When I use names(), it turns out a error message: names attribute must be the same length as the vector What can i do with the missing values? Thanks - [[alternative HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] missing values
Grace Conlon [EMAIL PROTECTED] writes: How can I deal with missing values in the excel file? I used read.csv to imports data, how ever there are missing values in the csv file. When I use names(), it turns out a error message: names attribute must be the same length as the vector What can i do with the missing values? What were you trying to do with names and what has it got to do with missing values?? How are the missing values coded in the csv file? If they are empty fields, read.csv (btw, isn't it easier to export as delimited file and use read.delim?) should handle them automatically, if not, try using the na.strings argument. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] missing values and gam (was: how to handle missing values)
Thank you for all the responses on generalized additive models(gam) and missing values. I am now able set up a model using gam and have a certain understanding of how R deals with missing values. The problem is, however, I am still not able to a gam model that is from a dataset that contains missing values. The function C-gam(depvar~var1+var2+s(var3), data=dataset) Returns the errors Error in na.omit.default() : Argument object is missing, with no default Again, can anyone help a newbie. Tor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] missing values and gam (was: how to handle missing values)
mgcv 0.9 will handle missing values properly (provided you are happy that dropping them is 'proper'). There is a pre-release version at: www.stats.gla.ac.uk/~simon/simon/mgcv.html (it is a pre-release version, so there will be bugs, reports of which gratefully received!) simon Thank you for all the responses on generalized additive models(gam) and missing values. I am now able set up a model using gam and have a certain understanding of how R deals with missing values. The problem is, however, I am still not able to a gam model that is from a dataset that contains missing values. The function C-gam(depvar~var1+var2+s(var3), data=dataset) Returns the errors Error in na.omit.default() : Argument object is missing, with no default Again, can anyone help a newbie. Tor __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] missing values
Dear list members, I'm relatively new to this list; can anyone tell me how to declare missing values once a dataset has been attached? For example here: VAR1 1 1 2 2 3 1 4 3 5 2 6 1 7 3 8 3 9 1 102 11 98 122 13 97 14 99 15 NA 163 I would like values 97, 98 and 99 to be treated as missing values. I read everything about is.na but I just can't figure out how to do it. Many thanks, Adrian Adrian Dusa ([EMAIL PROTECTED]) Romanian Social Data Archive (www.roda.ro) 1, Schitu Magureanu Bd. 76625 Bucharest sector 5 Romania Tel./Fax: +40 (21) 312.66.18 [[alternate HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] missing values
On Wed, 28 May 2003, Adrian Dusa wrote: I would like values 97, 98 and 99 to be treated as missing values. VAR1[VAR1 %in% c(97,98,99)]-NA -thomas __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help