subject:"\[R\] missing values"

[R] missing values

2007-05-30 Thread Allan Clark

hello all


i would like to perform multiple imputation using the norm library.


but i seem to get the following error when i use the da.norm function.


Error in as.double.default(list(V1 = c(0.058177827, 0.123076923, 0.138713745,  
: 
(list) object cannot be coerced to 'double'



can anyone help?  

thanking you in advance

Allan Clark

Lecturer in Statistical Sciences Department
University of Cape Town
7701 Rondebosch
South Africa
TEL (Office): +27-21-650-3228
FAX: +27-21-650-4773
http://web.uct.ac.za/depts/stats/aclark.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] missing values

2007-05-30 Thread Prof Brian Ripley

When it says 'matrix' it means it, not 'data frame'.

On Wed, 30 May 2007, Allan Clark wrote:

 hello all


 i would like to perform multiple imputation using the norm library.


 but i seem to get the following error when i use the da.norm function.


 Error in as.double.default(list(V1 = c(0.058177827, 0.123076923, 0.138713745, 
  :
(list) object cannot be coerced to 'double'



 can anyone help?

 thanking you in advance

 Allan Clark
 
 Lecturer in Statistical Sciences Department
 University of Cape Town
 7701 Rondebosch
 South Africa
 TEL (Office): +27-21-650-3228
 FAX: +27-21-650-4773
 http://web.uct.ac.za/depts/stats/aclark.htm

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] missing values

2007-05-02 Thread elyakhlifi mustapha

hello,
I need your help for this example

 for(k in LR) {
+ donGeno[[k]] - as.numeric(levels(factor(subset(don2, Id_Essai == 1006961  
Id_Cara == LC[1]  Id_Rep == k, select = Id_Geno)[,1])))
+ print(donGeno[[k]])}
 [1] 65125 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 
65137 65138 65139 65140 65141 65142 65143 65144 65171
 [1] 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 65137 
65138 65139 65140 65141 65142 65143 65144 65171
 [1] 65125 65126 65127 65128 65129 65130 65131 65132 65133 65134 65135 65136 
65137 65138 65139 65140 65141 65142 65143 65144 65171

there are a missing value for the vector donGeno[[2]] in fact there aren't the 
value 65125 and I wanna cut this value in the others vectors and I tried to do 
this as follow

C - vector()
for(k in LR) {
C[k] - length(donGeno[[k]])
}
print(C)
na=match(rep(0,length(C)-sum(match(C,C[1],nomatch=0))),match(C,C[1],nomatch=0))
#print(na)
if(na==length(C)){
pos=match(0,match(donGeno[[na-1]],donGeno[[na]],nomatch=0))
for(k in 1:(na-1)) {
donGeno[[k]] - donGeno[[k]][1:(na-1)]
}
}
else{
pos=match(0,match(donGeno[[na+1]],donGeno[[na]],nomatch=0))
for(k in 1:(.))
}

but I wonder if there's better from this script?


  
___





[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Missing values detected when there are no missing values

2006-04-24 Thread Petr Pikal

Hi

On 22 Apr 2006 at 23:29, Bob Green wrote:

Date sent:  Sat, 22 Apr 2006 23:29:02 +1000
To: r-help@stat.math.ethz.ch
From:   Bob Green [EMAIL PROTECTED]
Subject:[R] Missing values detected when there are no missing 
values

 I am hoping for some advice on the following matters.

 I have a csv data file with 153 variables x 92 rows.   To determine
 what the variables looked like I ran the summary command.  One
 variable had a large number of missing values  54/92.  For some
 reason, all subsequent 74 variables are reported as having 92 NA
 values, irrespective of whether the original csv variable was complete
 or not.

I have not seen any answer yet so I try to shot one.

first how do you know there is not any missing value in your csv 
file?

 Below are the commands I ran:

   study1dat - read.csv(c:\\study1r.csv,header=T)
   attach(study1dat)
   names(study1dat)
   summary(study1dat)

You showed what you did but we can not know much about study1r.csv so 
my answer is only guess. Let's assume that csv was constructed from 
Excel, couldn't be a problem in its construction? Some space in some 
columns which are not seen in Excel but are exported to csv and read 
to R as NA values?

What does str(study1dat) say about your data?
And are there really , vaues separators and . decimal separators 
as required by read.csv?

 The second puzzling issue, is that one variable with no missing values
 is reported in R as having 3 missing values, whereas there are no
 missing values in the csv file. The only errors in reading the data I
 received were:

Not when reading but when attaching data frame. Names in your data 
frame are same as names of some functions in mentioned packages, 
which is not an error, R just tell you that this had happened and you 
shall be avare of it.

HTH
Petr

 The following object(s) are masked from package:stats :
   time

  The following object(s) are masked from package:graphics :
   screen

  The following object(s) are masked from package:datasets :
   sleep

  The following object(s) are masked from package:base :
  pipe

 I am happy to send the csv file if required. Any advice that can
 offered is appreciated,

 Bob

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

Petr Pikal
[EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Missing values detected when there are no missing values

2006-04-22 Thread Bob Green

I am hoping for some advice on the following matters.

I have a csv data file with 153 variables x 92 rows.   To determine what 
the variables looked like I ran the summary command.  One variable had a 
large number of missing values  54/92.  For some reason, all subsequent 74 
variables are reported as having 92 NA values, irrespective of whether the 
original csv variable was complete or not.

Below are the commands I ran:

  study1dat - read.csv(c:\\study1r.csv,header=T)
  attach(study1dat)
  names(study1dat)
  summary(study1dat)

The second puzzling issue, is that one variable with no missing values is 
reported in R as having 3 missing values, whereas there are no missing 
values in the csv file.
The only errors in reading the data I received were:

The following object(s) are masked from package:stats :
  time

 The following object(s) are masked from package:graphics :
  screen

 The following object(s) are masked from package:datasets :
  sleep

 The following object(s) are masked from package:base :
pipe

I am happy to send the csv file if required. Any advice that can offered is 
appreciated,

Bob

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values in step procedure

2005-10-08 Thread Bernardo Rangel Tura

At 11:11 7/10/2005, you wrote:

Hi,
I have the problem that for the step procedure stops due to missing
values. There are no options in Step or stepAIC to handle missing
values. Is there any way to run stepwise modelselection in R in an
automated way in this case?

Here is the last step before it stops. Hope someone knows. Best regards,
Andreas

Step:  AIC= 1999.16
  EF ~ SF120_KS + SF120_PS + HADA0 + SOZU0 + LVEDD + logPROBNP +
 ALTER + SD0_01 + ASE_UK + DS140POS + RSQSICH0 + SD0_01:ASE_UK +
 SD0_01:DS140POS + SD0_01:RSQSICH0 + ASE_UK:DS140POS +
ASE_UK:RSQSICH0 +
 DS140POS:RSQSICH0 + SD0_01:ASE_UK:RSQSICH0 +
SD0_01:DS140POS:RSQSICH0 +
 ASE_UK:DS140POS:RSQSICH0

Df Sum of Sq RSS AIC
- SOZU0 1   3.0 25356.0  1997.2
- HADA0 1   7.6 25360.6  1997.3
- ALTER 1  13.0 25365.9  1997.4
- SF120_PS  1  14.7 25367.6  1997.5
- ASE_UK:DS140POS:RSQSICH0  1  20.1 25373.1  1997.6
- SD0_01:DS140POS:RSQSICH0  1  44.8 25397.7  1998.0
- SD0_01:ASE_UK:RSQSICH01  54.4 25407.4  1998.2
none  25352.9  1999.2
- LVEDD 1 382.2 25735.1  2004.6
- SF120_KS  1 476.4 25829.3  2006.4
- logPROBNP 1 891.9 26244.9  2014.4
Error in step(mod2, direction = back) :
 number of rows in use has changed: remove missing values?

Andreas

Try

data-na.omit(original database) before you run step() or stepAIC()




Bernardo Rangel Tura, MD, MSc
National Institute of Cardiology Laranjeiras
Rio de Janeiro Brazil  


--

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values in step procedure

2005-10-07 Thread Andreas Cordes

Hi,
I have the problem that for the step procedure stops due to missing 
values. There are no options in Step or stepAIC to handle missing 
values. Is there any way to run stepwise modelselection in R in an 
automated way in this case?

Here is the last step before it stops. Hope someone knows. Best regards, 
Andreas

Step:  AIC= 1999.16
 EF ~ SF120_KS + SF120_PS + HADA0 + SOZU0 + LVEDD + logPROBNP + 
ALTER + SD0_01 + ASE_UK + DS140POS + RSQSICH0 + SD0_01:ASE_UK + 
SD0_01:DS140POS + SD0_01:RSQSICH0 + ASE_UK:DS140POS + 
ASE_UK:RSQSICH0 + 
DS140POS:RSQSICH0 + SD0_01:ASE_UK:RSQSICH0 + 
SD0_01:DS140POS:RSQSICH0 + 
ASE_UK:DS140POS:RSQSICH0

   Df Sum of Sq RSS AIC
- SOZU0 1   3.0 25356.0  1997.2
- HADA0 1   7.6 25360.6  1997.3
- ALTER 1  13.0 25365.9  1997.4
- SF120_PS  1  14.7 25367.6  1997.5
- ASE_UK:DS140POS:RSQSICH0  1  20.1 25373.1  1997.6
- SD0_01:DS140POS:RSQSICH0  1  44.8 25397.7  1998.0
- SD0_01:ASE_UK:RSQSICH01  54.4 25407.4  1998.2
none  25352.9  1999.2
- LVEDD 1 382.2 25735.1  2004.6
- SF120_KS  1 476.4 25829.3  2006.4
- logPROBNP 1 891.9 26244.9  2014.4
Error in step(mod2, direction = back) :
number of rows in use has changed: remove missing values?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values in step procedure

2005-10-07 Thread Prof Brian Ripley

On Fri, 7 Oct 2005, Andreas Cordes wrote:

 I have the problem that for the step procedure stops due to missing
 values. There are no options in Step or stepAIC to handle missing
 values. Is there any way to run stepwise modelselection in R in an
 automated way in this case?

Try the hint it gives you, or see the help page (which covers this in a 
warning with an explanation).

[...]

 Error in step(mod2, direction = back) :
number of rows in use has changed: remove missing values?

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Missing values in argument of .Fortran.

2005-06-06 Thread Rolf Turner

I wish to pass a vector ``y'', some of whose entries are NAs to a
fortran subroutine which I am dynamically loading and calling by
means of .Fortran().  The subroutine runs through the vector entry by
entry; obviously I want to have it do one thing if y[i] is present
and a different thing if it is missing.

The way I am thinking of proceeding is along the xlines of:

ymiss - is.na(y)
rslt - .Fortran(
foo,
NAOK=TRUE,
as.double(y),
as.logical(ymiss),
etc,
etc
)

and inside ``foo'' have a logical branch based on the value of
xmiss(i).

Questions:

(1) Is there a sexier way to proceed?  E.g. is it possible
within (g77) fortran to detect the fact that y(i) is/was an
NA (or not) and make the nature of y(i) the basis of an
if-statement?

(2) Are there any lurking pitfalls in the use of the NAOK=TRUE
argument?

(3) Is there an entirely different and better way to proceed?

TIA.

cheers,

Rolf Turner
[EMAIL PROTECTED]

P. S. I'm running R 2.0.1 under (Red Hat) Linux.  (Sigh.  Yes I must
get around to upgrading real soon now.)

R. T.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Missing values in argument of .Fortran.

2005-06-06 Thread Duncan Murdoch


On 6/6/2005 9:52 AM, Rolf Turner wrote:

I wish to pass a vector ``y'', some of whose entries are NAs to a
fortran subroutine which I am dynamically loading and calling by
means of .Fortran().  The subroutine runs through the vector entry by
entry; obviously I want to have it do one thing if y[i] is present
and a different thing if it is missing.

The way I am thinking of proceeding is along the xlines of:

ymiss - is.na(y)
rslt - .Fortran(
foo,
NAOK=TRUE,
as.double(y),
as.logical(ymiss),
etc,
etc
)

and inside ``foo'' have a logical branch based on the value of
xmiss(i).

Questions:

(1) Is there a sexier way to proceed?  E.g. is it possible
within (g77) fortran to detect the fact that y(i) is/was an
NA (or not) and make the nature of y(i) the basis of an
if-statement?


In C you can use the macros

ISNA(x) True for Rs NA only
ISNAN(x) True for Rs NA and IEEE NaN
R_FINITE(x) False for Inf, -Inf, NA, NaN

where the R function is.na() is closest to ISNAN(), I think.  There's no 
supplied way to do these things in Fortran, but presumably you could 
call a C function which did one of these tests.



(2) Are there any lurking pitfalls in the use of the NAOK=TRUE
argument?


I think the way you did it looks perfectly safe.  Following my advice 
above will be a little trickier, because some other user of your code 
might use a different Fortran compiler, and it might handle C functions 
differently.



(3) Is there an entirely different and better way to proceed?


I'd do it your way if I was using Fortran.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Giordano Sanchez

Hello,
Thanks for the instructive responses. But two questions arise.
Firstable I can't manage to load the library mice.
I'm using R 2.0.1 on my Debian
  I try just copying the package in my library /usr/lib/R/library .
but when i do library()
   ...
   mice   ** No title available (pre-2.0.0 
install?) **
   ...
and when i do  library(mice)
  Error in library(mice) : 'mice' is not a valid 
package --installed  2.0.0?
  

The second question is more statistical:
aregImpute() seems to give good results but i would like to compare the 
different methods not just graphically. It'is possible?
I also have other meteorological stations that have correleted data with the 
data station I'm using? Can I use those data to improve my imputation 
method.

Regards,
Giordano
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values

2005-04-26 Thread falissard

Hello,
On my experience, mice works fine with R 1.9 but not necessarily for newer
versions...
Bruno


Bruno Falissard
INSERM U669, PSIGIAM
Paris Sud Innovation Group in Adolescent Mental Health
Maison de Solenn
97 Boulevard de Port Royal
75679 Paris cedex 14, France
tel : (+33) 6 81 82 70 76
fax : (+33) 1 45 59 34 18
web site : http://perso.wanadoo.fr/bruno.falissard/

 
-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] De la part de Giordano Sanchez
Envoyé : mardi 26 avril 2005 11:58
À : r-help@stat.math.ethz.ch
Objet : Re: [R] missing values

Hello,

Thanks for the instructive responses. But two questions arise.
Firstable I can't manage to load the library mice.
I'm using R 2.0.1 on my Debian
   I try just copying the package in my library /usr/lib/R/library .
but when i do library()
...
mice   ** No title available (pre-2.0.0 
install?) **
...
and when i do  library(mice)
   Error in library(mice) : 'mice' is not a valid 
package --installed  2.0.0?
   

The second question is more statistical:
aregImpute() seems to give good results but i would like to compare the 
different methods not just graphically. It'is possible?
I also have other meteorological stations that have correleted data with the

data station I'm using? Can I use those data to improve my imputation 
method.

Regards,

Giordano

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Jonathan Baron

On 04/26/05 09:58, Giordano Sanchez wrote:
 Hello,
 
 Thanks for the instructive responses. But two questions arise.
 Firstable I can't manage to load the library mice.
 I'm using R 2.0.1 on my Debian

The package called norm also has functions for missing data.
When I tried it, the values it gave were not sensible for my
problem, but I may have done something wrong.  (This was a simple 
problem that did not involve multiple imputation.)
 
 The second question is more statistical:
 aregImpute() seems to give good results but i would like to compare the
 different methods not just graphically. It'is possible?

What different methods?  Compare how?  Are you assuming that we
remember your last post?

 I also have other meteorological stations that have correleted data with the
 data station I'm using? Can I use those data to improve my imputation
 method.

This sounds like exactly what aregImput() is good for, or
transcan(), depending on whether you need to make inferences (and 
hence do multiple imputation).

Jon
-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Frank E Harrell Jr

Jonathan Baron wrote:
On 04/26/05 09:58, Giordano Sanchez wrote:
 Hello,
 
 Thanks for the instructive responses. But two questions arise.
 Firstable I can't manage to load the library mice.
 I'm using R 2.0.1 on my Debian

The package called norm also has functions for missing data.
When I tried it, the values it gave were not sensible for my
problem, but I may have done something wrong.  (This was a simple 
problem that did not involve multiple imputation.)
 
 The second question is more statistical:
 aregImpute() seems to give good results but i would like to compare the
 different methods not just graphically. It'is possible?

What different methods?  Compare how?  Are you assuming that we
remember your last post?
 I also have other meteorological stations that have correleted data with the
 data station I'm using? Can I use those data to improve my imputation
 method.
This sounds like exactly what aregImput() is good for, or
transcan(), depending on whether you need to make inferences (and 
hence do multiple imputation).

Jon
For those interested I have preprints of a paper comparing MICE, 
aregImpute, and transcan on the basis of simulations.

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Ted Harding

On 26-Apr-05 Jonathan Baron wrote:
 On 04/26/05 09:58, Giordano Sanchez wrote:
  Hello,
  
  Thanks for the instructive responses. But two questions arise.
  Firstable I can't manage to load the library mice.
  I'm using R 2.0.1 on my Debian
 
 The package called norm also has functions for missing data.
 When I tried it, the values it gave were not sensible for my
 problem, but I may have done something wrong.  (This was a simple 
 problem that did not involve multiple imputation.)

Hi Jonathan,
Would you be kind enough to give sufficient detail to reproduce
such a case? I've used 'norm' (and 'cat' and 'mix') quite
extensively, without encountering non-sensible results (at any
rate in situations where the packages were not being abused,
which one can do in certain circumstances -- imputing missing
values can depend quite strongly on supplying realistic constraints,
and on not expecting too much when the proportion of missing data
is substantial: this methodology does not have magical powers!).

best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 26-Apr-05   Time: 12:47:42
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Ramon Diaz-Uriarte

Dear Giordano,

Library Hmisc, by Frank Harrell, contains several functions for imputation 
which I have found extremely useful.

Best,

R.



On Tuesday 26 April 2005 11:58, Giordano Sanchez wrote:
 Hello,

 Thanks for the instructive responses. But two questions arise.
 Firstable I can't manage to load the library mice.
 I'm using R 2.0.1 on my Debian
I try just copying the package in my library /usr/lib/R/library .
 but when i do library()
 ...
 mice   ** No title available (pre-2.0.0
 install?) **
 ...
 and when i do  library(mice)
Error in library(mice) : 'mice' is not a valid
 package --installed  2.0.0?


 The second question is more statistical:
 aregImpute() seems to give good results but i would like to compare the
 different methods not just graphically. It'is possible?
 I also have other meteorological stations that have correleted data with
 the data station I'm using? Can I use those data to improve my imputation
 method.

 Regards,

 Giordano

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide!
 http://www.R-project.org/posting-guide.html

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)




**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en su caso los ficheros 
adjuntos, pueden contener información protegida para el uso exclusivo de su 
destinatario. Se prohíbe la distribución, reproducción o cualquier otro tipo de 
transmisión por parte de otra persona que no sea el destinatario. Si usted 
recibe por error este correo, se ruega comunicarlo al remitente y borrar el 
mensaje recibido. 
**CONFIDENTIALITY NOTICE** This email communication and any attachments may 
contain confidential and privileged information for the sole use of the 
designated recipient named above. Distribution, reproduction or any other use 
of this transmission by any party other than the intended recipient is 
prohibited. If you are not the intended recipient please contact the sender and 
delete all copies.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-26 Thread Jonathan Baron


On 04/26/05 12:54, Ted Harding wrote:
 Would you be kind enough to give sufficient detail to reproduce
 such a case? I've used 'norm' (and 'cat' and 'mix') quite
 extensively, without encountering non-sensible results (at any
 rate in situations where the packages were not being abused,
 which one can do in certain circumstances -- imputing missing
 values can depend quite strongly on supplying realistic constraints,
 and on not expecting too much when the proportion of missing data
 is substantial: this methodology does not have magical powers!).

OK.  Here you go.  First the data without any names:

41,43,41,43,44
43,40,40,42,41
43,44,NA,43,44
42,43,NA,44,44
41,44,42,42,42
43,43,41,42,42
47,48,46,47,46
39,35,35,39,38
40,39,36,40,38
40,40,40,40,40
48,46,46,48,46
45,45,42,44,45
41,40,40,41,41
40,39,37,40,38
41,42,40,41,41
41,42,41,43,43
46,46,45,46,46
40,40,41,40,41
39,41,40,41,41
40,43,38,40,39
37,36,37,36,39
45,46,45,46,46
43,44,42,43,44
42,42,48,42,43
45,46,45,46,45
37,36,36,36,38
37,34,39,37,39
NA,43,41,44,43
45,44,45,44,45
38,38,37,39,38
45,44,44,44,45
NA,42,43,43,43
45,45,44,44,45
40,35,37,40,38
43,43,43,43,43
39,34,37,36,39
38,38,38,39,39
43,41,40,42,43
46,43,42,45,45
46,45,41,44,44
40,40,38,39,40
39,37,39,38,39

Now the commands I used in norm, and the result:

m1 - as.matrix(read.csv(test.data))
s1 - prelim.norm(m1)
thetahat - em.norm(s1)
rngseed(1234564)
ximp - imp.norm(s1,thetahat,m1)
ximp

1  41.0 43 41.0 43 44
2  43.0 40 40.0 42 41
3  43.0 44 43.72409 43 44
4  42.0 43 43.36864 44 44
5  41.0 44 42.0 42 42
6  43.0 43 41.0 42 42
7  47.0 48 46.0 47 46
8  39.0 35 35.0 39 38
9  40.0 39 36.0 40 38
10 40.0 40 40.0 40 40
11 48.0 46 46.0 48 46
12 45.0 45 42.0 44 45
13 41.0 40 40.0 41 41
14 40.0 39 37.0 40 38
15 41.0 42 40.0 41 41
16 41.0 42 41.0 43 43
17 46.0 46 45.0 46 46
18 40.0 40 41.0 40 41
19 39.0 41 40.0 41 41
20 40.0 43 38.0 40 39
21 37.0 36 37.0 36 39
22 45.0 46 45.0 46 46
23 43.0 44 42.0 43 44
24 42.0 42 48.0 42 43
25 45.0 46 45.0 46 45
26 37.0 36 36.0 36 38
27 37.0 34 39.0 37 39
28 44.13337 43 41.0 44 43
29 45.0 44 45.0 44 45
30 38.0 38 37.0 39 38
31 45.0 44 44.0 44 45
32 41.25152 42 43.0 43 43
33 45.0 45 44.0 44 45
34 40.0 35 37.0 40 38
35 43.0 43 43.0 43 43
36 39.0 34 37.0 36 39
37 38.0 38 38.0 39 39
38 43.0 41 40.0 42 43
39 46.0 43 42.0 45 45
40 46.0 45 41.0 44 44
41 40.0 40 38.0 39 40
42 39.0 37 39.0 38 39

What seemed odd to me, and maybe they aren't, were the imputed
values in rows 3 and 4.  They seemed high, knowing the rater in
question and the students.  Here is the output of transcan, for
the same cases, which looks more in line with what I expected:

1  41.0 43 41.0 43 44
2  43.0 40 40.0 42 41
3  43.0 44 43.09469 43 44
4  42.0 43 43.39897 44 44
5  41.0 44 42.0 42 42
6  43.0 43 41.0 42 42
7  47.0 48 46.0 47 46
8  39.0 35 35.0 39 38
9  40.0 39 36.0 40 38
10 40.0 40 40.0 40 40
11 48.0 46 46.0 48 46
12 45.0 45 42.0 44 45
13 41.0 40 40.0 41 41
14 40.0 39 37.0 40 38
15 41.0 42 40.0 41 41
16 41.0 42 41.0 43 43
17 46.0 46 45.0 46 46
18 40.0 40 41.0 40 41
19 39.0 41 40.0 41 41
20 40.0 43 38.0 40 39
21 37.0 36 37.0 36 39
22 45.0 46 45.0 46 46
23 43.0 44 42.0 43 44
24 42.0 42 48.0 42 43
25 45.0 46 45.0 46 45
26 37.0 36 36.0 36 38
27 37.0 34 39.0 37 39
28 43.80165 43 41.0 44 43
29 45.0 44 45.0 44 45
30 38.0 38 37.0 39 38
31 45.0 44 44.0 44 45
32 42.91116 42 43.0 43 43
33 45.0 45 44.0 44 45
34 40.0 35 37.0 40 38
35 43.0 43 43.0 43 43
36 39.0 34 37.0 36 39
37 38.0 38 38.0 39 39
38 43.0 41 40.0 42 43
39 46.0 43 42.0 45 45
40 46.0 45 41.0 44 44
41 40.0 40 38.0 39 40
42 39.0 37 39.0 38 39

The commands here were

s.imp - transcan(m1,asis=*,data=m1,imputed=T,long=T,pl=F)
s.na - is.na(m1) # which ratings are imputed
m1[which(s.na)] - unlist(s.imp$imputed)

(I wish I could find a more elegant way to replace the NAs.)

Jon
- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values

2005-04-24 Thread Giordano Sanchez

Hello,
I have climatic data of various years with many missing values. I would like 
to know what tools in R are most suited to estimate this missing values. 
(New in R and quite new on statistics).

Thanks,
G
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2005-04-24 Thread Jonathan Baron

Turns out that this is not a simple question.  Depending on what
you want to do, some statistical methods will just deal with
missing data and use what is available, in different ways, e.g.,
cor().  For other purposes, you might want to impute (fill in)
the missing values, and then there are many ways to do this,
depending on what else you have (correlated variables?) and what
assumptions you are willing to make.  Two methods (among many)
that I have found useful are in aregImpute() and transcan(), both
in the Hmisc package.

To learn more, see my R search page:
http://finzi.psych.upenn.edu/

and I also have an example of aregImpute() in 
http://www.psych.upenn.edu/~baron/rpsych/rpsych.html

but see the help files first.

I found the following article very helpful when I was a beginner
with respect to this topic (which is still close to true):

Schafer, J. L.,  Graham, J. W. (2002).  Missing data: Our view
of the state of the art.  Psychological Methods, 7, 147-177.

Jon

On 04/24/05 10:15, Giordano Sanchez wrote:
 Hello,
 
 I have climatic data of various years with many missing values. I would like
 to know what tools in R are most suited to estimate this missing values.
 (New in R and quite new on statistics).

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page: http://www.sas.upenn.edu/~baron

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values

2005-04-24 Thread falissard

Hello,

The mice package http://web.inter.nl.net/users/S.van.Buuren/mi/hmtl/mice.htm
is also potentially interesting.
It works with R 1.9 but not always with newer versions.
Best regards,

Bruno


Bruno Falissard
Département de santé publique
Hôpital Paul Brousse
14 Avenue Paul Vaillant Couturier
94804 Villejuif cedex, France
tel : (+33) 6 81 82 70 76
fax : (+33) 1 45 59 34 18
web site : http://perso.wanadoo.fr/bruno.falissard/


-Message d'origine-
De : [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] De la part de Giordano Sanchez
Envoyé : dimanche 24 avril 2005 12:15
À : r-help@stat.math.ethz.ch
Objet : [R] missing values

Hello,

I have climatic data of various years with many missing values. I would like

to know what tools in R are most suited to estimate this missing values. 
(New in R and quite new on statistics).

Thanks,

G

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Missing Values

2004-12-06 Thread nhy303

I have just started using R for my PhD.  I am importing my data from Excel
via notepad into Word.  Unfortunately, my data has many missing values.  I
have put '.' and this allowed me to import the data into R.  However, I
now want to interpolate these missing values.  Please can someone give me
some pointers as to the method/code I could use?

Thankyou,

Lillian.

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values in logistic regression

2004-10-29 Thread Avril Coghlan

Dear R help list,

   I am trying to do a logistic regression
where I have a categorical response variable Y
and two numerical predictors X1 and X2. There
are quite a lot of missing values for predictor X2.
eg.,

Y X1   X2
red   0.6  0.2*
red   0.5  0.2*
red   0.5  NA
red   0.5  NA
green 0.2  0.1*
green 0.1  NA
green 0.1  NA
green 0.05 0.05   *


I am wondering can I combine X1 and X2 in
a logistic regression to predict Y, using
all the data for X1, even though there are NAs in
the X2 data?

Or do I have to take only the cases for which
there is data for both X1 and X2? (marked
with *s above)

I will be very grateful for any help,

sincerely,
Avril Coghlan
University College Dublin, Ireland

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values in logistic regression

2004-10-29 Thread Peter Dalgaard

Avril Coghlan [EMAIL PROTECTED] writes:

 Dear R help list,
 
I am trying to do a logistic regression
 where I have a categorical response variable Y
 and two numerical predictors X1 and X2. There
 are quite a lot of missing values for predictor X2.
 eg.,
 
 Y X1   X2
 red   0.6  0.2*
 red   0.5  0.2*
 red   0.5  NA
 red   0.5  NA
 green 0.2  0.1*
 green 0.1  NA
 green 0.1  NA
 green 0.05 0.05   *
 
 
 I am wondering can I combine X1 and X2 in
 a logistic regression to predict Y, using
 all the data for X1, even though there are NAs in
 the X2 data?
 
 Or do I have to take only the cases for which
 there is data for both X1 and X2? (marked
 with *s above)
 
 I will be very grateful for any help,

The built-in function (glm) for logistic regression will give you
a complete-case analysis. 

For more advanced handling of missing values, you need to look into
imputation methods. Two CRAN packages (at least) are dealing with
this, namely mix and mitools. The former is support software for a
book, which you'll probably want to consult.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values in logistic regression

2004-10-29 Thread Prof Brian Ripley

On 29 Oct 2004, Avril Coghlan wrote:

 Dear R help list,
 
I am trying to do a logistic regression
 where I have a categorical response variable Y
 and two numerical predictors X1 and X2. There
 are quite a lot of missing values for predictor X2.
 eg.,
 
 Y X1   X2
 red   0.6  0.2*
 red   0.5  0.2*
 red   0.5  NA
 red   0.5  NA
 green 0.2  0.1*
 green 0.1  NA
 green 0.1  NA
 green 0.05 0.05   *
 
 
 I am wondering can I combine X1 and X2 in
 a logistic regression to predict Y, using
 all the data for X1, even though there are NAs in
 the X2 data?
 
 Or do I have to take only the cases for which
 there is data for both X1 and X2? (marked
 with *s above)

You need to either

1) Train separate models for Y | X1 and Y | X1, X2 and use the appropriate 
one.

2) Produce an imputation model for X2 | X1, and use multiple imputation.

Given that the latter look like [0, 1] scores, mix (as suggested by PD) 
is not likely to be appropriate, but e.g. a 2D kde fit may well be.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values in logistic regression

2004-10-29 Thread Ted Harding

On 29-Oct-04 Avril Coghlan wrote:
 Dear R help list,
 
I am trying to do a logistic regression
 where I have a categorical response variable Y
 and two numerical predictors X1 and X2. There
 are quite a lot of missing values for predictor X2.
 eg.,
 
 Y X1   X2
 red   0.6  0.2*
 red   0.5  0.2*
 red   0.5  NA
 red   0.5  NA
 green 0.2  0.1*
 green 0.1  NA
 green 0.1  NA
 green 0.05 0.05   *
 
 I am wondering can I combine X1 and X2 in
 a logistic regression to predict Y, using
 all the data for X1, even though there are NAs in
 the X2 data?
 
 Or do I have to take only the cases for which
 there is data for both X1 and X2? (marked
 with *s above)

I don't know of any R routine directly aimed at logistic regression
with missing values as you describe.

However, if you are prepared to assume (or try to arrange by a
judiciously chosen transformation) that the distribution of (X1,X2)
is bivariate normal, with mean dependent on the value of Y but
with the same variance-covariance matrix throughout, then you
should be able to make progress along the following lines.
This ties in with Peter Dalgaard's suggestion of mix.
I shall assume for this explanation that your Y categories take
only two values A and B (as red , green), though the method can
be directly extended to several categories in Y.

The underlying theoretical point is that a linear logistic
regression is equivalent to a Bayesian discrimination between
two normally-distributed clusters. Let the vector of means for
(X1,X2) be mA for group A, and mB for group B; and let the
covariance matrix be V. Let x denote (X1,X2).

Then P(A|x) = [f(x|A)*p(A)]/[f(x|A)*p(A) + f(x|B)*p(B)]

where p(A) and p(B) are the prior probabilities of a group A
or a group B item.

Now substitute

 f(x|A) = C*exp(-0.5*(x-mA)'%*%W%*%(x-mA))

and similar for f(x|B); C is the constant 1/sqrt(2*pi*det(V))^k
where k is the dimension of x, and W is the inverse of V.

Then, with a bit of algebra,

 P(A|x) = 1/(1 + exp(a + b%*%x))

(a logistic regression) where a is the scalar

 log(p(B)/p(A)) + 0.5*(mA'%*%W%*%mA - mB'%*%W%*%mB)

and b is the vector

 (mB - mA)'%*%W

Now you can come back to the mix package. This is for multiple
imputation of missing values in a dataset consisting of variables
of two kinds: categorical and continuous.

The joint probability model for all the variables is expressed as a
product of the multinomial distribution for the categorical variables,
with a multivariate normal distribution for the continuous variables
where it is assumed that the covariance matrix is the same for every
combination of the values of the categorical variables, while the
multivariate means may differ at different levels of the categoricals.
Hence the underlying model for the mix package is exactly what is
needed for the above.

The primary output from imputation runs with mix is a set of
completed datasets (with missing values filled in). You can then
run a logistic regression on each completed dataset, obtaining
for each dataset the estimates of the regression parameters and
their standard errors. These can then be combined using the function
mi.inference in the mix library.

You can also, however, extract the parameter values (multinomial
probabilities and multivariate means and covariance matrix) used
in a particular imputation using the function getparam.mix in
the mix library. This function needs parameters s (evaluated
by the preliminary processor prelim.mix), and theta, evaluated
for each imputation by a data augmentation function such as da.mix.
Then you can substitute these in the above formulae for a and b to get
a and b directly, without needing to do an explicit logistic regression
on the completed dataset.

Hoping this helps!
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 29-Oct-04   Time: 13:45:46
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values in logistic regression

2004-10-29 Thread Frank E Harrell Jr

(Ted Harding) wrote:
On 29-Oct-04 Avril Coghlan wrote:
Dear R help list,
  I am trying to do a logistic regression
where I have a categorical response variable Y
and two numerical predictors X1 and X2. There
are quite a lot of missing values for predictor X2.
eg.,
Y X1   X2
red   0.6  0.2*
red   0.5  0.2*
red   0.5  NA
red   0.5  NA
green 0.2  0.1*
green 0.1  NA
green 0.1  NA
green 0.05 0.05   *
I am wondering can I combine X1 and X2 in
a logistic regression to predict Y, using
all the data for X1, even though there are NAs in
the X2 data?
Or do I have to take only the cases for which
there is data for both X1 and X2? (marked
with *s above)

I don't know of any R routine directly aimed at logistic regression
with missing values as you describe.
The aregImpute function in the Hmisc package can handle this, using 
predictive mean matching with weighted multinomial sampling of donor 
observations' binary covariate values.

. . ..
Ted.

E-Mail: (Ted Harding) [EMAIL PROTECTED]

--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University
__
[EMAIL PROTECTED] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values imputation

2004-05-12 Thread Anne

 What R functionnalities are there to do missing values imputation (substantial 
proportion of missing data)? 
I would prefer to use maximum likelihood methods ; is the EM algorithm implemented? in 
which package?


Thanks

Anne



Anne Piotet
Tel: +41 79 359 83 32 (mobile)
Email: [EMAIL PROTECTED]
---
M-TD Modelling and Technology Development
PSE-C
CH-1015 Lausanne
Switzerland
Tel: +41 21 693 83 98
Fax: +41 21 646 41 33
--
 
[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Rolf Turner

Anne Piotet wrote:

 What R functionnalities are there to do missing values imputation
 (substantial proportion of missing data)?  I would prefer to use
 maximum likelihood methods ; is the EM algorithm implemented? in
 which package?

The so-called ``EM algorithm'' is ***NOT*** an
algorithm.  It is a methodology or a unifying concept.
It would be impossible to ``implement'' it.  (Except
possibly by means of some extremely advanced and
sophisticated Artificial Intelligence software.)

cheers,

Rolf Turner
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values imputation

2004-05-12 Thread Liaw, Andy

 From: Rolf Turner
 
 Anne Piotet wrote:
 
  What R functionnalities are there to do missing values imputation
  (substantial proportion of missing data)?  I would prefer to use
  maximum likelihood methods ; is the EM algorithm implemented? in
  which package?
 
   The so-called ``EM algorithm'' is ***NOT*** an
   algorithm.  It is a methodology or a unifying concept.
   It would be impossible to ``implement'' it.  (Except
   possibly by means of some extremely advanced and
   sophisticated Artificial Intelligence software.)

Yes, but EM for missing value imputation is a bit narrower, I guess.  At
least the `norm' package on CRAN has em.norm() for multivariate gaussian...

Andy

 
   cheers,
 
   Rolf Turner
   [EMAIL PROTECTED]
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Ted Harding

On 12-May-04 Rolf Turner wrote:
 Anne Piotet wrote:
 
 What R functionnalities are there to do missing values imputation
 (substantial proportion of missing data)?  I would prefer to use
 maximum likelihood methods ; is the EM algorithm implemented? in
 which package?
 
   The so-called ``EM algorithm'' is ***NOT*** an
   algorithm.  It is a methodology or a unifying concept.
   It would be impossible to ``implement'' it.  (Except
   possibly by means of some extremely advanced and
   sophisticated Artificial Intelligence software.)

Do we understand the same thing by EM Algorithm?

The one I'm thinking of -- formulated under that name by Dempster,
Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete
data via the EM  algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm
in exactly the same sense as any iterative search for the maximum of a
function.

Essentially, in the context of data modelled by an underlying exponential
family distribution where there is incomplete information about the
values which have this distribution, it proceeds by

Start: Choose starting estimates for the parameters of the distribution
E: Using the current parameter values, compute the expected vaues
   of the sufficient statistics conditional on the observed information
M: Solve the maximum-likelihood equations (which are functions of the
   sufficient statistics) using the expected values computed in (E)
If sufficently converged, stop. Otherwise, make the current parameter
values equal to the values estimated in (M) and return to (E).

Algorithm, this, or not

And where does extremely advanced and sophisticated Artificial
Intelligence software come into it? You can, in some cases, perform
the above EM algorithm by hand.

Which EM Algorithm are you thinking of?

Best wishes,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 12-May-04   Time: 17:57:53
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Prof Brian Ripley

That's not an algorithm.  It is a recipe for deriving an algorithm.

  algorithm - A detailed sequence of actions to perform to accomplish some
  task. Named after an Iranian mathematician, Al-Khawarizmi.

  Technically, an algorithm must reach a result after a finite number of
  steps, thus ruling out brute force search methods for certain problems,
  though some might claim that brute force search was also a valid (generic)  
  algorithm. The term is also used loosely for any sequence of actions
  (which may or may not terminate).

Paul E. Black's Dictionary of Algorithms, Data Structures, and Problems.

On Wed, 12 May 2004 [EMAIL PROTECTED] wrote:

 On 12-May-04 Rolf Turner wrote:
  Anne Piotet wrote:
  
  What R functionnalities are there to do missing values imputation
  (substantial proportion of missing data)?  I would prefer to use
  maximum likelihood methods ; is the EM algorithm implemented? in
  which package?
  
The so-called ``EM algorithm'' is ***NOT*** an
algorithm.  It is a methodology or a unifying concept.
It would be impossible to ``implement'' it.  (Except
possibly by means of some extremely advanced and
sophisticated Artificial Intelligence software.)
 
 Do we understand the same thing by EM Algorithm?
 
 The one I'm thinking of -- formulated under that name by Dempster,
 Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete
 data via the EM  algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm
 in exactly the same sense as any iterative search for the maximum of a
 function.
 
 Essentially, in the context of data modelled by an underlying exponential
 family distribution where there is incomplete information about the
 values which have this distribution, it proceeds by
 
 Start: Choose starting estimates for the parameters of the distribution
 E: Using the current parameter values, compute the expected vaues
of the sufficient statistics conditional on the observed information
 M: Solve the maximum-likelihood equations (which are functions of the
sufficient statistics) using the expected values computed in (E)
 If sufficently converged, stop. Otherwise, make the current parameter
 values equal to the values estimated in (M) and return to (E).
 
 Algorithm, this, or not
 
 And where does extremely advanced and sophisticated Artificial
 Intelligence software come into it? You can, in some cases, perform
 the above EM algorithm by hand.
 
 Which EM Algorithm are you thinking of?
 
 Best wishes,
 Ted.
 
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 167 1972
 Date: 12-May-04   Time: 17:57:53
 -- XFMail --
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread A.J. Rossini

(Ted Harding) [EMAIL PROTECTED] writes:

 On 12-May-04 Rolf Turner wrote:
 Anne Piotet wrote:
 
 What R functionnalities are there to do missing values imputation
 (substantial proportion of missing data)?  I would prefer to use
 maximum likelihood methods ; is the EM algorithm implemented? in
 which package?
 
   The so-called ``EM algorithm'' is ***NOT*** an
   algorithm.  It is a methodology or a unifying concept.
   It would be impossible to ``implement'' it.  (Except
   possibly by means of some extremely advanced and
   sophisticated Artificial Intelligence software.)

 Do we understand the same thing by EM Algorithm?

 The one I'm thinking of -- formulated under that name by Dempster,
 Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete
 data via the EM  algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm
 in exactly the same sense as any iterative search for the maximum of a
 function.

 Essentially, in the context of data modelled by an underlying exponential
 family distribution where there is incomplete information about the
 values which have this distribution, it proceeds by

 Start: Choose starting estimates for the parameters of the distribution
 E: Using the current parameter values, compute the expected vaues
of the sufficient statistics conditional on the observed information
 M: Solve the maximum-likelihood equations (which are functions of the
sufficient statistics) using the expected values computed in (E)
 If sufficently converged, stop. Otherwise, make the current parameter
 values equal to the values estimated in (M) and return to (E).

 Algorithm, this, or not

 And where does extremely advanced and sophisticated Artificial
 Intelligence software come into it? You can, in some cases, perform
 the above EM algorithm by hand.

 Which EM Algorithm are you thinking of?

Thanks, Ted :-) -- to extend it a bit, one can imagine the use of
approximate solutions to the 2 steps (simulation methods to get
expected values, similar range of approaches for the maximization) and
get a general (but possibly not robust)  computational solution for
the parametric problem.  Just plug in a formula for the likelihood and
the sufficient statistics...

Of course, thousands of papers have been written on these variations
(likelihood, specific implementations of the E and M steps).  

best,
-tony

-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread A.J. Rossini


Picky, picky.

Details are in the eyes of the beholder.

Prof Brian Ripley [EMAIL PROTECTED] writes:

 That's not an algorithm.  It is a recipe for deriving an algorithm.

   algorithm - A detailed sequence of actions to perform to accomplish some
   task. Named after an Iranian mathematician, Al-Khawarizmi.

   Technically, an algorithm must reach a result after a finite number of
   steps, thus ruling out brute force search methods for certain problems,
   though some might claim that brute force search was also a valid (generic)  
   algorithm. The term is also used loosely for any sequence of actions
   (which may or may not terminate).

 Paul E. Black's Dictionary of Algorithms, Data Structures, and Problems.

 On Wed, 12 May 2004 [EMAIL PROTECTED] wrote:

 On 12-May-04 Rolf Turner wrote:
  Anne Piotet wrote:
  
  What R functionnalities are there to do missing values imputation
  (substantial proportion of missing data)?  I would prefer to use
  maximum likelihood methods ; is the EM algorithm implemented? in
  which package?
  
The so-called ``EM algorithm'' is ***NOT*** an
algorithm.  It is a methodology or a unifying concept.
It would be impossible to ``implement'' it.  (Except
possibly by means of some extremely advanced and
sophisticated Artificial Intelligence software.)
 
 Do we understand the same thing by EM Algorithm?
 
 The one I'm thinking of -- formulated under that name by Dempster,
 Laird and Rubin in 1977 (Maximum likelihood estimation from incomplete
 data via the EM  algorithm, JRSS(B) 39, 1-38) -- is indeed an algorithm
 in exactly the same sense as any iterative search for the maximum of a
 function.
 
 Essentially, in the context of data modelled by an underlying exponential
 family distribution where there is incomplete information about the
 values which have this distribution, it proceeds by
 
 Start: Choose starting estimates for the parameters of the distribution
 E: Using the current parameter values, compute the expected vaues
of the sufficient statistics conditional on the observed information
 M: Solve the maximum-likelihood equations (which are functions of the
sufficient statistics) using the expected values computed in (E)
 If sufficently converged, stop. Otherwise, make the current parameter
 values equal to the values estimated in (M) and return to (E).
 
 Algorithm, this, or not
 
 And where does extremely advanced and sophisticated Artificial
 Intelligence software come into it? You can, in some cases, perform
 the above EM algorithm by hand.
 
 Which EM Algorithm are you thinking of?
 
 Best wishes,
 Ted.
 
 
 
 E-Mail: (Ted Harding) [EMAIL PROTECTED]
 Fax-to-email: +44 (0)870 167 1972
 Date: 12-May-04   Time: 17:57:53
 -- XFMail --
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
 

 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Barry Rowlingson

A.J. Rossini wrote:
Picky, picky.

Details are in the eyes of the beholder.

 algorithm - A detailed sequence of actions to perform to accomplish some
 task. Named after an Iranian mathematician, Al-Khawarizmi.
Personally I like the first definition of 'Algorism, Algorithm' in the 
1913 Websters Revised Unabridged:

 1. The art of calculating by nine figures and zero.

Barry

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Ted Harding

On 12-May-04 A.J. Rossini wrote:
 (Ted Harding) [EMAIL PROTECTED] writes:
 [...]
 Algorithm, this, or not
 [...]
 Thanks, Ted :-) -- to extend it a bit, one can imagine the use of
 approximate solutions to the 2 steps (simulation methods to get
 expected values, similar range of approaches for the maximization) and
 get a general (but possibly not robust)  computational solution for
 the parametric problem.  Just plug in a formula for the likelihood and
 the sufficient statistics...

And thank you, Tony!

I confess to having deliberately been a bit provocative, since I see an
issue here on which I have a view (apparently shared by Tony).

For example:

Question: In your view, is the following exchange sort procedure
an algorithm? Or merely a recipe for deriving an algorithm?

A: Starting at the intended low end of the line compare each
   i-th item X[i] with the (i+1)-th item X[i+1] for i=1,2,...
B: If you find an i such that X[i]  X[i+1],
 exchange the positions of X[i] and X[i+1]
C: If you have reached the end of the line, stop.
   Otherwise, go to (A).

Now, I think this is an algorithm. However, before reading on,
please decide what you think yourself about this question.

















Well, you could use this to sort a line of people into order of
increasing height, without recourse to a measuring scale.

Just get X[i] and X[i+1] to stand up straight and look into each
others eyes. If X[i] has to look down into the eyes of X[i+1}, then
X[i]  X[i+1]; otherwise not.

The point is, illustrated naively by this example, that the above
description of exchange sort doesn't explain anything about .
So something has to be plugged in (in Tony's words) for , and
hence the algorithm, to have a meaning or an implementation. There
has to be a sort key with respect to which there is an implementation
of  in order to render the algorithm (my terminology ... ) concrete.
So yes, if being picky, the above description of exchange sort could be
called a recipe for deriving an algorithm. But then a different
algorithm would result for (a) every different kind of thing which could
be sorted; (b) every different kind of interpretation of  (e.g. it
would not then be the same algorithm if you measured people's heights
with a scale). OK, now perhaps being picky in my turn ...

However, the general point is that an algorithm, in my and no doubt
Tony's notion of it, usually needs a plugin or two or several in order
to be implemented for any particular case.

So for the EM algorithm.

It needs, specifically, a specification of the exponential-family
distribution, a means for computing a conditional expected value
with this distribution, and a solver for the complete-data maximum
likelihood equations. Once these are provided, the implementation
is complete.

Just as a coded computer routine can call a subroutine or co-routine,
so also one can envisage an algorithm calling a sub-algorithm.

Final question: What, for instance, is the status of the R function
integrate?

  plugin - function(x){x*(1-x)}
  integrate(plugin,0,1)

uses (I quote):

  For a finite interval, globally adaptive interval subdivision is
  used in connection with extrapolation by the Epsilon algorithm.

If plugin has not been specified, does the code for integrate
represent an algorithm or not? Well, I rather think it does!

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 12-May-04   Time: 19:44:20
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Rolf Turner

Thanks  Brian.

The EM algorithm requires an ``E'' step and an ``M'' step.  Harding
and Rossini appear to be seriously suggesting that an R function
could be written which would

(a) Perform the E step in arbitrary contexts, and
(b) For that given expected value, work out a procedure
to effect its maximization.

Or maybe they're not serious.

For the M step (b) general numerical optimization would theoretically
do the trick.  (But would be fraught with peril.)  For the E step
(a), forget it.

The point is, the EM ``algorithm'' is NOT an algorithm which could be
effected by an R function.  This is in complete contrast with
integrate() --- it's there; the code is written.  Hand integrate() an
integration problem, and it'll do it.  One of the differences is that
the input to an itegration problem is clearly defined and readily
specifiable as an R function.  The input to a general missing values
problem is amorphous.

Arguing about what constitutes an algorithm according to some
abstract definition is mindless.  If you define ``algorithm'' to suit
yourself, then the EM algorithm is an algorithm; otherwise not.

The original questioner wanted an R function to effect the EM
algorithm.  My point was that this is a silly request because such a
function would be impossible to write.

Call the EM algorithm an algorithm if it makes you happy.  But
remember that by doing so you'll mislead the naive inquirer
who will expect there to be a real live implementation of that
algorithm.  In computer (R) code.  Like integrate().

If you can write an R function to effect the EM ``algorithm'' --- in
general, not just in a special case --- you'll win the Chambers Prize
in computing and a few other things as well.


cheers,

Rolf Turner
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread A.J. Rossini

Rolf Turner [EMAIL PROTECTED] writes:

 The EM algorithm requires an ``E'' step and an ``M'' step.  Harding
 and Rossini appear to be seriously suggesting that an R function
 could be written which would

   (a) Perform the E step in arbitrary contexts, and
   (b) For that given expected value, work out a procedure
   to effect its maximization.

 Or maybe they're not serious.

Serious for a range of reasonable specific problems and appropriate
specification of the function (Remember that sufficient statistics
aren't unique, and would have to be specified!).  Think of it as a
macro.  Exercise left to the reader, see below.

 If you can write an R function to effect the EM ``algorithm'' --- in
 general, not just in a special case --- you'll win the Chambers Prize
 in computing and a few other things as well.

I believe there is an eligibility issue with the award you mention
(perhaps you are thinking of the ACM award?), but I suspect the
results, as in most software publications, would be severe headaches
and grief from having to listen to complaints, gripes, and groaning.

Seldom are prizes, credit, and gratitude given, else Brian would be
drowning in them.

best,
-tony

-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values imputation

2004-05-12 Thread Ted Harding

On 12-May-04 Rolf Turner wrote:
 The EM algorithm requires an ``E'' step and an ``M'' step.  Harding
 and Rossini appear to be seriously suggesting that an R function
 could be written which would
 
   (a) Perform the E step in arbitrary contexts, and
   (b) For that given expected value, work out a procedure
   to effect its maximization.
 
 Or maybe they're not serious.
 
 For the M step (b) general numerical optimization would theoretically
 do the trick.  (But would be fraught with peril.)  For the E step
 (a), forget it.
 
 The point is, the EM ``algorithm'' is NOT an algorithm which could be
 effected by an R function.
 [...]
 The original questioner wanted an R function to effect the EM
 algorithm.  My point was that this is a silly request because such a
 function would be impossible to write.

Well, I think there's been enough hair-splitting on the algorithm
issue!

To revert to the point about the original query from Anne Piotet.
She said she would prefer to use maximum likelihood methods, and asked
if the EM algorithm was available, in the context of imputing missing
data.

I don't think she was asking about whether R was blessed with a universal
EM algorithm into which any incomplete-data problem could be plugged
(and I agree that the generality of the problem, especially expressing
the conditioning corresponding to arbitrary incompleteness, would make
such a thing very elusive).

What I believe she *was* asking was whether, using R, she could do
imputation with maximum-likelihood methods using the EM algorithm.
There are plenty of imputation methods which dodge likelihood altogether,
and thereby lose efficiency, so the question has a lot of point, and
the EM algorithm is of course the natural approach since no information
is more manifestly incomplete than when there are holes in the data.

Schafer's methods (and thanks, Chuck, for the pointer to pan) all
implement the EM algorithm to obtain maximum likelihood estimates in
the first instance. As far as replying to Anne was concerned, I think
all that was needed was to give this information.

To receive a response which asserted (in effect) that it was
unimplementable must have come as a bit of a surpise, in the context!

Anyway, 'nuff said, probably ...

Best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 12-May-04   Time: 22:05:08
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values and survival analysis

2004-04-12 Thread Frank E Harrell Jr

On Sun, 11 Apr 2004 23:37:22 -0400
[EMAIL PROTECTED] wrote:

 
 Hi everyone,
 
 I'm analysing a survival analysis data set at the moment with missing 
 values in the covariate and survival vectors (I have about 60 
 variables).  I know there are some functions on the CRAN network to 
 deal with missing values in general multivariate data.  Does anybody 
 know of any package that deals with missing data specifically in the 
 context of survival analysis.  Any help would be greatly appreciated.
 
 Thanks,
 
 john.

Consider using the aregImpute function in the Hmisc package with
right-censored survival times by predicting baseline covariates using the
follow-up time, event indicator/censoring, and the product of the two,
using multiple imputation.  I am not comfortable with imputing follow-up
time and event indicators from covariates though.  If the follow-up time
is completely missing you might consider discarding the observation.

---
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values and survival analysis

2004-04-11 Thread john . ferguson


Hi everyone,

I'm analysing a survival analysis data set at the moment with missing 
values in the covariate and survival vectors (I have about 60 
variables).  I know there are some functions on the CRAN network to 
deal with missing values in general multivariate data.  Does anybody 
know of any package that deals with missing data specifically in the 
context of survival analysis.  Any help would be greatly appreciated.

Thanks,

john.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values for mda package

2004-04-09 Thread zhu wang

Thanks. I was able to use na.omit to remove NAs. But it seems to me this
kills one of the advantages of the original algorithm for handling
missing values.

On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote:
 zhu wang wrote:
  
  Dear helpers,
  
  I am trying to use the mda package downloaded from the R website, but
  the data set has missing values so I got an error message. Should I
  manually handle these missing values? I was trying to read the documents
  to specify any option related to missing values, but I did not find it.
  Please forgive me if I ignore something obvious.
 
 If it is not documented (hence probably not available) and you don't
 know how to tell the functions to handle missing values, try to do it
 yourself. ?NA suggests:
 See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be
 tuned to deal with missing values.
 
 Uwe Ligges
 
 
 
  Thanks,
  
  Zhu Wang
  
  Statistical Science Department
  Southern Methodist University
  Dallas, TX 75275-0332
  Phone: (214)768-2453
  Fax: (214)768-4035
  Email: [EMAIL PROTECTED]
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- 
Zhu Wang

Statistical Science Department
Southern Methodist University
Dallas, TX 75275-0332
Phone: (214)768-2453
Fax: (214)768-4035
Email: [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values for mda package

2004-04-09 Thread Prof Brian Ripley

Package mda covers many things, including bruto, mars, polyreg and mda
itself.  Which `the original algorithm' for which option did you have in
mind?  More concretely, what where you trying to do with the package?

Given that the package is the original authors' own code, it seems
unlikely that they `killed one of the advantages' of their methodology, so 
elucidation is sorely needed.

On 9 Apr 2004, zhu wang wrote:

 Thanks. I was able to use na.omit to remove NAs. But it seems to me this
 kills one of the advantages of the original algorithm for handling
 missing values.
 
 On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote:
  zhu wang wrote:
   
   Dear helpers,
   
   I am trying to use the mda package downloaded from the R website, but
   the data set has missing values so I got an error message. Should I
   manually handle these missing values? I was trying to read the documents
   to specify any option related to missing values, but I did not find it.
   Please forgive me if I ignore something obvious.
  
  If it is not documented (hence probably not available) and you don't
  know how to tell the functions to handle missing values, try to do it
  yourself. ?NA suggests:
  See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be
  tuned to deal with missing values.
  
  Uwe Ligges
  
  
  
   Thanks,
   
   Zhu Wang
   
   Statistical Science Department
   Southern Methodist University
   Dallas, TX 75275-0332
   Phone: (214)768-2453
   Fax: (214)768-4035
   Email: [EMAIL PROTECTED]
   
   __
   [EMAIL PROTECTED] mailing list
   https://www.stat.math.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values for mda package

2004-04-09 Thread Wang, Zhu

I basically wanted to use MARS to reproduce results using the dataset Marketing in 
the following book 

http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/

The authors actually provided S-Plus functions for mars, bruto ,etc. I used all 
default options of mars in R but there was an error due to NAs and I could not find 
any option to handle missing values. 

Zhu Wang 

-Original Message-
From:   Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent:   Fri 4/9/2004 12:53 PM
To: Wang, Zhu
Cc: Uwe Ligges; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject:Re: [R] missing values for mda package
Package mda covers many things, including bruto, mars, polyreg and mda
itself.  Which `the original algorithm' for which option did you have in
mind?  More concretely, what where you trying to do with the package?

Given that the package is the original authors' own code, it seems
unlikely that they `killed one of the advantages' of their methodology, so 
elucidation is sorely needed.

On 9 Apr 2004, zhu wang wrote:

 Thanks. I was able to use na.omit to remove NAs. But it seems to me this
 kills one of the advantages of the original algorithm for handling
 missing values.
 
 On Tue, 2004-04-06 at 11:54, Uwe Ligges wrote:
  zhu wang wrote:
   
   Dear helpers,
   
   I am trying to use the mda package downloaded from the R website, but
   the data set has missing values so I got an error message. Should I
   manually handle these missing values? I was trying to read the documents
   to specify any option related to missing values, but I did not find it.
   Please forgive me if I ignore something obvious.
  
  If it is not documented (hence probably not available) and you don't
  know how to tell the functions to handle missing values, try to do it
  yourself. ?NA suggests:
  See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be
  tuned to deal with missing values.
  
  Uwe Ligges
  
  
  
   Thanks,
   
   Zhu Wang
   
   Statistical Science Department
   Southern Methodist University
   Dallas, TX 75275-0332
   Phone: (214)768-2453
   Fax: (214)768-4035
   Email: [EMAIL PROTECTED]
   
   __
   [EMAIL PROTECTED] mailing list
   https://www.stat.math.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

RE: [R] missing values for MARS (was mda package)

2004-04-09 Thread Prof Brian Ripley

On Fri, 9 Apr 2004, Wang, Zhu wrote:

 I basically wanted to use MARS to reproduce results using the dataset
 Marketing in the following book
 
 http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/
 
 The authors actually provided S-Plus functions for mars, bruto ,etc. I
 used all default options of mars in R but there was an error due to NAs
 and I could not find any option to handle missing values.

Friedman originated MARS and has code for it.  The code in mda by 
Hastie/Tibshirani is different, and the code on that website is a direct 
ancestor of the mda package for R.  I see no option in the code for mars 
there to handle missing values, so you would do better to ask the authors 
how they did it (if you really believe they have such an option).

And PLEASE read the posting guide and try to learn to ask precise
questions with enough background information!

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values for mda package

2004-04-06 Thread zhu wang

Dear helpers,

I am trying to use the mda package downloaded from the R website, but
the data set has missing values so I got an error message. Should I
manually handle these missing values? I was trying to read the documents
to specify any option related to missing values, but I did not find it.
Please forgive me if I ignore something obvious.

Thanks,

Zhu Wang

Statistical Science Department
Southern Methodist University
Dallas, TX 75275-0332
Phone: (214)768-2453
Fax: (214)768-4035
Email: [EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values for mda package

2004-04-06 Thread Uwe Ligges



zhu wang wrote:
 
 Dear helpers,
 
 I am trying to use the mda package downloaded from the R website, but
 the data set has missing values so I got an error message. Should I
 manually handle these missing values? I was trying to read the documents
 to specify any option related to missing values, but I did not find it.
 Please forgive me if I ignore something obvious.

If it is not documented (hence probably not available) and you don't
know how to tell the functions to handle missing values, try to do it
yourself. ?NA suggests:
See Also: [...] 'na.action', 'na.omit', 'na.fail' on how methods can be
tuned to deal with missing values.

Uwe Ligges



 Thanks,
 
 Zhu Wang
 
 Statistical Science Department
 Southern Methodist University
 Dallas, TX 75275-0332
 Phone: (214)768-2453
 Fax: (214)768-4035
 Email: [EMAIL PROTECTED]
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values

2004-03-07 Thread Grace Conlon

How can I deal with missing values in the excel file? 
I used read.csv to imports data, how ever there are missing values in the csv file. 
When I use names(), it turns out a error message:  names attribute must be the same 
length as the vector 
What can i do with the missing values?
Thanks


-


[[alternative HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] missing values

2004-03-07 Thread Peter Dalgaard

Grace Conlon [EMAIL PROTECTED] writes:

 How can I deal with missing values in the excel file? 
 I used read.csv to imports data, how ever there are missing values in the csv file. 
 When I use names(), it turns out a error message:  names attribute must be the same 
 length as the vector 
 What can i do with the missing values?

What were you trying to do with names and what has it got to do with
missing values??

How are the missing values coded in the csv file? If they are empty
fields, read.csv (btw, isn't it easier to export as delimited file and
use read.delim?) should handle them automatically, if not, try using
the na.strings argument.

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] missing values and gam (was: how to handle missing values)

2003-07-17 Thread Tor A Strand

Thank you for all the responses on generalized additive models(gam) and
missing values. I am now able set up a model using gam and have a certain
understanding of how R deals with missing values.

The problem is, however, I am still not able to a gam model that is from a
dataset that contains missing values.

The function

C-gam(depvar~var1+var2+s(var3), data=dataset)
Returns the errors

Error in na.omit.default() : Argument object is missing, with no default

Again, can anyone help a newbie.

Tor

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] missing values and gam (was: how to handle missing values)

2003-07-17 Thread Simon Wood

mgcv 0.9 will handle missing values properly (provided you are happy that 
dropping them is 'proper'). There is a pre-release version at:

www.stats.gla.ac.uk/~simon/simon/mgcv.html

(it is a pre-release version, so there will be bugs, reports of which
gratefully received!)

simon

 Thank you for all the responses on generalized additive models(gam) and
 missing values. I am now able set up a model using gam and have a certain
 understanding of how R deals with missing values.
 
 The problem is, however, I am still not able to a gam model that is from a
 dataset that contains missing values.
 
 The function
 
 C-gam(depvar~var1+var2+s(var3), data=dataset)
 Returns the errors
 
 Error in na.omit.default() : Argument object is missing, with no default
 
 Again, can anyone help a newbie.
 
 Tor
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] missing values

2003-05-29 Thread Adrian Dusa

Dear list members,

 

I'm relatively new to this list; can anyone tell me how to declare
missing values once a dataset has been attached?

For example here:

 

   VAR1

1 1

2 2

3 1

4 3

5 2

6 1

7 3

8 3

9 1

102

11   98

122

13   97

14   99

15   NA

163

 

I would like values 97, 98 and 99 to be treated as missing values.

I read everything about is.na but I just can't figure out how to do it. 

 

Many thanks,

Adrian

 


Adrian Dusa ([EMAIL PROTECTED])
Romanian Social Data Archive (www.roda.ro)
1, Schitu Magureanu Bd.
76625 Bucharest sector 5
Romania
Tel./Fax: +40 (21) 312.66.18

 


[[alternate HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] missing values

2003-05-29 Thread Thomas Lumley

On Wed, 28 May 2003, Adrian Dusa wrote:



 I would like values 97, 98 and 99 to be treated as missing values.


VAR1[VAR1 %in% c(97,98,99)]-NA

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

52 matches

Mail list logo