Re: [R] Inconsistence in specifying action for missing data

2005-09-04 Thread Peter Dalgaard
Martin Maechler [EMAIL PROTECTED] writes:

  Duncan == Duncan Murdoch [EMAIL PROTECTED]
  on Sat, 03 Sep 2005 11:40:18 -0400 writes:
 
 Duncan John Sorkin wrote:
  A question for R (and perhaps S and SPlus) historians.
  
  Does anyone know the reason for the inconsistency in the
  way that the action that should be taken when data are
  missing is specified? There are several variants,
  na.action, na.omit, T, TRUE, etc. I know that a foolish
  consistency is the hobgoblin of a small mind, but
  consistency can make things easier.
  
  My question is not meant as a complaint. I very much
  admire the R development team. I simply am curious.
 
 Duncan R and S have been developed by lots of people, over
 Duncan a long time.  I think that's it.
 
 yes, but there's a bit more to it.
 
 First, the question was wrong (don't you just hate such an answer?):
 A more interesting  question would have asked why there was 
   'na.rm = {TRUE, FALSE}' 
 on one hand and
   'na.action =  {na.omit, na.replace, .}'
 on the other hand,
 since only these two appear as function *arguments* 
 {at least in `decent' S and R functions}.

So cor() is indecent (with its use= argument)? ;-)


-- 
   O__   Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Inconsistence in specifying action for missing data

2005-09-04 Thread Thomas Lumley
On Sat, 3 Sep 2005, John Sorkin wrote:

 A question for R (and perhaps S and SPlus) historians.

 Does anyone know the reason for the inconsistency in the way that the
 action that should be taken when data are missing is specified? There
 are several variants, na.action, na.omit, T, TRUE,  etc. I know that a
 foolish consistency is the hobgoblin of a small mind, but consistency
 can make things easier.


There's actually a little more consistency than first appears.  There are 
two most common ways to refer to missingness,  na.rm and na.action.  Usually 
na.rm has default TRUE (using T is a bug) and removes NAs from one vector 
at a time.

na.action usually has default na.omit() and works on whole data frames, eg 
na.omit and na.exclude do casewise deletion if any variable is NA.

These aren't completely uniform, and that is simply historical. I think 
there was once an attempt to make na.fail() the default na.action, but 
there was too much resistance to change.

-thomas

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Inconsistence in specifying action for missing data

2005-09-03 Thread Duncan Murdoch
John Sorkin wrote:
 A question for R (and perhaps S and SPlus) historians.
 
 Does anyone know the reason for the inconsistency in the way that the
 action that should be taken when data are missing is specified? There
 are several variants, na.action, na.omit, T, TRUE,  etc. I know that a
 foolish consistency is the hobgoblin of a small mind, but consistency
 can make things easier.
 
 My question is not meant as a complaint. I very much admire the R
 development team. I simply am curious.

R and S have been developed by lots of people, over a long time.  I 
think that's it.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Inconsistence in specifying action for missing data

2005-09-03 Thread Martin Maechler
 Duncan == Duncan Murdoch [EMAIL PROTECTED]
 on Sat, 03 Sep 2005 11:40:18 -0400 writes:

Duncan John Sorkin wrote:
 A question for R (and perhaps S and SPlus) historians.
 
 Does anyone know the reason for the inconsistency in the
 way that the action that should be taken when data are
 missing is specified? There are several variants,
 na.action, na.omit, T, TRUE, etc. I know that a foolish
 consistency is the hobgoblin of a small mind, but
 consistency can make things easier.
 
 My question is not meant as a complaint. I very much
 admire the R development team. I simply am curious.

Duncan R and S have been developed by lots of people, over
Duncan a long time.  I think that's it.

yes, but there's a bit more to it.

First, the question was wrong (don't you just hate such an answer?):
A more interesting  question would have asked why there was 
  'na.rm = {TRUE, FALSE}' 
on one hand and
  'na.action =  {na.omit, na.replace, .}'
on the other hand,
since only these two appear as function *arguments* 
{at least in `decent' S and R functions}.

There, the answer has at least two parts:
- First, for some functionalities,  na.rm = TRUE/FALSE is the
  only thing that makes sense, so why should you have to use
  something more complicated?

- IIRC, 'na.rm' has been much earlier (S version 2),
  than 'na.action' (S version 3; with  na.replace much later IIRC);
  na.action was really becoming relevant only when thinking
  about model fitting and non-trivial missing value treatment.

Martin Maechler, ETH Zurich

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html