[R] Effect of na.omit()

2009-12-29 Thread James Rome
I had an NA in one row of my data frame, so I called na.omit(). But I do
not understand where that row disappeared to.

fri=na.omit(fri)
 fri
  Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
109/05/2008 Friday833   3  328
210/24/2008 Friday   2186   4  287
310/31/2008 Friday833   4  205
410/31/2008 Friday834   4  205
510/31/2008 Friday835   4  123

1233 08/28/2009 Friday0 2   3  123
1234 09/18/2009 Friday   2292   3   82
1235 09/18/2009 Friday   2393   3  205
 fri[1235,]
   Date.Only  DAY Hour Min15 Quarter Arrival.Val Arrival4
NA  NA NA   NANA  NA  NA   NA
 fri[1234,]
  Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
1235 09/18/2009 Friday   2393   3  205

So, the index numbers of the rows do not seem to have been updated. They
are not part of my data frame (I think), so why didn't the rows renumber
themselves?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Effect of na.omit()

2009-12-29 Thread Ted Harding
On 29-Dec-09 21:11:38, James Rome wrote:
 I had an NA in one row of my data frame, so I called na.omit().
 But I do not understand where that row disappeared to.
 
fri=na.omit(fri)
 fri
   Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
 109/05/2008 Friday833   3  328
 210/24/2008 Friday   2186   4  287
 310/31/2008 Friday833   4  205
 410/31/2008 Friday834   4  205
 510/31/2008 Friday835   4  123
 
 1233 08/28/2009 Friday0 2   3  123
 1234 09/18/2009 Friday   2292   3   82
 1235 09/18/2009 Friday   2393   3  205
 fri[1235,]
Date.Only  DAY Hour Min15 Quarter Arrival.Val Arrival4
 NA  NA NA   NANA  NA  NA   NA
 fri[1234,]
   Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
 1235 09/18/2009 Friday   2393   3  205
 
 So, the index numbers of the rows do not seem to have been updated.
 They are not part of my data frame (I think), so why didn't the rows
 renumber themselves?
 
 Thanks,
 Jim Rome

Because the numbers which are displayed at the left of the rows
are not the row numbers of the structure being displayed, but
they are in fact row *names*!

These are so assigned (by default) when the dataframe is created.
Example:

  DF - data.frame(col1=c(1,2,3,4),col2=c(2,3,4,5),col3=c(3,4,5,6))
  DF
  #   col1 col2 col3
  # 1123
  # 2234
  # 3345
  # 4456
  row.names(DF)
  # [1] 1 2 3 4

  DF[c(1,3,4),]
  #   col1 col2 col3
  # 1123
  # 3345
  # 4456

  row.names(DF) - c(A,B,C,D)
  DF
  #   col1 col2 col3
  # A123
  # B234
  # C345
  # D456

  DF[c(1,3,4),]
  #   col1 col2 col3
  # A123
  # C345
  # D456

So the (1,2,3,4) row-names - (1,3,4) are treated exactly like
the row-names (A,B,C,D) - (A,C,D).

If you want to re-number the rows after eliminating some rows
(with na.omit) then you could do

row.names(fri) - (1:nrow(fri))

Example:

  DF1 -  DF[c(1,3,4),]
  DF1
  #   col1 col2 col3
  # A123
  # C345
  # D456
  row.names(DF1) - (1:nrow(DF1))
  DF1
  #   col1 col2 col3
  # 1123
  # 2345
  # 3456

However, often it is very useful to keeep the original numbering
(i.e. the numerical row-names), since this is then a record of
which rows in the dataframe got used. For example, in a regression
with some missing data coded as NA, the model-matrix will retain
the original numbering, so yhou can identify which cases (rows)
got used by looking at the row.names() of the model matrix.

Since these are returned as numeric values, the result can be used
as an index into the original dataset.

Hoping this helps,
Ted.


E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Dec-09   Time: 21:33:10
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.