On 29-Dec-09 21:11:38, James Rome wrote:
I had an NA in one row of my data frame, so I called na.omit().
But I do not understand where that row disappeared to.
fri=na.omit(fri)
fri
Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
109/05/2008 Friday833 3 328
210/24/2008 Friday 2186 4 287
310/31/2008 Friday833 4 205
410/31/2008 Friday834 4 205
510/31/2008 Friday835 4 123
1233 08/28/2009 Friday0 2 3 123
1234 09/18/2009 Friday 2292 3 82
1235 09/18/2009 Friday 2393 3 205
fri[1235,]
Date.Only DAY Hour Min15 Quarter Arrival.Val Arrival4
NA NA NA NANA NA NA NA
fri[1234,]
Date.OnlyDAY Hour Min15 Quarter Arrival.Val Arrival4
1235 09/18/2009 Friday 2393 3 205
So, the index numbers of the rows do not seem to have been updated.
They are not part of my data frame (I think), so why didn't the rows
renumber themselves?
Thanks,
Jim Rome
Because the numbers which are displayed at the left of the rows
are not the row numbers of the structure being displayed, but
they are in fact row *names*!
These are so assigned (by default) when the dataframe is created.
Example:
DF - data.frame(col1=c(1,2,3,4),col2=c(2,3,4,5),col3=c(3,4,5,6))
DF
# col1 col2 col3
# 1123
# 2234
# 3345
# 4456
row.names(DF)
# [1] 1 2 3 4
DF[c(1,3,4),]
# col1 col2 col3
# 1123
# 3345
# 4456
row.names(DF) - c(A,B,C,D)
DF
# col1 col2 col3
# A123
# B234
# C345
# D456
DF[c(1,3,4),]
# col1 col2 col3
# A123
# C345
# D456
So the (1,2,3,4) row-names - (1,3,4) are treated exactly like
the row-names (A,B,C,D) - (A,C,D).
If you want to re-number the rows after eliminating some rows
(with na.omit) then you could do
row.names(fri) - (1:nrow(fri))
Example:
DF1 - DF[c(1,3,4),]
DF1
# col1 col2 col3
# A123
# C345
# D456
row.names(DF1) - (1:nrow(DF1))
DF1
# col1 col2 col3
# 1123
# 2345
# 3456
However, often it is very useful to keeep the original numbering
(i.e. the numerical row-names), since this is then a record of
which rows in the dataframe got used. For example, in a regression
with some missing data coded as NA, the model-matrix will retain
the original numbering, so yhou can identify which cases (rows)
got used by looking at the row.names() of the model matrix.
Since these are returned as numeric values, the result can be used
as an index into the original dataset.
Hoping this helps,
Ted.
E-Mail: (Ted Harding) ted.hard...@manchester.ac.uk
Fax-to-email: +44 (0)870 094 0861
Date: 29-Dec-09 Time: 21:33:10
-- XFMail --
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.