Alan Kelly wrote:
Deal list,
I have a data frame (birth) with mixed variables (numeric and
alphanumeric). One variable "t1stvisit" was originally coded as numeric
with values 1,2, and 3. After attaching the data frame, this is what I
see when I use str(t1stvisit)
actually, str(birth), I suspect, but not important.
$ t1stvisit: int 1 1 1 1 1 1 1 1 2 2 ...
This is as expected.
I then convert t1stvisit to a factor and to avoid creating a second copy
of this variable independent of the data frame I use:
birth$t1stvisit = as.factor(birth$t1stvisit)
if I check that the conversion has worked:
is.factor(t1stvisit)
[1] FALSE
Now the only object present in the workspace in the data frame "birth"
and, as noted, I have not created any new variables. So why does R
still treat t1stvisit as numeric?
is.factor(t1stvisit)
[1] FALSE
Yet when I try the following:
> is.factor(birth$t1stvisit)
[1] TRUE
So, there appears to be two versions of "t1stvisit" - the original
numeric version and the correct factor version although ls() only shows
"birth" as present in the workspace.
If I type:
> summary(t1stvisit)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 1.000 2.000 1.574 2.000 3.000 29.000
I get the numeric version, but if I try
summary(birth$t1stvisit)
1 2 3 NA's
180 169 22 29
I get the factor version.
Frankly I feel that this behaviour is non-intuitive and potentially
problematic. Nor have I seen warnings about this in the various text
books on R.
Can anyone comment on why this should occur?
I haven't looked at discussions of 'attach()' for a while,
since I rarely use it nowadays (I find with() more convenient
most of the time), but Chapter 6 in 'An Introduction to R'
does discuss it.
There are indeed two versions of 'birth'.
Your basic problem is which version of 'birth' is being modified.
Hint: it's NOT the attached version.
Small example:
dat <- data.frame(x=1:3)
attach(dat)
dat$y <- 4:6
y
#Error: object 'y' not found
dat$y
#[1] 4 5 6
BTW, you don't need as.factor(); use factor().
-Peter Ehlers
Many thanks,
Alan Kelly
Dr. Alan Kelly
Department of Public Health & Primary Care
Trinity College Dublin
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.