If you have a million levels is it really necessary to use a factor? I'm
not sure what advantages it will to have to a string in this circumstance
(especially since you don't seem to know the levels a priori but have to
learn them from the data).
Hadley
On Sunday, September 16, 2012, Sam
I have a data frame with columns which draw on the same underlying
universe, so I want them to be factors with the same level set:
--8---cut here---start-8---
z - data.frame(a=c(a,b,c),b=c(b,c,d),stringsAsFactors=FALSE)
str(z)
'data.frame': 3 obs. of 2
Hello,
The obvious simplification is to call union() only once. With 10M rows
it should save time.
Then I've asked myself whether unique() wouldn't be faster.
f1 - function(x){
x[[1]] - factor(x[[1]], levels = union(x[[1]], x[[2]]))
x[[2]] - factor(x[[2]], levels = union(x[[1]],
3 matches
Mail list logo