Re: [R] multi-column factor

2012-09-17 Thread Hadley Wickham
If you have a million levels is it really necessary to use a factor? I'm not sure what advantages it will to have to a string in this circumstance (especially since you don't seem to know the levels a priori but have to learn them from the data). Hadley On Sunday, September 16, 2012, Sam

[R] multi-column factor

2012-09-16 Thread Sam Steingold
I have a data frame with columns which draw on the same underlying universe, so I want them to be factors with the same level set: --8---cut here---start-8--- z - data.frame(a=c(a,b,c),b=c(b,c,d),stringsAsFactors=FALSE) str(z) 'data.frame': 3 obs. of 2

Re: [R] multi-column factor

2012-09-16 Thread Rui Barradas
Hello, The obvious simplification is to call union() only once. With 10M rows it should save time. Then I've asked myself whether unique() wouldn't be faster. f1 - function(x){ x[[1]] - factor(x[[1]], levels = union(x[[1]], x[[2]])) x[[2]] - factor(x[[2]], levels = union(x[[1]],