Simon Urbanek wrote: >> I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII >> factor as it is on the ASCII factor. > > Strange - are you sure you get the right locale names? Make sure it's > listed in locale -a.
Yes, I managed to reproduce it now, using a locale listed in ‘locale -a’. There is a performance hit, though *much* smaller than on Windows. > FWIW if you care about speed you should use tabulate() instead - it's much > faster and incurs no penalty: Yes, that the solution I ended up using: res = tabulate(x, nbins=nlevels(x)) # nbins needed for levels that don’t occur names(res) = levels(x) res (Though I’m not sure it’s *guaranteed* that factors are internally stored in a way that make this works, i.e., as the numbers 1, 2, ... for level 1, 2 ...) Anyway, do you think it’s worth trying to change the ‘table’ function the way I outlined in my first post¹? This should eliminate the performance hit on all platforms. However, it will introduce a performance hit (CPU and memory use) if the elements of ‘exclude’ make up a large part of the factor(s). ¹ http://permalink.gmane.org/gmane.comp.lang.r.devel/26576 -- Karl Ove Hufthammer ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel