>>> Method dispatch for `vec_c()` is quite simple because associativity and >>> commutativity mean that we can determine the output type only by >>> considering a pair of inputs at a time. To this end, vctrs provides >>> `vec_type2()` which takes two inputs and returns their common type >>> (represented as zero length vector): >>> >>> str(vec_type2(integer(), double())) >>> #> num(0) >>> >>> str(vec_type2(factor("a"), factor("b"))) >>> #> Factor w/ 2 levels "a","b": >> >> >> What is the reasoning behind taking the union of the levels here? I'm not >> sure that is actually the behavior I would want if I have a vector of >> factors and I try to append some new data to it. I might want/ expect to >> retain the existing levels and get either NAs or an error if the new data >> has (present) levels not in the first data. The behavior as above doesn't >> seem in-line with what I understand the purpose of factors to be (explicit >> restriction of possible values). > > Originally (like a week ago 😀), we threw an error if the factors > didn't have the same level, and provided an optional coercion to > character. I decided that while correct (the factor levels are a > parameter of the type, and hence factors with different levels aren't > comparable), that this fights too much against how people actually use > factors in practice. It also seems like base R is moving more in this > direction, i.e. in 3.4 factor("a") == factor("b") is an error, whereas > in R 3.5 it returns FALSE.
I now have a better argument, I think: If you squint your brain a little, I think you can see that each set of automatic coercions is about increasing resolution. Integers are low resolution versions of doubles, and dates are low resolution versions of date-times. Logicals are low resolution version of integers because there's a strong convention that `TRUE` and `FALSE` can be used interchangeably with `1` and `0`. But what is the resolution of a factor? We must take a somewhat pragmatic approach because base R often converts character vectors to factors, and we don't want to be burdensome to users. So we say that a factor `x` has finer resolution than factor `y` if the levels of `y` are contained in `x`. So to find the common type of two factors, we take the union of the levels of each factor, given a factor that has finer resolution than both. Finally, you can think of a character vector as a factor with every possible level, so factors and character vectors are coercible. (extracted from the in-progress vignette explaining how to extend vctrs to work with your own vctrs, now that vctrs has been rewritten to use double dispatch) Hadley -- http://hadley.nz ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel