If each input column is encoded as a value from 0 to the (number of
possible values for that column - 1) then n_values for that column should
be the highest value + 1, which is also the number of levels per column.
Does that make sense?
Actually, I've realised there's a somewhat slow and unnecessary bit of code
in the one-hot encoder: where the COO matrix is converted to CSR. I suspect
this was done because most of our ML algorithms perform better on CSR, or
else to maintain backwards compatibility with an earlier implementation.
scikit-learn mailing list