If each input column is encoded as a value from 0 to the (number of possible values for that column - 1) then n_values for that column should be the highest value + 1, which is also the number of levels per column. Does that make sense?
Actually, I've realised there's a somewhat slow and unnecessary bit of code in the one-hot encoder: where the COO matrix is converted to CSR. I suspect this was done because most of our ML algorithms perform better on CSR, or else to maintain backwards compatibility with an earlier implementation.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn