You can effectively merge features through matrix multiplication: multiply the CountVectorizer output by a sparse matrix of shape (n_features_in, n_features_out) which has 1 where the output feature corresponds to an input feature. Your spelling correction then consists of building this mapping matrix.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn