Use a Pipeline to help avoid this kind of issue (and others). You might also want to do something like http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html
On 3 August 2017 at 12:01, pybokeh <pybo...@gmail.com> wrote: > Hello, > I am studying this example from scikit-learn's site: > http://scikit-learn.org/stable/tutorial/text_analytics/ > working_with_text_data.html > > The problem that I need to solve is very similar to this example, except I > have one > additional feature column (part #) that is categorical of type string. My > label or target > values consist of just 2 values: 0 or 1. > > With that additional feature column, I am transforming it with a > LabelEncoder and > then I am encoding it with the OneHotEncoder. > > Then I am concatenating that one-hot encoded column (part #) to the > text/document > feature column (complaint), which I had applied the CountVectorizer and > TfidfTransformer transformations. > > Then I chose the MultinomialNB model to fit my concatenated training data > with. > > The problem I run into is when I invoke the prediction, I get a dimension > mis-match error. > > Here's my jupyter notebook gist: > http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 > ef86ba41424b311 > > I would gladly appreciate it if someone can guide me where I went wrong. > Thanks! > > - Daniel > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn