I found my problem. When I one-hot encoded my test part #, it resulted in being a 1x1 matrix, when I need it to be a 1x153. This happened because I used the default setting ('auto') for n_values, when I needed it set it to 153. Now when I horizontally stacked it to my other feature matrix, the resulting total # of columns now correctly comes to 1294, instead of 1142. Looking back now, not sure if using Pipeline or using FeatureUnion would have helped in this case or prevented this since this error occurred on the prediction side, not on training or modeling side.
On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.noth...@gmail.com> wrote: > Use a Pipeline to help avoid this kind of issue (and others). You might > also want to do something like http://scikit-learn.org/ > stable/auto_examples/hetero_feature_union.html > > On 3 August 2017 at 12:01, pybokeh <pybo...@gmail.com> wrote: > >> Hello, >> I am studying this example from scikit-learn's site: >> http://scikit-learn.org/stable/tutorial/text_analytics/worki >> ng_with_text_data.html >> >> The problem that I need to solve is very similar to this example, except >> I have one >> additional feature column (part #) that is categorical of type string. >> My label or target >> values consist of just 2 values: 0 or 1. >> >> With that additional feature column, I am transforming it with a >> LabelEncoder and >> then I am encoding it with the OneHotEncoder. >> >> Then I am concatenating that one-hot encoded column (part #) to the >> text/document >> feature column (complaint), which I had applied the CountVectorizer and >> TfidfTransformer transformations. >> >> Then I chose the MultinomialNB model to fit my concatenated training data >> with. >> >> The problem I run into is when I invoke the prediction, I get a dimension >> mis-match error. >> >> Here's my jupyter notebook gist: >> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >> ef86ba41424b311 >> >> I would gladly appreciate it if someone can guide me where I went wrong. >> Thanks! >> >> - Daniel >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn