pipeline helps in prediction time too. On 4 Aug 2017 7:49 am, "pybokeh" <pybo...@gmail.com> wrote:
> I found my problem. When I one-hot encoded my test part #, it resulted in > being a 1x1 matrix, when I need it to be a 1x153. This happened because I > used the default setting ('auto') for n_values, when I needed it set it to > 153. Now when I horizontally stacked it to my other feature matrix, the > resulting total # of columns now correctly comes to 1294, instead of > 1142. Looking back now, not sure if using Pipeline or using FeatureUnion > would have helped in this case or prevented this since this error occurred > on the prediction side, not on training or modeling side. > > On Wed, Aug 2, 2017 at 10:38 PM, Joel Nothman <joel.noth...@gmail.com> > wrote: > >> Use a Pipeline to help avoid this kind of issue (and others). You might >> also want to do something like http://scikit-learn.org/stable >> /auto_examples/hetero_feature_union.html >> >> On 3 August 2017 at 12:01, pybokeh <pybo...@gmail.com> wrote: >> >>> Hello, >>> I am studying this example from scikit-learn's site: >>> http://scikit-learn.org/stable/tutorial/text_analytics/worki >>> ng_with_text_data.html >>> >>> The problem that I need to solve is very similar to this example, except >>> I have one >>> additional feature column (part #) that is categorical of type string. >>> My label or target >>> values consist of just 2 values: 0 or 1. >>> >>> With that additional feature column, I am transforming it with a >>> LabelEncoder and >>> then I am encoding it with the OneHotEncoder. >>> >>> Then I am concatenating that one-hot encoded column (part #) to the >>> text/document >>> feature column (complaint), which I had applied the CountVectorizer and >>> TfidfTransformer transformations. >>> >>> Then I chose the MultinomialNB model to fit my concatenated training >>> data with. >>> >>> The problem I run into is when I invoke the prediction, I get a >>> dimension mis-match error. >>> >>> Here's my jupyter notebook gist: >>> http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85 >>> ef86ba41424b311 >>> >>> I would gladly appreciate it if someone can guide me where I went >>> wrong. Thanks! >>> >>> - Daniel >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> scikit-learn@python.org >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn