Hello, I am studying this example from scikit-learn's site: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_ data.html
The problem that I need to solve is very similar to this example, except I have one additional feature column (part #) that is categorical of type string. My label or target values consist of just 2 values: 0 or 1. With that additional feature column, I am transforming it with a LabelEncoder and then I am encoding it with the OneHotEncoder. Then I am concatenating that one-hot encoded column (part #) to the text/document feature column (complaint), which I had applied the CountVectorizer and TfidfTransformer transformations. Then I chose the MultinomialNB model to fit my concatenated training data with. The problem I run into is when I invoke the prediction, I get a dimension mis-match error. Here's my jupyter notebook gist: http://nbviewer.jupyter.org/gist/anonymous/59ba930a783571c85ef86ba41424b311 I would gladly appreciate it if someone can guide me where I went wrong. Thanks! - Daniel
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn