Hi,
I'm having trouble integrating a HashingVectorizer into a pipeline using
heterogeneous features. I've tried to construct my pipeline like this:
pipeline = Pipeline([
# Extract review text and stars
('text_stars', TextStarExtractor()),
# Use FeatureUnion to combine features of text and star ratings
('union', FeatureUnion(
transformer_list=[
# Pipeline for pulling tf-idf from review text
('review_bow', Pipeline([
('selector', ItemSelector(key='body')),
('hasher', HashingVectorizer()),
('tfidf', TfidfVectorizer()),
])),
# Pipeline for pulling ad hoc features from post's body
('body_stats', Pipeline([
('selector', ItemSelector(key='body')),
('stats', TextStats()), # returns a list of dicts
('vect', DictVectorizer(sparse=False)), # list of dicts ->
feature matrix
])),
('star_stats', Pipeline([
('selector', ItemSelector(key='stars')),
('rating_stats', StarStats()), # returns a list of dicts
('star_vect',DictVectorizer(sparse=False)), # list of
dicts -> feature matrix
]))
],
# weight components in FeatureUnion
transformer_weights={
'review_bow': 0.8,
'body_stats': 0.5,
'star_stats': 1.0,
},
)),
# Use a NSGD classifier
('clf', SGDClassifier()),
])
parameters = {
'clf__alpha': (0.00001, 0.000001),
}
grid_search = GridSearchCV(pipeline, parameters, verbose=1, cv=3)
grid_search.fit(data_dcts, training_targets)
It works without the HashingVectorizer. What am I doing wrong? I can
include the entire code if you'd like.
- Adam
--
*Adam Goodkind *
adamgoodkind.com <http://www.adamgoodkind.com>
@adamgreatkind <https://twitter.com/#!/adamgreatkind>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general