2011/12/14 Joel Schaerer <joel.schae...@gmail.com>:
> Moving to the point: the features I'm going to use are a mixture of text
> (link title, description, content I can scrape on the page the link
> points to) and other things (numerical number of "upvotes" and
> "downvotes", link author, domain the link points to, subreddit
> (subforum) name, etc.)
>
> If I mix all these features into a single classifier, I fear the very
> numerous features extracted from the text will drown the other
> (important!) features. Some people suggested running a separate
> classifier for the text, and then using the output of this classifier as
> a single feature.

Last summer, I spoke to some folks who combined text and SIFT[1]
features for classifying images on Flickr. They just concatenated the
feature vectors end-to-end and trained a single SVM on the result,
with pretty good performance. So this is not necessarily a bad idea.

The text features being numerous is not the problem. If none of them
turn out to be very discriminative, but some of your other features
are, then the text features should be largely ignored be a classifier
trained on a mix of features. So I'd first try this simple approach
before digging into classifiers combinations.

Be sure to ask around on http://metaoptimize.com/qa for more advice.

[1] https://en.wikipedia.org/wiki/Scale-invariant_feature_transform

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Cloud Computing - Latest Buzzword or a Glimpse of the Future?
This paper surveys cloud computing today: What are the benefits? 
Why are businesses embracing it? What are its payoffs and pitfalls?
http://www.accelacomm.com/jaw/sdnl/114/51425149/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to