Thanks Ted and Hector. Makes sense.
On 6/27/11 9:55 PM, Ted Dunning wrote:
Yeah... what Hector says.
You can even make the output of preliminary classifiers be features for new
classifiers.
Or if you have two different target variables, you can make a model that
predicts one target be a feature in the model that predicts the other.
Feature extraction generally has more potential for performance improvements
than any algorithm changes.
On Mon, Jun 27, 2011 at 7:22 PM, Hector Yee<[email protected]> wrote:
Redacted to pass the overly aggressive spam filter.
On Mon, Jun 27, 2011 at 7:19 PM, Hector Yee<[email protected]> wrote:
Just make the pattern a feature and feed it into the machine learning.
e.g. if its a spam model and you notice v**gra is a spam term just make
feature 0 = "v**gra count" and the rest your regular bag of words.
The only thing you have to be careful of is the relative weights between
each feature category. Typical normalizations is to L2 norm each feature
category separately before concatenation.
Another option is to use a "scale free" classification algorithm like
adaboost.
On Mon, Jun 27, 2011 at 5:51 PM, Patrick Collins<
[email protected]> wrote:
Has anyone got any advice on how to combine heuristics and
classification?
When preparing my data to build out the features to feed into my
classification model I keep noticing patterns of text which I know with
99.99% probability implies a certain outcome.
How would you construct the data/features in order to pre-classify this
data to provide much more likelihood that the classifier comes to the
"correct" conclusion?
For example, I remember seeing an anti-spam machine which used a
combination of fuzzy logic and then classification to build a better
outcome
(but he did not detail out how it was actually implemented). He used a
whole
range of heuristics to determine that a certain sender is known to be a
spammer rather than just blindly passing this data in to the classifier.
In my dataset I have a LOT of patterns like this that I can identify and
then determine with very high probability the outcome. I say high
probability, but I cannot say absolutely. Ideally if I could pre compute
a
lot of this data using heuristics I could feed this information in to
the
classifier to greatly reduce the number of features. But the classifiers
do
not allow me the ability to provide a "weight" to a certain feature.
Other than "well just try and see what works", I was wondering how do
people deal with this problem? Do they just leave it to the classifier
and
hope that the classifier picks up the same patterns?
I'm a bit new to mahout and classification algorithms and so am just
trying to get some input from how others might see this problem and
whether
I'm barking up the wrong tree.
Patrick.
--
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)
--
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)