Redacted to pass the overly aggressive spam filter.

On Mon, Jun 27, 2011 at 7:19 PM, Hector Yee <[email protected]> wrote:

> Just make the pattern a feature and feed it into the machine learning.
>
> e.g. if its a spam model and you notice v**gra  is a spam term just make
> feature 0 = "v**gra count" and the rest your regular bag of words.
>
> The only thing you have to be careful of is the relative weights between
> each feature category. Typical normalizations is to L2 norm each feature
> category separately before concatenation.
> Another option is to use a "scale free" classification algorithm like
> adaboost.
>
>
> On Mon, Jun 27, 2011 at 5:51 PM, Patrick Collins <
> [email protected]> wrote:
>
>> Has anyone got any advice on how to combine heuristics and classification?
>>
>> When preparing my data to build out the features to feed into my
>> classification model I keep noticing patterns of text which I know with
>> 99.99% probability implies a certain outcome.
>>
>> How would you construct the data/features in order to pre-classify this
>> data to provide much more likelihood that the classifier comes to the
>> "correct" conclusion?
>>
>> For example, I remember seeing an anti-spam machine which used a
>> combination of fuzzy logic and then classification to build a better outcome
>> (but he did not detail out how it was actually implemented). He used a whole
>> range of heuristics to determine that a certain sender is known to be a
>> spammer rather than just blindly passing this data in to the classifier.
>>
>> In my dataset I have a LOT of patterns like this that I can identify and
>> then determine with very high probability the outcome. I say high
>> probability, but I cannot say absolutely. Ideally if I could pre compute a
>> lot of this data using heuristics I could feed this information in to the
>> classifier to greatly reduce the number of features. But the classifiers do
>> not allow me the ability to provide a "weight" to a certain feature.
>>
>> Other than "well just try and see what works", I was wondering how do
>> people deal with this problem? Do they just leave it to the classifier and
>> hope that the classifier picks up the same patterns?
>>
>> I'm a bit new to mahout and classification algorithms and so am just
>> trying to get some input from how others might see this problem and whether
>> I'm barking up the wrong tree.
>>
>> Patrick.
>>
>
>
>
> --
> Yee Yang Li Hector
> http://hectorgon.blogspot.com/ (tech + travel)
> http://hectorgon.com (book reviews)
>
>


-- 
Yee Yang Li Hector
http://hectorgon.blogspot.com/ (tech + travel)
http://hectorgon.com (book reviews)

Reply via email to