On Thu, Oct 27, 2011 at 2:08 AM, Sean Owen <[email protected]> wrote:

> How about using bigrams instead of single words?
>

Exactly what I was about to say.

The classifier worries about the weights.  The suggestion here is to
consider your features to be:

text: white sedan with dark windows
bigrams: white_sedan, sedan_with, with_dark, dark_windows


> On Oct 27, 2011 9:54 AM, "Loic Descotte" <[email protected]> wrote:
>
> > Hello,
> >
> > I'm working on a classification problem. My datasets are basically text
> > entries.
> >
> > To find the right class, I know that some words are very important. Is
> > there a way to tell the classifier that this words should have a greater
> > weight?
> >
> > Another very important thing is the position of this words and their
> > distance to other important words.
> >
> > Example: I want to classifify black and white cars. I know that the
> > words "car", "sedan" and "limo" are very important, and that their
> > localisation in relation to "white" and "black" words is very important
> > too.
> >
> > The sentence "white sedan with dark windows" sould be classified in
> > white cars, not black cars even if the black word is here.
> > The localisation of coulors ("black" is further than "sedan" in relation
> > to"white") should help us a lot.
> >
> >
> > Is there a way to express that with Mahout classifiers (I 'm currently
> > testing with SGD) ?
> > If yes, do you have any idea or example about how to do that?
> >
> > Thans a lot for your help
> >
> > Loic
> >
> >
>

Reply via email to