How about using bigrams instead of single words?
On Oct 27, 2011 9:54 AM, "Loic Descotte" <[email protected]> wrote:
> Hello,
>
> I'm working on a classification problem. My datasets are basically text
> entries.
>
> To find the right class, I know that some words are very important. Is
> there a way to tell the classifier that this words should have a greater
> weight?
>
> Another very important thing is the position of this words and their
> distance to other important words.
>
> Example: I want to classifify black and white cars. I know that the
> words "car", "sedan" and "limo" are very important, and that their
> localisation in relation to "white" and "black" words is very important
> too.
>
> The sentence "white sedan with dark windows" sould be classified in
> white cars, not black cars even if the black word is here.
> The localisation of coulors ("black" is further than "sedan" in relation
> to"white") should help us a lot.
>
>
> Is there a way to express that with Mahout classifiers (I 'm currently
> testing with SGD) ?
> If yes, do you have any idea or example about how to do that?
>
> Thans a lot for your help
>
> Loic
>
>