On Thu, Oct 27, 2011 at 2:08 AM, Sean Owen <[email protected]> wrote: > How about using bigrams instead of single words? >
Exactly what I was about to say. The classifier worries about the weights. The suggestion here is to consider your features to be: text: white sedan with dark windows bigrams: white_sedan, sedan_with, with_dark, dark_windows > On Oct 27, 2011 9:54 AM, "Loic Descotte" <[email protected]> wrote: > > > Hello, > > > > I'm working on a classification problem. My datasets are basically text > > entries. > > > > To find the right class, I know that some words are very important. Is > > there a way to tell the classifier that this words should have a greater > > weight? > > > > Another very important thing is the position of this words and their > > distance to other important words. > > > > Example: I want to classifify black and white cars. I know that the > > words "car", "sedan" and "limo" are very important, and that their > > localisation in relation to "white" and "black" words is very important > > too. > > > > The sentence "white sedan with dark windows" sould be classified in > > white cars, not black cars even if the black word is here. > > The localisation of coulors ("black" is further than "sedan" in relation > > to"white") should help us a lot. > > > > > > Is there a way to express that with Mahout classifiers (I 'm currently > > testing with SGD) ? > > If yes, do you have any idea or example about how to do that? > > > > Thans a lot for your help > > > > Loic > > > > >
