Hello everyone, this is my first post to the list: I am new to mahout but have some background in machine learning. I am trying to understand if mahout can be useful for my use case, and I'll try to describe it to get some advices or insights from any of you.
Basically, I'd like to learn a classifier to apply labels to sentences of documents. I can already spot in the training documents (and even in the ones to classify) the sentences that should be classified: let's say every sentence that contains the string "red" should be read as training input and then labeled in testing. The thing is, the classification strategy the classifier learns should depend on a set of features that are not just "internal" to the sentence (like the contained words): the features should include the sentence position inside the document (e.g.: start, middle, end), some words of the enclosing section's title, and even some words contained inside the sentence. Is it possible with mahout (and some custom classes) have this flexibility and describe such types of features? The specific algorithm is not really important at this point, I am only concerned about what I described. Any type of pointers that could help me? Thanks! Matteo -- Matteo Moci http://it.linkedin.com/in/matteomoci http://about.me/matteomoci/bio
