Hi,

A few quick Qs about classifiers in Mahout:

* Are any of the classifiers we have in Mahout MR-free?  That is, do all 
classifier implementations run exclusively on top of Hadoop?

* Do any classifiers offer the option of basing classification on linguistic 
rules?

Concretely, I'm wondering if there is anything in Mahout that would let me 
classify very short text comments (300 bytes of avg).

Here is an example from another system that uses linguistic rules (of some kind 
-- I don't have the details):
There's a rule that classifies items as "Customer Wants a Callback" that 
identifies comments such as, "I want a manager to call me about my engine 
issue." It also finds comments such as "I want a refund."  It has dictionaries 
and rules to discover parts of speech indicating a callback is needed.

Another example is a rule that finds comments about Staff Speed.  It identifies 
comments that indicate that the staff was slow in the performance of their 
duties.


I think we have nothing that would do the above in Mahout, but I thought I'd 
ask.
Also, I am *guessing* the existing classifiers in Mahout would not do well with 
very short pieces of text?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/

Reply via email to