I am experimenting with opennlp to categorize pages of documents.  These are 
typically scanned images of varying quality that we ocr.  The pages range from 
forms (semi-structured) to letters (unstructured).  I have searched for some 
real world examples using opennlp for document categorization and the searches 
have been limited to bigram/trigram feature generator.  
Could someone please provide some real world examples on how one would go about 
this?  Furthermore from my reading, there is no way to plug in feature 
extractors such as mutual information or chi-square.
Thanks
-- viraf

Reply via email to