I am experimenting with opennlp to categorize pages of documents. These are typically scanned images of varying quality that we ocr. The pages range from forms (semi-structured) to letters (unstructured). I have searched for some real world examples using opennlp for document categorization and the searches have been limited to bigram/trigram feature generator. Could someone please provide some real world examples on how one would go about this? Furthermore from my reading, there is no way to plug in feature extractors such as mutual information or chi-square. Thanks -- viraf