Hi,

what we are using is something like JODConverter or a simple bridge to microsoft word or open office in order to convert the document (rtf or doc/docx) to html. Then, we apply the HTMLAnnotator and HTMLConverter of UIMA Ruta in order to get plain text with annotations for the html tags. However, we do not have an (available) analysis engine for this complete process.

Best,

Peter

Am 01.09.2013 23:42, schrieb Dave Kincaid:
Before I embark on building an RTF annotator I thought I'd ask around a bit to
see if anyone had built such a thing. Most of the documents I have to handle
are in RTF format. I can pretty easily extract the text only using something
like Apache TIka, but there is important information in the formatting as well
(bold, italic, font sizes, centering, tables, etc) that I'd like to use. Is
anyone aware of a UIMA annotator that does this already?

Thanks,

Dave Kincaid


Reply via email to