Re: RTF Annotator

Peter Klügl Tue, 03 Sep 2013 13:29:50 -0700

Hi,

what we are using is something like JODConverter or a simple bridge tomicrosoft word or open office in order to convert the document (rtf ordoc/docx) to html. Then, we apply the HTMLAnnotator and HTMLConverter ofUIMA Ruta in order to get plain text with annotations for the html tags.However, we do not have an (available) analysis engine for this completeprocess.


Best,

Peter

Am 01.09.2013 23:42, schrieb Dave Kincaid:

Before I embark on building an RTF annotator I thought I'd ask around a bit to
see if anyone had built such a thing. Most of the documents I have to handle
are in RTF format. I can pretty easily extract the text only using something
like Apache TIka, but there is important information in the formatting as well
(bold, italic, font sizes, centering, tables, etc) that I'd like to use. Is
anyone aware of a UIMA annotator that does this already?

Thanks,

Dave Kincaid

Re: RTF Annotator

Reply via email to