Hi Folks,

I am looking for some feedback on accuracy of cTAKES annotations over input
text if the input text is not properly formed paragraphs?
Is this known to significantly affect annotation accuracy/performance?
Does anyone have a 'golden' input example of where cTAKES works best for
annotation accuracy and performance?

My situation is as follows; right now I use Apache Tika to parse a
multitude of document and I feed the parse result from those documents into
cTAKES for annotation purposes. Sometimes Tika is not able to form
paragraphs correctly as the paragraphs are split over a page.

Another example is when footer information (such as page numbers, DOI's,
Journal names, etc.) exists between pages.

Thanks for any feedback.
Lewis

-- 
*Lewis*

Reply via email to