Hi Folks, I am looking for some feedback on accuracy of cTAKES annotations over input text if the input text is not properly formed paragraphs? Is this known to significantly affect annotation accuracy/performance? Does anyone have a 'golden' input example of where cTAKES works best for annotation accuracy and performance?
My situation is as follows; right now I use Apache Tika to parse a multitude of document and I feed the parse result from those documents into cTAKES for annotation purposes. Sometimes Tika is not able to form paragraphs correctly as the paragraphs are split over a page. Another example is when footer information (such as page numbers, DOI's, Journal names, etc.) exists between pages. Thanks for any feedback. Lewis -- *Lewis*
