Re: Paragraph Chunking in cTAKES

Mattmann, Chris A (3980) Wed, 23 Sep 2015 11:13:37 -0700

+1 really interested in the reply to this :)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Lewis John Mcgibbney <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, September 23, 2015 at 11:07 AM
To: "[email protected]" <[email protected]>
Subject: Paragraph Chunking in cTAKES

>Hi Folks,
>
>
>I am looking for some feedback on accuracy of cTAKES annotations over
>input text if the input text is not properly formed paragraphs?
>
>Is this known to significantly affect annotation accuracy/performance?
>
>Does anyone have a 'golden' input example of where cTAKES works best for
>annotation accuracy and performance?
>
>
>My situation is as follows; right now I use Apache Tika to parse a
>multitude of document and I feed the parse result from those documents
>into cTAKES for annotation purposes. Sometimes Tika is not able to form
>paragraphs correctly as the paragraphs are
> split over a page.
>
>
>
>Another example is when footer information (such as page numbers, DOI's,
>Journal names, etc.) exists between pages.
>
>
>Thanks for any feedback.
>
>Lewis
>
>
>
>-- 
>Lewis
>
>
>
>
>
>
>
>

Re: Paragraph Chunking in cTAKES

Reply via email to