One follow-up. Is the Constituency parser needed for good results with the assertion modules (History, Generic, Uncertainty, etc.)?
From: Miller, Timothy [mailto:[email protected]] Sent: Monday, March 07, 2016 11:01 AM To: [email protected] Subject: Re: MaxentParserWrapper Hi Brandon, I wrote the constituency parser module. It is basically a wrapper for the OpenNLP constituency parser. The only thing our module does is convert from our TypeSystem into tokens for the parser, run the parser, then convert the output back into our typesystem. As far as slowness, it is known that there are issues with extremely long sentences (I believe the algorithm is n^3 on the input so this makes sense). But we have found (Sean Finan pointed this out) that the problem is often coming from upstream, with misclassified strings of punctuation used as section delimiters being tokenized/segmented as super long sentences. I believe he implemented some workarounds in some of our pipelines to recognize "non-real" sentences and have the parser skip them, but I don't know off the top of my head where that is and whether or not it's checked in. Maybe Sean can chime in with more info if that sounds familiar. Tim On 03/07/2016 09:06 AM, Geise, Brandon D. wrote: Hi, Can someone point me in the direction of where I can dig deeper into the MaxentParserWrapper? I'm seeing some long slowness once I get to this point in the pipeline and would like to understand what's going on a little better. Thanks, Brandon ________________________________ IMPORTANT WARNING: The information in this message (and the documents attached to it, if any) is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken, or omitted to be taken, in reliance on it is prohibited and may be unlawful. If you have received this message in error, please delete all electronic copies of this message (and the documents attached to it, if any), destroy any hard copies you may have created and notify me immediately by replying to this email. Thank you. Geisinger Health System utilizes an encryption process to safeguard Protected Health Information and other confidential data contained in external e-mail messages. If email is encrypted, the recipient will receive an e-mail instructing them to sign on to the Geisinger Health System Secure E-mail Message Center to retrieve the encrypted e-mail.
