Re: sentence detector newline behavior

2014-01-29 Thread Jörn Kottmann
On 01/27/2014 08:44 PM, Tim Miller wrote: That is a good point, and something I was wondering about. Having now looked at both the ctakes and opennlp code for the sentence splitter it seems like there is a lot of overlap. I would've thought it was just a matter of converting annotations into

RE: sentence detector newline behavior

2014-01-29 Thread Chen, Pei
@ctakes.apache.org Subject: Re: sentence detector newline behavior On 01/27/2014 08:44 PM, Tim Miller wrote: That is a good point, and something I was wondering about. Having now looked at both the ctakes and opennlp code for the sentence splitter it seems like there is a lot of overlap. I would've

Re: sentence detector newline behavior

2014-01-29 Thread Jörn Kottmann
On 01/27/2014 03:52 PM, Tim Miller wrote: OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into cTAKES? I am currently

Re: sentence detector newline behavior

2014-01-27 Thread Jörn Kottmann
On 01/26/2014 11:29 PM, Miller, Timothy wrote: Yes, this fixes the whitespace sentence issue but the evaluation issue remains. I believe the problem is in SentenceSampleStream, where in the following block the whitespace trim happens before the LF character is replaced with the \n character. So

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp, how quickly can we get a re-trained model into cTAKES? I heard from a researcher at AMIA who tried cTAKES and because

RE: sentence detector newline behavior

2014-01-27 Thread digital paula
From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate the performance I was getting before. Thanks a lot Jörn! Assuming this is in the next incremental release of opennlp

RE: sentence detector newline behavior

2014-01-27 Thread Masanz, James J.
. -- James -Original Message- From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, January 27, 2014 8:52 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate the performance I was getting

RE: sentence detector newline behavior

2014-01-27 Thread Masanz, James J.
to it were - the list of end of sentence candidate characters - and the handling of newlines -- James -Original Message- From: Tim Miller [mailto:timothy.mil...@childrens.harvard.edu] Sent: Monday, January 27, 2014 1:45 PM To: dev@ctakes.apache.org Subject: Re: sentence detector newline

Re: sentence detector newline behavior

2014-01-27 Thread vijay garla
with it and if anything I scratch my head and doubt my competence. ;-) Regards, Paula Date: Mon, 27 Jan 2014 09:52:00 -0500 From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able to replicate

Re: sentence detector newline behavior

2014-01-27 Thread Tim Miller
struggling a bit with it and if anything I scratch my head and doubt my competence. ;-) Regards, Paula Date: Mon, 27 Jan 2014 09:52:00 -0500 From: timothy.mil...@childrens.harvard.edu To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior OK, with the most recent version I am able

Re: sentence detector newline behavior

2014-01-26 Thread Jörn Kottmann
On 01/25/2014 10:03 PM, Miller, Timothy wrote: On 01/25/2014 12:24 PM, Jörn Kottmann wrote: The code which computes the spans tries to remove white space from it. Removing the white space from a whitespace only sentence is causing the exception your are seeing. Which response would you expect

Re: sentence detector newline behavior

2014-01-26 Thread Miller, Timothy
On 01/26/2014 09:59 AM, Jörn Kottmann wrote: The evaluation should ignore white spaces. I committed now my fix, it would be nice if you can test it. There might be still something wrong. In my test data I replaced all question marks with white spaces, and the result is slightly worse

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
, 2014 3:42 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending punctuation at the end of the line Gotcha. So simply using Lines would not suffice in those cases because it would run together sentences where

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
- removed in those last examples ... -Original Message- From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, January 22, 2014 3:42 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending

Re: sentence detector newline behavior

2014-01-25 Thread Jörn Kottmann
On 01/25/2014 01:33 PM, Miller, Timothy wrote: Thanks Joern, I'll try it. My understanding is I just need to give it my training data, with the special character I used replaced with the literal string LF and each line in the file is an example sentence. Yes, exactly. Just thinking about the

Re: sentence detector newline behavior

2014-01-25 Thread Jörn Kottmann
On 01/25/2014 03:03 PM, Miller, Timothy wrote: I'm running into one issue, it gets tripped up on sentences with line-ending spaces. I could easily remove them with a script but by default they are in there. It happens when a sentence example ends: ...BILAT HEMATOMAS. LF (There is a period,

Re: sentence detector newline behavior

2014-01-25 Thread Miller, Timothy
On 01/25/2014 12:24 PM, Jörn Kottmann wrote: The code which computes the spans tries to remove white space from it. Removing the white space from a whitespace only sentence is causing the exception your are seeing. Which response would you expect from the sentence detector? Should a white

Re: sentence detector newline behavior

2014-01-24 Thread Jörn Kottmann
On 01/23/2014 10:06 PM, Tim Miller wrote: Just an FYI, a while back I did some of these annotations myself on MIMIC to get around this issue. I replaced the newline character with a special (non-English) character, then pre-processed ctakes input to replace newlines with that character, then

Re: sentence detector newline behavior

2014-01-23 Thread vijay garla
PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending punctuation at the end of the line Gotcha. So simply using Lines would not suffice in those cases because it would run together sentences where there are more

Re: sentence detector newline behavior

2014-01-23 Thread Karthik Sarma
Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending punctuation at the end of the line Gotcha. So simply using Lines would not suffice in those cases because it would run together sentences where there are more than one on a line

Re: sentence detector newline behavior

2014-01-23 Thread Tim Miller
, Sean [mailto:sean.fi...@childrens.harvard.edu] Sent: Wednesday, January 22, 2014 3:42 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Thanks James but then no typical sentence ending punctuation at the end of the line Gotcha. So simply using Lines would not suffice

RE: sentence detector newline behavior

2014-01-22 Thread Masanz, James J.
=mayo@ctakes.apache.org] On Behalf Of Jörn Kottmann Sent: Tuesday, January 21, 2014 4:29 AM To: dev@ctakes.apache.org Subject: Re: sentence detector newline behavior Yes, exactly, OPENNLP-602 is about training a sentence detector model which can use a new line as a end-of-sentence character

RE: sentence detector newline behavior

2014-01-22 Thread Masanz, James J.
...@childrens.harvard.edu] Sent: Wednesday, January 22, 2014 1:33 PM To: dev@ctakes.apache.org Subject: RE: sentence detector newline behavior Just whistling in the wind here ... Perhaps before any changes are made to universally toggle cTakes in one direction or the other, we can take a poll of when where cTakes

RE: sentence detector newline behavior

2014-01-22 Thread Finan, Sean
be done for the last bit where punctuation is missing. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Wednesday, January 22, 2014 3:07 PM To: 'dev@ctakes.apache.org' Subject: RE: sentence detector newline behavior I know there are notes where

Re: sentence detector newline behavior

2014-01-21 Thread Jörn Kottmann
Yes, exactly, OPENNLP-602 is about training a sentence detector model which can use a new line as a end-of-sentence character. In case you have certain rules to split sentences we should have a look at them. The Sentence Detector could be extended to support a user provided rule based

Re: sentence detector newline behavior

2014-01-20 Thread Jörn Kottmann
Hi all, currently I have quite a bit of time to work on OpenNLP, and would like to help you out with this issue. Here is the follow up issue for this change: https://issues.apache.org/jira/browse/OPENNLP-602 I am still trying to figure out what would be the best option to implement this. In

Re: sentence detector newline behavior

2013-05-23 Thread Tim Miller
OK I've started doing this, was able to get training working on a very small example, will try doing slightly bigger. Tim On 05/22/2013 08:03 AM, Jörn Kottmann wrote: On 05/22/2013 01:17 PM, Miller, Timothy wrote: That's awesome! It might be worth trying at least. How does the training