Re: question about sentence segmentation

2014-08-04 Thread Miller, Timothy
Very pleased to see so many people offer suggestions! Comparing some of
these different methods might make an interesting student project.

Sean:
 Just an fyi.  Does that make sense?  Haven't had my coffee ...
Makes perfect sense, the downside is it requires some kind of higher
level understanding during sentence segmentation to understand what the
hierarchy is. You could imagine something that looks similar but with a
different logical structure. Long term, some big joint model that does
all things simultaneously is definitely something I'm interested in.

Steve:
 Seems like rather than specifying a set of candidate characters, we
 want to specify a candidate boundary regular expression.
This might be something that would be possible with minimal changes to
the model.


John:
  why not just split sentences with regex's off a small list of defined onc 
 physical exam terms?
My preference for vanilla ctakes is always to do basic linguistic things
like tokenization and sentence segmentation without reference to
context-specific rules, just because it makes them less portable.
Obviously for specific use cases or applications (like what Britt is
probably doing) you will use whatever information makes sense for your
domain. But I think we could get maybe 75% of the remaining cases (which
are probably only 5% of the total # of cases) by using smarter boundary
conditions like Steve suggested.

Thanks again,
Tim


On 08/02/2014 01:26 PM, John Green wrote:
 I was thinking the same thing as Steve. Thats a pretty regular onc physical 
 exam, why not just split sentences with regex's off a small list of defined 
 onc physical exam terms? The interesting case would be breast, as this term 
 may appear in the body of a sentence (rather than just a term), but u could 
 use a regex sub match where u conditionally match breast first then one or 
 more key physical findings to correctly identify THAT breast word token as 
 the term, eg beginning of the sentence. I would recommend red flag physical 
 findings as they are more likely to always been in the body of the sentence, 
 for example, Breast: no lumps or masses palpable.


 I have a few other ideas if thats barking up the right tree.




 JG
 —
 Sent from Mailbox for iPhone

 On Sat, Aug 2, 2014 at 8:58 AM, Steven Bethard steven.beth...@gmail.com
 wrote:

 On Sat, Aug 2, 2014 at 7:43 AM, Miller, Timothy
 timothy.mil...@childrens.harvard.edu wrote:
 PE: Lymphnodes: neck and axilla without adenopathy Lungs: normal and clear 
 to auscultation CV: regular rate and rhythm without murmur or gallop , S1, 
 S2 normal, no murmur, click, rub or gal*, chest is clear without rales or 
 wheezing, no pedal edema, no JVD, no hepatosplenomegaly Breast: negative 
 findings right/left breast with mild swelling, warmth, mild erythema, 
 slightly tender, no seroma or hematoma Abdomen: Abdomen soft, non-tender.

 It would be preferable to me to put sentence breaks in between the 
 sections, so the first two sentences would be:

 1) PE: Lymphonodes...
 2) Lungs: normal...
 [snip]
 Another example that breaks our model in a different way (truncated):
 1. Baseline labwork including tumor markers  2. Start DD AC on Friday 8/1 
 with RN chemo teach  3. S U parent study
 [snip]
 Here it would be preferable to get:
 1.
 Baseline labwork...
 2.
 Start DD...
 3.
 S U parent study
 Seems like rather than specifying a set of candidate characters, we
 want to specify a candidate boundary regular expression. Something
 like, \p{P}|\b\p{Lu}|\b\p{N}, should cover all of the above cases:
 sentence boundaries may appear at punctuation marks, at uppercase
 letters after word boundaries, and at numbers after a word boundaries.
 Steve



RE: LabMentions

2014-08-04 Thread Masanz, James J.

As far as I know, there isn't an annotator yet for creating LabMention 
annotations.  We would welcome a contribution.

- James Masanz

-Original Message-
From: Harpreet Khanduja [mailto:hsk5...@rit.edu] 
Sent: Friday, August 01, 2014 11:27 AM
To: dev@ctakes.apache.org
Subject: LabMentions

Hello,

 Is there a way to include the annotation LabMentions in the pipeline?

Thank you for your help.

Regards,
Harpreet


Re: LabMentions

2014-08-04 Thread Harpreet Khanduja
Thank you so much for letting me know.
I will try my best to come up with it.

Regards,
Harpreet


On Mon, Aug 4, 2014 at 4:42 PM, Masanz, James J. masanz.ja...@mayo.edu
wrote:


 As far as I know, there isn't an annotator yet for creating LabMention
 annotations.  We would welcome a contribution.

 - James Masanz

 -Original Message-
 From: Harpreet Khanduja [mailto:hsk5...@rit.edu]
 Sent: Friday, August 01, 2014 11:27 AM
 To: dev@ctakes.apache.org
 Subject: LabMentions

 Hello,

  Is there a way to include the annotation LabMentions in the pipeline?

 Thank you for your help.

 Regards,
 Harpreet