Guide on Training assertion module

2016-01-20 Thread Harish Kulkarni
Hi

Is there a guide on how to train the assertion module to get the correct
polarity etc.
I have some good data to train.
Currently simple negation is not recognized.

Thanks
Harish


Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-01-20 Thread Pei Chen
Hi,
Sorry I was swamped recently.
But yeah, we can even create an extended type system to store these items 
temporarily and add them into the main/core type system afterwards.
There was an existing item to upgrade UIMA, but agreed- it will require much 
more testing.  If it works, we can upgrade it in our sandbox area or create a 
branch if necessary.

—Pei

> On Jan 18, 2016, at 9:06 AM, Peter Klügl  wrote:
> 
> Hi,
> 
> a new patch is attached.
> 
> @Pei:
> are there suitable annotation types in the cTAKES type system? Some
> project in cTAKES uses something like OntologyMatch... I map it to
> IdentifiedAnnotation right now, but there are many empty features...
> 
> @Azad:
> I changed the rules a bit, especially the capitalization like I use it
> in ruta normally. The wordlist are compiled to a trie by the maven
> plugin. I also added the two regexes for url and email. I extended the
> regex for the url. I also changed the evaluation order of some rules
> (with @). Feel free to add simple examples to examples.csv for the unit
> tests.
> 
> Let me know if you need more information about the changes.
> 
> Do you wanna have help with the other rule sets? Or should we split them up?
> 
> Best,
> 
> Peter
> 
> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
>> Hi,
>> 
>> great. I will integrate them in the project and in the next patch.
>> 
>> Best,
>> 
>> Peter
>> 
>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>>> Three NERs translated and uploaded.
>>> 
>>> PS. I will validate all NERs once we have them all completed.
>>> 
>>> Cheers,
>>> Azad
>>> 
>>> On 24 November 2015 at 10:37, Azad Dehghan  wrote:
>>> 
 This is on my todo list for Dec. as well. If there are any more volunteers
 for translating JAPE to RUTA, please get in touch.
 
 Cheers,
 Azad
 
 On 24 Nov 2015 09:55, "Peter Klügl"  wrote:
> Hi,
> 
> I just wanted to mention that I haven't forgot about it. Unfortunately,
> there is just no spare time right now. I hope I will be able to provide
> the patches in December.
> 
> Best,
> 
> Peter
> 
> Am 06.11.2015 um 16:40 schrieb Pei Chen:
>> Hi Peter,
>> I think the ctakes-examples is probably a good starting point at least
>> in terms of maven modules, etc.  I think it would be good if we use
>> uimaFIT style as primary approach to wiring components together and
>> generate desc's as secondary...
>> I think the actual components that would be required is probably best
>> left up to what is actually required for best performing c-deid.  The
>> output would be interesting, I'm not sure if we should treat this as
>> an independent preprocessing component or part of a pipeline (in which
>> case, we may need to propose a change to the type system or perhaps an
>> alternative JCas view.  You can probably open up that discussion to
>> the dev group as you see fit.)
>> 
>> My 2 cents...
>> 
>> 
>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl 
 wrote:
>>> Hi,
>>> 
>>> Is there a cTAKES project that may serve as an example on how the
 cTAKES
>>> community develops or how a project should look like?
>>> I learned that different people set up UIMA project in a quite
 different
>>> manner and I do not what to get inspired by "some sort of out-dated"
>>> approach in the cTAKES repo.
>>> 
>>> Are there restriction or preferences about the preprocessing
 components
>>> that should be used and the kind of "output" of the project.
>>> Components: On which components may the componetns rely: tokenizer,
 ...
>>> parser, ... dict lookup?
>>> "output": Should the project provide a pipeline or a single AE?
>>> 
>>> More comments below.
>>> 
>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
> Who else plans to provide patches for it? Just to avoid duplicate
 work
> and to coordnate the efforts ...
> 
 I would like to help with the translating JAPE to RUTA.
>>> You can already go ahead with the UIMA Ruta Workbench if you want, or
>>> wait until I set up the project with ruta integration.
>>> 
>>> If any questions arise, just ask :-)
>>> 
> Is there a development dataset which was utilized for the initial
> development, and if yes, is it possible to contribute it too?
> 
 The data set is unfortunately not publicly available; i2b2
  typically releases the
 data
 sets 12 months after a given challenge; this is done on an
 individual basis
 and involve a Data Use Agreement.
 
 However, I will be able to conduct and coordinate the validation.
 
>>> Ok, I'll investigate if we have already access to the dataset here.

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-01-20 Thread Azad Dehghan
> I integrated your ruta scripts and added a new patch (includes and
> replaces my last one).
>
Ok.

> I noticed some semantic differences between the ruta rules and their
> jape originals, e.g., the brackets for the user name. Are they intended?
>
Brackets should be included. I was getting some inconsistent output so I
removed them for the time being.
> I needed to change some rule elements, e.g, "M.D." does not work as a
> literal rule element match (very old restriction of ruta which should be
> removed some day...). These literal string matches should be avoided at
> all if possible, or at least the start anchor should set to a different
> rule element.
>
OK.
> Ok, I'll let you know when I start with a rule set.

Great!

Azad
> Am 20.01.2016 um 01:47 schrieb Azad Dehghan:
> > Peter,
> >
> > So, we have Email, Url, Profession, Street, Zip, State and Username
> > completed so far.
> >
> > The following NERs remain:
> > Country, Age, Doctor, Fax, Id_num, Medicalrec_num, Patient, and Phone.
> >
> > I will do Country next. If you are able to translated the rest quickly
> > please do :) else just keep me posted which ones your are working on to
> > avoid duplicate work...and we can work through the remaining NERs.
> >
> > Also, once the NERs are translated I will prepare a number of examples
for
> > unit testing -- I will also be validate the NERs using the i2b2 research
> > dataset.
> >
> > Cheers,
> > Azad
> >
> > On 19 January 2016 at 09:01, Peter Klügl 
wrote:
> >
> >> Ok, let me know which ones I should translate.
> >>
> >> Best,
> >>
> >> Peter
> >>
> >> Am 18.01.2016 um 20:13 schrieb Azad Dehghan:
> >>> Peter,
> >>>
> >>> Thanks for pushing things!
> >>>
> >>> I would rather split the rules/NERs to get things moving quicker (as I
> >> am a
> >>> newbie to Ruta). I will be uploading another NER (Username) shortly. I
> >> will
> >>> look at your changes to follow suit.
> >>>
> >>> Best,
> >>> Azad
> >>>
> >>> On 18 January 2016 at 14:06, Peter Klügl 
> >> wrote:
>  Hi,
> 
>  a new patch is attached.
> 
>  @Pei:
>  are there suitable annotation types in the cTAKES type system? Some
>  project in cTAKES uses something like OntologyMatch... I map it to
>  IdentifiedAnnotation right now, but there are many empty features...
> 
>  @Azad:
>  I changed the rules a bit, especially the capitalization like I use
it
>  in ruta normally. The wordlist are compiled to a trie by the maven
>  plugin. I also added the two regexes for url and email. I extended
the
>  regex for the url. I also changed the evaluation order of some rules
>  (with @). Feel free to add simple examples to examples.csv for the
unit
>  tests.
> 
>  Let me know if you need more information about the changes.
> 
>  Do you wanna have help with the other rule sets? Or should we split
them
>  up?
> 
>  Best,
> 
>  Peter
> 
>  Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> > Hi,
> >
> > great. I will integrate them in the project and in the next patch.
> >
> > Best,
> >
> > Peter
> >
> > Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
> >> Three NERs translated and uploaded.
> >>
> >> PS. I will validate all NERs once we have them all completed.
> >>
> >> Cheers,
> >> Azad
> >>
> >> On 24 November 2015 at 10:37, Azad Dehghan 
>  wrote:
> >>> This is on my todo list for Dec. as well. If there are any more
>  volunteers
> >>> for translating JAPE to RUTA, please get in touch.
> >>>
> >>> Cheers,
> >>> Azad
> >>>
> >>> On 24 Nov 2015 09:55, "Peter Klügl" 
> >> wrote:
>  Hi,
> 
>  I just wanted to mention that I haven't forgot about it.
>  Unfortunately,
>  there is just no spare time right now. I hope I will be able to
>  provide
>  the patches in December.
> 
>  Best,
> 
>  Peter
> 
>  Am 06.11.2015 um 16:40 schrieb Pei Chen:
> > Hi Peter,
> > I think the ctakes-examples is probably a good starting point at
>  least
> > in terms of maven modules, etc.  I think it would be good if we
use
> > uimaFIT style as primary approach to wiring components together
and
> > generate desc's as secondary...
> > I think the actual components that would be required is probably
> >> best
> > left up to what is actually required for best performing c-deid.
> >> The
> > output would be interesting, I'm not sure if we should treat
this
> >> as
> > an independent preprocessing component or part of a pipeline (in
>  which
> > case, we may need to propose a change to the type system or
perhaps
>  an
> > alternative JCas view.  You can probably open up that
discussion to
> 

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

2016-01-20 Thread Peter Klügl
Hi,

I integrated your ruta scripts and added a new patch (includes and
replaces my last one).

I noticed some semantic differences between the ruta rules and their
jape originals, e.g., the brackets for the user name. Are they intended?

I needed to change some rule elements, e.g, "M.D." does not work as a
literal rule element match (very old restriction of ruta which should be
removed some day...). These literal string matches should be avoided at
all if possible, or at least the start anchor should set to a different
rule element.

Ok, I'll let you know when I start with a rule set.

Best,

Peter

Am 20.01.2016 um 01:47 schrieb Azad Dehghan:
> Peter,
>
> So, we have Email, Url, Profession, Street, Zip, State and Username
> completed so far.
>
> The following NERs remain:
> Country, Age, Doctor, Fax, Id_num, Medicalrec_num, Patient, and Phone.
>
> I will do Country next. If you are able to translated the rest quickly
> please do :) else just keep me posted which ones your are working on to
> avoid duplicate work...and we can work through the remaining NERs.
>
> Also, once the NERs are translated I will prepare a number of examples for
> unit testing -- I will also be validate the NERs using the i2b2 research
> dataset.
>
> Cheers,
> Azad
>
> On 19 January 2016 at 09:01, Peter Klügl  wrote:
>
>> Ok, let me know which ones I should translate.
>>
>> Best,
>>
>> Peter
>>
>> Am 18.01.2016 um 20:13 schrieb Azad Dehghan:
>>> Peter,
>>>
>>> Thanks for pushing things!
>>>
>>> I would rather split the rules/NERs to get things moving quicker (as I
>> am a
>>> newbie to Ruta). I will be uploading another NER (Username) shortly. I
>> will
>>> look at your changes to follow suit.
>>>
>>> Best,
>>> Azad
>>>
>>> On 18 January 2016 at 14:06, Peter Klügl 
>> wrote:
 Hi,

 a new patch is attached.

 @Pei:
 are there suitable annotation types in the cTAKES type system? Some
 project in cTAKES uses something like OntologyMatch... I map it to
 IdentifiedAnnotation right now, but there are many empty features...

 @Azad:
 I changed the rules a bit, especially the capitalization like I use it
 in ruta normally. The wordlist are compiled to a trie by the maven
 plugin. I also added the two regexes for url and email. I extended the
 regex for the url. I also changed the evaluation order of some rules
 (with @). Feel free to add simple examples to examples.csv for the unit
 tests.

 Let me know if you need more information about the changes.

 Do you wanna have help with the other rule sets? Or should we split them
 up?

 Best,

 Peter

 Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> Hi,
>
> great. I will integrate them in the project and in the next patch.
>
> Best,
>
> Peter
>
> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
>> Three NERs translated and uploaded.
>>
>> PS. I will validate all NERs once we have them all completed.
>>
>> Cheers,
>> Azad
>>
>> On 24 November 2015 at 10:37, Azad Dehghan 
 wrote:
>>> This is on my todo list for Dec. as well. If there are any more
 volunteers
>>> for translating JAPE to RUTA, please get in touch.
>>>
>>> Cheers,
>>> Azad
>>>
>>> On 24 Nov 2015 09:55, "Peter Klügl" 
>> wrote:
 Hi,

 I just wanted to mention that I haven't forgot about it.
 Unfortunately,
 there is just no spare time right now. I hope I will be able to
 provide
 the patches in December.

 Best,

 Peter

 Am 06.11.2015 um 16:40 schrieb Pei Chen:
> Hi Peter,
> I think the ctakes-examples is probably a good starting point at
 least
> in terms of maven modules, etc.  I think it would be good if we use
> uimaFIT style as primary approach to wiring components together and
> generate desc's as secondary...
> I think the actual components that would be required is probably
>> best
> left up to what is actually required for best performing c-deid.
>> The
> output would be interesting, I'm not sure if we should treat this
>> as
> an independent preprocessing component or part of a pipeline (in
 which
> case, we may need to propose a change to the type system or perhaps
 an
> alternative JCas view.  You can probably open up that discussion to
> the dev group as you see fit.)
>
> My 2 cents...
>
>
> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
 peter.klu...@averbis.com>
>>> wrote:
>> Hi,
>>
>> Is there a cTAKES project that may serve as an example on how the
>>> cTAKES
>> community develops or how a project should look like?
>> I