RE: Dictionary and Assertion rules for the Aggregate Plaintext Processor-Question Answered

digital paula Wed, 20 Nov 2013 12:59:09 -0800

forgot to tag something to denote closure in the title :-)
 
From: cybersat...@hotmail.com
To: user@ctakes.apache.org
Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
Processor
Date: Wed, 20 Nov 2013 15:56:09 -0500

The information that you had provided is very helpful.  Thanks so much James!

From: masanz.ja...@mayo.edu
To: user@ctakes.apache.org
Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
Processor
Date: Wed, 20 Nov 2013 20:14:55 +0000

I don't have any links handy offhand.

cTAKES uses HSQL.  Mostly you can just reference HSQL documentation for 
learning HSQL, and point to the database in

resources\org\apache\ctakes\dictionary\lookup\umls2011ab

(which will be under whereever you unzipped the ctakes-resources you downloaded 
from sourceforge)

The only cTAKES-specific thing I can think of that you need to be aware of when 
updating cTAKES dictionaries is that you need to tokenize the (text part of 
the) dictionary entries the same way that cTAKES will tokenize the text you 
will process, so that
 when cTAKES compares the text it is processing to the dictionary entries, the 
tokens match up.

Tokens within the dictionary entries are separated by a space.

For example, should "left-handed" be all together without spaces, or should it 
be three separate tokens: left - handed

I think the tokenization issue is described in the old forum posts I mentioned 
earlier.

- James

From: user-return-352-Masanz.James=mayo....@ctakes.apache.org 
[user-return-352-Masanz.James=mayo....@ctakes.apache.org] on behalf of digital 
paula [cybersat...@hotmail.com]

Sent: Tuesday, November 19, 2013 11:22 PM

To: user@ctakes.apache.org

Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
Processor

Thanks so much James for helping me to get my tiny dictionary to work.  If you 
look up James in it....it states 'a great guy'  :-)

I really do appreciate your mentioning of the other options such as the lucene 
index (as well as links to this) which I don't have any familiarity with, yet.  
 Also, like what you stated on the better alternative on updating the SQL 
dictionaries.  However,
 I'm not familiar with that either, is there any reference links or details 
that you can provide on how to update SQL dictionaries?  

Thanks.

Regards,

Paula

From: masanz.ja...@mayo.edu

To: user@ctakes.apache.org

Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
Processor

Date: Tue, 19 Nov 2013 03:50:10 +0000

Hi Paula,

There is a separate DictionaryLookupAnnotatorCSV.xml descriptor for using the 
delimited file that you found.
You would have to update the aggregate to refer to 
DictionaryLookupAnnotatorCSV.xml instead of DictionaryLookupAnnotator.xml in 
order to use the
 delimited files directly.

Or, to create a lucene index to replace the lucene index used by 
DictionaryLookupAnnotator.xml, there are some posts on the old forum that talk
 about creating dictionaries.  You could take a look at those. 

https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=80&start=20#p1459

https://cabig-kc.nci.nih.gov/Vocab/forums/viewtopic.php?f=28&t=423&p=1465

I think a better alternative would be update the SQL dictionaries – delete all 
the data and replace with what you want. And remove the check for
 the UMLS user ID.

Hope that helps.

-- James

From: user-return-350-Masanz.James=mayo....@ctakes.apache.org 
[mailto:user-return-350-Masanz.James=mayo....@ctakes.apache.org]
On Behalf Of digital paula

Sent: Monday, November 18, 2013 8:48 PM

To: user@ctakes.apache.org

Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
Processor

Thanks James for the response. 

I'd like to update that tiny dictionary and see the changes take effect.   I 
looked in the dictionary-lookup folders and found a dictionary1.csv file with 
these entires, I  added 'elbow 'to it  (just typed in and saved file).  

ankle|ankle

aspirin|aspirin

cm|cm 

cm|cm is a synonym of a UMLS term Cutaneous Mastocytosis which is in SNOMED but 
cm is not by itself

hyperlipidemia|hyperlipidemia

knee|knee

knee|knee pain

ld|ld

ld|ld SNOMED procedure C0011911

medical|medical nutrition therapy

nutrition|nutrition

nutrition|nutrition therapy

pain|pain

pain|pain, chronic

weight|weight gain

elbow|elbow

 I typed in the following text in CVD using the plaintext processor that uses 
the DictionaryLookupAnnotator.xml descriptor.

"patient has knee pain and gained significant weight gain due to injury.  Past 
history of elbow pain.  family history includes hyperlipidemia. ."

There was no annotation for 'elbow' or 'weight' after executing the 
AggregatePlaintext AE processor.  Does something else have to be done to get it 
to annotate, basically what are the steps to add terms to the tiny dictionary?  

Thanks.

Regards,

Paula

> From:
masanz.ja...@mayo.edu

> To: user@ctakes.apache.org

> Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
> Processor

> Date: Mon, 18 Nov 2013 20:17:32 +0000

> 

> The "pain" and "aspirin" terms are from a tiny dictionary that has just a 
> handful of terms with made-up codes/CUIs. (For the record, they are in a 
> lucene index instead of an HSQL database.) I think "knee" is also in that 
> tiny dictionary.

> 

> As far as I know, the assertion component doesn't work any differently 
> whether you use the UMLS dictionary or the tiny, sample, made-up dictionary.

> 

> Which dictionary is used is determined by which Dictionary Lookup analysis 
> engine is included in the aggregate

> DictionaryLookupAnnotator.xml for the tiny one

> DictionaryLookupAnnotatorUMLS.xml for the one that has real UMLS terms and 
> CUIs

> 

> -- James

> 

> 

> From: 
user-return-348-Masanz.James=mayo....@ctakes.apache.org 
[mailto:user-return-348-Masanz.James=mayo....@ctakes.apache.org] On Behalf Of 
digital paula

> Sent: Monday, November 18, 2013 1:37 PM

> To: user@ctakes.apache.org

> Subject: RE: Dictionary and Assertion rules for the Aggregate Plaintext 
> Processor

> 

> Oh, gosh....I meant 'Hello' again.   

>  

> ________________________________________

> From: cybersat...@hotmail.com

> To: user@ctakes.apache.org

> Subject: Dictionary and Assertion rules for the Aggregate Plaintext Processor

> Date: Mon, 18 Nov 2013 14:36:13 -0500

> Hell again cTakes Community,

>  

> I think this will be an easy question.     Okay I've decided to start simple 
> by first exploring the Aggregate Plaintext Processor without ULMS.   Since 
> it's not using ULMS, where  is the dictionary and the assertion rules being 
> defined?  Can these be modified
 easily?   For example,  I see that 'pain' and 'aspirin' gets annotated in text 
or if text states 'family history'  this will be noted as well.   Can someone 
enlighten me as to how cTakes generates this and if it's easily customizable?  

>  

> Thanks.

>  

> Regards,

> Paula

RE: Dictionary and Assertion rules for the Aggregate Plaintext Processor-Question Answered

Reply via email to