ctakes with icd10

2015-12-08 Thread Alaa al Barari
Hi,

I downloaded Latest umls version, and I want to know how to make ctakes
work with icd10 and icd9.


Thanks


RE: ctakes with icd10

2015-12-08 Thread Savova, Guergana
Hi Alaa,
You need to create a resource off the terminology/ontology you want to use (in 
this case ICD9 or ICD10). Then run that resource with cTAKES for the fast 
dictionary lookup. There is cTAKES code and some documentation on how to create 
that resource. By default, cTAKES runs with a resource created from the English 
version of SNOMED CT and RxNORM.
Hope this helps.
--Guergana

-Original Message-
From: Alaa al Barari [mailto:alaa.albar...@gmail.com] 
Sent: Tuesday, December 8, 2015 10:01 AM
To: dev@ctakes.apache.org
Subject: ctakes with icd10

Hi,

I downloaded Latest umls version, and I want to know how to make ctakes work 
with icd10 and icd9.


Thanks


Re: ctakes with icd10

2015-12-08 Thread David Kincaid
This seems like a pretty common request and with such an old version of
UMLS database shipped with cTAKES it's only going to get worse. I've been
wanting to build a dictionary using the latest UMLS release (as well as a
custom database), so would be happy to write up the steps as I go through
it. That assumes that I can dig up the instructions in the dev list.

- Dave

On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> The -shortest- answer is that you'll need to run the dictionary creation
> tool.  There are instructions in older devlist threads.  By default the
> dictionary creation tool does add icd9 and icd10 tables to the dictionary.
> The problem is that in Umls 2011AB those codes weren't very well
> populated.  The 2015AB icd# set is much more rich so those tables should be
> pretty good.  Then in ctakes you would look up annotations by icd9 or icd10
> codes instead of by cui:
> OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code );
> OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>
> Sean
>
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 12:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Hi Alaa,
> You need to create a resource off the terminology/ontology you want to use
> (in this case ICD9 or ICD10). Then run that resource with cTAKES for the
> fast dictionary lookup. There is cTAKES code and some documentation on how
> to create that resource. By default, cTAKES runs with a resource created
> from the English version of SNOMED CT and RxNORM.
> Hope this helps.
> --Guergana
>
> -Original Message-
> From: Alaa al Barari [mailto:alaa.albar...@gmail.com]
> Sent: Tuesday, December 8, 2015 10:01 AM
> To: dev@ctakes.apache.org
> Subject: ctakes with icd10
>
> Hi,
>
> I downloaded Latest umls version, and I want to know how to make ctakes
> work with icd10 and icd9.
>
>
> Thanks
>


RE: ctakes with icd10

2015-12-08 Thread Geise, Brandon D.
Not to perpetuate the instructions again but I sent these out not long ago when 
I was going through the process and Sean was helping me.

1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
"SNOMEDCT_US"
2. Copy ctakesumls.properties and ctakesumls.script from memdbtemplate 
to location to put new UMLS DB
3. Run DictionaryCreator2
java -cp dictionarytool.jar;lib/* 
org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls "\pathToUmls\META" 
-atui ./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015 
-tbl CUI_TERMS
4. Run CodeMapCreator
java -cp dictionarytool.jar;lib/* 
org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META" -atui 
./data/tiny/CtakesAnatTuis.txt -db jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl 
CUI_TERMS
5. Copy new DB files to new location and create a copy of 
cTakesHsql.xml and update dictionary location

Thanks,
Brandon

-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com] 
Sent: Tuesday, December 08, 2015 12:47 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10

This seems like a pretty common request and with such an old version of UMLS 
database shipped with cTAKES it's only going to get worse. I've been wanting to 
build a dictionary using the latest UMLS release (as well as a custom 
database), so would be happy to write up the steps as I go through it. That 
assumes that I can dig up the instructions in the dev list.

- Dave

On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> The -shortest- answer is that you'll need to run the dictionary 
> creation tool.  There are instructions in older devlist threads.  By 
> default the dictionary creation tool does add icd9 and icd10 tables to the 
> dictionary.
> The problem is that in Umls 2011AB those codes weren't very well 
> populated.  The 2015AB icd# set is much more rich so those tables 
> should be pretty good.  Then in ctakes you would look up annotations 
> by icd9 or icd10 codes instead of by cui:
> OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code 
> ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>
> Sean
>
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 12:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Hi Alaa,
> You need to create a resource off the terminology/ontology you want to 
> use (in this case ICD9 or ICD10). Then run that resource with cTAKES 
> for the fast dictionary lookup. There is cTAKES code and some 
> documentation on how to create that resource. By default, cTAKES runs 
> with a resource created from the English version of SNOMED CT and RxNORM.
> Hope this helps.
> --Guergana
>
> -Original Message-
> From: Alaa al Barari [mailto:alaa.albar...@gmail.com]
> Sent: Tuesday, December 8, 2015 10:01 AM
> To: dev@ctakes.apache.org
> Subject: ctakes with icd10
>
> Hi,
>
> I downloaded Latest umls version, and I want to know how to make 
> ctakes work with icd10 and icd9.
>
>
> Thanks
>


IMPORTANT WARNING: The information in this message (and the documents attached 
to it, if any) is confidential and may be legally privileged. It is intended 
solely for the addressee. Access to this message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken, or omitted to be taken, in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please delete all electronic copies of this message (and the documents attached 
to it, if any), destroy any hard copies you may have created and notify me 
immediately by replying to this email. Thank you.

Geisinger Health System utilizes an encryption process to safeguard Protected 
Health Information and other confidential data contained in external e-mail 
messages. If email is encrypted, the recipient will receive an e-mail 
instructing them to sign on to the Geisinger Health System Secure E-mail 
Message Center to retrieve the encrypted e-mail.

RE: ctakes with icd10

2015-12-08 Thread Finan, Sean
Yeah, I'm actually building a 2015AB version this very moment.  I am going to 
do a little testing and then I'll check it into sourceforge.  When I'm done 
I'll email the link.

Sean

-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com] 
Sent: Tuesday, December 08, 2015 12:47 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10

This seems like a pretty common request and with such an old version of UMLS 
database shipped with cTAKES it's only going to get worse. I've been wanting to 
build a dictionary using the latest UMLS release (as well as a custom 
database), so would be happy to write up the steps as I go through it. That 
assumes that I can dig up the instructions in the dev list.

- Dave

On Tue, Dec 8, 2015 at 11:36 AM, Finan, Sean < 
sean.fi...@childrens.harvard.edu> wrote:

> Hi Alaa,
>
> The -shortest- answer is that you'll need to run the dictionary 
> creation tool.  There are instructions in older devlist threads.  By 
> default the dictionary creation tool does add icd9 and icd10 tables to the 
> dictionary.
> The problem is that in Umls 2011AB those codes weren't very well 
> populated.  The 2015AB icd# set is much more rich so those tables 
> should be pretty good.  Then in ctakes you would look up annotations 
> by icd9 or icd10 codes instead of by cui:
> OntologyConceptUtil.getAnnotationsByCode( jcas, lookupWindow, icd#Code 
> ); OntologyConceptUtil.getAnnotationsByCode( jcas, icd#Code );
>
> Sean
>
> -Original Message-
> From: Savova, Guergana [mailto:guergana.sav...@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 12:17 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Hi Alaa,
> You need to create a resource off the terminology/ontology you want to 
> use (in this case ICD9 or ICD10). Then run that resource with cTAKES 
> for the fast dictionary lookup. There is cTAKES code and some 
> documentation on how to create that resource. By default, cTAKES runs 
> with a resource created from the English version of SNOMED CT and RxNORM.
> Hope this helps.
> --Guergana
>
> -Original Message-
> From: Alaa al Barari [mailto:alaa.albar...@gmail.com]
> Sent: Tuesday, December 8, 2015 10:01 AM
> To: dev@ctakes.apache.org
> Subject: ctakes with icd10
>
> Hi,
>
> I downloaded Latest umls version, and I want to know how to make 
> ctakes work with icd10 and icd9.
>
>
> Thanks
>


Re: Need help to identify procedures in xml file using AggregatePlaintextFastUMLSProcessor

2015-12-08 Thread Pei Chen
Hi Reena,
If you search for "ProcedureMention" in the attached output xml, you
should be able to find the Procedures (plus the FSArray of the
associated Concepts) that were extracted...
Or am I missing something...
--Pei

On Mon, Dec 7, 2015 at 12:40 AM, Reena Duggal
 wrote:
> Sorry, I attached the wrong file in last mail. PFA the correct xml file.
>
> Thanks & Regards
> Reena Duggal
> Research Scholar(Full-Time)
> Amity Institute of Information Technology
> Amity University Uttar Pradesh
> M - 09740256313
> On 12/7/2015 10:41 AM, Reena Duggal wrote:
>
> Hello
> I have setup ctakes on my machine using cTAKES 3.2 User Install Guide. I
> created an xml file using CPE and using AggregatePlaintextFastUMLSProcessor.
> I am attaching it with email. Pl let me know how to parse this file to get
> list of procedures from it. I am not able to figure out that part. Also pl
> check, if this file is correct. Will really appreciate your help on this.
>
>
> Thanks & Regards
> Reena Duggal
> Research Scholar(Full-Time)
> Amity Institute of Information Technology
> Amity University Uttar Pradesh
> M - 09740256313
>
>


RE: ctakes blog

2015-12-08 Thread Finan, Sean
Fantastic idea

-Original Message-
From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] 
Sent: Tuesday, December 08, 2015 4:41 PM
To: dev@ctakes.apache.org
Subject: ctakes blog

The recent discussion over dictionary building, and someone pointing out it has 
come up several times, made me think that maybe we should use the blog space 
that apache provides. It could be used for write-ups of things that are not 
quite as formal as "documentation" but would benefit from being written down 
somewhere that is easier to search and link to than a mailing list.
Any thoughts?
Tim



RE: ctakes with icd10; 2015 versions available on sourceforge!

2015-12-08 Thread Finan, Sean
Hi Alaa,

I have a slightly updated version of the dictionary tool - only a couple of 
changes but I should check them in nonetheless after I've cleaned up a bit.  

I followed the process as emailed by Brandon Geise around 12:51 today.  My 
command parameters were:
[DictionaryCreator2]
 -umls C:\Spiffy\umls\data\external\2015AB\META
-db jdbc:hsqldb:file:C:/Spiffy/rword_dict/output/umls2015icd_hsql/ctakesicd2015
-tbl CUI_TERMS
-fd ./data/tiny
-src ./data/tiny/CtakesSources.txt
-atui ./data/tiny/CtakesAnatTuis.txt
-tui ./data/tiny/CtakesSnomedTuis.txt

And I added ICD9CM and ICD10PCS to CtakesSources.txt

[CodeMapCreator]
-umls C:\Spiffy\umls\data\external\2015AB\META
-db jdbc:hsqldb:file:C:/Spiffy/rword_dict/output/umls2015icd_hsql/ctakesicd2015
-tbl kludge
-fd ./data/tiny
-src ./data/tiny/CtakesSources.txt

Obviously C:/Spiffy/ is my root of all evil ;^)


The .script and .properties files constitute the hsql database - which holds 
the dictionary.  You can copy them into the ctakes root resources/ directory 
parallel to the existing ctakessnorx/ directory:
[CTAKES_ROOT]/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/

Then, in ctakes-dictionary-fast-res, edit your cTakesHsql.xml file:
[CTAKES_ROOT]/ctakes-dictionary-lookup-fast-res/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml

Change both entries of  
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx/ctakessnorx"/>
to  
value="jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/ctakesicd2015"/>


Then get rid of the lines that look like




And replace them with






To get icd# related stuff, it might be easiest to use 
org.apache.ctakes.core.util.OntologyConceptUtil
getCodes( IdentifiedAnnotation, [schemeName] ) will return for an annotation 
all of the codes for the schema name.
getCodes( IdentifiedAnnotations, [schemeName] ) as above, but for all given 
annotations
getCodes( JCas, [schemeName] ) as above, but for all annotations in cas
getCodes( JCas, [lookupWindow], [schemeName] ) as above, but for all 
annotations in the lookup window
getCodes( IdentifiedAnnotation ) is like that above but returns all codes for 
all schema.
getCodes( IdentifiedAnnotations ) as above, but for all given annotations
getCodes( JCas, [lookupWindow] ) as above, but for all annotations in the 
lookup window

getSchemeCodes( IdentifiedAnnotation ) will return a hashtable of all the codes 
related to the annotation.  The keys of the hashtable are the schema names 
(icd9cm, icd10pcs, etc.) and the values are lists of all the codes in the 
schema.
getSchemeCodes( IdentifiedAnnotations ) as above, but for all given annotations
getSchemeCodes( JCas ) will return a hashtable with all codes in the cas -  
useful if you are just looking for existence.

getAnnotationsByCode( JCas, [code] ) returns all annotations in the cas with 
the given code
getAnnotationsByCode( JCas, [lookupWindow], [code] ) as above, but in lookup 
window
getAnnotationsByCode( IdentifiedAnnotations, [code] ) as above, but in 
annotation collection

So, you could use something like:
getCodes( JCas, "ICD10PCS" ) to get all the icd10 codes found in the document.  
For codes of interest, use
getAnnnotationsByCode( JCas, [code] ) to get all the annotations in the 
document with that code.

I know that is a lot to go over at once, and I am skimming the surface a bit, 
but I hope that it helps.

Must run,
Sean



-Original Message-
From: Alaa al Barari [mailto:alaa.albar...@gmail.com] 
Sent: Tuesday, December 08, 2015 6:49 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thank you very very much Finan,

I am still very nooob to ctakes so please bare with me.

1- Could you please post detailed instructions on how you built the 
dictionaries ? or give as much as you can examples on the steps ?
2- what did you upload exactly, I only see a script and properties files what 
are those ? and what I need to change in c takes to make it work with them. 
like how to get icd10 codes ?

I am sorry for being noob, hope soon I will understand the whole thing and be 
effective.

Thanks in advance

On Wed, Dec 9, 2015 at 1:12 AM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
> >Step 1
> I built the tool to be able to build a dictionary using anything in 
> the umls - snomed, icd9, hpo, etc. so using the veterinary extension 
> shouldn't be a problem.  You just add it to the CtakesSources file (or 
> create an alternate file and point to it with -src).  To answer 
> another of your questions, there can be zero or more sources - you saw 
> snomedct and snomedct_us (each valid in a different umls version).
> It also can include any semantic type, just add (or remove) the 

RE: ctakes with icd10; 2015 versions available on sourceforge!

2015-12-08 Thread Finan, Sean
Hi Dave,

I'm always happy to see interest in our stuff!

>Step 1
I built the tool to be able to build a dictionary using anything in the umls - 
snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a 
problem.  You just add it to the CtakesSources file (or create an alternate 
file and point to it with -src).  To answer another of your questions, there 
can be zero or more sources - you saw snomedct and snomedct_us (each valid in a 
different umls version).  
It also can include any semantic type, just add (or remove) the appropriate 
tuis in a different data file.

>Step 2
You have it right - you copy the templates to another location and output to 
that location.  Otherwise you 'lose' your templates.

>Step 3 and 4
The jar is built from source.  I need to (soon) check in updates to the source, 
and at the same time I can check in a default prebuilt .jar  The lib/ directory 
is in the source repository.

Various people have toyed with the idea of putting the tool into a ctakes 
module, putting it into an "installation package", making a gui ...  The best 
option (imo) is probably to make an easy to use gui and keep a pre-built 
version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do 
that ...

Sean


-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com] 
Sent: Tuesday, December 08, 2015 4:57 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thanks, Sean! It's great that cTAKES may soon have an up to date database out 
of the box. Hopefully it will cut down on the need for many to build their own 
DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in 
veterinary medicine so I need to add in the veterinary extension for SNOMED-CT 
into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
"SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, 
second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate so 
others don't have to search for them. Not sure what it means to move them to 
"location to put new UMLS DB". Does that mean move them into a new directory 
where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't 
exist. Does one need to build that somehow from the source before running it? 
The command line also adds "lib/*" to the classpath. Is that the lib directory 
inside the dictionarytool source code or some other location?

What else would I need to do to include the SNOMED-CT Veterinary Extension 
along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do I'd 
be happy to write up an easy to follow tutorial for building a custom 
dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code 
itself? Like including it in the main cTAKES release? It seems there is demand 
for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB 
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> filtered by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to 
> the dictionary, as are all text variations from all sources that 
> contain that cui.  Both dictionaries also use the standard ctakes 
> semantic group tui filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakessnorx2015_=BQIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=SRqwsl3Fm
> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> oS1Gav7r2A=
>
> The snomed rxnorm icd9 icd10:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakesicd2015_=BQIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZMSd
> ioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=SRqwsl3FmuU
> Xq77GmVlfXn0lE0pVRkL53DNhukcaW6c=RZ--ZQ2qvGnhm4h2Vvz1oU97qA8BG2G39Tw
> w7EdYgKA=
>
> The svn root for the whole ugly thing 

Re: ctakes with icd10; 2015 versions available on sourceforge!

2015-12-08 Thread Pei Chen
Brandon,
That sounds great!
Please open a Jira ticket for any contributions (anyone should be able
to create a Jira account).  There are some legal items built into the
ASF Jira attachments for accepting contributions/donations.
It will also credit the contributors with the merit appropriately.
Anyone who is interested can follow the Jira item. (Even better if
contributions were open discussion/open development.)
--Pei

On Tue, Dec 8, 2015 at 10:36 PM, Geise, Brandon D.
 wrote:
> I'd be interested in contributing to making the dictionary tool more user 
> friendly with a GUI.
>
> Thanks,
> Brandon
>
> -Original Message-
> From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu]
> Sent: Tuesday, December 08, 2015 6:12 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!
>
> Hi Dave,
>
> I'm always happy to see interest in our stuff!
>
>>Step 1
> I built the tool to be able to build a dictionary using anything in the umls 
> - snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a 
> problem.  You just add it to the CtakesSources file (or create an alternate 
> file and point to it with -src).  To answer another of your questions, there 
> can be zero or more sources - you saw snomedct and snomedct_us (each valid in 
> a different umls version).
> It also can include any semantic type, just add (or remove) the appropriate 
> tuis in a different data file.
>
>>Step 2
> You have it right - you copy the templates to another location and output to 
> that location.  Otherwise you 'lose' your templates.
>
>>Step 3 and 4
> The jar is built from source.  I need to (soon) check in updates to the 
> source, and at the same time I can check in a default prebuilt .jar  The lib/ 
> directory is in the source repository.
>
> Various people have toyed with the idea of putting the tool into a ctakes 
> module, putting it into an "installation package", making a gui ...  The best 
> option (imo) is probably to make an easy to use gui and keep a pre-built 
> version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to 
> do that ...
>
> Sean
>
>
> -Original Message-
> From: David Kincaid [mailto:kincaid.d...@gmail.com]
> Sent: Tuesday, December 08, 2015 4:57 PM
> To: dev@ctakes.apache.org
> Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!
>
> Thanks, Sean! It's great that cTAKES may soon have an up to date database out 
> of the box. Hopefully it will cut down on the need for many to build their 
> own DB's. Thank you much for doing that.
>
> Unfortunately, I still will need to build a custom one for us. I work in 
> veterinary medicine so I need to add in the veterinary extension for 
> SNOMED-CT into the database.
>
> I looked over the steps below that Brandon included and have some questions:
>
> step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
> "SNOMEDCT_US". The file that I have has two lines in it. First line is 
> SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.
>
> step 2 should reference the two scripts as being in resource/memdbtemplate so 
> others don't have to search for them. Not sure what it means to move them to 
> "location to put new UMLS DB". Does that mean move them into a new directory 
> where the newly created UMLS DB will get written?
>
> steps 3 and 4 for running the tools reference dictionarytool.jar which 
> doesn't exist. Does one need to build that somehow from the source before 
> running it? The command line also adds "lib/*" to the classpath. Is that the 
> lib directory inside the dictionarytool source code or some other location?
>
> What else would I need to do to include the SNOMED-CT Veterinary Extension 
> along with the snomedct and rxnorm sources?
>
> I'll probably not have time to try this out for a while yet, but when I do 
> I'd be happy to write up an easy to follow tutorial for building a custom 
> dictionary assuming I am able to get it to work.
>
> Has anyone considered making this tool available outside of the source code 
> itself? Like including it in the main cTAKES release? It seems there is 
> demand for it.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < 
> sean.fi...@childrens.harvard.edu> wrote:
>
>> Hi Brandon, thanks for finding and forwarding the instructions!
>>
>> I have checked in two new hsqldb dictionaries, both from the 2015AB
>> version of the UMLS.  They both have codes for snomedct_us, rxnorm,
>> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>>
>> One uses cuis filtered by snomed and rxnorm, the other adds cuis
>> filtered by icd9 and icd10.
>> What this means:  Cuis that exist for a [filter source] are added to
>> the dictionary, as are all text variations from all sources that
>> contain that cui.  Both dictionaries also use the standard ctakes
>> semantic group tui filters.
>>
>> The names are ctakessnorx2015 

RE: ctakes with icd10; 2015 versions available on sourceforge!

2015-12-08 Thread Geise, Brandon D.
I'd be interested in contributing to making the dictionary tool more user 
friendly with a GUI.

Thanks,
Brandon

-Original Message-
From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] 
Sent: Tuesday, December 08, 2015 6:12 PM
To: dev@ctakes.apache.org
Subject: RE: ctakes with icd10; 2015 versions available on sourceforge!

Hi Dave,

I'm always happy to see interest in our stuff!

>Step 1
I built the tool to be able to build a dictionary using anything in the umls - 
snomed, icd9, hpo, etc. so using the veterinary extension shouldn't be a 
problem.  You just add it to the CtakesSources file (or create an alternate 
file and point to it with -src).  To answer another of your questions, there 
can be zero or more sources - you saw snomedct and snomedct_us (each valid in a 
different umls version).  
It also can include any semantic type, just add (or remove) the appropriate 
tuis in a different data file.

>Step 2
You have it right - you copy the templates to another location and output to 
that location.  Otherwise you 'lose' your templates.

>Step 3 and 4
The jar is built from source.  I need to (soon) check in updates to the source, 
and at the same time I can check in a default prebuilt .jar  The lib/ directory 
is in the source repository.

Various people have toyed with the idea of putting the tool into a ctakes 
module, putting it into an "installation package", making a gui ...  The best 
option (imo) is probably to make an easy to use gui and keep a pre-built 
version in sandbox.  Someday, after the rainbow, maybe I'll get a chance to do 
that ...

Sean


-Original Message-
From: David Kincaid [mailto:kincaid.d...@gmail.com]
Sent: Tuesday, December 08, 2015 4:57 PM
To: dev@ctakes.apache.org
Subject: Re: ctakes with icd10; 2015 versions available on sourceforge!

Thanks, Sean! It's great that cTAKES may soon have an up to date database out 
of the box. Hopefully it will cut down on the need for many to build their own 
DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in 
veterinary medicine so I need to add in the veterinary extension for SNOMED-CT 
into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to 
"SNOMEDCT_US". The file that I have has two lines in it. First line is SNOMED, 
second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate so 
others don't have to search for them. Not sure what it means to move them to 
"location to put new UMLS DB". Does that mean move them into a new directory 
where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which doesn't 
exist. Does one need to build that somehow from the source before running it? 
The command line also adds "lib/*" to the classpath. Is that the lib directory 
inside the dictionarytool source code or some other location?

What else would I need to do to include the SNOMED-CT Veterinary Extension 
along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do I'd 
be happy to write up an easy to follow tutorial for building a custom 
dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code 
itself? Like including it in the main cTAKES release? It seems there is demand 
for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB 
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, 
> icd9cm and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis 
> filtered by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to 
> the dictionary, as are all text variations from all sources that 
> contain that cui.  Both dictionaries also use the standard ctakes 
> semantic group tui filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> ctakesresources_code_HEAD_tree_trunk_ctakes-2Dresources-2Dsnomed-2Drwo
> rd-2Dhsqldb-2D2011ab_src_main_resources_org_apache_ctakes_dictionary_l
> ookup_fast_ctakessnorx2015_=BQIBaQ=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao=SRqwsl3Fm
> uUXq77GmVlfXn0lE0pVRkL53DNhukcaW6c=kWCcj3-hcqYWZXIPhsERggDLCO-5gppCR
> oS1Gav7r2A=
>
> The snomed rxnorm icd9 icd10:
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sourceforge.net_p_
> 

Re: ctakes blog

2015-12-08 Thread David Kincaid
That is a great idea! I'd be happy to help provide some content as someone
who has been trying to use cTAKES inside other projects. As I mentioned
previously I am working in veterinary medicine which has just enough
peculiarities to make cTAKES out of the box a challenge (although it is
much better now than it was a couple years ago, so kudos to those of you
who have been making it better). I also can't stand Eclipse, happily left
Subversion behind years ago and prefer to work in Clojure as much as
possible.

- Dave

On Tue, Dec 8, 2015 at 3:41 PM, Miller, Timothy <
timothy.mil...@childrens.harvard.edu> wrote:

> The recent discussion over dictionary building, and someone pointing out
> it has come up several times, made me think that maybe we should use the
> blog space that apache provides. It could be used for write-ups of
> things that are not quite as formal as "documentation" but would benefit
> from being written down somewhere that is easier to search and link to
> than a mailing list.
> Any thoughts?
> Tim
>
>


Re: ctakes blog

2015-12-08 Thread andy mcmurry
+1 I would be happy to write about the experience of creating the medgen
(medical genetics) database, especially useful for cancer and cardio
On Dec 8, 2015 2:04 PM, "David Kincaid"  wrote:

> That is a great idea! I'd be happy to help provide some content as someone
> who has been trying to use cTAKES inside other projects. As I mentioned
> previously I am working in veterinary medicine which has just enough
> peculiarities to make cTAKES out of the box a challenge (although it is
> much better now than it was a couple years ago, so kudos to those of you
> who have been making it better). I also can't stand Eclipse, happily left
> Subversion behind years ago and prefer to work in Clojure as much as
> possible.
>
> - Dave
>
> On Tue, Dec 8, 2015 at 3:41 PM, Miller, Timothy <
> timothy.mil...@childrens.harvard.edu> wrote:
>
> > The recent discussion over dictionary building, and someone pointing out
> > it has come up several times, made me think that maybe we should use the
> > blog space that apache provides. It could be used for write-ups of
> > things that are not quite as formal as "documentation" but would benefit
> > from being written down somewhere that is easier to search and link to
> > than a mailing list.
> > Any thoughts?
> > Tim
> >
> >
>


Re: ctakes blog

2015-12-08 Thread britt fitch
I like the blog approach.
I think it will help direct the creation of various guide-type documentation 
going forward.



Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
britt.fi...@wiredinformatics.com

> On Dec 8, 2015, at 4:41 PM, Miller, Timothy 
>  wrote:
> 
> The recent discussion over dictionary building, and someone pointing out
> it has come up several times, made me think that maybe we should use the
> blog space that apache provides. It could be used for write-ups of
> things that are not quite as formal as "documentation" but would benefit
> from being written down somewhere that is easier to search and link to
> than a mailing list.
> Any thoughts?
> Tim
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: ctakes with icd10; 2015 versions available on sourceforge!

2015-12-08 Thread David Kincaid
Thanks, Sean! It's great that cTAKES may soon have an up to date database
out of the box. Hopefully it will cut down on the need for many to build
their own DB's. Thank you much for doing that.

Unfortunately, I still will need to build a custom one for us. I work in
veterinary medicine so I need to add in the veterinary extension for
SNOMED-CT into the database.

I looked over the steps below that Brandon included and have some questions:

step 1 says to "Change /data/default/CtakesSources.txt from "SNOMEDCT" to
"SNOMEDCT_US". The file that I have has two lines in it. First line is
SNOMED, second line is SNOMEDCT_US. So this step doesn't really make sense.

step 2 should reference the two scripts as being in resource/memdbtemplate
so others don't have to search for them. Not sure what it means to move
them to "location to put new UMLS DB". Does that mean move them into a new
directory where the newly created UMLS DB will get written?

steps 3 and 4 for running the tools reference dictionarytool.jar which
doesn't exist. Does one need to build that somehow from the source before
running it? The command line also adds "lib/*" to the classpath. Is that
the lib directory inside the dictionarytool source code or some other
location?

What else would I need to do to include the SNOMED-CT Veterinary Extension
along with the snomedct and rxnorm sources?

I'll probably not have time to try this out for a while yet, but when I do
I'd be happy to write up an easy to follow tutorial for building a custom
dictionary assuming I am able to get it to work.

Has anyone considered making this tool available outside of the source code
itself? Like including it in the main cTAKES release? It seems there is
demand for it.

- Dave

On Tue, Dec 8, 2015 at 3:22 PM, Finan, Sean <
sean.fi...@childrens.harvard.edu> wrote:

> Hi Brandon, thanks for finding and forwarding the instructions!
>
> I have checked in two new hsqldb dictionaries, both from the 2015AB
> version of the UMLS.  They both have codes for snomedct_us, rxnorm, icd9cm
> and icd10pcs - as well as the usual cui, tui, preferred term mappings.
>
> One uses cuis filtered by snomed and rxnorm, the other adds cuis filtered
> by icd9 and icd10.
> What this means:  Cuis that exist for a [filter source] are added to the
> dictionary, as are all text variations from all sources that contain that
> cui.  Both dictionaries also use the standard ctakes semantic group tui
> filters.
>
> The names are ctakessnorx2015 and ctakesicd2015
>
> The snomed rxnorm :
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakessnorx2015/
>
> The snomed rxnorm icd9 icd10:
>
> http://sourceforge.net/p/ctakesresources/code/HEAD/tree/trunk/ctakes-resources-snomed-rword-hsqldb-2011ab/src/main/resources/org/apache/ctakes/dictionary/lookup/fast/ctakesicd2015/
>
> The svn root for the whole ugly thing is:
>  svn checkout svn://svn.code.sf.net/p/ctakesresources/code/trunk
>
> Stats:
> ctakessnorx2015
> 545,913 Terms
> 229,251 Concepts (Cuis)
> 272,987 Snomed codes
> 32,419 Rxnorm codes
> 11,321 icd9 codes
> 61 icd10 codes
>
> Ctakesicd2015
> 611,230 Terms
> 282,211 Concepts
> 18,626 icd9 codes
> 45,818 icd10 codes
> Snomed and Rxnorm counts are the same
>
> So, adding the icd filters gave us an extra ~53,000 concepts and ~65,000
> terms.
>
> I would like to move this all to a better root (not
> ctakes-resources-snomed-rword-hsqldb-2011ab) but I wasn't able to write
> directly in trunk (??) and need to get moving on to other things.
>
> There is help on the ctakes wiki:
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.2+-+Fast+Dictionary+Lookup
> Though I should probably add a few items ...
>
>
> Sean
>
>
> -Original Message-
> From: Geise, Brandon D. [mailto:bdge...@geisinger.edu]
> Sent: Tuesday, December 08, 2015 12:51 PM
> To: dev@ctakes.apache.org
> Subject: RE: ctakes with icd10
>
> Not to perpetuate the instructions again but I sent these out not long ago
> when I was going through the process and Sean was helping me.
>
> 1. Change /data/default/CtakesSources.txt from "SNOMEDCT" to
> "SNOMEDCT_US"
> 2. Copy ctakesumls.properties and ctakesumls.script from
> memdbtemplate to location to put new UMLS DB
> 3. Run DictionaryCreator2
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.DictionaryCreator2 -umls
> "\pathToUmls\META" -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> 4. Run CodeMapCreator
> java -cp dictionarytool.jar;lib/*
> org.apache.ctakes.dictionarytool.CodeMapCreator -umls "\pathToUmls\META"
> -atui ./data/tiny/CtakesAnatTuis.txt -db
> jdbc:hsqldb:file:pathTonewDB\snorx2015 -tbl CUI_TERMS
> 5. Copy new DB files to new location and create a copy of
> cTakesHsql.xml and update dictionary location
>
> Thanks,
> Brandon
>
>