Re: Lucene for UMLS2014
Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
Thanks James. I was planning on closing the vote today. In the meantime, does anyone a quick way to clone/rename the wiki documentation for 3.2? --Pei -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Monday, July 21, 2014 4:25 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Here's the additional I've done I ran mvn test with 0 Failures and 0 Errors. Ran the AggregateTemplateFiller.xml and received same output (except for internal UIMA identifiers) with rc2 as I did with 3.1.1. +1 to release -Original Message- From: Masanz, James J. Sent: Wednesday, July 16, 2014 3:59 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) FYI, so far I have done the following steps: downloaded the source archive compiled it using: maven compile downloaded the separately available resources set up classpath to include e.g. jars (from the bin distribution) set ctakes.umlsuser and ctakes.umlspw env vars run runctakesCVD.bat loaded AggregatePlaintextUMLSProcessor.xml ran against some simple text. verified did not through an exception. verified some EventMention and EntityMention annotations were produced. I will do more testing tomorrow. Just giving a status update. --James -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, July 12, 2014 6:24 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Agreed on that. I downloaded the new resources binary and was able to run my tests on the - bin version of the RC. +1 for making this the release. Tim From: Masanz, James J. [masanz.ja...@mayo.edu] Sent: Friday, July 11, 2014 7:27 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I agree about keeping the thread open. -- James -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, July 11, 2014 4:28 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Updated the lvg.properties file within ctakes-resources on sourceforge [1]. Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE thread open. Also renamed it to 3.2.0 (even though they technically do not have to follow each other, but probably nice to keep it consistent for users as James suggested.) [1] http://sourceforge.net/projects/ctakesresources/files/ctakes-resources- 3.2.0.zip/download -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 5:53 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3 -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 2:12 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I think this is due to the fact that the default lvg.properties also exits in the ctakes-resources project, so if you download and replace, it will override the ctakes configured one. I think it's a bug, but probably always been there... I'll fix up ctakes-resources on sourceforge nethertheless but it shouldn't require any changes to the release candidates. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 11:59 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Hi Tim, When you say that it didn't seem to affect the run, where you comparing output to last release or just checking if data seemed OK at a glance? -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 7:29 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I was able to run the binary without issues this time. I also downloaded the resources from sourceforge and integrated into the bin release and ran with the ctakes dictionary. I did get some weird exceptions thrown that didn't seem to affect the run -- looks like some hardcoded file paths in LVG? (See below) Tim Exception: java.io.FileNotFoundException: /export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data (No such file or directory) ** Error: problem of opening/reading stop words file: '/export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data'. Exception: java.io.FileNotFoundException: /export/home/lu/Development/LVG/lvg2008/data/misc/nonInfoWords.data (No such file or directory) ** Error: problem of opening/reading non-Info words file:
RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
When I asked Troy that question for 3.1.1, he didn't know of a way, and I don't either, which is why I had the 3.1.1 page mostly just reference the 3.2 documentation. -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Tuesday, July 22, 2014 10:00 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Thanks James. I was planning on closing the vote today. In the meantime, does anyone a quick way to clone/rename the wiki documentation for 3.2? --Pei -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Monday, July 21, 2014 4:25 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Here's the additional I've done I ran mvn test with 0 Failures and 0 Errors. Ran the AggregateTemplateFiller.xml and received same output (except for internal UIMA identifiers) with rc2 as I did with 3.1.1. +1 to release -Original Message- From: Masanz, James J. Sent: Wednesday, July 16, 2014 3:59 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) FYI, so far I have done the following steps: downloaded the source archive compiled it using: maven compile downloaded the separately available resources set up classpath to include e.g. jars (from the bin distribution) set ctakes.umlsuser and ctakes.umlspw env vars run runctakesCVD.bat loaded AggregatePlaintextUMLSProcessor.xml ran against some simple text. verified did not through an exception. verified some EventMention and EntityMention annotations were produced. I will do more testing tomorrow. Just giving a status update. --James -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, July 12, 2014 6:24 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Agreed on that. I downloaded the new resources binary and was able to run my tests on the - bin version of the RC. +1 for making this the release. Tim From: Masanz, James J. [masanz.ja...@mayo.edu] Sent: Friday, July 11, 2014 7:27 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I agree about keeping the thread open. -- James -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, July 11, 2014 4:28 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Updated the lvg.properties file within ctakes-resources on sourceforge [1]. Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE thread open. Also renamed it to 3.2.0 (even though they technically do not have to follow each other, but probably nice to keep it consistent for users as James suggested.) [1] http://sourceforge.net/projects/ctakesresources/files/ctakes-resources- 3.2.0.zip/download -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 5:53 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3 -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 2:12 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I think this is due to the fact that the default lvg.properties also exits in the ctakes-resources project, so if you download and replace, it will override the ctakes configured one. I think it's a bug, but probably always been there... I'll fix up ctakes-resources on sourceforge nethertheless but it shouldn't require any changes to the release candidates. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 11:59 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Hi Tim, When you say that it didn't seem to affect the run, where you comparing output to last release or just checking if data seemed OK at a glance? -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 7:29 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I was able to run the binary without issues this time. I also downloaded the resources from sourceforge and integrated into the bin release and ran with the ctakes dictionary. I did get some weird exceptions thrown that didn't seem to affect the run -- looks like some hardcoded file paths in LVG? (See below) Tim Exception: java.io.FileNotFoundException: /export/home/lu/Development/LVG/lvg2008/data/misc/stopWords.data (No such file or directory) **
RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
One page at a time. At least there's that. Thanks Troy -Original Message- From: Masanz, James J. Sent: Tuesday, July 22, 2014 10:38 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) When I asked Troy that question for 3.1.1, he didn't know of a way, and I don't either, which is why I had the 3.1.1 page mostly just reference the 3.2 documentation. -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Tuesday, July 22, 2014 10:00 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Thanks James. I was planning on closing the vote today. In the meantime, does anyone a quick way to clone/rename the wiki documentation for 3.2? --Pei -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Monday, July 21, 2014 4:25 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Here's the additional I've done I ran mvn test with 0 Failures and 0 Errors. Ran the AggregateTemplateFiller.xml and received same output (except for internal UIMA identifiers) with rc2 as I did with 3.1.1. +1 to release -Original Message- From: Masanz, James J. Sent: Wednesday, July 16, 2014 3:59 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) FYI, so far I have done the following steps: downloaded the source archive compiled it using: maven compile downloaded the separately available resources set up classpath to include e.g. jars (from the bin distribution) set ctakes.umlsuser and ctakes.umlspw env vars run runctakesCVD.bat loaded AggregatePlaintextUMLSProcessor.xml ran against some simple text. verified did not through an exception. verified some EventMention and EntityMention annotations were produced. I will do more testing tomorrow. Just giving a status update. --James -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, July 12, 2014 6:24 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Agreed on that. I downloaded the new resources binary and was able to run my tests on the - bin version of the RC. +1 for making this the release. Tim From: Masanz, James J. [masanz.ja...@mayo.edu] Sent: Friday, July 11, 2014 7:27 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I agree about keeping the thread open. -- James -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, July 11, 2014 4:28 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Updated the lvg.properties file within ctakes-resources on sourceforge [1]. Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE thread open. Also renamed it to 3.2.0 (even though they technically do not have to follow each other, but probably nice to keep it consistent for users as James suggested.) [1] http://sourceforge.net/projects/ctakesresources/files/ctakes-resources - 3.2.0.zip/download -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 5:53 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3 -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 2:12 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I think this is due to the fact that the default lvg.properties also exits in the ctakes-resources project, so if you download and replace, it will override the ctakes configured one. I think it's a bug, but probably always been there... I'll fix up ctakes-resources on sourceforge nethertheless but it shouldn't require any changes to the release candidates. -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 11:59 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Hi Tim, When you say that it didn't seem to affect the run, where you comparing output to last release or just checking if data seemed OK at a glance? -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 7:29 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I was able to run the binary without issues this time. I also downloaded the resources from sourceforge and integrated into the bin release and ran with the ctakes dictionary. I did get some weird exceptions thrown that didn't seem
Re: Lucene for UMLS2014
Hello, I am using ctakes 3.1.1 in eclipse and I have added my customizations to the project, but now I want to update it to 3.2 so that I can use ctakes-dictionary-lookup-fast. Is there any way to update the whole ctakes project to 3.2 without my customizations getting removed? It would be a great help. Thank you, Harpreet On Tue, Jul 22, 2014 at 10:53 AM, Harpreet Khanduja hsk5...@g.rit.edu wrote: Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
RE: Lucene for UMLS2014
Did you download the source and import into eclipse, or did you check out 3.1.1 from SVN. If you checked it out from SVN, did you check it out from trunk, or from the tag for 3.1.1. -- James -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 12:49 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I am using ctakes 3.1.1 in eclipse and I have added my customizations to the project, but now I want to update it to 3.2 so that I can use ctakes-dictionary-lookup-fast. Is there any way to update the whole ctakes project to 3.2 without my customizations getting removed? It would be a great help. Thank you, Harpreet On Tue, Jul 22, 2014 at 10:53 AM, Harpreet Khanduja hsk5...@g.rit.edu wrote: Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
Re: Lucene for UMLS2014
Hello, I checked out 3.1.1 from trunk SVN. Thank you On Tue, Jul 22, 2014 at 2:29 PM, Masanz, James J. masanz.ja...@mayo.edu wrote: Did you download the source and import into eclipse, or did you check out 3.1.1 from SVN. If you checked it out from SVN, did you check it out from trunk, or from the tag for 3.1.1. -- James -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 12:49 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I am using ctakes 3.1.1 in eclipse and I have added my customizations to the project, but now I want to update it to 3.2 so that I can use ctakes-dictionary-lookup-fast. Is there any way to update the whole ctakes project to 3.2 without my customizations getting removed? It would be a great help. Thank you, Harpreet On Tue, Jul 22, 2014 at 10:53 AM, Harpreet Khanduja hsk5...@g.rit.edu wrote: Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
RE: Lucene for UMLS2014
I'm not an svn guru, but you can use Team-Update to get the latest of all the things you have not customized, plus SVN will tell you of the conflicts, and you can merge your customizations into the latest. I've done it when I haven't had many customizations to preserve. To get the new dictionary lookup (sub)project, you might have to do something to get it imported, such as going into the SVN repository exploring view and use Check out as Maven Project menu option on that (sub)project. -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 2:32 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I checked out 3.1.1 from trunk SVN. Thank you On Tue, Jul 22, 2014 at 2:29 PM, Masanz, James J. masanz.ja...@mayo.edu wrote: Did you download the source and import into eclipse, or did you check out 3.1.1 from SVN. If you checked it out from SVN, did you check it out from trunk, or from the tag for 3.1.1. -- James -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 12:49 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I am using ctakes 3.1.1 in eclipse and I have added my customizations to the project, but now I want to update it to 3.2 so that I can use ctakes-dictionary-lookup-fast. Is there any way to update the whole ctakes project to 3.2 without my customizations getting removed? It would be a great help. Thank you, Harpreet On Tue, Jul 22, 2014 at 10:53 AM, Harpreet Khanduja hsk5...@g.rit.edu wrote: Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
Re: Lucene for UMLS2014
I will try to do the same. Thank you, Harpreet On Tue, Jul 22, 2014 at 4:11 PM, Masanz, James J. masanz.ja...@mayo.edu wrote: I'm not an svn guru, but you can use Team-Update to get the latest of all the things you have not customized, plus SVN will tell you of the conflicts, and you can merge your customizations into the latest. I've done it when I haven't had many customizations to preserve. To get the new dictionary lookup (sub)project, you might have to do something to get it imported, such as going into the SVN repository exploring view and use Check out as Maven Project menu option on that (sub)project. -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 2:32 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I checked out 3.1.1 from trunk SVN. Thank you On Tue, Jul 22, 2014 at 2:29 PM, Masanz, James J. masanz.ja...@mayo.edu wrote: Did you download the source and import into eclipse, or did you check out 3.1.1 from SVN. If you checked it out from SVN, did you check it out from trunk, or from the tag for 3.1.1. -- James -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Tuesday, July 22, 2014 12:49 PM To: dev@ctakes.apache.org Subject: Re: Lucene for UMLS2014 Hello, I am using ctakes 3.1.1 in eclipse and I have added my customizations to the project, but now I want to update it to 3.2 so that I can use ctakes-dictionary-lookup-fast. Is there any way to update the whole ctakes project to 3.2 without my customizations getting removed? It would be a great help. Thank you, Harpreet On Tue, Jul 22, 2014 at 10:53 AM, Harpreet Khanduja hsk5...@g.rit.edu wrote: Thank you so much for your help. Harpreet. On Mon, Jul 21, 2014 at 6:28 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Hi Harpreet, If you are willing to use cTakes 3.2, try the dictionary-lookup-fast module as a replacement of the default dictionary-lookup. That module has a new dictionary resource (hsql, not lucene) and slightly different methods for lookup and matching. In time trials it has been faster than the default module (hence the name). Accuracy depends upon the parameter settings, but in the tests performed so far the results are comparable or better. The new dictionary is much leaner than the current default dictionary, small enough to port from the hsql cached version to a hsql in-memory version. Using the in-memory version makes dictionary lookup practically instantaneous (hundredths of a second). Limited documentation is available in the module's doc/ directory. I will be on vacation for a week, but please don't hesitate to write if you have any questions. Sean From: Harpreet Khanduja [hsk5...@rit.edu] Sent: Thursday, July 17, 2014 5:07 PM To: dev@ctakes.apache.org Subject: Lucene for UMLS2014 Hello, I would be grateful if someone could help. I created a lucene index for umls2014 but only for snomed vocabulary. I did this because I thought this would reduce the dictionary look up time. But it still almost the same. Is there any other way to improve the dictionary look up time? Thank you, Harpreet
RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
There is currently no guides on the confluence wiki for cTAKES 3.2.0... I was thinking of just cloning 3.1.1 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.1 And just add the YTEX and/or any new changes to it... Would be grateful for any help here... -Original Message- From: John Green [mailto:john.travis.gr...@gmail.com] Sent: Tuesday, July 22, 2014 2:37 PM To: dev@ctakes.apache.org Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2) What exactly needs updated? I have not had the time (unfortunately) to help with this project very much because of the steep learning curve on the technology. I'm currently on some protected research time working with cTakes as of this week and would be happy to help with some grunt work. JG On Tue, Jul 22, 2014 at 11:39 AM, Bleeker, Troy C. bleeker.t...@mayo.edu wrote: One page at a time. At least there's that. Thanks Troy -Original Message- From: Masanz, James J. Sent: Tuesday, July 22, 2014 10:38 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) When I asked Troy that question for 3.1.1, he didn't know of a way, and I don't either, which is why I had the 3.1.1 page mostly just reference the 3.2 documentation. -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Tuesday, July 22, 2014 10:00 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Thanks James. I was planning on closing the vote today. In the meantime, does anyone a quick way to clone/rename the wiki documentation for 3.2? --Pei -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Monday, July 21, 2014 4:25 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Here's the additional I've done I ran mvn test with 0 Failures and 0 Errors. Ran the AggregateTemplateFiller.xml and received same output (except for internal UIMA identifiers) with rc2 as I did with 3.1.1. +1 to release -Original Message- From: Masanz, James J. Sent: Wednesday, July 16, 2014 3:59 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) FYI, so far I have done the following steps: downloaded the source archive compiled it using: maven compile downloaded the separately available resources set up classpath to include e.g. jars (from the bin distribution) set ctakes.umlsuser and ctakes.umlspw env vars run runctakesCVD.bat loaded AggregatePlaintextUMLSProcessor.xml ran against some simple text. verified did not through an exception. verified some EventMention and EntityMention annotations were produced. I will do more testing tomorrow. Just giving a status update. --James -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, July 12, 2014 6:24 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Agreed on that. I downloaded the new resources binary and was able to run my tests on the - bin version of the RC. +1 for making this the release. Tim From: Masanz, James J. [masanz.ja...@mayo.edu] Sent: Friday, July 11, 2014 7:27 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I agree about keeping the thread open. -- James -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, July 11, 2014 4:28 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Updated the lvg.properties file within ctakes-resources on sourceforge [1]. Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE thread open. Also renamed it to 3.2.0 (even though they technically do not have to follow each other, but probably nice to keep it consistent for users as James suggested.) [1] http://sourceforge.net/projects/ctakesresources/files/ctakes-resourc es - 3.2.0.zip/download -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 5:53 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3 -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 2:12 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I think this is due to the fact that the default lvg.properties also exits in the ctakes-resources project, so if you download and replace, it will override the
RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2)
Ill play with it tonight or tomorrow night. JG — Sent from Mailbox for iPhone On Tue, Jul 22, 2014 at 4:27 PM, Chen, Pei pei.c...@childrens.harvard.edu wrote: There is currently no guides on the confluence wiki for cTAKES 3.2.0... I was thinking of just cloning 3.1.1 https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+3.1.1 And just add the YTEX and/or any new changes to it... Would be grateful for any help here... -Original Message- From: John Green [mailto:john.travis.gr...@gmail.com] Sent: Tuesday, July 22, 2014 2:37 PM To: dev@ctakes.apache.org Subject: Re: [VOTE] Release Apache cTAKES 3.2.0 (rc2) What exactly needs updated? I have not had the time (unfortunately) to help with this project very much because of the steep learning curve on the technology. I'm currently on some protected research time working with cTakes as of this week and would be happy to help with some grunt work. JG On Tue, Jul 22, 2014 at 11:39 AM, Bleeker, Troy C. bleeker.t...@mayo.edu wrote: One page at a time. At least there's that. Thanks Troy -Original Message- From: Masanz, James J. Sent: Tuesday, July 22, 2014 10:38 AM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) When I asked Troy that question for 3.1.1, he didn't know of a way, and I don't either, which is why I had the 3.1.1 page mostly just reference the 3.2 documentation. -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Tuesday, July 22, 2014 10:00 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Thanks James. I was planning on closing the vote today. In the meantime, does anyone a quick way to clone/rename the wiki documentation for 3.2? --Pei -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Monday, July 21, 2014 4:25 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Here's the additional I've done I ran mvn test with 0 Failures and 0 Errors. Ran the AggregateTemplateFiller.xml and received same output (except for internal UIMA identifiers) with rc2 as I did with 3.1.1. +1 to release -Original Message- From: Masanz, James J. Sent: Wednesday, July 16, 2014 3:59 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) FYI, so far I have done the following steps: downloaded the source archive compiled it using: maven compile downloaded the separately available resources set up classpath to include e.g. jars (from the bin distribution) set ctakes.umlsuser and ctakes.umlspw env vars run runctakesCVD.bat loaded AggregatePlaintextUMLSProcessor.xml ran against some simple text. verified did not through an exception. verified some EventMention and EntityMention annotations were produced. I will do more testing tomorrow. Just giving a status update. --James -Original Message- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Saturday, July 12, 2014 6:24 AM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Agreed on that. I downloaded the new resources binary and was able to run my tests on the - bin version of the RC. +1 for making this the release. Tim From: Masanz, James J. [masanz.ja...@mayo.edu] Sent: Friday, July 11, 2014 7:27 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I agree about keeping the thread open. -- James -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Friday, July 11, 2014 4:28 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Updated the lvg.properties file within ctakes-resources on sourceforge [1]. Since the Apache cTAKES artifacts didn't change, I would like to keep this VOTE thread open. Also renamed it to 3.2.0 (even though they technically do not have to follow each other, but probably nice to keep it consistent for users as James suggested.) [1] http://sourceforge.net/projects/ctakesresources/files/ctakes-resourc es - 3.2.0.zip/download -Original Message- From: Masanz, James J. [mailto:masanz.ja...@mayo.edu] Sent: Thursday, July 10, 2014 5:53 PM To: 'dev@ctakes.apache.org' Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) Can you also give ctakesresources the number 3.2 or 3.2.0 instead of 3.1.3 -Original Message- From: Chen, Pei [mailto:pei.c...@childrens.harvard.edu] Sent: Thursday, July 10, 2014 2:12 PM To: dev@ctakes.apache.org Subject: RE: [VOTE] Release Apache cTAKES 3.2.0 (rc2) I