Pausing the CPE
Hi everybody, I've been running approximately 1million notes through the CPE pipeline of the ctakes/ytex branch. I'm around 700k notes through, but the VM in which I am running the pipeline has its resources fully allocated to the pipeline. I'm trying to run some data processing side-by-side with it which requires more heap space for the JVM. So my question is, does hitting the pause button on the CPE disrupt the pipeline? Aka, can I resume the pipeline after pausing it without losing any information? Clayton Turner
YTEX Exporting with Large Dataset
Hi everyone: So I'm doing some work with the ctakes-ytex branch of ctakes. So, in the past, I've been able to use the YTEX exporter (for going to sparsematrix) on datasets of about 300-400 notes. Now I have run my full dataset through the pipeline and want to set up the exporter. I'm getting a null pointer exception when using the big dataset, but no error occurs if I use my old, smaller dataset even though the export files are nearly identical. Are there file size limits that I am potentially hitting or is my error likely something else? Thanks, Clayton Turner
Re: YTEX Exporting with Large Dataset
Ah, so apparently YTEX does not like me using a join inside the InstanceClassQuery. This is inconvenient, but I can work around it. Clayton Turner Graduate Research Assistant at The College of Charleston Web Developer at Innovative Resource Management Email: caturn...@g.cofc.edu Phone: (843)-424-3784 Blog: claytonturner.blogspot.com -- “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson On Wed, Oct 29, 2014 at 2:16 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi everyone: So I'm doing some work with the ctakes-ytex branch of ctakes. So, in the past, I've been able to use the YTEX exporter (for going to sparsematrix) on datasets of about 300-400 notes. Now I have run my full dataset through the pipeline and want to set up the exporter. I'm getting a null pointer exception when using the big dataset, but no error occurs if I use my old, smaller dataset even though the export files are nearly identical. Are there file size limits that I am potentially hitting or is my error likely something else? Thanks, Clayton Turner
Re: YTEX Exporting with Large Dataset
So, YTEX does not like having a join inside the InstanceClassQuery. This is inconvenient, but I can work around it. On Wed, Oct 29, 2014 at 2:16 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi everyone: So I'm doing some work with the ctakes-ytex branch of ctakes. So, in the past, I've been able to use the YTEX exporter (for going to sparsematrix) on datasets of about 300-400 notes. Now I have run my full dataset through the pipeline and want to set up the exporter. I'm getting a null pointer exception when using the big dataset, but no error occurs if I use my old, smaller dataset even though the export files are nearly identical. Are there file size limits that I am potentially hitting or is my error likely something else? Thanks, Clayton Turner
Re: YTEX Exporting with Large Dataset
Oops! Let me clarify in case someone else hits this or thinks I'm just messing up really badly. I had null values in my dataset and forgot to do a simple is not null - that explains a 'null pointer exception' alright. On Wed, Oct 29, 2014 at 2:39 PM, Clayton Turner caturn...@g.cofc.edu wrote: So, YTEX does not like having a join inside the InstanceClassQuery. This is inconvenient, but I can work around it. On Wed, Oct 29, 2014 at 2:16 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi everyone: So I'm doing some work with the ctakes-ytex branch of ctakes. So, in the past, I've been able to use the YTEX exporter (for going to sparsematrix) on datasets of about 300-400 notes. Now I have run my full dataset through the pipeline and want to set up the exporter. I'm getting a null pointer exception when using the big dataset, but no error occurs if I use my old, smaller dataset even though the export files are nearly identical. Are there file size limits that I am potentially hitting or is my error likely something else? Thanks, Clayton Turner
Re: Change from SNOMEDCT to SNOMEDCT_US affecting v_snomed_fword_lookup
Awesome. This is just what I needed for the longest time. I'm having a slight issue. When running either the ytex pipeline or ytex version of the AggregatePlaintextUMLSProcessor I get an error during initialization. My DictionaryLookupAnnotator.xml is raising a org.apache.uima.resource.ResourceInitializationException causedby: java.lang.ClassNotFoundException: edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl I feel like I may have drifted away from what I need, though, because before this the CPE was complaining about a lack of LookupDesc_SNOMED.xml file. I found a ytex version of this on a google code site somewhere and pasted it where the CPE was looking for it. Now this error is coming up. Could my problem be solved with just a re-run of the ant script (was just trying to avoid since it takes ages) or is it a different issue? On Tue, Aug 19, 2014 at 12:58 PM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi John, I'm not sure what was going on with the @db.schema@ error, although I was getting it as well before with my prior build of 3.1.2 - I assume that you've fixed something (thank you!) to make this go away. I rebuilt everything from scratch and it's working now. I think one other thing I had to change was that after I had finished the install/build, the cTakes version of LookupDesc_Db.xml doesn't work (in resources\org\apache\ctakes\dictionary\lookup) - I'm pretty sure I had to copy in an older version of the file from 3.1.1 to get the default cTakes AggregatePlaintextUMLSProcessor pipeline working, although please double-check that as my memory is a little foggy. But yes, here's what I have working since re-building: 1. ytex-pipeline.xml 2. ytex version of AggregatePlaintextUMLSProcessor.xml 3. cTakes version of AggregatePlaintextUMLSProcessor.xml (with swapping the LookupDesc_Db.xml file as above) I've even made modifications to the ytex version of LookupDesc_SNOMED.xml to get it tagging Disease Disorders, along with database modifications to have it store these entries as well, which is working great. Literally, everything is working perfectly now. Still so much for me to learn! Let me know if you need any more details. All the best, Tim On Tue, Aug 19, 2014 at 4:31 AM, John Green john.travis.gr...@gmail.com wrote: I have not had time to implement this - to clarify out of curiosity, does this clear up the @db.schema@ error Tim? And did you successfully run ytex with the ctakes dictionary-lookup? JG — Sent from Mailbox for iPhone On Sat, Aug 16, 2014 at 2:53 AM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi folks, I was having an issue with the current build (from svn) of ctakes/ytex not identifying any annotations as some folks on this board. I traced it to the fact that the UMLS database has at sometime in the relatively recent past changed the SAB tag in the MRCONSO table for SNOMED terms from SNOMEDCT to SNOMEDCT_US. I just had a newer version of UMLS that uses SNOMEDCT_US. Thus when the install script tried to create the v_snomed_fword_lookup table, it wasn't finding any of the SNOMEDCT terms, thus nothing was getting annotated. The ytex install script was just looking for things in MRCONSO with the SNOMEDCT SAB tag when it created the ytex lookup table - so, by changing this to SNOMEDCT_US in the file CTAKES_HOME/bin/ctakes-ytex/scripts/data/mysql/umls/insert_view_template.sql it now works (for mysql users) to find the annotations. You can just re-run the ytex setup script, but that takes hours - instead, I just deleted all the data from the v_snomed_fword_lookup table and basically ran the sql command to repopulate the table and it worked fine. Here's the code, n.b. my schema name for my umls database is 'umls' - change the code below if yours is different. delete from v_snomed_fword_lookup; insert into v_snomed_fword_lookup (cui, tui, fword, fstem, tok_str, stem_str) select mrc.cui, t.tui, c.fword, c.fstem, c.tok_str, c.stem_str from umls_aui_fword c inner join umls.MRCONSO mrc on c.aui = mrc.aui and mrc.SAB in ( 'SNOMEDCT_US', 'RXNORM') inner join ( select cui, min(tui) tui from umls.MRSTY sty where sty.tui in ( 'T019', 'T020', 'T037', 'T046', 'T047', 'T048', 'T049', 'T050', 'T190', 'T191', 'T033', 'T184', 'T017', 'T029', 'T023', 'T030', 'T031', 'T022', 'T025', 'T026', 'T018', 'T021', 'T024', 'T116', 'T195', 'T123', 'T122', 'T118', 'T103', 'T120', 'T104', 'T200', 'T111', 'T196', 'T126', 'T131', 'T125', 'T129', 'T130', 'T197', 'T119', 'T124', 'T114', 'T109', 'T115', 'T121', 'T192', 'T110', 'T127', 'T060', 'T065', 'T058', 'T059', 'T063', 'T062', 'T061', 'T074', 'T075', 'T059' ) group by cui ) t on t.cui = mrc.cui ; Hope it helps - cheers, Tim -- -- Clayton Turner email: caturn
Re: Change from SNOMEDCT to SNOMEDCT_US affecting v_snomed_fword_lookup
Ah, I just switched to the ytex branch and all is good now. The SNOMED_US issue has been plaguing me for weeks now so thanks a million for that. On Thu, Aug 21, 2014 at 2:13 PM, Clayton Turner caturn...@g.cofc.edu wrote: Awesome. This is just what I needed for the longest time. I'm having a slight issue. When running either the ytex pipeline or ytex version of the AggregatePlaintextUMLSProcessor I get an error during initialization. My DictionaryLookupAnnotator.xml is raising a org.apache.uima.resource.ResourceInitializationException causedby: java.lang.ClassNotFoundException: edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl I feel like I may have drifted away from what I need, though, because before this the CPE was complaining about a lack of LookupDesc_SNOMED.xml file. I found a ytex version of this on a google code site somewhere and pasted it where the CPE was looking for it. Now this error is coming up. Could my problem be solved with just a re-run of the ant script (was just trying to avoid since it takes ages) or is it a different issue? On Tue, Aug 19, 2014 at 12:58 PM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi John, I'm not sure what was going on with the @db.schema@ error, although I was getting it as well before with my prior build of 3.1.2 - I assume that you've fixed something (thank you!) to make this go away. I rebuilt everything from scratch and it's working now. I think one other thing I had to change was that after I had finished the install/build, the cTakes version of LookupDesc_Db.xml doesn't work (in resources\org\apache\ctakes\dictionary\lookup) - I'm pretty sure I had to copy in an older version of the file from 3.1.1 to get the default cTakes AggregatePlaintextUMLSProcessor pipeline working, although please double-check that as my memory is a little foggy. But yes, here's what I have working since re-building: 1. ytex-pipeline.xml 2. ytex version of AggregatePlaintextUMLSProcessor.xml 3. cTakes version of AggregatePlaintextUMLSProcessor.xml (with swapping the LookupDesc_Db.xml file as above) I've even made modifications to the ytex version of LookupDesc_SNOMED.xml to get it tagging Disease Disorders, along with database modifications to have it store these entries as well, which is working great. Literally, everything is working perfectly now. Still so much for me to learn! Let me know if you need any more details. All the best, Tim On Tue, Aug 19, 2014 at 4:31 AM, John Green john.travis.gr...@gmail.com wrote: I have not had time to implement this - to clarify out of curiosity, does this clear up the @db.schema@ error Tim? And did you successfully run ytex with the ctakes dictionary-lookup? JG — Sent from Mailbox for iPhone On Sat, Aug 16, 2014 at 2:53 AM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi folks, I was having an issue with the current build (from svn) of ctakes/ytex not identifying any annotations as some folks on this board. I traced it to the fact that the UMLS database has at sometime in the relatively recent past changed the SAB tag in the MRCONSO table for SNOMED terms from SNOMEDCT to SNOMEDCT_US. I just had a newer version of UMLS that uses SNOMEDCT_US. Thus when the install script tried to create the v_snomed_fword_lookup table, it wasn't finding any of the SNOMEDCT terms, thus nothing was getting annotated. The ytex install script was just looking for things in MRCONSO with the SNOMEDCT SAB tag when it created the ytex lookup table - so, by changing this to SNOMEDCT_US in the file CTAKES_HOME/bin/ctakes-ytex/scripts/data/mysql/umls/insert_view_template.sql it now works (for mysql users) to find the annotations. You can just re-run the ytex setup script, but that takes hours - instead, I just deleted all the data from the v_snomed_fword_lookup table and basically ran the sql command to repopulate the table and it worked fine. Here's the code, n.b. my schema name for my umls database is 'umls' - change the code below if yours is different. delete from v_snomed_fword_lookup; insert into v_snomed_fword_lookup (cui, tui, fword, fstem, tok_str, stem_str) select mrc.cui, t.tui, c.fword, c.fstem, c.tok_str, c.stem_str from umls_aui_fword c inner join umls.MRCONSO mrc on c.aui = mrc.aui and mrc.SAB in ( 'SNOMEDCT_US', 'RXNORM') inner join ( select cui, min(tui) tui from umls.MRSTY sty where sty.tui in ( 'T019', 'T020', 'T037', 'T046', 'T047', 'T048', 'T049', 'T050', 'T190', 'T191', 'T033', 'T184', 'T017', 'T029', 'T023', 'T030', 'T031', 'T022', 'T025', 'T026', 'T018', 'T021', 'T024', 'T116', 'T195', 'T123', 'T122', 'T118', 'T103', 'T120', 'T104', 'T200', 'T111', 'T196', 'T126', 'T131', 'T125', 'T129', 'T130', 'T197', 'T119', 'T124', 'T114', 'T109', 'T115', 'T121', 'T192
Re: Change from SNOMEDCT to SNOMEDCT_US affecting v_snomed_fword_lookup
It didn't fix the @db.schema@ - I just went in and manually changed it whenever the CPE complained. I assume that's supposed to be reading from ytex.properties, but mine was set and it didn't resolve that @db.schema@ issue. On Thu, Aug 21, 2014 at 5:00 PM, John Green john.travis.gr...@gmail.com wrote: Clayton - this indeed did fix the @db.schema@ for you? Im gonna try and reproduce (havent had time yet) then ill close the Jira ticket out. JG — Sent from Mailbox for iPhone On Thu, Aug 21, 2014 at 1:24 PM, Clayton Turner caturn...@g.cofc.edu wrote: Ah, I just switched to the ytex branch and all is good now. The SNOMED_US issue has been plaguing me for weeks now so thanks a million for that. On Thu, Aug 21, 2014 at 2:13 PM, Clayton Turner caturn...@g.cofc.edu wrote: Awesome. This is just what I needed for the longest time. I'm having a slight issue. When running either the ytex pipeline or ytex version of the AggregatePlaintextUMLSProcessor I get an error during initialization. My DictionaryLookupAnnotator.xml is raising a org.apache.uima.resource.ResourceInitializationException causedby: java.lang.ClassNotFoundException: edu.mayo.bmi.uima.lookup.ae.FirstTokenPermLookupInitializerImpl I feel like I may have drifted away from what I need, though, because before this the CPE was complaining about a lack of LookupDesc_SNOMED.xml file. I found a ytex version of this on a google code site somewhere and pasted it where the CPE was looking for it. Now this error is coming up. Could my problem be solved with just a re-run of the ant script (was just trying to avoid since it takes ages) or is it a different issue? On Tue, Aug 19, 2014 at 12:58 PM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi John, I'm not sure what was going on with the @db.schema@ error, although I was getting it as well before with my prior build of 3.1.2 - I assume that you've fixed something (thank you!) to make this go away. I rebuilt everything from scratch and it's working now. I think one other thing I had to change was that after I had finished the install/build, the cTakes version of LookupDesc_Db.xml doesn't work (in resources\org\apache\ctakes\dictionary\lookup) - I'm pretty sure I had to copy in an older version of the file from 3.1.1 to get the default cTakes AggregatePlaintextUMLSProcessor pipeline working, although please double-check that as my memory is a little foggy. But yes, here's what I have working since re-building: 1. ytex-pipeline.xml 2. ytex version of AggregatePlaintextUMLSProcessor.xml 3. cTakes version of AggregatePlaintextUMLSProcessor.xml (with swapping the LookupDesc_Db.xml file as above) I've even made modifications to the ytex version of LookupDesc_SNOMED.xml to get it tagging Disease Disorders, along with database modifications to have it store these entries as well, which is working great. Literally, everything is working perfectly now. Still so much for me to learn! Let me know if you need any more details. All the best, Tim On Tue, Aug 19, 2014 at 4:31 AM, John Green john.travis.gr...@gmail.com wrote: I have not had time to implement this - to clarify out of curiosity, does this clear up the @db.schema@ error Tim? And did you successfully run ytex with the ctakes dictionary-lookup? JG — Sent from Mailbox for iPhone On Sat, Aug 16, 2014 at 2:53 AM, Tim O'Connell tim.oconn...@gmail.com wrote: Hi folks, I was having an issue with the current build (from svn) of ctakes/ytex not identifying any annotations as some folks on this board. I traced it to the fact that the UMLS database has at sometime in the relatively recent past changed the SAB tag in the MRCONSO table for SNOMED terms from SNOMEDCT to SNOMEDCT_US. I just had a newer version of UMLS that uses SNOMEDCT_US. Thus when the install script tried to create the v_snomed_fword_lookup table, it wasn't finding any of the SNOMEDCT terms, thus nothing was getting annotated. The ytex install script was just looking for things in MRCONSO with the SNOMEDCT SAB tag when it created the ytex lookup table - so, by changing this to SNOMEDCT_US in the file CTAKES_HOME/bin/ctakes-ytex/scripts/data/mysql/umls/insert_view_template.sql it now works (for mysql users) to find the annotations. You can just re-run the ytex setup script, but that takes hours - instead, I just deleted all the data from the v_snomed_fword_lookup table and basically ran the sql command to repopulate the table and it worked fine. Here's the code, n.b. my schema name for my umls database is 'umls' - change the code below if yours is different. delete from v_snomed_fword_lookup; insert into v_snomed_fword_lookup (cui, tui, fword, fstem, tok_str, stem_str) select mrc.cui, t.tui
Re: v_snomed_fword_lookup view
Okay, I believe I have ctakes dictionary fast working now. Something I'm curious about, though, is how you extract the data in order to conduct analysis. I've, in the past, been using the SparseDataExporterImpl from ytex in order to create a .arff file for use in weka, but the ctakes pipeline I'm using doesn't seem to be compatible with this ytex exporting as I'm not getting any cuis in my arff file. I'm using the aggregate plain text umls processor analysis engine from ctakes and then using the dbconsumer analysis engine from ytex (for storing into the database with regard to analysis batch). Any tips for exporting or some simple issue I'm missing? Thanks, Clayton On Mon, Aug 11, 2014 at 2:09 PM, Harpreet Khanduja hsk5...@rit.edu wrote: Yes, absolutely and no problem at all. Regards, Harpreet On Mon, Aug 11, 2014 at 1:16 PM, Finan, Sean sean.fi...@childrens.harvard.edu wrote: Thanks Harpreet, That is definitely necessary to build! Those lines should already be in the pom, but commented out. I think that some version/branching issues may have arisen at some point wrt this module ... If somebody beats me to it then cheers, otherwise I will try to check out tonight and get all the bits in place. Sean -Original Message- From: Harpreet Khanduja [mailto:hsk5...@rit.edu] Sent: Monday, August 11, 2014 1:12 PM To: dev@ctakes.apache.org Subject: Re: v_snomed_fword_lookup view Hello Clayton, I do not know about ytex, but I did switch from dictionary-lookup to dictionary- lookup-fast. I update my ctakes-dictionary-lookup-fast project using maven. I think I used Team- Update and switched to the latest revision available and then I downloaded new 3.2 resources from the for umls. and then I added these resources to my ctakes-dictionary-lookup-fast resources folder and also the classpath in ctakes- clinical-pipeline. Then I changed the pom.xml file which belongs to the whole ctakes project and added dependency groupIdorg.apache.ctakes/groupId artifactIdctakes-dictionary-lookup-res/artifactId version${ctakes.version}/version /dependency dependency groupIdorg.apache.ctakes/groupId artifactIdctakes-dictionary-lookup-fast/artifactId version${ctakes.version}/version /dependency these two dependencies to the file. After this, I also added the dependency dependency groupIdorg.apache.ctakes/groupId artifactIdctakes-dictionary-lookup-fast/artifactId /dependency to the pom.xml of ctakes-clinical-pipeline. And then add the resources folder in ctakes-clinical-pipeline using build path configuration under add class option. After this it should work. Regards, Harpreet On Mon, Aug 11, 2014 at 12:44 PM, Clayton Turner caturn...@g.cofc.edu wrote: I still get the same error with the ctakes3.2 branch. Any suggestions? On Mon, Aug 11, 2014 at 12:06 PM, Clayton Turner caturn...@g.cofc.edu wrote: I'm going to do a clean install through the repo rather than the binaries and see if that fixes my issue because I think I just read a past post saying the lookup2 folders exist there. On Mon, Aug 11, 2014 at 11:52 AM, Clayton Turner caturn...@g.cofc.edu wrote: When navigating to ctakes-dictionary-lookup-fast\desc\analysis_engine there are 2 files, assumedly analysis engines. SnomedLookupAnnotator.xml and SnomedOvLookupAnnotator.xml If I pick either, I put in my UMLS information but receive an error when trying to run the CPE: Initialization of CAS Processor with name SnomedOvLookupAnnotator failed. CausedBy: org.apache.uima.resource.ResourceConfigurationException: Initialization of CAS processor with name SnomedOvLookupAnnotator failed. CausedBy: org.apache.uima.resource.ResourceInitializationException: Error initializing org.apache.uima.resource.impl.DataResource_impl from descriptor file:..SnomedLookupAnnotator.xml CausedBy: org.apache.uima.resource.ResourceInitializationException: Could not access the resource data at file:org\apache\ctakes\dictionary\lookup2\Snomed2011ab_ctakesTui\cTake sSnomed.xml Now, I don't even have a lookup2 folder and, subsequently the Tui folder and cTakesSnomed.xml file. This seems to be the problem, but I'm not sure where these files are supposed to be grabbed from. On Mon, Aug 11, 2014 at 11:47 AM, Clayton Turner caturn...@g.cofc.edu wrote: Hi again: How exactly do you switch to using the cTakes dictionary-lookup-fast. Do I need to go in and alter xml files or is it as simple as adding a certain item to the list of analysis engines? On Fri
v_snomed_fword_lookup view
Hi Everyone: I have a question about how the v_snomed_fword_lookup view works when running the CPE. So my understanding of the view is that it is a view comprised of the ytex.umls_aui_fword table, the umls.mrconso table and bits/pieces from other umls tables. I feel like this is not completely correct or my idea of how the join to create the view works is off. For example, let's say I want the CPE to find malar (e.g. malar rash) as a concept in the annotations. It never happens after running my CPE descriptor and I cannot find it in my v_snomed_fword_lookup view. select count(*) from umls_aui_fword where fword='malar'; yields 34 results select count(*) from umls.mrconso where str='malar'; yields 3 results. So clearly these two tables know what the cui and context(s) are for malar . Yet, whenever I run a gold standard set of notes through the CPE, malar is constantly flagged as just a word token and the concept is never grabbed. This is recurrent for lots of other concepts, as well, I just wanted to use an example to illustrate my issue. Some troubleshooting I already went through: 1) Reinstalled ytex and umls database objects 2) Reinstalled a second time after redownloading umls through metamorphosys, ensuring that snomed vocabularies were included (also checked file sizes and noticed a big difference so I know those vocabularies ARE included Anyone got any ideas as to what the issue could be? Thank you, Clayton Turner
Exporting YTEX Pipeline
Hi, I'm trying to export the data I get from running the pipeline through the Collection Processing Engine. I set up the pipeline where I have a directory where all the XML is output to, but I am having issues at this point. I've tried using the built in Exporter from the Data Mining section on this page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide but those notes are out of date. Even altering directories to match the files still gives me errors about not being able to find the ExporterImpl class. The class version of this file only exists outside of the target directory for the ctakes snapshot and attempting to use it still fails. I then ventured to here: https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture The files here match up to the data mining section from the previous link - so I created my export.xml file and changed everything that needed to be changed for my example (tried to even run bone fracture), but I cannot get data exported, no matter what I do. Is there a way to use some new(er) implementation of the SparseDataExporterImpl class or is there an alternative for extracting data for use with weka? I've messaged about this in the past but I don't believe I was thorough enough with my issues. Thanks in advance, Clayton
Re: Exporting YTEX Pipeline
Awesome!! It worked! The only things I had to change (since I'm on Windows) was flipping the slashes when necessary and removing the first slash when specifying the -Dlog4j.configuration=file:/... Thank you so much for putting up with my issues -Clayton On Wed, Jul 30, 2014 at 2:48 PM, vijay garla vnga...@gmail.com wrote: Can you try this: copy https://code.google.com/p/ytex/source/browse/trunk/workspace/examples/fracture/cui/export.template.xml to CTAKES_HOME\desc\ctakes-ytex\fracture\cui.xml replace %DB_SCHEMA% with your database schema name (value of db.schema in your ytex.properties file) Then from a command prompt, execute the following commands: cd CTAKES_HOME bin\setenv.bat java -cp %CLASSPATH% -Dlog4j.configuration=file:/%CTAKES_HOME%/config/log4j.xml -Xmx256m org.apache.ctakes.ytex.kernel.SparseDataExporterImpl -prop desc\ctakes-ytex\fracture\cui.xml -type weka Tell me if you run into any issues. I will add this to the ctakes confluence doc. Best, VJ On Wed, Jul 30, 2014 at 5:11 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm trying to export the data I get from running the pipeline through the Collection Processing Engine. I set up the pipeline where I have a directory where all the XML is output to, but I am having issues at this point. I've tried using the built in Exporter from the Data Mining section on this page https://cwiki.apache.org/confluence/display/CTAKES/User%27s+Guide but those notes are out of date. Even altering directories to match the files still gives me errors about not being able to find the ExporterImpl class. The class version of this file only exists outside of the target directory for the ctakes snapshot and attempting to use it still fails. I then ventured to here: https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture The files here match up to the data mining section from the previous link - so I created my export.xml file and changed everything that needed to be changed for my example (tried to even run bone fracture), but I cannot get data exported, no matter what I do. Is there a way to use some new(er) implementation of the SparseDataExporterImpl class or is there an alternative for extracting data for use with weka? I've messaged about this in the past but I don't believe I was thorough enough with my issues. Thanks in advance, Clayton -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
cTAKES CPE MySQL Exception
Hi, everyone. First off, I'd like to say awesome and thank you for the cTAKES 3.2 release and information. I've been following those pages and it's been really helpful for helping me move along in my own progress. Really cool stuff. So I'm using the Collection Processing Engine (with ytex and umls) and I'm trying to process ~1 million notes (as opposed to the about 30 in the given demo). I've tried this the past 2 days and when I come back in to check the progress I see that I've received an error about 14000 notes into the process: org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator processing failed. CausedBy: org.springframework.transaction.CannotCreateTransactionException: Could not open Hibernate Session for transaction; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet successfully received from the server was 53,888,249 milliseconds ago. The last packet sent successfully to the server was 53,888,249 milliseconds ago. is longer than the server configured value of 'wait_timeout'. You should consider either expiring and/or testing connection validity before use in your application, increasing the server configured values for client timeouts, or using the Connector/J connection property 'autoReconnect=true' to avoid this problem. So, in my own debugging, I have ensured that autoReconnect true was on (it always has been). I looked at my CPE output in the command prompt and noticed a PacketTooBigException so I increased the packet max size to 1G (the max for sql server). I increased the time allowed for timeouts. I'm really unsure of what to do here. Should I find a way to see if there is a problematic note that is giving me issues (though I can't understand how 1 note would make a packet too large)? Should I try to do some horizontal sharding and break the problem into smaller chunks (though I would think this program could handle large datasets since it's using a query language)? I'm just at a loss with this error, especially since it takes so long to actually spit the error out at me. Thanks in advance everyone, Clayton -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
ytex examples
I've been following the usage component guide for ctakes 3.2 and ytex, but I'm having an issue. I get to the point where I want to export my data as a bag of words (or cuis), but the documentation on the wiki seems to be really out of date when it comes to the exporting for data mining step. The YTEX home directory doesn't seem to actually be a thing and there's no fracture example directory with a cui/word folder for the examples anymore. Is there an updated version of this documentation in the works or can someone just give me pointers on how to execute the command over the command prompt (Windows)? Thank you, Clayton
Re: ytex examples
Hi, Alright, I planned on using weka, but it might not be a bad idea to just jump in with either R or Python. I'll check out that link. Thanks! On Thu, Jul 24, 2014 at 2:11 PM, vijay garla vnga...@gmail.com wrote: Hi Clayton, Haven't gotten around to upgrading the docs look here for examples: https://code.google.com/p/ytex/source/browse/#svn%2Ftrunk%2Fworkspace%2Fexamples%2Ffracture If you are using R/Matlab/Python it is easy to generate a sparse matrix directly via database queries, I can give you a few examples Best, VJ On Thu, Jul 24, 2014 at 8:02 PM, Clayton Turner caturn...@g.cofc.edu wrote: I've been following the usage component guide for ctakes 3.2 and ytex, but I'm having an issue. I get to the point where I want to export my data as a bag of words (or cuis), but the documentation on the wiki seems to be really out of date when it comes to the exporting for data mining step. The YTEX home directory doesn't seem to actually be a thing and there's no fracture example directory with a cui/word folder for the examples anymore. Is there an updated version of this documentation in the works or can someone just give me pointers on how to execute the command over the command prompt (Windows)? Thank you, Clayton -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: cTAKES 3.2 Analysis Batch Issue
I don't see a log file when running the CPE. When running the CVD I have access to a log file within the gui, but that does not seem to be present here. Is there a specific place that this log file is saved? On Tue, Jul 8, 2014 at 3:14 AM, vijay garla vnga...@gmail.com wrote: Hi Clayton, The screenshot is not coming through via the newsgroup emails. can you attach the log file? vj On Mon, Jul 7, 2014 at 5:38 PM, Clayton Turner caturn...@g.cofc.edu wrote: Any update on this issue? I have this problem even if I don't use the ytex version of the aggregate text processor (UMLS-independent as well). On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu wrote: Yes, I am running the fracture_demo.xml cpe. There is no option for the analysis batch (that's the main issue). I also get no response in my MySQL database (umls installed - not sure if that can be related). Here's a screenshot of my CPE (using ytex): [image: Inline image 1] On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote: Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: cTAKES 3.2 Analysis Batch Issue
Any update on this issue? I have this problem even if I don't use the ytex version of the aggregate text processor (UMLS-independent as well). On Thu, Jul 3, 2014 at 2:33 PM, Clayton Turner caturn...@g.cofc.edu wrote: Yes, I am running the fracture_demo.xml cpe. There is no option for the analysis batch (that's the main issue). I also get no response in my MySQL database (umls installed - not sure if that can be related). Here's a screenshot of my CPE (using ytex): [image: Inline image 1] On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote: Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson
Re: cTAKES 3.2 Analysis Batch Issue
Yes, I am running the fracture_demo.xml cpe. There is no option for the analysis batch (that's the main issue). I also get no response in my MySQL database (umls installed - not sure if that can be related). Here's a screenshot of my CPE (using ytex): [image: Inline image 1] On Wed, Jul 2, 2014 at 10:48 PM, vijay garla vnga...@gmail.com wrote: Hi clayton, I assume you are running the fracture_demo.xml cpe - is that correct? The CPE GUI should give you the option to set the analysis batch. (see attached screenshot). That being said, the analysis_batch is not required (it will default to the current date). Can you attach the log file? -vj [image: Inline image 1] On Wed, Jul 2, 2014 at 12:22 PM, Clayton Turner caturn...@g.cofc.edu wrote: Hi, I'm a relatively new user of cTAKES. I recently cloned cTAKES from the repository and I am using UMLS installed in my mysql database. I have recently noticed an issue, though. When conducting the bone fracture demo, In the CPE, I use the DBCollectionReader and Analysis Engine from the ctakes-ytex-uima directory within my CTAKES_HOME. I can get this to run successfully, but I am not able to specify an analysis batch in the CPE. Because of this, my ytex database is not being updated with results of the CPE run (in the v_document tables). Any ideas why the analysis batch field is missing? Side question: Any update on when cTAKES 3.2 will be officially released? I see we're passed the expected release and was curious on how long it will be until it will officially come out. Thanks a lot, -- Clayton Turner -- -- Clayton Turner email: caturn...@g.cofc.edu phone: (843)-424-3784 web: claytonturner.blogspot.com - “When scientifically investigating the natural world, the only thing worse than a blind believer is a seeing denier.” - Neil deGrasse Tyson