Hi James, Below is the CAS consumer detail:
FileWriterCasConsumer Descriptor in collection reader: FilesInDirectoryCollectionReader.xml The contents of AggregatePlaintextFastUMLSProcessor are not changed and I have always used CPE GUI by clear all option. I am not sure of hard drive error logs, but will check that as one of the possibilities. Could you please let me know approximately how much time it took for you to run files of sizes ~2Mb (or if you can share any other benchmarks for other file sizes you used earlier) Regards, Harish. From: James Masanz [mailto:[email protected]] Sent: Thursday, December 14, 2017 1:21 PM To: [email protected] Subject: Re: Slowness in processing files [EXTERNAL] sorry, I meant verify that the contents of the xml file for the fast dictionary lookup haven't changed (AggregatePlaintextFastUMLSProcessor) On Thu, Dec 14, 2017 at 1:20 PM, James Masanz <[email protected]<mailto:[email protected]>> wrote: Harish, with the AggregatePlaintextFastUMLSProcessor, it should not be taking that long. It sounds like either something outside of cTAKES is having an issue (a hard drive starting to fail) or that you are accidentally running AggregatePlaintextUMLSProcessor. I've had issues with the CPE GUI not always behaving well for me. I suggest when you run the CPE GUI, you use File->Clear all and re-enter/re-select what you want. If that doesn't help, verify that the contents of AggregatePlaintextUMLSProcessor haven't been changed. If none of that helps, as a last resort, I'd look into hard drive error logs. Also, are you using a Cas Consumer? if so, which one. On Thu, Dec 14, 2017 at 12:04 PM, <[email protected]<mailto:[email protected]>> wrote: If a 2kb file takes about 11 seconds, then a 2mb file is expected to take ~11*1000 seconds which is about 3 hours (under the assumption that the runtime is linear to the file size). I do not know if the pipeline can be sped up. I would suggest to chunk the file into smaller chunks (pieces) and run the pipeline in parallel for each chunk. Jonas S Am 14.12.17 um 17:48 schrieb Yadav, Harish: Hi Timothy, Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. It runs fine for 2 Kb file. Regards, Harish. -----Original Message----- From: Miller, Timothy [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, December 14, 2017 11:22 AM To: [email protected]<mailto:[email protected]> Subject: Re: Slowness in processing files [EXTERNAL] You missed the most important part of my message: Do not try to use AggregatePlainTextProcessor, it is just slow. Use AggregatePlaintextFastUMLSProcessor Tim On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote: Hi Timothy, I fixed the password issues and ran with AE AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I have checked the memory consumption of the process and it never goes above 4.5 G, so I am not sure if it is the memory issue. However, AE AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but most of our files are in Mbs so processing time for each file for more than 2 hours is not feasible. Could you please suggest something which may improve the performance. Below are the logs for the process of 2 Mb file with AggregatePlainTextProcessor: Logs: C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes- 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g org.apache.uima.tools.cpm.CpmFrame Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j: reset attribute= "false". log4j: Threshold ="null". log4j: Retreiving an instance of org.apache.log4j.Logger. log4j: Setting [ProgressAppender] additivity to [false]. log4j: Level value for ProgressAppender is [INFO]. log4j: ProgressAppender level set to INFO log4j: Class name: [org.apache.log4j.ConsoleAppender] log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" log4j: Setting property [conversionPattern] to [%m]. log4j: Adding appender named [noEolAppender] to category [ProgressAppender]. log4j: Retreiving an instance of org.apache.log4j.Logger. log4j: Setting [ProgressDone] additivity to [false]. log4j: Level value for ProgressDone is [INFO]. log4j: ProgressDone level set to INFO log4j: Class name: [org.apache.log4j.ConsoleAppender] log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" log4j: Setting property [conversionPattern] to [%m%n]. log4j: Adding appender named [eolAppender] to category [ProgressDone]. log4j: Level value for root is [INFO]. log4j: root level set to INFO log4j: Class name: [org.apache.log4j.ConsoleAppender] log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p %c{1} - %m%n]. log4j: Adding appender named [consoleAppender] to category [root]. 14 Dec 2017 09:42:09 INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip 14 Dec 2017 09:42:10 INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB 14 Dec 2017 09:42:10 INFO ContextDependentTokenizerAnnotator - Finite state machines loaded. 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using dictionary lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using minimum term text span: 3 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml 14 Dec 2017 09:42:10 INFO DictionaryDescriptorParser - Parsing dictionary specifications: 14 Dec 2017 09:42:10 INFO UmlsUserApprover - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm. nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for user harish1234: .14 Dec 2017 09:42:11 INFO UmlsUserApprover - UMLS Account at http s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-<http://urldefense.proofpoint.com/v2/url?u=https-3A__uts-> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for user harish1234 has been validated 14 Dec 2017 09:42:11 INFO JdbcConnectionFactory - Connecting to jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s no_rx_16ab/sno_rx_16ab: 14 Dec 2017 09:42:11 INFO ENGINE - open start - state not modified .................. 14 Dec 2017 09:42:17 INFO JdbcConnectionFactory - Database connected 14 Dec 2017 09:42:17 INFO JdbcRareWordDictionary - Connected to cui and term table CUI_TERMS 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept table TUI with class TUI 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept table RXNORM with class LONG 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept table PREFTERM with class PREFTERM 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept table SNOMEDCT_US with class LONG 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope sizes: 10 , 10 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: LEFT,RIGHT 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: org.apache.ctakes.necontexts.status.StatusContextAnalyzer 14 Dec 2017 09:42:17 INFO StatusContextAnalyzer - initBoundaryData() called for ContextInitializer 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: org.apache.ctakes.necontexts.status.StatusContextHitConsumer 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: org.apache.ctakes.typesystem.type.syntax.BaseToken 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope sizes: 7 , 7 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: LEFT,RIGHT 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: org.apache.ctakes.necontexts.negation.NegationContextAnalyzer 14 Dec 2017 09:42:17 INFO NegationContextAnalyzer - initBoundaryData() called for ContextInitializer 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: org.apache.ctakes.necontexts.negation.NegationContextHitConsumer 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: org.apache.ctakes.typesystem.type.syntax.BaseToken 14 Dec 2017 09:42:17 INFO SentenceDetector - Sentence detector model file: org/apache/ctakes/core/sentdetect/sd-med-model.zip 14 Dec 2017 09:42:17 INFO POSTagger - POS tagger model file: org/apache/ctakes/postagger/models/mayo-pos.zip 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0- bin\apache-ctakes- 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - config file absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cwd = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\resources\org\apache\ctakes\lvg\ 14 Dec 2017 09:42:18 INFO ENGINE - open start - state not modified 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open start 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open end 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 14 Dec 2017 09:42:18 INFO DrugMentionAnnotator - Finite state machines loaded. 14 Dec 2017 09:42:23 INFO ClearNLPDependencyParserAE - using Morphy analysis? true Loading configuration. Loading feature templates. Loading lexica. Loading model: ..................................................................... ................... Loading configuration. Loading feature templates. Loading model: . Loading configuration. Loading feature templates. Loading lexica. Loading model: ... <various Loading model> . Loading configuration. Loading feature templates. Loading lexica. Loading model: ................................ Loading model: ............................. 14 Dec 2017 09:42:32 INFO ConstituencyParser - Initializing parser... 14 Dec 2017 09:42:33 INFO SentenceDetector - Starting processing. 14 Dec 2017 09:42:34 INFO TokenizerAnnotatorPTB - process(JCas) in org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB 14 Dec 2017 09:42:36 INFO LvgAnnotator - process(JCas) 14 Dec 2017 09:42:55 INFO ContextDependentTokenizerAnnotator - process(JCas) 14 Dec 2017 09:42:58 INFO POSTagger - process(JCas) 14 Dec 2017 09:43:10 INFO Chunker - process(JCas) 14 Dec 2017 09:43:46 INFO ChunkAdjuster - process(JCas) 14 Dec 2017 09:43:47 INFO ChunkAdjuster - process(JCas) 14 Dec 2017 09:43:48 INFO AbstractJCasTermAnnotator - Starting processing 14 Dec 2017 09:43:54 INFO AbstractJCasTermAnnotator - Finished processing 14 Dec 2017 09:43:54 INFO DrugMentionAnnotator - process(JCas) 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:34 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:38 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:39 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:42 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:43 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:45 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:53 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:54 INFO DrugMentionAnnotator - 14 Dec 2017 09:45:59 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:00 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:05 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:06 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:08 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:11 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:16 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:24 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:27 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:30 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:32 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:35 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:38 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:45 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:53 INFO DrugMentionAnnotator - 14 Dec 2017 09:46:54 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:02 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:22 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:24 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:28 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:29 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:34 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:38 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:46 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:49 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - 14 Dec 2017 09:47:58 INFO DrugMentionAnnotator - 14 Dec 2017 09:48:45 INFO MaxentParserWrapper - Started processing: idd_secondTrial.txt 14 Dec 2017 10:20:19 INFO MaxentParserWrapper - Done parsing: idd_secondTrial.txt Regards, Harish. -----Original Message----- From: Miller, Timothy [mailto:[email protected]<mailto:[email protected]>] Sent: Thursday, December 14, 2017 9:16 AM To: [email protected]<mailto:[email protected]> Subject: Re: Slowness in processing files [EXTERNAL] Do not try to use AggregatePlainTextProcessor, it is just slow. Use the fast version and debug the password issues. Make sure you have your UMLS credentials set in: $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r x_ 16ab.xml in two different places. Tim On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote: Hi James, Thanks for responding. Single file is taking ~5 hours to process with AggregatePlainTextProcessor of size 2 Mb. This is how the process looks like for JVM arguments regarding memory: C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bi apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0- bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes- 4.0.0- bin\apache-ctakes-4.0.0\lib\*" -Dlog4j. nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache- ctakes- 4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame Also, just now I tried to process the file with AE AggregatePlaintextFastUMLSProcessor but ran into different problem of not getting authentication error with same username password being used in AggregatePlainTextProcessor. I can run it with AggregatePlaintextFastUMLSProcessor by increasing Xms 5g and Xmx5g, if you could please let me know how can it be possible that with one AE AggregatePlainTextProcessor it is running fine with above username and password but giving below exception with same username, password with AggregatePlaintextFastUMLSProcessor. Exception: C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes- 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache- ctakes- 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache- ctakes- 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM java.util.prefs.WindowsPreferences <init> WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j: attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017 21:05:00 INFO TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae<http://org.apache.ctakes.core.ae>.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 INFO ContextDependentTokenizerAnnotator - Finite state machines loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using dictionary lookup window type: org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing dictionary specifications: 13 Dec 2017 21:05:00 INFO UmlsUserApprover - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url ?u=https-3A__uts- 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e= v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://urldefe nse.proofpoint.com/v2/url?u=https-3A__uts-<http://nse.proofpoint.com/v2/url?u=https-3A__uts-> 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He up-IbsIg9Q1TPOylpP9FE4GTK- OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e= m.nih.gov/restful/isValidUMLSUser<http://m.nih.gov/restful/isValidUMLSUser> is not valid for user XXXXXXX-ß with XXXXXXX org.apache.uima.resource.ResourceInitializationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed. at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini ti alize(CollectionProcessingEngine_impl.java:81) at org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin gE ngine(UIMAFramework_impl.java:420) at org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM AF ramework.java:918) at org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57 3) at org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105) at org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused by: org.apache.uima.resource.ResourceConfigurationException: Initialization of CAS Processor with name "AggregatePlaintextFastUMLSProcessor" failed. at org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt eg ratedCasProcessor(CPEFactory.java:1101) at org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc es sors(CPEFactory.java:547) at org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja va :253) at org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl. ja va:127) at org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini ti alize(CollectionProcessingEngine_impl.java:73) ... 5 more Caused by: org.apache.uima.resource.ResourceInitializationException: Initialization of annotator class "org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.DefaultJCasTermAnnotator" failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0- bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup- fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i ni tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i ni tialize(PrimitiveAnalysisEngine_impl.java:170) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana ly sisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( Co mpositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 9) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j av a:407) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja va :256) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i ni tASB(AggregateAnalysisEngine_impl.java:429) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i ni tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37 3) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i ni tialize(AggregateAnalysisEngine_impl.java:186) at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana ly sisEngineFactory_impl.java:94) at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( Co mpositeResourceFactory_impl.java:62) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 9) at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33 1) at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j av a:448) at org.apache.uima.collection.impl.cpm.container.CPEFactory.pro<http://l.cpm.container.CPEFactory.pro>duceInt eg ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused by: org.apache.uima.resource.ResourceInitializationException: MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBundle, key C ould not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic ti onary at org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i ni tialize(AbstractJCasTermAnnotator.java:131) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i ni tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) ... 24 more Caused by: org.apache.uima.analysis_engine.annotator.AnnotatorContextException : MESSAGE LOCALIZATION FAILED: Can't find resource for bundle java.util.PropertyResourceBu ndle, key Could not construct org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic ti onary at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto rP arser.parseDictionary(DictionaryDescriptorParser.java:199) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto rP arser.parseDictionaries(DictionaryDescriptorParser.java:156) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto rP arser.parseDescriptor(DictionaryDescriptorParser.java:128) at org.apache.ctakes.dictionary.lookup2.ae<http://lookup2.ae>.AbstractJCasTermAnnotator.i ni tialize(AbstractJCasTermAnnotator.java:129) ... 25 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto rP arser.parseDictionary(DictionaryDescriptorParser.java:196) ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS dictionary sno_rx_16abTerms at org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic ti onary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33 more From: James Masanz [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, December 13, 2017 8:56 PM To: [email protected]<mailto:[email protected]> Subject: Re: Slowness in processing files Using AggregatePlaintextFastUMLSProcessor is much faster than AggregatePlainTextProcessor, so I suggest that to start with you just use AggregatePlaintextFastUMLSProcessor. Do you mean it is taking ~5 hours for a single file to be processed at times, or is that for a set of files? If your JVM heap space is not set large enough, you can get very slow results. Try increasing to 5G (or more) using the JVM parameter -Xmx5G For faster start up, you can also set the -Xms to the same or something close to -Xmx value. -- James On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]<mailto:[email protected]> wrote: Hi All, When the medical records are run with the AE as AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the processing is very slow. It is pretty fast when the smaller files (~2 kb) are fed as input but when I am processing with bigger files say, 2Mb, it is very slow and the files are taking ~5 hours to process. Any pointer will be of great help. Regards, Harish.
