Hi Timothy, Sorry for the typo, I meant ran with AE AggregatePlaintextFastUMLSProcessor with -Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a single file of 2 Mb size. It runs fine for 2 Kb file.
Regards, Harish. -----Original Message----- From: Miller, Timothy [mailto:[email protected]] Sent: Thursday, December 14, 2017 11:22 AM To: [email protected] Subject: Re: Slowness in processing files [EXTERNAL] You missed the most important part of my message: > Do not try to use AggregatePlainTextProcessor, it is just slow. Use AggregatePlaintextFastUMLSProcessor Tim On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote: > Hi Timothy, > > I fixed the password issues and ran with AE > AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a > lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I > have checked the memory consumption of the process and it never goes > above 4.5 G, so I am not sure if it is the memory issue. However, AE > AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but > most of our files are in Mbs so processing time for each file for more > than 2 hours is not feasible. > > Could you please suggest something which may improve the performance. > Below are the logs for the process of 2 Mb file with > AggregatePlainTextProcessor: > > > > Logs: > > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp > "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes- > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g > org.apache.uima.tools.cpm.CpmFrame > Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init> > WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs > at root 0x80000002. Windows > RegCreateKeyEx(...) returned error code 5. > log4j: reset attribute= "false". > log4j: Threshold ="null". > log4j: Retreiving an instance of org.apache.log4j.Logger. > log4j: Setting [ProgressAppender] additivity to [false]. > log4j: Level value for ProgressAppender is [INFO]. > log4j: ProgressAppender level set to INFO > log4j: Class name: [org.apache.log4j.ConsoleAppender] > log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > log4j: Setting property [conversionPattern] to [%m]. > log4j: Adding appender named [noEolAppender] to category > [ProgressAppender]. > log4j: Retreiving an instance of org.apache.log4j.Logger. > log4j: Setting [ProgressDone] additivity to [false]. > log4j: Level value for ProgressDone is [INFO]. > log4j: ProgressDone level set to INFO > log4j: Class name: [org.apache.log4j.ConsoleAppender] > log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > log4j: Setting property [conversionPattern] to [%m%n]. > log4j: Adding appender named [eolAppender] to category [ProgressDone]. > log4j: Level value for root is [INFO]. > log4j: root level set to INFO > log4j: Class name: [org.apache.log4j.ConsoleAppender] > log4j: Parsing layout of class: "org.apache.log4j.PatternLayout" > log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy > HH:mm:ss} %5p %c{1} - %m%n]. > log4j: Adding appender named [consoleAppender] to category [root]. > 14 Dec 2017 09:42:09 INFO Chunker - Chunker model file: > org/apache/ctakes/chunker/models/chunker-model.zip > 14 Dec 2017 09:42:10 INFO TokenizerAnnotatorPTB - Initializing > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB > 14 Dec 2017 09:42:10 INFO ContextDependentTokenizerAnnotator - Finite > state machines loaded. > 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using > dictionary lookup window type: > org.apache.ctakes.typesystem.type.textspan.Sentence > 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Exclusion > tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB > VBD VBG VBN VBP VBZ WDT WP WPS WRB > 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using minimum > term text span: 3 > 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using > Dictionary Descriptor: > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml > 14 Dec 2017 09:42:10 INFO DictionaryDescriptorParser - Parsing > dictionary specifications: > 14 Dec 2017 09:42:10 INFO UmlsUserApprover - Checking UMLS Account at > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm. > nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW > 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC > EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for > user harish1234: > .14 Dec 2017 09:42:11 INFO UmlsUserApprover - UMLS Account at http > s://urldefense.proofpoint.com/v2/url?u=https-3A__uts- > 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z > y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC > EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for > user harish1234 has been validated > > 14 Dec 2017 09:42:11 INFO JdbcConnectionFactory - Connecting to > jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s > no_rx_16ab/sno_rx_16ab: > 14 Dec 2017 09:42:11 INFO ENGINE - open start - state not modified > .................. > 14 Dec 2017 09:42:17 INFO JdbcConnectionFactory - Database connected > 14 Dec 2017 09:42:17 INFO JdbcRareWordDictionary - Connected to cui > and term table CUI_TERMS > 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept > table TUI with class TUI > 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept > table RXNORM with class LONG > 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept > table PREFTERM with class PREFTERM > 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept > table SNOMEDCT_US with class LONG > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope > sizes: 10 , 10 > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: > LEFT,RIGHT > 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: > org.apache.ctakes.necontexts.status.StatusContextAnalyzer > 14 Dec 2017 09:42:17 INFO StatusContextAnalyzer - initBoundaryData() > called for ContextInitializer > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: > org.apache.ctakes.necontexts.status.StatusContextHitConsumer > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window > type: org.apache.ctakes.typesystem.type.textspan.Sentence > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: > org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: > org.apache.ctakes.typesystem.type.syntax.BaseToken > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right scope > sizes: 7 , 7 > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order: > LEFT,RIGHT > 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3] > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer: > org.apache.ctakes.necontexts.negation.NegationContextAnalyzer > 14 Dec 2017 09:42:17 INFO NegationContextAnalyzer - > initBoundaryData() called for ContextInitializer > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer: > org.apache.ctakes.necontexts.negation.NegationContextHitConsumer > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window > type: org.apache.ctakes.typesystem.type.textspan.Sentence > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type: > org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation > 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type: > org.apache.ctakes.typesystem.type.syntax.BaseToken > 14 Dec 2017 09:42:17 INFO SentenceDetector - Sentence detector model > file: org/apache/ctakes/core/sentdetect/sd-med-model.zip > 14 Dec 2017 09:42:17 INFO POSTagger - POS tagger model file: > org/apache/ctakes/postagger/models/mayo-pos.zip > 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - Loading NLM Norm > and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0- > bin\apache-ctakes- > 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties > 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - config file > absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties > 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cwd = > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 > 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > 4.0.0\resources\org\apache\ctakes\lvg\ > 14 Dec 2017 09:42:18 INFO ENGINE - open start - state not modified > 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open start > 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open end > 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0 > 14 Dec 2017 09:42:18 INFO DrugMentionAnnotator - Finite state > machines loaded. > 14 Dec 2017 09:42:23 INFO ClearNLPDependencyParserAE - using Morphy > analysis? true Loading configuration. > Loading feature templates. > Loading lexica. > Loading model: > ..................................................................... > ................... > Loading configuration. > Loading feature templates. > Loading model: > . > Loading configuration. > Loading feature templates. > Loading lexica. > Loading model: > ... > <various Loading model> > . > Loading configuration. > Loading feature templates. > Loading lexica. > Loading model: > ................................ > Loading model: > ............................. > 14 Dec 2017 09:42:32 INFO ConstituencyParser - Initializing parser... > 14 Dec 2017 09:42:33 INFO SentenceDetector - Starting processing. > 14 Dec 2017 09:42:34 INFO TokenizerAnnotatorPTB - process(JCas) in > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB > 14 Dec 2017 09:42:36 INFO LvgAnnotator - process(JCas) > 14 Dec 2017 09:42:55 INFO ContextDependentTokenizerAnnotator - > process(JCas) > 14 Dec 2017 09:42:58 INFO POSTagger - process(JCas) > 14 Dec 2017 09:43:10 INFO Chunker - process(JCas) > 14 Dec 2017 09:43:46 INFO ChunkAdjuster - process(JCas) > 14 Dec 2017 09:43:47 INFO ChunkAdjuster - process(JCas) > 14 Dec 2017 09:43:48 INFO AbstractJCasTermAnnotator - Starting > processing > 14 Dec 2017 09:43:54 INFO AbstractJCasTermAnnotator - Finished > processing > 14 Dec 2017 09:43:54 INFO DrugMentionAnnotator - process(JCas) > 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:34 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:38 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:39 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:42 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:43 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:45 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:53 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:54 INFO DrugMentionAnnotator - > 14 Dec 2017 09:45:59 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:00 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:05 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:06 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:08 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:11 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:16 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:24 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:27 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:30 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:32 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:35 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:38 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:45 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:53 INFO DrugMentionAnnotator - > 14 Dec 2017 09:46:54 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:02 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:22 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:24 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:28 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:29 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:34 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:38 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:46 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:49 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator - > 14 Dec 2017 09:47:58 INFO DrugMentionAnnotator - > 14 Dec 2017 09:48:45 INFO MaxentParserWrapper - Started processing: > idd_secondTrial.txt > 14 Dec 2017 10:20:19 INFO MaxentParserWrapper - Done parsing: > idd_secondTrial.txt > > > > > > > > Regards, > Harish. > > > -----Original Message----- > From: Miller, Timothy [mailto:[email protected]] > Sent: Thursday, December 14, 2017 9:16 AM > To: [email protected] > Subject: Re: Slowness in processing files [EXTERNAL] > > Do not try to use AggregatePlainTextProcessor, it is just slow. > Use the fast version and debug the password issues. > Make sure you have your UMLS credentials set in: > $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r > x_ > 16ab.xml > > in two different places. > > Tim > > > > On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote: > > > > Hi James, > > > > Thanks for responding. > > > > Single file is taking ~5 hours to process with > > AggregatePlainTextProcessor of size 2 Mb. This is how the process > > looks like for JVM arguments regarding memory: > > > > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java > > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp > > "C:\New_Drive\apache-ctakes-4.0.0-bi > > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0- > > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes- > > 4.0.0- > > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j. > > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache- > > ctakes- > > 4.0.0\config\log4j.xml -Xms512M -Xmx3g > > org.apache.uima.tools.cpm.CpmFrame > > > > Also, just now I tried to process the file with AE > > AggregatePlaintextFastUMLSProcessor but ran into different problem > > of not getting authentication error with same username password > > being used in AggregatePlainTextProcessor. > > > > I can run it with AggregatePlaintextFastUMLSProcessor by increasing > > Xms 5g and Xmx5g, if you could please let me know how can it be > > possible that with one AE AggregatePlainTextProcessor it is running > > fine with above username and password but giving below exception > > with same username, password with > > AggregatePlaintextFastUMLSProcessor. > > > > Exception: > > > > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java > > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp > > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes- > > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes- > > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache- > > ctakes- > > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache- > > ctakes- > > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g > > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM > > java.util.prefs.WindowsPreferences <init> WARNING: Could not > > open/create prefs root node Software\JavaSoft\Prefs at root > > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. > > log4j: > > attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model > > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec > > 2017 > > 21:05:00 INFO TokenizerAnnotatorPTB - Initializing > > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 > > 21:05:00 > > INFO ContextDependentTokenizerAnnotator - Finite state machines > > loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using > > dictionary lookup window type: > > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017 > > 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: > > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN > > VBP VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO > > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec > > 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary > > Descriptor: > > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml > > 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing > > dictionary specifications: 13 Dec 2017 21:05:00 INFO > > UmlsUserApprover > > - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url > > ?u=https-3A__uts- > > 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx > > eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK- > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ > > vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e= > > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017 > > 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://urldefe > > nse.proofpoint.com/v2/url?u=https-3A__uts- > > 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He > > up-IbsIg9Q1TPOylpP9FE4GTK- > > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ > > vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e= > > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß > > with XXXXXXX > > org.apache.uima.resource.ResourceInitializationException: > > Initialization of CAS Processor with name > > "AggregatePlaintextFastUMLSProcessor" failed. at > > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini > > ti > > alize(CollectionProcessingEngine_impl.java:81) at > > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin > > gE > > ngine(UIMAFramework_impl.java:420) at > > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM > > AF > > ramework.java:918) at > > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57 > > 3) > > at > > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105) > > at > > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused > > by: org.apache.uima.resource.ResourceConfigurationException: > > Initialization of CAS Processor with name > > "AggregatePlaintextFastUMLSProcessor" failed. at > > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt > > eg > > ratedCasProcessor(CPEFactory.java:1101) at > > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc > > es > > sors(CPEFactory.java:547) at > > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja > > va > > :253) at > > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl. > > ja > > va:127) at > > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini > > ti > > alize(CollectionProcessingEngine_impl.java:73) ... 5 more > > Caused by: > > org.apache.uima.resource.ResourceInitializationException: > > Initialization of annotator class > > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator " > > failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0- > > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup- > > fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i > > ni > > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271) > > at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i > > ni > > tialize(PrimitiveAnalysisEngine_impl.java:170) at > > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana > > ly > > sisEngineFactory_impl.java:94) at > > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( > > Co > > mpositeResourceFactory_impl.java:62) at > > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 > > 9) > > at > > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j > > av > > a:407) at > > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja > > va > > :256) at > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i > > ni > > tASB(AggregateAnalysisEngine_impl.java:429) at > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i > > ni > > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37 > > 3) > > at > > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i > > ni > > tialize(AggregateAnalysisEngine_impl.java:186) at > > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana > > ly > > sisEngineFactory_impl.java:94) at > > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource( > > Co > > mpositeResourceFactory_impl.java:62) at > > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27 > > 9) > > at > > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33 > > 1) > > at > > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j > > av > > a:448) at > > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt > > eg > > ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused > > by: > > org.apache.uima.resource.ResourceInitializationException: MESSAGE > > LOCALIZATION FAILED: Can't find resource for bundle > > java.util.PropertyResourceBundle, key C ould not construct > > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic > > ti > > onary at > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i > > ni > > tialize(AbstractJCasTermAnnotator.java:131) at > > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i > > ni > > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266) > > ... 24 more Caused by: > > org.apache.uima.analysis_engine.annotator.AnnotatorContextException > > : > > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle > > java.util.PropertyResourceBu ndle, key Could not construct > > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic > > ti > > onary at > > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto > > rP > > arser.parseDictionary(DictionaryDescriptorParser.java:199) > > at > > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto > > rP > > arser.parseDictionaries(DictionaryDescriptorParser.java:156) > > at > > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto > > rP > > arser.parseDescriptor(DictionaryDescriptorParser.java:128) > > at > > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i > > ni > > tialize(AbstractJCasTermAnnotator.java:129) ... 25 more > > Caused > > by: java.lang.reflect.InvocationTargetException at > > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > > Method) > > at > > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown > > Source) > > at > > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown > > Source) at > > java.lang.reflect.Constructor.newInstance(Unknown > > Source) at > > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto > > rP > > arser.parseDictionary(DictionaryDescriptorParser.java:196) > > ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS > > dictionary sno_rx_16abTerms at > > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic > > ti > > onary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33 more > > > > > > > > From: James Masanz [mailto:[email protected]] > > Sent: Wednesday, December 13, 2017 8:56 PM > > To: [email protected] > > Subject: Re: Slowness in processing files > > > > Using AggregatePlaintextFastUMLSProcessor is much faster than > > AggregatePlainTextProcessor, so I suggest that to start with you > > just use AggregatePlaintextFastUMLSProcessor. > > > > Do you mean it is taking ~5 hours for a single file to be processed > > at times, or is that for a set of files? > > > > If your JVM heap space is not set large enough, you can get very > > slow results. > > Try increasing to 5G (or more) using the JVM parameter -Xmx5G For > > faster start up, you can also set the -Xms to the same or something > > close to -Xmx value. > > > > -- James > > > > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected] > > > > > wrote: > > Hi All, > > > > When the medical records are run with the AE as > > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor > > the processing is very slow. It is pretty fast when the smaller > > files > > (~2 kb) are fed as input but when I am processing with bigger files > > say, 2Mb, it is very slow and the files are taking ~5 hours to > > process. Any pointer will be of great help. > > > > Regards, > > Harish. > >
