You missed the most important part of my message:
> Do not try to use AggregatePlainTextProcessor, it is just slow.

Use AggregatePlaintextFastUMLSProcessor

Tim


On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
> Hi Timothy,
> 
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for
> more than 2 hours is not feasible. 
> 
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
> 
> 
> 
> Logs:
> 
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node
> Software\JavaSoft\Prefs at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is  [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is  [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category
> [ProgressDone].
> log4j: Level value for root is  [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09  INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator -
> Finite state machines loaded.
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
> 14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing
> dictionary specifications:
> 14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account
> at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234:
> .14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234 has been validated
> 
> 14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
> ..................
> 14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database
> connected
> 14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table TUI with class TUI
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right
> scope sizes: 10 , 10
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData()
> called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right
> scope sizes: 7 , 7
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17  INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state
> machines loaded.
> 14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy
> analysis? true
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing
> parser...
> 14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
> 14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
> 14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting
> processing
> 14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished
> processing
> 14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
> 
> 
> 
> 
> 
> 
> 
> Regards,
> Harish.
> 
> 
> -----Original Message-----
> From: Miller, Timothy [mailto:[email protected]
> Sent: Thursday, December 14, 2017 9:16 AM
> To: [email protected]
> Subject: Re: Slowness in processing files [EXTERNAL]
> 
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
> 
> in two different places.
> 
> Tim
> 
> 
> 
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> > 
> > Hi James,
> >  
> > Thanks for responding.
> >  
> > Single file is taking ~5 hours to process with 
> > AggregatePlainTextProcessor of size 2 Mb. This is how the process 
> > looks like for JVM arguments regarding memory:
> >  
> > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bi
> > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> > 4.0.0-
> > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame
> >  
> > Also, just now I tried to process the file with AE
> >  AggregatePlaintextFastUMLSProcessor but ran into different problem
> > of 
> > not getting authentication error with same username password being 
> > used in AggregatePlainTextProcessor.
> >  
> > I can run it with AggregatePlaintextFastUMLSProcessor by
> > increasing 
> > Xms 5g and Xmx5g,  if you could please let me know how can it be 
> > possible that with one AE AggregatePlainTextProcessor it is
> > running 
> > fine with above username and password but giving below exception
> > with 
> > same username, password with AggregatePlaintextFastUMLSProcessor.
> >  
> > Exception:
> >  
> >  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp 
> > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> > ctakes-
> > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM 
> > java.util.prefs.WindowsPreferences <init> WARNING: Could not 
> > open/create prefs root node Software\JavaSoft\Prefs at root 
> > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> > log4j:
> > attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> > 2017
> > 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
> > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> > 21:05:00 
> > INFO ContextDependentTokenizerAnnotator - Finite state machines 
> > loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator -
> > Using 
> > dictionary lookup window type:
> > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> > 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> > VBP 
> > VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO 
> > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> > 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> > Descriptor:
> > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> > 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing 
> > dictionary specifications: 13 Dec 2017 21:05:00  INFO
> > UmlsUserApprover 
> > - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> > ?u=https-3A__uts-
> > 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> > eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e= 
> > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> > 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://urldefe
> > nse.proofpoint.com/v2/url?u=https-3A__uts-
> > 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> > up-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e= 
> > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> > with 
> > XXXXXXX
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:81)         at 
> > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> > gE
> > ngine(UIMAFramework_impl.java:420)         at 
> > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> > AF
> > ramework.java:918)         at
> > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> > 3)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
> >         at
> > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> > by: org.apache.uima.resource.ResourceConfigurationException:
> > Initialization of CAS Processor with name 
> > "AggregatePlaintextFastUMLSProcessor" failed.         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1101)         at 
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> > es
> > sors(CPEFactory.java:547)         at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> > va
> > :253)         at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> > ja
> > va:127)         at
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:73)         ... 5 more 
> > Caused by:
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of annotator class
> > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> > failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> > fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
> >         at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tialize(PrimitiveAnalysisEngine_impl.java:170)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:407)         at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> > va
> > :256)         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tASB(AggregateAnalysisEngine_impl.java:429)         at 
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> > 3)
> >         at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tialize(AggregateAnalysisEngine_impl.java:186)         at 
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94)         at 
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62)         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> >         at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> > 1)
> >         at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:448)         at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused
> > by:
> > org.apache.uima.resource.ResourceInitializationException: MESSAGE 
> > LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBundle, key C ould not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:131)         at 
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
> >         ... 24 more Caused by:
> > org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> > :
> > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> > java.util.PropertyResourceBu ndle, key Could not construct 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:199)        
> > at 
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDescriptor(DictionaryDescriptorParser.java:128)        
> > at 
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more
> > Caused 
> > by: java.lang.reflect.InvocationTargetException         at 
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> >         at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> > Source)
> >         at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> > Source)         at
> > java.lang.reflect.Constructor.newInstance(Unknown
> > Source)         at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:196)
> > ... 28 more Caused by: java.sql.SQLException: Invalid User for
> > UMLS 
> > dictionary sno_rx_16abTerms         at 
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33
> > more
> >  
> >  
> >  
> > From: James Masanz [mailto:[email protected]]
> > Sent: Wednesday, December 13, 2017 8:56 PM
> > To: [email protected]
> > Subject: Re: Slowness in processing files
> >  
> > Using AggregatePlaintextFastUMLSProcessor  is much faster than 
> > AggregatePlainTextProcessor, so I suggest that to start with you
> > just 
> > use AggregatePlaintextFastUMLSProcessor.
> >  
> > Do you mean it is taking ~5 hours for a single file to be processed
> > at 
> > times, or is that for a set of files?
> >  
> > If your JVM heap space is not set large enough, you can get very
> > slow 
> > results.
> > Try increasing to 5G (or more) using the JVM parameter   -Xmx5G
> > For 
> > faster start up, you can also set the -Xms to the same or
> > something 
> > close to -Xmx value.
> >  
> >  -- James
> >  
> > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]
> > >
> > wrote:
> > Hi All,
> >  
> > When the medical records are run with the AE as 
> > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> > the 
> > processing is very slow. It is pretty fast when the smaller files
> > (~2 kb) are fed as input but when I am processing with bigger
> > files 
> > say, 2Mb, it is very slow and the files are taking ~5 hours to 
> > process. Any pointer will be of great help.
> >  
> > Regards,
> > Harish.
> >  

Reply via email to