You missed the most important part of my message:
> Do not try to use AggregatePlainTextProcessor, it is just slow.
Use AggregatePlaintextFastUMLSProcessor
Tim
On Thu, 2017-12-14 at 16:15 +0000, Yadav, Harish wrote:
> Hi Timothy,
>
> I fixed the password issues and ran with AE
> AggregatePlainTextProcessor with -Xms6g -Xmx6g, but still it takes a
> lot of time ( ~more than 2 hours) for a single file of 2 Mb size. I
> have checked the memory consumption of the process and it never goes
> above 4.5 G, so I am not sure if it is the memory issue. However, AE
> AggregatePlainTextProcessor process the 2KB file in ~11 seconds, but
> most of our files are in Mbs so processing time for each file for
> more than 2 hours is not feasible.
>
> Could you please suggest something which may improve the performance.
> Below are the logs for the process of 2 Mb file with
> AggregatePlainTextProcessor:
>
>
>
> Logs:
>
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp
> "C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms6g -Xmx6g
> org.apache.uima.tools.cpm.CpmFrame
> Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
> WARNING: Could not open/create prefs root node
> Software\JavaSoft\Prefs at root 0x80000002. Windows
> RegCreateKeyEx(...) returned error code 5.
> log4j: reset attribute= "false".
> log4j: Threshold ="null".
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressAppender] additivity to [false].
> log4j: Level value for ProgressAppender is [INFO].
> log4j: ProgressAppender level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m].
> log4j: Adding appender named [noEolAppender] to category
> [ProgressAppender].
> log4j: Retreiving an instance of org.apache.log4j.Logger.
> log4j: Setting [ProgressDone] additivity to [false].
> log4j: Level value for ProgressDone is [INFO].
> log4j: ProgressDone level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%m%n].
> log4j: Adding appender named [eolAppender] to category
> [ProgressDone].
> log4j: Level value for root is [INFO].
> log4j: root level set to INFO
> log4j: Class name: [org.apache.log4j.ConsoleAppender]
> log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss} %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 14 Dec 2017 09:42:09 INFO Chunker - Chunker model file:
> org/apache/ctakes/chunker/models/chunker-model.zip
> 14 Dec 2017 09:42:10 INFO TokenizerAnnotatorPTB - Initializing
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:10 INFO ContextDependentTokenizerAnnotator -
> Finite state machines loaded.
> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Exclusion
> tagset loaded: CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB
> VBD VBG VBN VBP VBZ WDT WP WPS WRB
> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using minimum
> term text span: 3
> 14 Dec 2017 09:42:10 INFO AbstractJCasTermAnnotator - Using
> Dictionary Descriptor:
> org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 14 Dec 2017 09:42:10 INFO DictionaryDescriptorParser - Parsing
> dictionary specifications:
> 14 Dec 2017 09:42:10 INFO UmlsUserApprover - Checking UMLS Account
> at https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.
> nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW
> 14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234:
> .14 Dec 2017 09:42:11 INFO UmlsUserApprover - UMLS Account at http
> s://urldefense.proofpoint.com/v2/url?u=https-3A__uts-
> 2Dws.nlm.nih.gov_restful_isValidUMLSUser&d=DwIGaQ&c=qS4goWBT7poplM69z
> y_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQvC
> EkIgDc6DU1Nbw&s=sqr66ew_JEhLww9qWi-re1b6LLKYW49FjyfEi8IGPIE&e= for
> user harish1234 has been validated
>
> 14 Dec 2017 09:42:11 INFO JdbcConnectionFactory - Connecting to
> jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/s
> no_rx_16ab/sno_rx_16ab:
> 14 Dec 2017 09:42:11 INFO ENGINE - open start - state not modified
> ..................
> 14 Dec 2017 09:42:17 INFO JdbcConnectionFactory - Database
> connected
> 14 Dec 2017 09:42:17 INFO JdbcRareWordDictionary - Connected to cui
> and term table CUI_TERMS
> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept
> table TUI with class TUI
> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept
> table RXNORM with class LONG
> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept
> table PREFTERM with class PREFTERM
> 14 Dec 2017 09:42:17 INFO JdbcConceptFactory - Connected to concept
> table SNOMEDCT_US with class LONG
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right
> scope sizes: 10 , 10
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.status.StatusContextAnalyzer
> 14 Dec 2017 09:42:17 INFO StatusContextAnalyzer - initBoundaryData()
> called for ContextInitializer
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.status.StatusContextHitConsumer
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using left , right
> scope sizes: 7 , 7
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using scope order:
> LEFT,RIGHT
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - SCOPE ORDER: [1, 3]
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context analyzer:
> org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
> 14 Dec 2017 09:42:17 INFO NegationContextAnalyzer -
> initBoundaryData() called for ContextInitializer
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context consumer:
> org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using lookup window
> type: org.apache.ctakes.typesystem.type.textspan.Sentence
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using focus type:
> org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
> 14 Dec 2017 09:42:17 INFO ContextAnnotator - Using context type:
> org.apache.ctakes.typesystem.type.syntax.BaseToken
> 14 Dec 2017 09:42:17 INFO SentenceDetector - Sentence detector model
> file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
> 14 Dec 2017 09:42:17 INFO POSTagger - POS tagger model file:
> org/apache/ctakes/postagger/models/mayo-pos.zip
> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - Loading NLM Norm
> and Lvg with config file = C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - config file
> absolute path = C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cwd =
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\org\apache\ctakes\lvg\
> 14 Dec 2017 09:42:18 INFO ENGINE - open start - state not modified
> 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open start
> 14 Dec 2017 09:42:18 INFO ENGINE - dataFileCache open end
> 14 Dec 2017 09:42:18 INFO LvgCmdApiResourceImpl - cd
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
> 14 Dec 2017 09:42:18 INFO DrugMentionAnnotator - Finite state
> machines loaded.
> 14 Dec 2017 09:42:23 INFO ClearNLPDependencyParserAE - using Morphy
> analysis? true
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> .....................................................................
> ...................
> Loading configuration.
> Loading feature templates.
> Loading model:
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ...
> <various Loading model>
> .
> Loading configuration.
> Loading feature templates.
> Loading lexica.
> Loading model:
> ................................
> Loading model:
> .............................
> 14 Dec 2017 09:42:32 INFO ConstituencyParser - Initializing
> parser...
> 14 Dec 2017 09:42:33 INFO SentenceDetector - Starting processing.
> 14 Dec 2017 09:42:34 INFO TokenizerAnnotatorPTB - process(JCas) in
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
> 14 Dec 2017 09:42:36 INFO LvgAnnotator - process(JCas)
> 14 Dec 2017 09:42:55 INFO ContextDependentTokenizerAnnotator -
> process(JCas)
> 14 Dec 2017 09:42:58 INFO POSTagger - process(JCas)
> 14 Dec 2017 09:43:10 INFO Chunker - process(JCas)
> 14 Dec 2017 09:43:46 INFO ChunkAdjuster - process(JCas)
> 14 Dec 2017 09:43:47 INFO ChunkAdjuster - process(JCas)
> 14 Dec 2017 09:43:48 INFO AbstractJCasTermAnnotator - Starting
> processing
> 14 Dec 2017 09:43:54 INFO AbstractJCasTermAnnotator - Finished
> processing
> 14 Dec 2017 09:43:54 INFO DrugMentionAnnotator - process(JCas)
> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:32 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:33 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:34 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:38 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:39 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:42 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:43 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:45 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:48 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:50 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:53 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:54 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:45:59 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:00 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:04 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:05 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:06 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:08 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:09 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:11 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:16 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:24 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:27 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:30 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:32 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:35 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:38 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:45 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:46 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:53 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:46:54 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:02 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:22 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:24 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:28 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:29 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:34 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:38 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:46 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:49 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:54 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:47:58 INFO DrugMentionAnnotator -
> 14 Dec 2017 09:48:45 INFO MaxentParserWrapper - Started processing:
> idd_secondTrial.txt
> 14 Dec 2017 10:20:19 INFO MaxentParserWrapper - Done parsing:
> idd_secondTrial.txt
>
>
>
>
>
>
>
> Regards,
> Harish.
>
>
> -----Original Message-----
> From: Miller, Timothy [mailto:[email protected]]
> Sent: Thursday, December 14, 2017 9:16 AM
> To: [email protected]
> Subject: Re: Slowness in processing files [EXTERNAL]
>
> Do not try to use AggregatePlainTextProcessor, it is just slow.
> Use the fast version and debug the password issues.
> Make sure you have your UMLS credentials set in:
> $CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_r
> x_
> 16ab.xml
>
> in two different places.
>
> Tim
>
>
>
> On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> >
> > Hi James,
> >
> > Thanks for responding.
> >
> > Single file is taking ~5 hours to process with
> > AggregatePlainTextProcessor of size 2 Mb. This is how the process
> > looks like for JVM arguments regarding memory:
> >
> > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXXXX" -cp
> > "C:\New_Drive\apache-ctakes-4.0.0-bi
> > apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> > bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-
> > 4.0.0-
> > bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> > nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\config\log4j.xml -Xms512M -Xmx3g
> > org.apache.uima.tools.cpm.CpmFrame
> >
> > Also, just now I tried to process the file with AE
> > AggregatePlaintextFastUMLSProcessor but ran into different problem
> > of
> > not getting authentication error with same username password being
> > used in AggregatePlainTextProcessor.
> >
> > I can run it with AggregatePlaintextFastUMLSProcessor by
> > increasing
> > Xms 5g and Xmx5g, if you could please let me know how can it be
> > possible that with one AE AggregatePlainTextProcessor it is
> > running
> > fine with above username and password but giving below exception
> > with
> > same username, password with AggregatePlaintextFastUMLSProcessor.
> >
> > Exception:
> >
> > C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> > -Dctakes.umlsuser="XXXXXXX"  -Dctakes.umlspw="XXXXXX" -cp
> > "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> > 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> > 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-
> > ctakes-
> > 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-
> > ctakes-
> > 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g
> > org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM
> > java.util.prefs.WindowsPreferences <init> WARNING: Could not
> > open/create prefs root node Software\JavaSoft\Prefs at root
> > 0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
> > log4j:
> > attributes.... 13 Dec 2017 21:04:58 INFO Chunker - Chunker model
> > file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec
> > 2017
> > 21:05:00 INFO TokenizerAnnotatorPTB - Initializing
> > org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017
> > 21:05:00
> > INFO ContextDependentTokenizerAnnotator - Finite state machines
> > loaded. 13 Dec 2017 21:05:00 INFO AbstractJCasTermAnnotator -
> > Using
> > dictionary lookup window type:
> > org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> > 21:05:00 INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> > CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN
> > VBP
> > VBZ WDT WP WPS WRB 13 Dec 2017 21:05:00 INFO
> > AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> > 2017 21:05:00 INFO AbstractJCasTermAnnotator - Using Dictionary
> > Descriptor:
> > org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> > 13 Dec 2017 21:05:00 INFO DictionaryDescriptorParser - Parsing
> > dictionary specifications: 13 Dec 2017 21:05:00 INFO
> > UmlsUserApprover
> > - Checking UMLS Account at https://urldefense.proofpoint.com/v2/url
> > ?u=https-3A__uts-
> > 2Dws.nlm.nih.go&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppx
> > eFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=v_ivdTVH9oojQd-0bohfzxVCl5UxJlSZ5FUfi7qnmxo&e=
> > v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> > 21:05:02 ERROR UmlsUserApprover - UMLS Account at https://urldefe
> > nse.proofpoint.com/v2/url?u=https-3A__uts-
> > 2Dws.nl&d=DwIGaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=He
> > up-IbsIg9Q1TPOylpP9FE4GTK-
> > OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=gE2jjaOVTYONTnzEtl8mA4LBRUcQ
> > vCEkIgDc6DU1Nbw&s=U8OuKgmE0YMDPABaTm39jDFIXG4tnVEeE1rrCS03cbM&e=
> > m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß
> > with
> > XXXXXXX
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of CAS Processor with name
> > "AggregatePlaintextFastUMLSProcessor" failed. at
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:81) at
> > org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessin
> > gE
> > ngine(UIMAFramework_impl.java:420) at
> > org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIM
> > AF
> > ramework.java:918) at
> > org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:57
> > 3)
> > at
> > org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
> > at
> > org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> > by: org.apache.uima.resource.ResourceConfigurationException:
> > Initialization of CAS Processor with name
> > "AggregatePlaintextFastUMLSProcessor" failed. at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1101) at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProc
> > es
> > sors(CPEFactory.java:547) at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.ja
> > va
> > :253) at
> > org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.
> > ja
> > va:127) at
> > org.apache.uima.collection.impl.CollectionProcessingEngine_impl.ini
> > ti
> > alize(CollectionProcessingEngine_impl.java:73) ... 5 more
> > Caused by:
> > org.apache.uima.resource.ResourceInitializationException:
> > Initialization of annotator class
> > "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> > failed. (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> > bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> > fast/desc/analysis_engine/UmlsLookupAnnotator.xml) at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
> > at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tialize(PrimitiveAnalysisEngine_impl.java:170) at
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94) at
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62) at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> > at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:407) at
> > org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.ja
> > va
> > :256) at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tASB(AggregateAnalysisEngine_impl.java:429) at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:37
> > 3)
> > at
> > org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.i
> > ni
> > tialize(AggregateAnalysisEngine_impl.java:186) at
> > org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Ana
> > ly
> > sisEngineFactory_impl.java:94) at
> > org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(
> > Co
> > mpositeResourceFactory_impl.java:62) at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:27
> > 9)
> > at
> > org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:33
> > 1)
> > at
> > org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.j
> > av
> > a:448) at
> > org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInt
> > eg
> > ratedCasProcessor(CPEFactory.java:1085) ... 9 more Caused
> > by:
> > org.apache.uima.resource.ResourceInitializationException: MESSAGE
> > LOCALIZATION FAILED: Can't find resource for bundle
> > java.util.PropertyResourceBundle, key C ould not construct
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:131) at
> > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.i
> > ni
> > tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
> > ... 24 more Caused by:
> > org.apache.uima.analysis_engine.annotator.AnnotatorContextException
> > :
> > MESSAGE LOCALIZATION FAILED: Can't find resource for bundle
> > java.util.PropertyResourceBu ndle, key Could not construct
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:199)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> > at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDescriptor(DictionaryDescriptorParser.java:128)
> > at
> > org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.i
> > ni
> > tialize(AbstractJCasTermAnnotator.java:129) ... 25 more
> > Caused
> > by: java.lang.reflect.InvocationTargetException at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> > Method)
> > at
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown
> > Source)
> > at
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> > Source) at
> > java.lang.reflect.Constructor.newInstance(Unknown
> > Source) at
> > org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescripto
> > rP
> > arser.parseDictionary(DictionaryDescriptorParser.java:196)
> > ... 28 more Caused by: java.sql.SQLException: Invalid User for
> > UMLS
> > dictionary sno_rx_16abTerms at
> > org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDic
> > ti
> > onary.<init>(UmlsJdbcRareWordDictionary.java:29) ... 33
> > more
> >
> >
> >
> > From: James Masanz [mailto:[email protected]]
> > Sent: Wednesday, December 13, 2017 8:56 PM
> > To: [email protected]
> > Subject: Re: Slowness in processing files
> >
> > Using AggregatePlaintextFastUMLSProcessor is much faster than
> > AggregatePlainTextProcessor, so I suggest that to start with you
> > just
> > use AggregatePlaintextFastUMLSProcessor.
> >
> > Do you mean it is taking ~5 hours for a single file to be processed
> > at
> > times, or is that for a set of files?
> >
> > If your JVM heap space is not set large enough, you can get very
> > slow
> > results.
> > Try increasing to 5G (or more) using the JVM parameter -Xmx5G
> > For
> > faster start up, you can also set the -Xms to the same or
> > something
> > close to -Xmx value.
> >
> > -- James
> >
> > On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]
> > >
> > wrote:
> > Hi All,
> >
> > When the medical records are run with the AE as
> > AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor
> > the
> > processing is very slow. It is pretty fast when the smaller files
> > (~2 kb) are fed as input but when I am processing with bigger
> > files
> > say, 2Mb, it is very slow and the files are taking ~5 hours to
> > process. Any pointer will be of great help.
> >
> > Regards,
> > Harish.
> >