RE: Slowness in processing files [EXTERNAL]

Yadav, Harish Thu, 14 Dec 2017 08:16:30 -0800

Hi Timothy,

I fixed the password issues and ran with AE AggregatePlainTextProcessor with 
-Xms6g -Xmx6g, but still it takes a lot of time ( ~more than 2 hours) for a 
single file of 2 Mb size. I have checked the memory consumption of the process 
and it never goes above 4.5 G, so I am not sure if it is the memory issue. 
However, AE AggregatePlainTextProcessor process the 2KB file in ~11 seconds, 
but most of our files are in Mbs so processing time for each file for more than 
2 hours is not feasible.


Could you please suggest something which may improve the performance. Below are 
the logs for the process of 2 Mb file with AggregatePlainTextProcessor:



Logs:

C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java -cp 
"C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\lib\*"
 
-Dlog4j.configuration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml
 -Xms6g -Xmx6g org.apache.uima.tools.cpm.CpmFrame
Dec 14, 2017 9:40:25 AM java.util.prefs.WindowsPreferences <init>
WARNING: Could not open/create prefs root node Software\JavaSoft\Prefs at root 
0x80000002. Windows RegCreateKeyEx(...) returned error code 5.
log4j: reset attribute= "false".
log4j: Threshold ="null".
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressAppender] additivity to [false].
log4j: Level value for ProgressAppender is  [INFO].
log4j: ProgressAppender level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m].
log4j: Adding appender named [noEolAppender] to category [ProgressAppender].
log4j: Retreiving an instance of org.apache.log4j.Logger.
log4j: Setting [ProgressDone] additivity to [false].
log4j: Level value for ProgressDone is  [INFO].
log4j: ProgressDone level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%m%n].
log4j: Adding appender named [eolAppender] to category [ProgressDone].
log4j: Level value for root is  [INFO].
log4j: root level set to INFO
log4j: Class name: [org.apache.log4j.ConsoleAppender]
log4j: Parsing layout of class: "org.apache.log4j.PatternLayout"
log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss} %5p 
%c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
14 Dec 2017 09:42:09  INFO Chunker - Chunker model file: 
org/apache/ctakes/chunker/models/chunker-model.zip
14 Dec 2017 09:42:10  INFO TokenizerAnnotatorPTB - Initializing 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14 Dec 2017 09:42:10  INFO ContextDependentTokenizerAnnotator - Finite state 
machines loaded.
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using dictionary lookup 
window type: org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded: 
CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP VBZ WDT 
WP WPS WRB
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using minimum term text 
span: 3
14 Dec 2017 09:42:10  INFO AbstractJCasTermAnnotator - Using Dictionary 
Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
14 Dec 2017 09:42:10  INFO DictionaryDescriptorParser - Parsing dictionary 
specifications:
14 Dec 2017 09:42:10  INFO UmlsUserApprover - Checking UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234:
.14 Dec 2017 09:42:11  INFO UmlsUserApprover -   UMLS Account at 
https://uts-ws.nlm.nih.gov/restful/isValidUMLSUser for user harish1234 has been 
validated

14 Dec 2017 09:42:11  INFO JdbcConnectionFactory - Connecting to 
jdbc:hsqldb:file:resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab/sno_rx_16ab:
14 Dec 2017 09:42:11  INFO ENGINE - open start - state not modified
..................
14 Dec 2017 09:42:17  INFO JdbcConnectionFactory -  Database connected
14 Dec 2017 09:42:17  INFO JdbcRareWordDictionary - Connected to cui and term 
table CUI_TERMS
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table TUI 
with class TUI
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table 
RXNORM with class LONG
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table 
PREFTERM with class PREFTERM
14 Dec 2017 09:42:17  INFO JdbcConceptFactory - Connected to concept table 
SNOMEDCT_US with class LONG
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope sizes: 
10 , 10
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order: LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer: 
org.apache.ctakes.necontexts.status.StatusContextAnalyzer
14 Dec 2017 09:42:17  INFO StatusContextAnalyzer - initBoundaryData() called 
for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer: 
org.apache.ctakes.necontexts.status.StatusContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window type: 
org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type: 
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type: 
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using left , right scope sizes: 7 
, 7
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using scope order: LEFT,RIGHT
14 Dec 2017 09:42:17  INFO ContextAnnotator - SCOPE ORDER: [1, 3]
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context analyzer: 
org.apache.ctakes.necontexts.negation.NegationContextAnalyzer
14 Dec 2017 09:42:17  INFO NegationContextAnalyzer - initBoundaryData() called 
for ContextInitializer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context consumer: 
org.apache.ctakes.necontexts.negation.NegationContextHitConsumer
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using lookup window type: 
org.apache.ctakes.typesystem.type.textspan.Sentence
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using focus type: 
org.apache.ctakes.typesystem.type.textsem.IdentifiedAnnotation
14 Dec 2017 09:42:17  INFO ContextAnnotator - Using context type: 
org.apache.ctakes.typesystem.type.syntax.BaseToken
14 Dec 2017 09:42:17  INFO SentenceDetector - Sentence detector model file: 
org/apache/ctakes/core/sentdetect/sd-med-model.zip
14 Dec 2017 09:42:17  INFO POSTagger - POS tagger model file: 
org/apache/ctakes/postagger/models/mayo-pos.zip
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - Loading NLM Norm and Lvg 
with config file = 
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl -   config file absolute path 
= 
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\data\config\lvg.properties
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cwd = 
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd 
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0\resources\org\apache\ctakes\lvg\
14 Dec 2017 09:42:18  INFO ENGINE - open start - state not modified
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open start
14 Dec 2017 09:42:18  INFO ENGINE - dataFileCache open end
14 Dec 2017 09:42:18  INFO LvgCmdApiResourceImpl - cd 
C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0
14 Dec 2017 09:42:18  INFO DrugMentionAnnotator - Finite state machines loaded.
14 Dec 2017 09:42:23  INFO ClearNLPDependencyParserAE - using Morphy analysis? 
true
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
........................................................................................
Loading configuration.
Loading feature templates.
Loading model:
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
...
<various Loading model>
.
Loading configuration.
Loading feature templates.
Loading lexica.
Loading model:
................................
Loading model:
.............................
14 Dec 2017 09:42:32  INFO ConstituencyParser - Initializing parser...
14 Dec 2017 09:42:33  INFO SentenceDetector - Starting processing.
14 Dec 2017 09:42:34  INFO TokenizerAnnotatorPTB - process(JCas) in 
org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
14 Dec 2017 09:42:36  INFO LvgAnnotator - process(JCas)
14 Dec 2017 09:42:55  INFO ContextDependentTokenizerAnnotator - process(JCas)
14 Dec 2017 09:42:58  INFO POSTagger - process(JCas)
14 Dec 2017 09:43:10  INFO Chunker -  process(JCas)
14 Dec 2017 09:43:46  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:47  INFO ChunkAdjuster -  process(JCas)
14 Dec 2017 09:43:48  INFO AbstractJCasTermAnnotator - Starting processing
14 Dec 2017 09:43:54  INFO AbstractJCasTermAnnotator - Finished processing
14 Dec 2017 09:43:54  INFO DrugMentionAnnotator - process(JCas)
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:33  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:39  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:42  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:43  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:48  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:50  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:45:59  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:00  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:04  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:05  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:06  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:08  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:09  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:11  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:16  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:27  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:30  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:32  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:35  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:45  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:53  INFO DrugMentionAnnotator -
14 Dec 2017 09:46:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:02  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:22  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:24  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:28  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:29  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:34  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:38  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:46  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:49  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:54  INFO DrugMentionAnnotator -
14 Dec 2017 09:47:58  INFO DrugMentionAnnotator -
14 Dec 2017 09:48:45  INFO MaxentParserWrapper - Started processing: 
idd_secondTrial.txt
14 Dec 2017 10:20:19  INFO MaxentParserWrapper - Done parsing: 
idd_secondTrial.txt







Regards,
Harish.


-----Original Message-----
From: Miller, Timothy [mailto:[email protected]] 
Sent: Thursday, December 14, 2017 9:16 AM
To: [email protected]
Subject: Re: Slowness in processing files [EXTERNAL]

Do not try to use AggregatePlainTextProcessor, it is just slow.
Use the fast version and debug the password issues.
Make sure you have your UMLS credentials set in:
$CTAKES_ROOT/resources/org/apache/ctakes/dictionary/lookup/fast/sno_rx_
16ab.xml

in two different places.

Tim



On Thu, 2017-12-14 at 02:36 +0000, Yadav, Harish wrote:
> Hi James,
>  
> Thanks for responding.
>  
> Single file is taking ~5 hours to process with 
> AggregatePlainTextProcessor of size 2 Mb. This is how the process 
> looks like for JVM arguments regarding memory:
>  
> C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXXXX" -cp 
> "C:\New_Drive\apache-ctakes-4.0.0-bi
> apache-ctakes-4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-
> bin\apache-ctakes-4.0.0\lib\*" -Dlog4j.
> nfiguration=file:\C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> org.apache.uima.tools.cpm.CpmFrame
>  
> Also, just now I tried to process the file with AE
>  AggregatePlaintextFastUMLSProcessor but ran into different problem of 
> not getting authentication error with same username password being 
> used in AggregatePlainTextProcessor.
>  
> I can run it with AggregatePlaintextFastUMLSProcessor by increasing 
> Xms 5g and Xmx5g,  if you could please let me know how can it be 
> possible that with one AE AggregatePlainTextProcessor it is running 
> fine with above username and password but giving below exception with 
> same username, password with AggregatePlaintextFastUMLSProcessor.
>  
> Exception:
>  
>  C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-4.0.0>java
> -Dctakes.umlsuser="XXXXXXX"┬á -Dctakes.umlspw="XXXXXX" -cp 
> "C:\New_Drive\apache-ctakes-4.0.0-bin\ apache-ctakes-
> 4.0.0\desc\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\resources\;C:\New_Drive\apache-ctakes-4.0.0-bin\apache-ctakes-
> 4.0.0\lib\*" -Dlog4j.co nfiguration=file:\C:\New_Drive\apache-ctakes-
> 4.0.0-bin\apache-ctakes-4.0.0\config\log4j.xml -Xms512M -Xmx3g 
> org.apache.uima.tools.cpm.CpmFrame Dec 13, 2017 9:01:20 PM 
> java.util.prefs.WindowsPreferences <init> WARNING: Could not 
> open/create prefs root node Software\JavaSoft\Prefs at root 
> 0x80000002. Windows RegCreateKeyEx(...) returned error code 5. log4j:
> attributes.... 13 Dec 2017 21:04:58  INFO Chunker - Chunker model
> file: org/apache/ctakes/chunker/models/chunker-model.zip 13 Dec 2017
> 21:05:00  INFO TokenizerAnnotatorPTB - Initializing 
> org.apache.ctakes.core.ae.TokenizerAnnotatorPTB 13 Dec 2017 21:05:00 
> INFO ContextDependentTokenizerAnnotator - Finite state machines 
> loaded. 13 Dec 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using 
> dictionary lookup window type:
> org.apache.ctakes.typesystem.type.textspan.Sentence 13 Dec 2017
> 21:05:00  INFO AbstractJCasTermAnnotator - Exclusion tagset loaded:
> CC CD DT EX IN LS MD PDT POS PP PP$ PRP PRP$ RP TO VB VBD VBG VBN VBP 
> VBZ WDT WP WPS WRB   13 Dec 2017 21:05:00  INFO 
> AbstractJCasTermAnnotator - Using minimum term text span: 3 13 Dec
> 2017 21:05:00  INFO AbstractJCasTermAnnotator - Using Dictionary
> Descriptor: org/apache/ctakes/dictionary/lookup/fast/sno_rx_16ab.xml
> 13 Dec 2017 21:05:00  INFO DictionaryDescriptorParser - Parsing 
> dictionary specifications: 13 Dec 2017 21:05:00  INFO UmlsUserApprover 
> - Checking UMLS Account at https://uts-ws.nlm.nih.go 
> v/restful/isValidUMLSUser for user harish1234-ß: ....13 Dec 2017
> 21:05:02 ERROR UmlsUserApprover -   UMLS Account at https://uts-ws.nl 
> m.nih.gov/restful/isValidUMLSUser is not valid for user XXXXXXX-ß with 
> XXXXXXX
> org.apache.uima.resource.ResourceInitializationException:
> Initialization of CAS Processor with name 
> "AggregatePlaintextFastUMLSProcessor" failed.         at 
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:81)         at 
> org.apache.uima.impl.UIMAFramework_impl._produceCollectionProcessingE
> ngine(UIMAFramework_impl.java:420)         at 
> org.apache.uima.UIMAFramework.produceCollectionProcessingEngine(UIMAF
> ramework.java:918)         at
> org.apache.uima.tools.cpm.CpmPanel.startProcessing(CpmPanel.java:573)
>         at
> org.apache.uima.tools.cpm.CpmPanel.access$000(CpmPanel.java:105)
>         at
> org.apache.uima.tools.cpm.CpmPanel$1.run(CpmPanel.java:713) Caused
> by: org.apache.uima.resource.ResourceConfigurationException:
> Initialization of CAS Processor with name 
> "AggregatePlaintextFastUMLSProcessor" failed.         at 
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1101)         at 
> org.apache.uima.collection.impl.cpm.container.CPEFactory.getCasProces
> sors(CPEFactory.java:547)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.init(BaseCPMImpl.java
> :253)         at
> org.apache.uima.collection.impl.cpm.BaseCPMImpl.<init>(BaseCPMImpl.ja
> va:127)         at
> org.apache.uima.collection.impl.CollectionProcessingEngine_impl.initi
> alize(CollectionProcessingEngine_impl.java:73)         ... 5 more 
> Caused by: org.apache.uima.resource.ResourceInitializationException:
> Initialization of annotator class
> "org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator "
> failed.  (Descriptor: file:/C:/New_Drive/apache-ctakes-4.0.0-
> bin/apache-ctakes-4.0.0/desc/ctakes-dictionary-lookup-
> fast/desc/analysis_engine/UmlsLookupAnnotator.xml)         at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:271)
>         at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tialize(PrimitiveAnalysisEngine_impl.java:170)         at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:407)         at
> org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java
> :256)         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tASB(AggregateAnalysisEngine_impl.java:429)         at 
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
>         at
> org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.ini
> tialize(AggregateAnalysisEngine_impl.java:186)         at 
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(Analy
> sisEngineFactory_impl.java:94)         at 
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(Co
> mpositeResourceFactory_impl.java:62)         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
>         at
> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
>         at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.jav
> a:448)         at
> org.apache.uima.collection.impl.cpm.container.CPEFactory.produceInteg
> ratedCasProcessor(CPEFactory.java:1085)         ... 9 more Caused by:
> org.apache.uima.resource.ResourceInitializationException: MESSAGE 
> LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBundle, key C ould not construct 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:131)         at 
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.ini
> tializeAnalysisComponent(PrimitiveAnalysisEngine_impl.java:266)
>         ... 24 more Caused by:
> org.apache.uima.analysis_engine.annotator.AnnotatorContextException:
> MESSAGE LOCALIZATION FAILED: Can't find resource for bundle 
> java.util.PropertyResourceBu ndle, key Could not construct 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:199)         at 
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionaries(DictionaryDescriptorParser.java:156)
> at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDescriptor(DictionaryDescriptorParser.java:128)         at 
> org.apache.ctakes.dictionary.lookup2.ae.AbstractJCasTermAnnotator.ini
> tialize(AbstractJCasTermAnnotator.java:129)         ... 25 more Caused 
> by: java.lang.reflect.InvocationTargetException         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source)         at java.lang.reflect.Constructor.newInstance(Unknown
> Source)         at
> org.apache.ctakes.dictionary.lookup2.dictionary.DictionaryDescriptorP
> arser.parseDictionary(DictionaryDescriptorParser.java:196)
> ... 28 more Caused by: java.sql.SQLException: Invalid User for UMLS 
> dictionary sno_rx_16abTerms         at 
> org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDicti
> onary.<init>(UmlsJdbcRareWordDictionary.java:29)         ... 33 more
>  
>  
>  
> From: James Masanz [mailto:[email protected]]
> Sent: Wednesday, December 13, 2017 8:56 PM
> To: [email protected]
> Subject: Re: Slowness in processing files
>  
> Using AggregatePlaintextFastUMLSProcessor  is much faster than 
> AggregatePlainTextProcessor, so I suggest that to start with you just 
> use AggregatePlaintextFastUMLSProcessor.
>  
> Do you mean it is taking ~5 hours for a single file to be processed at 
> times, or is that for a set of files?
>  
> If your JVM heap space is not set large enough, you can get very slow 
> results.
> Try increasing to 5G (or more) using the JVM parameter   -Xmx5G For 
> faster start up, you can also set the -Xms to the same or something 
> close to -Xmx value.
>  
>  -- James
>  
> On Wed, Dec 13, 2017 at 7:04 PM, Yadav, Harish <[email protected]>
> wrote:
> Hi All,
>  
> When the medical records are run with the AE as 
> AggregatePlaintextFastUMLSProcessor or AggregatePlainTextProcessor the 
> processing is very slow. It is pretty fast when the smaller files
> (~2 kb) are fed as input but when I am processing with bigger files 
> say, 2Mb, it is very slow and the files are taking ~5 hours to 
> process. Any pointer will be of great help.
>  
> Regards,
> Harish.
>

RE: Slowness in processing files [EXTERNAL]

Reply via email to